Computational design of an epitope-specific Keap1 binding antibody using hotspot residues grafting and CDR loop swapping

Therapeutic and diagnostic applications of monoclonal antibodies often require careful selection of binders that recognize specific epitopes on the target molecule to exert a desired modulation of biological function. Here we present a proof-of-concept application for the rational design of an epitope-specific antibody binding with the target protein Keap1, by grafting pre-defined structural interaction patterns from the native binding partner protein, Nrf2, onto geometrically matched positions of a set of antibody scaffolds. The designed antibodies bind to Keap1 and block the Keap1-Nrf2 interaction in an epitope-specific way. One resulting antibody is further optimised to achieve low-nanomolar binding affinity by in silico redesign of the CDRH3 sequences. An X-ray co-crystal structure of one resulting design reveals that the actual binding orientation and interface with Keap1 is very close to the design model, despite an unexpected CDRH3 tilt and VH/VL interface deviation, which indicates that the modelling precision may be improved by taking into account simultaneous CDR loops conformation and VH/VL orientation optimisation upon antibody sequence change. Our study confirms that, given a pre-existing crystal structure of the target protein-protein interaction, hotspots grafting with CDR loop swapping is an attractive route to the rational design of an antibody targeting a pre-selected epitope.

. The modelled structure of Keap1 with G54.1 design indicates CDRH3 may be exchanged for affinity improvement. a, Predicted Keap1/G54.1 model structure. The six CDRs are highlighted, and the three grafted hotspots from Nrf2 in CDRH2 are shown as sticks. b, CDR-loop-wise Rosetta ΔG scores decomposition of designed G54.1 antibody. The individual CDR loop's contributions to the Rosetta ΔG scores between G54.1 and Keap1 were estimated by truncating each CDR loop from the Fv fragment of modelled G54.1/Keap1 complex structure.

Supplementary Methods
Computational design methodology. This method is an extension of the hotspot-centric de novo binding protein design approach. The interactions mediated by these hotspot residues often involve hydrogen bond networks, tight hydrophobic packing, and strong salt bridges, are therefore energetically favourable and evolutionarily conserved 1-3 . A geometric hashing method was developed to graft the hotspots, from either cognate protein binders or from in silico placements, into antibody scaffold crystal structures. Due to very limited number of available antibody Fv scaffold crystal structures, it is challenging to design high-affinity antibodies bearing CDRs that form optimal shape/electrostatic complementarity to the selected epitope on target proteins. CDRs swap leverages the large number of sequences and experimentally determined CDR configurations from other antibody scaffolds to construct new chimeric antibody models. The two strategies used in a tandem way may serve fast generation of high affinity antibodies targeting the selected binding site guided by the hotspotsmediated interaction patterns. Various RosettaScript XML script files were taken from 'rosetta_demo/public/rosetta_scripts' subdirectory of Rosetta 4 installation and customised to adapt to Keap1 antibody design example in this study.
Nrf2 hotspots identification. Three Nrf2 hotspot residues dominating the binding to Keap1 were identified using in silico alanine scanning script AlaScan.xml. The binding energy of Nrf2 and Keap1 in the complex structure (PDB accession code 2FLU 5 ) was predicted by calculating the Rosetta total energy difference using default all-atom forcefield (score12 weights) between bound and unbound structures, referred as Rosetta ΔG scores hereafter. Each Nrf2 residue was in silico mutated into alanine, and the top ranked three Nrf2 residues (Glu79, Thr80, and Glu82) with the Rosetta ΔG scores decreased by at least 0.8 Rosetta energy unit (REU) after alanine mutation were confirmed as hotspots. The hotspots conformations were diversified by generation of inverse rotamers starting from their side chain atoms nearest to the Keap1 surface using the InverseRotamers.xml script 6 . Extra rotamer sampling (two half step standard deviations) was performed around all side chain torsion angles.
Antibody V-region scaffold structures. The antibody V-region scaffold structures with at least one paired VH/VL stored in PDB were extracted from SabDab (http://opig.stats.ox.ac.uk/webapps/sabdab) database in 2014. Only the structures solved by X-ray crystallography were used, including both Fab and scFv formats. If multiple crystal copies were available for the same antibody structure with different chain identifiers, only the first copy which appeared in the PDB file was kept. Only the Fv regions were kept from the Fab structures. Abnum 7 was used to renumber the residues in the Fv structures according to Chothia numbering scheme 8 . Any structures with broken polypeptide CDR loops were discarded. Finally 1417 antibody Fv scaffold structures were kept for hotspots graft design.
Graft Nrf2 hotspots onto antibody scaffold structures. An in-house residue-based triplet hashing method was implemented to search for the best antibody scaffold structures to accommodate the three Nrf2 hotspots onto, while maintaining the hotspots original interaction patterns with Keap1. We defined a 'residue triplet' as consisting of three virtual triangles that connected three residues' backbone Cα, N and C atoms, respectively. The triplet is characterised by nine vertexes (Vα1, Vα2, Vα3, VN1, VN2, VN3, VC1, VC2 and VC3, corresponding to the positions of nine backbone Cα, N, and C atoms of the three residues consisting of the triplet) and nine edges (Eα1, Eα2, Eα3, EN1, EN2, EN3, EC1, EC2 and EC3, corresponding to the edges from the three triangles). On the hotspots side, any three inverse rotamers were enumerated from the three Nrf2 hotspot residues (Glu79, Thr80, and Glu82) and compiled into a residue triplet. Each triplet was canonicalized by ensuring that the longest and second longest Cα edges always corresponded to Eα1 and Eα2, respectively. Each triplet was indexed into a unique string key by concatenating six edges' round-off (RO) lengths in order.
All of the non-redundant index keys of hotspots' triplets were stored into a lookup table for fast access to corresponding hotspot triplet's information, including vertex residue types and atomic coordinates to facilitate later grafting onto the CDRs of antibody scaffold structures.
On the antibody scaffold side, any three CDR residues were enumerated and compiled into a triplet. The index key lookup table was generated in the same way as for hotspots triplet. To find the antibody scaffold structures which are able to accommodate the three hotspot residues in the geometrically matched positions in CDRs, the identical hotspots and antibody scaffold triplets were identified by directly comparing the respective index keys. The antibody scaffolds were grafted onto the hotspots by superimposing the scaffold triplet onto the corresponding identical hotspots one to minimise the RMSD between two sets of nine vertexes of the three triplet triangles. The three scaffold triplet residues were replaced with corresponding hotspots' ones by fitting the hotspots backbone atoms onto those of antibody triplet ones.
For each antibody designs obtained from hotspots graft, the sidechains of interfacial residues in antibody scaffolds clashing with Keap1 atoms were mutated into alanine to reduce clashes. The heavyatom RMSD of the hotspots sidechain atoms before and after replacement was calculated. All residues were repacked and minimised using the ppk.xml script. Several filters described below were applied to triage the designs:  The heavy-atom RMSD of the hotspots before and after replacement onto the antibody scaffold was smaller than 2.0 Å.
 The buried solvent accessible surface area (SASA) upon binding was greater than 1200 Å 2 .  Shape-complementarity (Sc) score was greater than 0.5.  The Rosetta ΔG score (binding energy) was lower than 0.0 REU.
The surviving designs that passed the filtering rules were finally ranked by Rosetta ΔG scores. The Rosetta ΔG scores of each CDR truncation mutant were re-calculated. Individual CDR's contribution to binding was estimated by computing the Rosetta ΔG scores difference between each CDR truncation mutant and the original G54.1 antibody.
All the exogenous CDRH3 loops from the antibody scaffold crystal structures used in previous hotspots graft stage were dissected at the positions from VH93 to VH103 of Fv structures and labelled as the CDRH3 anchor residues. To graft an exogenous CDRH3 loop onto G54.1, the original CDRH3 loop of G54.1 was removed at the positions from VH94 to VH102, leaving VH93 and VH103 as the Fv anchor residues. Each exogenous CDRH3 loop was fitted onto the G54.1 Fv structure by superimposing the backbone atoms from two sets of anchor residues. The Fv anchor residues of G54.1 were later removed and the grafted exogenous CDRH3 loop was ligated onto G54.1 Fv by connecting the CDRH3 anchor residues with the neighbouring G54.1 residues (VH92 and VH104). The resulting structures were discarded if the backbone atoms of the new CDRH3 loop clashed with original G54.1/Keap1 complex structure. Any CDRH3 residue sidechains clashing with G54.1/Keap1 residues were mutated to alanine to reduce clashes. The final structures obtained from CDRH3 swap were repacked and minimised using Rosetta ppk.xml script as in Step 2 and ranked by Rosetta ΔG scores.
Rosetta sequence design. Two rounds of sequence design were performed to optimise the binding affinities of the designed antibodies from hotspots graft and CDRH3 swap, respectively.
During the first round, starting from the five designed antibody structures that accommodated the three Nrf2 hotspots-mediated Keap1 interaction patterns, each interfacial CDR residue in the antibody side was mutated into other amino acid types (except cysteine, glycine and proline) to probe the mutation effect on Rosetta ΔG scores in order to identify mutants that were potentially able to improve the computed binding energies of designed antibodies with Keap1. The MutationScanPB.xml script for computing change in binding free energy during in silico mutagenesis using the scoring function with the modified electrostatics scoring term 9 was used to generate the single point mutants list. The point mutations were ranked by calculating the change of Rosetta ΔG scores, or, between each mutant and corresponding wild type structures. The top ranked single point mutations were selected and combined (maximum 5 mutations) to generate a variant of the original antibody graft.
During the second round, all residues of the swapped CDRH3 loops on G54.1 were allowed to mutate into all other amino acid types (excluding glycine, proline, and cysteine) simultaneously, with the backbone conformation of all interfacial residues on CDRs and Keap1 locally perturbed using backrub method 10 , using flexbb-interfacedesign.xml script. Explicit electrostatics was not used in the scoring function. Three iterations of redesign and minimization were used to increase the likelihood that higheraffinity interactions could be found, starting with a soft-repulsive potential (soft_rep weights), and ending with the default all-atom forcefield (score12 weights). Similar filter rules previously described for hotspots grafting designs were used to triage and rank the resulting CDRH3-swap designed structures:  The buried SASA upon binding was greater than 2000 Å 2 .  The Rosetta ΔG score was lower than -20.0 REU.  Sc score was greater than 0.6.
Design scoring. All the previously described computational features used for filtering or ranking the designs (Table S2 & S3) were calculated by Rosetta InterfaceAnalyzer application 11 :  Rosetta ΔG score, or binding energy was defined as the difference between the total system energy in the bound and unbound states. In each state, interface residues were allowed to repack.
 Rosetta total energy of the modelled complex structures.  Buried solvent accessible surface areas (SASAs) were defined as the difference between the total system SASAs in the bound and unbound states.
 Shape-complementarity (Sc) score of the modelled antibody/Keap1 complex structures.  Buried unsaturated polar atoms.

Keap1 expression & purification.
The gene encoding the Kelch domain of Keap1 was cloned into the expression vector pET-28a in frame with an N-terminal His tag and a TEV protease cleavage site. The amino acid sequence of the of the gene product is GSMGHAPKVGRLIYTAGGYFRQSLSYLEAYNPSDGTWLDLADLQVPRSGLAGCVVGGLLYAVGGRNNSPDGNTDSSA LDCYNPMTNQWSPCAPMSVPRNRIGVGVIDGHIYAVGGSHGCIHHNSVERYEPERDEWHLVAPMLTRRIGVGVAVL NRLLYAVGGFDGTNRLNSAECYYPERNEWRMITAMNTIRSGAGVCVLHNCIYAAGGYDGQDQLNSVERYDVETETW TFVAPMKHRRSALGITVHQGRIYVLGGYDGHTFLDSVECYDPDTDTWSEVTRMTSGRSGVGVAVTME.
The construct was transformed into E.Coli strain BL21 (DE3), which was subsequently cultured in 2TY medium containing 25ug/ml kanamycin at 37 °C. Protein production was induced with 0.3 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) at an O.D.600 of 4. Glycerol-based feed (50 mM MOPS, 1 mM MgSO4/MgCl2, 2 % glycerol) was added to the culture immediately after addition of IPTG, and the cultured was incubated further at 17 °C overnight. Cells were harvested by centrifugation and lysed in a buffer containing 50 mM Tris pH 8.5, 50 mM NaCl, 10% glycerol, 0.5% tritom-X100, 20 mM imidazole and sufficient amount of protease inhibitors (Roche). The lysate, pre-cleared by centrifugation, was filtered with a 0.2 μM filter and then mixed with Ni-NTA beads (Qiagen). The beads were washed with 50 mM Tris pH 8, 150 mM NaCl, 50 mM imidazole and 1 mM DTT before Keap1 was eluted with the former buffer supplemented with imidazole to a concentration of 250 mM. After the His tag was cut off, the sample was applied to a Ni-NTA (Qiagen) column to remove any Ni-binding contaminating proteins. The flow-through was collected and further purified by size exclusion (Superdex 75, GE Healthcare) and, if necessary, ion exchange (Mono Q, GE Healthcare) chromatography. The purified keap1 was concentrated and stored in 20 mM Tris pH 7.5 and 5 mM DTT at -80 °C.

Antibody cloning & expression.
Heavy and light chain variable region genes designed in silico were chemically synthesized by DNA2.0, Inc. Transcriptionally active PCR (TAP) was employed to separately amplify the heavy and light chain variable regions and subsequently introduce DNA sequences encoding the hCMV promotor sequence, human γ1 CH1 and Cκ (Km3 allotype) constant regions and poly(A) tail. The resultant constructs contained all of the required components for transient cellular expression. To generate Fab fragments for SPR analysis, HEK-293 cells were transiently transfected with TAP products using 293Fectin lipid transfection (Life Technologies, according to the manufacturer's instructions).
Crystallographic trials with the top four high affinity CDRH3-swap antibodies in Fab formats failed to yield diffraction-quality crystals in complex with Keap1. To convert LS146 from a Fab to a scFv construct, a gene encoding VH fused to VL through a (Gly4Ser)4 linker, a His10 tag along with a TEV protease cleavage site was synthesized and cloned into a UCB proprietary expression vector by DNA2.0, Inc. The amino acid sequence of the gene product is EVQLVESGGGLVQPGGSLRLSCAASGFAISASSIHWVRQAPGKCLEWVASIDPETGETLYAKSVAGRFTISADTSKNTAY LQMNSLRAEDTAVYYCARAYAGDGVYYADVWGQGTLVTVSSGGGGSGGGGSGGGGSGGGGSDIQMTQSPSSLSAS VGDRVTITCRASQSVSSAVAWYQQKPGKAPKLLIYSASSLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQSYSFPS TFGCGTKVEIKRTENLYFQGHHHHHHHHHH. CHO-S XE cells, a CHO-K1 derived cell line were transiently transfected with plasmid DNA using electroporation. 12 Cells were removed by centrifugation and scFv-TEV-His tagged protein was purified by IMAC. Supernatant was filtered with a 0.2uM filter and then loaded into a HisTrap excel column (GE healthcare). The column was washed with 50 mM Tris pH 8, 150 mM NaCl, 45 mM imidazole before the antibody was eluted with 50 mM Tris pH 8, 150 mM NaCl, 250 mM imidazole. After the His tag was removed, the sample was applied to the HisTrap excel column again to remove the Ni-binding contaminating proteins. The flow-through was collected and further purified by size exclusion (Superdex 75, GE Healthcare) chromatography. Purified antibody was concentrated, in 50 mM HEPES pH 7.5, 150 mM NaCl, 5% glycerol, and stored in aliquots at -80 °C until required.
Binding analysis. Surface plasmon resonance (SPR) experiments were carried out on a Biacore 3000 system (GE Healthcare) using reagents from the same manufacturer. Fabs were captured on the surface of CM5 sensor chips via affinity purified goat polyclonal F(ab')2 fragment specific to anti-human F(ab')2 (Jackson 109-006-097). The latter was immobilised to the activated carboxymethyl dextran surface via amine coupling as follows: a fresh mixture of 50 mM N-hydroxysuccimide and 200 mM 1-ethyl-3-(3dimethylaminopropyl)-carbodiimide was injected for 5 minutes at a flow rate of 10 μl/min, followed by 50 μg/ml anti-human F(ab')2 in 10 mM acetate pH 5.0 buffer for 5 min at the same flow rate. Finally the surface was deactivated with a 10 minute pulse of 1 M ethanolamine•HCl pH 8.5. Reference flow cell was on the chip was prepared by omitting the protein from the above procedure, thus in the following experiments sensorgrams were obtained as the response unit difference between anti-F(ab')2 and reference flow cells. Initial binding of Keap1 to expressed Fabs was assessed by injecting 50 μl supernatant, diluted 1 in 5 in running buffer, over the reference and anti-F(ab')2 flow cells at a flow rate of 10 μl/min, followed by a 150 μl injection of 0, 500 or 5000 nM Keap1 in running buffer at a flow rate of 30 μl/min. After the dissociation phase lasting at least 5 min the chip surface was regenerated with two 60 sec pulses of 40 mM HCl interspersed with a 30 sec pulse of 5 mM NaOH at the same flow rate. Association and dissociation kinetics of Keap1 binding to captured Fabs were determined by the same protocol over at least 8 values of the following concentrations: 75, 100, 150, 250, 350, 500, 750, 1000, 1500, 2500, 3500 and 5000 nM. Zero Keap1 controls were interspersed between the former cycles in order to correct for baseline drift and sham transfected supernatant was assessed at each Keap1 concentration in order to determine and correct for non-specific binding of Keap1. Specificity of Fab binding to Keap1 was assessed by competition with a high-affinity Nrf2 peptide analogue, biotin-PEG-LQLDEETGEFLPIQ-amide, corresponding to Nrf2 residues 74 to 87 that comprise the stronger Keap1 binding loop motif. Peptide Keap1 binding in the presence of peptide titrations to captured Fabs was followed using the above protocol. Using BIAevaluation™ software all sensorgrams were first transformed by subtracting a zero Keap1 control cycle and the corresponding non-specific control cycle prior to fitting dissociation and association kinetics. Dissociation constants (KD) were estimated as the logarithmic mean of values measured over at least 6 Keap1 concentrations. IC50 values were calculated using GraphPad Prism™ software by fitting to the log concentration versus normalized response/variable slope model represented by the following equation, where percent inhibition values for the three report points were treated as replicates at each concentration: (1) Crystallisation. Keap1 was buffer exchanged to the storage buffer of LS146-scFv (50 mM HEPES pH 7.5, 150 mM NaCl and 5% glycerol) prior to complex formation. This removed DTT from Keap1 storage buffer and prevented it from breaking the disulphide bonds in the antibody. Keap1 was then mixed with LS146-scFv at a molar ratio of 1:1.5 and incubated at room temperature for 30 minutes. The complex was purified by size exclusion chromatography (Superdex 75™ 26/60, GE Healthcare) and concentrated to 5 mg/ml. Initial crystallisation trials, with 200 nl protein solution plus 200 nl reservoir solution (Qiagen) in sitting-drop vapor-diffusion format, produced crystals in two conditions. Reproduction and optimization of one of the hit crystallization conditions (0.2 M sodium acetate and 20% PEG3500), using seed crystals obtained from the initial screening, generated diffraction quality crystals. The crystals were cryoprotected in mother liquor, supplemented with PEG 3350 to 35% (w/v), and vitrified in liquid nitrogen prior to data collection.
Crystallographic data collection and processing. Datasets from crystals LS146-scFv/Keap1 complex was collected at the Diamond Light Source synchrotron facility (Didcot, United Kingdom) on beamline 104-1 at a wavelength of 0.917 Å. Molecular replacement was performed using program PHASER 13 in the CCP4 software suite 14,15 using Keap1 (PDB accession code 1X2J 16 ), VH and VK frameworks without CDR loops (PDB accession code 3IVK 17 ) as the models. The solvent content of the crystal was determined as 46.09% and there are two copied of complexes in an asymmetric unit. Solutions were found in three stages; positions of two copies of Keap1 were searched and obtained first, and then the two copies of heavy chains and the two light chains. Refinement and model building were carried out using Refmac5.4 (REFinement of MACromolecular structures) 18 and COOT (Crystallography Object-Oriented Toolkit) 19 , respectively. The geometric quality of the final model was validated using Rampage 20 , ProCheck 21 , SFCheck 22 , and the validation tools provided by the RCSB Protein Data Base. Data collection and refinement statistics for LS146-scFv/Keap1 is provided in Table S4.