Structural analysis of biological targets by host:guest crystal lattice engineering

To overcome the laborious identification of crystallisation conditions for protein X-ray crystallography, we developed a method where the examined protein is immobilised as a guest molecule in a universal host lattice. We applied crystal engineering to create a generic crystalline host lattice under reproducible, predefined conditions and analysed the structures of target guest molecules of different size, namely two 15-mer peptides and green fluorescent protein (sfGFP). A fusion protein with an N-terminal endo-α-N-acetylgalactosaminidase (EngBF) domain and a C-terminal designed ankyrin repeat protein (DARPin) domain establishes the crystal lattice. The target is recruited into the host lattice, always in the same crystal form, through binding to the DARPin. The target structures can be determined rapidly from difference Fourier maps, whose quality depends on the size of the target and the orientation of the DARPin.

Structural analysis of biological targets by host:guest crystal lattice engineering patrick ernst, Andreas plückthun * & peer R. e. Mittl * To overcome the laborious identification of crystallisation conditions for protein X-ray crystallography, we developed a method where the examined protein is immobilised as a guest molecule in a universal host lattice. We applied crystal engineering to create a generic crystalline host lattice under reproducible, predefined conditions and analysed the structures of target guest molecules of different size, namely two 15-mer peptides and green fluorescent protein (sfGFP). A fusion protein with an N-terminal endo-α-N-acetylgalactosaminidase (EngBF) domain and a C-terminal designed ankyrin repeat protein (DARPin) domain establishes the crystal lattice. The target is recruited into the host lattice, always in the same crystal form, through binding to the DARPin. The target structures can be determined rapidly from difference Fourier maps, whose quality depends on the size of the target and the orientation of the DARPin.
Three-dimensional structural information is key for the understanding of almost any molecular process in life sciences. Furthermore, it is now an integral part of drug design. Diffraction methods are the prevailing techniques to obtain such information, but the scattering of photons by single molecules is too weak for direct recording. Scattering molecules need to be packed into well-ordered three-dimensional arrays, i.e. macromolecular crystals, to amplify the diffracted waves. Particularly for biological macromolecules, the conditions to favour crystallisation over aggregation are unpredictable and to find them a labour-and time-consuming trial-and-error process is required.
Since the early days of genetic engineering, mutagenesis methods have been used to improve the likelihood of achieving the crystalline state 1 , and/or the homogeneity of the crystalline molecules (reviewed in refs 2,3 ). However, screening of constructs and crystallisation conditions is still required and successful crystallisation is not guaranteed. The unpredictable quest for suitable crystallisation constructs and conditions still limits protein crystallography, making fast, routine and predictable crystallisation systems highly desirable, ideally in combination with fast phasing by difference Fourier methods. While great progress in electron microscopy 4 paves the way to atomic resolution structures without the crystallisation bottleneck, it is still a rather laborious undertaking. The structure determination by X-ray crystallography using a host:guest approach, perhaps similar to the workflow presented below, could be much faster and more suitable for high-throughput approaches.
Placing the target molecule at well-defined positions in an existing host lattice would be one option to crystallise arbitrary biological macromolecules under predefined conditions (Fig. 1A). This idea is not new; it was pioneered in DNA crystallography (reviewed in ref. 5 ), even though to create sufficient order for high resolution may still present a challenge for DNA. It was also tested for small molecule crystallography 6,7 : in the 'metal sponge' technique, target molecules with low molecular weights were soaked into crystalline frameworks of porous metal complexes and the structures were determined by X-ray diffraction. Porous crystals of a putative polyisoprenoid-binding protein from Campylobacter jejuni have been used as a host lattice to study the absorption and release of fluorescent proteins and gold nano-clusters 8,9 , but crystal structures were only determined for small compounds after covalent attachment to the host lattice 10 . In summary, several strategies to use preformed crystals to overcome the crystallisation bottleneck exist, but none of them has been successful in determining the structures of larger guest molecules.
Here, we show how peptide and protein targets can be reproducibly crystallised as "guests" in a host crystal under predefined conditions, and we explain the advantages and current limitations of the host lattice-display

Results
The auxiliary domain defines the robust host lattice by providing the majority of crystal contacts, and thus also the crystallisation conditions. It serves as a scaffold for rigidly positioning the target-binding domain to immobilise guest molecules at well-defined positions. A suitable auxiliary protein needs to fulfil several premises: (i) It must form a stable crystal lattice with large solvent channels that resists perturbation by target protein binding. (ii) It must diffract X-rays to high resolution with and without guest molecule, to allow an accurate and rapid structure determination by difference Fourier analysis. (iii) It must be easily manipulated, expressed, purified and crystallised under mild conditions to maintain the integrity of the host:guest complex.
Although high solvent content and strong diffraction are usually orthogonal features of protein crystals (reviewed in ref. 11 ), we identified several natural proteins in the PDB database that fulfil the requirements and could serve as auxiliary domains ( Table 1). The most promising candidate, endo-α-N-acetylgalactosaminidase from Bifidobacterium longum JCM1217 (EngBF, PDB ID: 2ZXQ), is a 150 kDa protein (devoid of domain 1, UniProtKB Q3T552, residues 340 to 1694) that diffracts to 2 Å resolution and crystallises with 72% solvent in space group P6 5 at neutral pH (25% 2-methyl-2,4-pentanediol (MPD), 3% PEG 20,000, 0.2 M NaCl, 0.01 M MnCl 2 , 0.1 M MES at pH 6.9) 12,13 . An additional carbohydrate-binding module (CBM32, not resolved in the electron density (ED) map) is connected to EngBF via a helical bundle whose C-terminus faces the large solvent-filled channel 13 . We replaced the CBM32 domain, after residue 1520 of EngBF, with different target-binding domains through rigid shared-helix fusions, similar to the design of various crystallisation chaperones 14,15 and electron microscopy aids 16,17 .
First, we tested whether the EngBF lattice tolerates the insertion of target-binding domains. For this reason, we tested a designed Armadillo repeat protein (dArmRP, 329 residues) 18,19 , the B30.2 domain from sRFPL1 (201 residues) 20 , and a designed ankyrin repeat protein (DARPin, 162 residues) 21 . Suitable scaffolds must be small and rigid to fit in the solvent channel, expose a large paratope to lock the target molecule in a unique conformation, and their N-terminus should be α-helical to permit the rigid fusion concept using a shared helix 14,15 . All fusions crystallised isomorphically and crystals diffracted between 1.8 Å and 3.0 resolution, proving the feasibility of the fusion approach (Fig. 2). Yet, no continuous ED was visible for the fused domains, suggesting an inherent disorder of the target-binding domains in the host lattice, requiring further engineering.
During the second design cycle, we rotated the DARPin domain in different orientations by stepwise extending the helical linker as shown in Fig. 3A,B. We proceeded with DARPins, as they are more rigid than dArmRPs, which possess some internal flexibility 22 . Furthermore, the selection of tight binders from DARPin libraries against almost any target molecule is well established 23 . Except for the structure EngBF_DARPin_rot4 (Table 2), where the shared helix was broken and a new crystal contacts formed via the DARPin paratope (Fig. 3C), the DARPin domains were still invisible in the ED maps. Again, this result confirmed that the DARPin fusions crystallised easily under the established conditions, but to have sufficient ED, additional crystal contacts were mandatory. These were engineered by incorporating disulphide bridges in the third design cycle.   Based on the previous findings we selected two different orientations, L1 from rotation 4 and L2 from rotation 9, and introduced additional crystal contacts by inter-molecular disulfide bridges (Fig. 3D). EngBF-DARPin construct L1 has a shorter shared helix, but the molecular packing only permits binding of small targets. To test if a single disulfide bridge stabilises the DARPin and allows target binding, we introduced mutations Lys1655 → Cys and Ser342* → Cys (*refers to a symmetry-related molecule) between the DARPin C-cap and the N-terminus of a symmetry-related EngBF domain. Using these mutations, DARPins B6 and G10 were fused to EngBF using the L1 construct. These DARPins bind a cyclic peptide of 15 amino acids cyclised by a D-Pro-L-Pro unit (c-pep1) 24 . Complexes EngBF-L1-DARPin_B6:c-pep1 and EngBF-L1-DARPin_G10:c-pep1 co-crystallised under identical conditions as the native EngBF (Table 2) and diffracted to 2.7 Å and 2.0 Å resolution, respectively. The Cys1655-Cys342* disulfide bridge confers sufficient rigidity to identify the DARPin domain. The EngBF-L1-DARPin_B6 difference map was sufficiently clear to build residues 6 to 14 from peptide c-pep1 independently of prior structural knowledge (Fig. 4A), and the structure turned out to be virtually identical to the previously determined structure of DARPin_B6:c-pep1. After refinement, the c-pep1 main chain and most side chains were defined in the final ED map (Fig. 4B). The narrow space in the L1 construct causes an additional crystal contact between c-pep1 and the host lattice ( Fig. 4C and Table 3). This minor contact does not prevent crystallisation and may add additional stability to the design.
To show that this strategy works for other small ligands as well, we inserted DARPin_D12 that recognises pep2, which also comprises 15 amino acids like c-pep1 but lacks the D-Pro-L-Pro unit. Again, crystals were obtained under the established conditions and the ligand was visible in the 2.4 Å resolution difference map. Here, residues 1, 2, 6, and 13-15 from pep2 are not resolved in the final ED map, suggesting that internal molecular rigidity, conferred by the cyclisation unit in c-pep1, is required to resolve the target completely (Fig. 4D). In summary, while the L1 construct is useful for rapidly determining structures of small targets under predefined crystallisation conditions, it provides little space for larger targets.
In contrast to L1, the L2 linker orients the DARPin paratope towards the central solvent channel of the EngBF lattice, allowing for larger targets to bind, but with fewer possible crystal contacts. For creating a very rigid L2 fusion, a disulfide bridge between Val1406 → Cys and Thr1488 → Cys was used to shift the three-helix bundle and the connected DARPin closer to a symmetry-related EngBF domain. A second intramolecular disulfide bridge (Glu1476 → Cys to Glu1555 → Cys) connects the loop of the EngBF three-helix bundle with the loop www.nature.com/scientificreports www.nature.com/scientificreports/ between the DARPin N-cap and its first internal repeat to reduce bending motions (Fig. 3D). Three additional disulfides between the DARPin domain and a symmetry-related EngBF domain crosslink the DARPin in the crystal (Cys1064*-Cys1685, Cys1090*-Cys1656, Cys1118*-Cys1617). As a test, we inserted DARPin_3G124, a high-affinity binder for sfGFP 25 .
EngBF-L2-DARPin_3G124 was co-crystallised with sfGFP, again under the established conditions and the yellow crystal colour suggested that sfGFP was absorbed in the EngBF-L2-DARPin_3G124 lattice (Fig. 5A). The EngBF-L2-DARPin_3G124 crystals diffracted to 2.3 Å resolution in the presence of sfGFP (Table 2) and the ED map confirms that the DARPin_3G124 domain is locked in the desired orientation with the paratope pointing towards the solvent-filled channel of the EngBF host lattice (Fig. 5B,C). This orientation provides sufficient space for larger targets up to 40 kDa, such as sfGFP. After refinement of EngBF-L2-DARPin_3G124 in the absence of the target, residual ED suggests binding of sfGFP, but the ED map is insufficient for placing sfGFP without additional information. Superposition of the DARPin_3G124nc:sfGFP structure (PDB-ID 5MA6 26 ) on EngBF-L2-DARPin_3G124 reveals that the difference map agrees very well with the expected orientation of sfGFP  www.nature.com/scientificreports www.nature.com/scientificreports/ (Fig. 5D). After placing the sfGFP based on the superimposed complex, the EngBF-L2-DARPin_3G124:sfGFP complex was refined at 2.3 Å resolution. Refinement of EngBF-L2-DARPin_3G124 free and in complex with sfGFP yielded very similar R work /R free values of 0.175/0.206 and 0.171/0.205, respectively. After refinement, sfGFP possesses an elevated B-factor of 214 Å 2 and the 2mF obs -DF model σ A -weighted map shows discontinuous density for the sfGFP main chain and no clear side chain density (data not shown). The B-factors vary along the main chain of all EngBF fusion proteins (Figs 4C and 5E). In all fusions EngBF, the robust scaffold of the crystal lattice, shows equally low B-factors, both by itself and in all refined fusion constructs, while the B-factors for the DARPin domains and for the targets are higher (Table 4). Since the B-factor for sfGFP exceeds 200 Å 2 and due to the marginal contribution on the improvement of R free , we deleted the sfGFP chain from the final model.   www.nature.com/scientificreports www.nature.com/scientificreports/ Low RMSDs suggest that the crystal engineering approach has not perturbed the structures of the individual domains. We measured 0.20 Å and 0.72 Å for the superposition of the refined EngBF-L2-DARPin_3G124 structure on isolated EngBF (PDB-ID 2ZXQ, 7646 atoms) and DARPin_3G124nc (PDB-ID 5MA6, 869 atoms), respectively.

Discussion
Our analysis shows that EngBF crystals -and perhaps other host crystals as well -tolerate the insertion of target:binder complexes and still robustly form under the same established crystallisation conditions. The target must be locked in a unique orientation to give a clear ED, which can be achieved by rigid and rigidly connected scaffolds such as e.g. the DARPins with a very constant geometry 23,27 . From one target to the next, only the binding residues of the DARPin need to be exchanged, as the shape of this binding molecule is very constant, and suitable DARPins can now be routinely selected for up to 95 targets in parallel. We created two different positions for guest molecules. In both cases target molecules line up along the central solvent channel, albeit with different orientations (Fig. 6A,B). Construct L1, with the DARPin domain facing a smaller cavity, can bind spherical targets with diameters up to 20 Å (targets below 3-4 kDa), whereas construct L2 can recognise targets with diameters up to 40 Å (targets below 40 kDa) (Fig. 6A).
The confinement in L1 offers sufficient rigidity to unambiguously refine the conformation of smaller targets, provided that the target itself possesses a rigid three-dimensional structure, whereas the extended space in L2  www.nature.com/scientificreports www.nature.com/scientificreports/ comes at the expense of reduced molecular rigidity. The yellow colour of the EngBF-L2-DARPin_3G124 crystals in the presence of sfGFP confirms that the target penetrates the host lattice and adsorbs to a higher concentration than present in the mother liquor, but to judge how much they contribute to diffraction, pure adsorption is an insufficient criterion, because the target molecules must be oriented in a rigid conformation. The residual ED suggests that this is at least partially the case for sfGFP, but the refinement parameters, such as average B-factor and improvement in R free , indicate only a marginal contribution of sfGFP, which is currently insufficient for building an independent structural model.
The poor local resolution for sfGFP could either be due to low occupancy or thermal motions. In solution DARPin_3G124 binds sfGFP with a K D of 22 ± 0.3 nM 25 . Since sfGFP was present during crystallisation at a concentration of 0.5 mg/ml (equivalent to approximately 16 μM, 5-fold molar excess over the EngBF-DARPin fusion, and thus 1000-fold above the expected K D ) we can assume that the occupancy is high. As the sfGFP-DARPin complex structure shows the same interface in the EngBF complex as without EngBF, and since there are no clashes, it is reasonable to assume that the dissociation constants of the EngBF-L2-DARPin_3G124:sfGFP complexes in the crystal and in solution are intrinsically similar. Nonetheless, the crystallisation buffer and precipitant may influence the K D , and the crystal lattice could have distorted the interface, such that the occupancy might actually be lower than expected from the affinity and the concentrations used.
On the other hand, we observed a pronounced B-factor gradient ranging from below 40 Å 2 for EngBF to above 150 Å 2 for the DARPin domain and even higher for the target (Table 4). Typically, the B-factor of the DARPin domain is approximately twice as high as the B-factor for the EngBF domain and the B-factor of the target is always higher than the B-factor of the DARPin domain, because the DARPin domain provides the main lattice contacts for the target. For EngBF-L2-DARPin_3G124 we observed an average B-factor of 150.6 Å 2 for the DARPin_3G124 domain. Due to the lack of additional crystal contacts, thermal motions of the target are restricted by the DARPin paratope only. The B-factor of the DARPin and the fraction of molecular target surface, which is buried in the DARPin interface, dictate the B-factor of the target and consequently the precision of its ED. Therefore, a B-factor exceeding 200 Å 2 can be expected for sfGFP even at full occupancy in the L2 construct. In the crystal of the individual DARPin_3G124:sfGFP complex (PDB-ID 5MA6) the average B-factors are 79.8 and 87.3 Å 2 , respectively, showing no intrinsic flexibility in this complex. We conclude that the rigid embedding of the target-binding domain, which is achieved by the shared helix and the engineered disulfide bridges in our case, is absolutely essential for host-lattice display to reveal sufficient ED for the target. In the future the engineering of additional crystal contacts of the target-binding domain and an extended paratope will be necessary to constrain the molecular order more effectively and to improve the ED for larger targets. This will be the prerequisite to make this approach truly generic.
A host:guest approach like this offers additional advantages. The intrinsic phase problem of X-ray crystallography is reduced to difference Fourier maps, making this technique particularly attractive for structures where simple phasing techniques like molecular replacement cannot be applied. Proteins with intrinsically disordered regions are notoriously difficult to crystallise, but since disorder does not hamper the selection of binders, the system presented here should allow the structural analysis of at least the rigid regions of the target molecule. The solvent channels of the host lattice permit easy access to the target and reduce the impact of crystal lattice forces on the conformation of the target, making this approach also attractive for drug design and time-resolved studies 28 .
In conclusion, this work has laid the foundation for a host:guest approach to protein crystallography, obviating the need to empirically search for crystallisation conditions. While this concept has been discussed for many years, this may be the first practical implementation for larger targets that can be extended into a general The host lattice is shown as a grey surface and the DARPin domains for constructs L1 and L2 as cartoons in cyan and green, respectively. The view is similar to Fig. 3D. (B) Perspective view of the crystal arrangement with EngBF-L2-DARPin_3G124 in the crystal oriented along the P6 5 symmetry axis. The symmetry elements are schematically drawn into the picture with the organisation of the unit-cell shown below the picture. EngBFdomain in pale-cyan, DARPin in orange, and the superimposed sfGFP (PDB-ID 5MA6) in green.

Materials and Methods
Shared helix and disulfide design. Suitable host lattices were identified using the advanced search tool from the RCSB Protein databank internet service. The databank was queried for lattices with high solvent content and resolution. The results were manually curated in light of the molecular structure, to assess if the host lattice permits the fusion of target-binding domains.
Shared helices were designed according to ref. 15 using the Rosetta molecular modelling suite 29 for the dArmRP and the B30.2 fusions. For the DARPin fusions, shared helix H15flex from ref. 15 was used as a template for the connection between EngBF and the DARPin. Potential disulfides where identified using the Disulfide by Design Server 2.0 30 .
Cloning, expression and purification. DNA encoding different EngBF fusion constructs was cloned into a pQIq vector (a lacI q encoding derivative of pQE30 (Qiagen, Hilden, Germany)), containing an N-terminal sfGFP fusion and a C-terminal His 6 -tag, both cleavable via a 3C protease cleavage site as described in ref. 26 . DNA fragments encoding the respective DARPin fusions with different binding sites and cysteine residues were ordered from IDT (Coralville, USA) or Genewiz (South Plainfield, USA) and cloned into the target vector via a BglII and a HindIII site. Chemocompetent E. coli BL21-Gold cells were transformed with the respective plasmid and used both for cloning and expression. Genes were expressed in 200-400 mL auto-induction 5052 medium 31 for 15 h at 25 °C. The cells were subsequently harvested by centrifugation at 5,000 × g for 10-15 min and resuspended in 15-20 mL washing buffer (20 mM sodium phosphate pH 6.3, 200 mM NaCl, 20 mM imidazole) and lysed by sonication. Cell debris was centrifuged at 20,000 × g for 15-20 min and the supernatant was loaded on 5 mL NiNTA-agarose resin (Qiagen, Hilden, Germany). Columns were washed with 5 column volumes (cv) of washing buffer and protein was eluted using 10 mL elution buffer (20 mM sodium phosphate pH 6.3, 200 mM NaCl, 250-500 mM imidazole). The elution fraction was directly loaded onto 2 mL Sepharose resin coupled with DARPin clamp R7, which binds to GFP with picomolar affinity as described in ref. 26 . The resin was washed with 20 mL crystallisation buffer (20 mM sodium phosphate, pH 6.3, 200 mM NaCl). To cleave the EngBF fusion construct off the column, 2 mL crystallisation buffer containing 1 mg HRV 3C protease were loaded on the column. Cleavage was either carried out overnight at 4 °C or for three hours at 25 °C for constructs L1 and L2 containing the cysteine mutations. Cleaved protein and protease were subsequently washed off the GFP-binding column with 10 mL crystallisation buffer and washed through 2 mL Ni-NTA resin columns to remove the His 6 -tag peptide and the protease (also carrying a His-tag). Proteins were directly used for crystallisation and always freshly prepared. crystallisation and structure determination. Proteins were concentrated to 2-20 mg/mL using Amicon ® centrifugal concentrators (50,000 MWCO, Merck Millipore, Massachusetts, USA)) and set up for crystallisation in a fine screen of the initial conditions (25% 2-methyl-2,4-pentanediol (MPD), 3% PEG 20,000, 0.2 M NaCl chloride, 0.01 M MnCl 2 , 0.1 M MES pH 6.9), changing the pH along the columns (from pH 6 to 7) and the MPD/PEG 20,000 ratio along the rows (MPD from 23% to 27% (v/v) and PEG 20,000 from 5% to 2% (w/v)) in a 96-well format. Three different mother-liquor to protein ratios (1:1, 2:1, 5:1) in 300-400 nL drops were used per well and incubated against 75 µL of reservoir solution at 4 °C. For L1/L2 complex crystallisations, the ligand was added in two-fold (c-pep1) to five-fold (sfGFP) molar excess and incubated 1-3 h prior to setting up the crystallisation experiment.
Crystals grew between day 0 and day 25 and were flash-frozen in liquid nitrogen prior to data collection without any additional cryo-protectant. Diffraction data collection was done at 1 Å at beamlines X06SA or X06DA (Swiss Light Source, PSI, Villigen, Switzerland) equipped with an Eiger 16M or Pilatus 2 M detector (Dectris, Baden-Wättwil, Switzerland). Data collection and refinement statistics are summarized in Table 2. Data processing was done using XDS, XSCALE and XDSCONV 32 . To match the polar 6 5 -screw axis with the deposited diffraction data of EngBF (PDB-ID 2ZXQ) data were re-indexed using the operator (hkl) = (kh-l) if necessary. Structures were determined by difference Fourier analysis. Model building was done in Coot 33 and refinement using REFMAC5 34 , PHENIX refine 35 and BUSTER 36 . Final resolution of the datasets were determined by paired refinement in pdb_redo 37 according to ref. 38 .

Data availability
All data needed to evaluate the conclusions in the paper are present in the main text or the supplementary materials. Plasmids encoding the constructs reported in this study are available for research purposes from the authors. Coordinates and structure factors have been deposited in the Protein Data Bank with the accession codes 6QFO, 6QFK, 6QEV, 6QEP, and 6SH9. Raw diffraction data are available at https://proteindiffraction.org/.