Structure and stability of the designer protein WRAP-T and its permutants

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}β-Propeller proteins are common natural disc-like pseudo-symmetric proteins that contain multiple repeats (‘blades’) each consisting of a 4-stranded anti-parallel \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}β-sheet. So far, 4- to 12-bladed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}β-propellers have been discovered in nature showing large functional and sequential variation. Using computational design approaches, we created perfectly symmetric \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}β-propellers out of natural pseudo-symmetric templates. These proteins are useful tools to study protein evolution of this very diverse fold. While the 7-bladed architecture is the most common, no symmetric 7-bladed monomer has been created and characterized so far. Here we describe such a engineered protein, based on a highly symmetric natural template, and test the effects of circular permutation on its stability. Geometrical analysis of this protein and other artificial symmetrical proteins reveals no systematic constraint that could be used to help in engineering of this fold, and suggests sequence constraints unique to each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}β-propeller sub-family.


Scientific Reports
| (2021) 11:18867 | https://doi.org/10.1038/s41598-021-98391-0 www.nature.com/scientificreports/ Tako8 15 and the nine-bladed propeller Cake9 16 . The latter can also fold as an eight-bladed propeller when only eight repeats are expressed. This structural plasticity may explain the origin of abundantly present odd-numbered β-propeller through duplication and fusion events, but it limits the applicability of Cake9 as a building block for larger assemblies. An attempt was made to create a protein that can only adopt the nine-bladed propeller fold by designing a three-fold symmetric protein instead of a nine-fold symmetric one. This resulted in the Scone protein which interestingly folds as a permuted eight-bladed propeller despite nine repeats are expressed in the open reading frame 8 . These unexpected observations highlight the need for further optimization of the computational design strategies in order to accurately control the folding symmetry of β-propellers.
In this study, we aimed to investigate whether there are geometric patterns linking the 3D structure to the fold symmetry. Perfectly symmetric β-propellers ease this analysis compared to natural pseudo-symmetric propellers. With Pizza6, Tako8, Cake8 and Cake9 we have six-, eight-, and nine-bladed proteins at hand. A five-bladed symmetric propeller was designed by the group of Tawfik. They selected functional proteins from an ancestral sequence library constructed from the sugar binding protein Tachylectin-2, resulting in a sequence of 47 amino acids that multimerized into a functional five-bladed propeller 17 . The group of Lupas more recently reported a highly repetitive propeller with 14 repeats folding as two seven-bladed propeller domains (PDB:2ymu) in Nostoc punctiforme. They named this protein "WD40-family Recently Amplified Propeller (WRAP)". Similar to Cake however this sequence possesses the ability to adopt both multimeric eight-and nine-fold symmetry, when fragments are expressed 18 .
However, in nature the most commonly observed number of blades is seven, and so far no monomeric 7-bladed symmetrical propeller protein has been reported. Here, we report the crystallographic structure of a seven-fold repeating monomeric protein, based on 2ymu. We also sought to investigate the influence of the "Velcro" positioning on the stability of the protein 19 . As a full set of designed symmetric β-propellers from five to nine-fold symmetry is now available, we also analysed and compared their geometry in order deduce rules which may be used as guidelines or restraints in future designs.

Results
We began our project before a consensus sequence of the 14 WRAP blades was reported by the group of Lupas 18 . Therefore we independently aligned all repeats and identified the most common amino acid at each position. The WRAP repeats show almost perfect conservation with very few mutations, and the only position which is not highly conserved is the ninth residue of the inner β-strand, where arginine and tryptophan occur equally often. As neither amino acid dominated, we did not choose either. Choosing an arginine may have yielded a supercharged structure at the core potentially preventing stable folding, while choosing the tryptophan could have depleted the supply of this single codon amino acid during expression and therefore negatively influence the yield of the protein. Hence, we constructed a sequence logo based on 7142 sequences classified in the PROS-ITE database as WD40 repeats 20 . In agreement with previous studies 21, 22 , we observed threonine to be the most commonly observed residue at this position. As this manually edited consensus sequence is only different in one residue compared to the WRAP structures later reported by the Lupas group 18 , we refer to the here reported sequence as WRAP-T. The sequence identity between WRAP-T and the highly repetitive natural propeller is 97%, with a minimum of 95% in the most divergent repeat. The template protein, repeat alignment and sequence logo are shown in Fig. 1.
In order to investigate the influence of the "Velcro closure" position we designed four synthetic DNA fragments expressing circularly permuted versions of the protein : nvWRAP-T without a "Velcro strap", v31WRAP-T with a "Velcro" close to the central cavity, v22WRAP-T with a "Velcro" in the middle of the four β-strands and v31WRAP-T with the outer "Velcro closure" (also present in the model protein 2ymu). These "Velcro" mutants were created by shifting the N-and C-termini to the inside of the loop region between β-strands. To avoid bad interactions between the close termini, the N-terminal residue was removed. We decided to position the termini so that a glycine residue would be removed, theorizing this residues contribute little to the overall interaction energy and stability of the fold. In the case of V22WRAP-T, this was impossible as the loop lacks a glycine. Hence the central aspartate was removed instead. A model and diagram of the first repeat are shown in Fig. 2.
The DNA fragments were cloned into a pET28 vector, and expressed in E. coli. Purification with a Ni-NTA column and size-exclusion chromatography yielded protein of high purity, suitable for crystallization. Each protein variant was monodisperse and eluted at the same volume, indicating the same molecular size, which was verified using analytical gel filtration, see Fig. 3. Optimal crystallization conditions were determined using standard screens.
Each protein crystallized in a different condition. High quality X-ray diffraction data-sets (see Table 1) were obtained with high resolution limits between 1.4 and 1.8 Å. Initial models were obtained using molecular replacement, starting with a propeller domain from PDB entry 2ymu. While the space group and packing differ between the crystals, the structure is identical for each protein. The backbone RMSD between each artificial protein and the 2ymu template is below 0.4 Å. The crystal structures also show that the proteins are highly internally symmetric with a backbone RMSD between individual repeats below 0.3 Å. The characteristic hydrogen bridge network in WD40 repeats, between a conserved histidine, serine, aspartate and tryptophan is clearly visible in the electron density, see Fig. 4.
In order to investigate the influence of the "Velcro" position on stability, the CD signal at 218 nm was monitored while raising the protein temperature, see Fig. 3. The resulting curves were fitted with a Boltzmann-sigmoid to determine the melting temperature and H, see Table 2. Thermal reversibility was tested by heating the samples to 85 • C and then back to 20 • C to compare the spectrum. Chemical stability was tested by measuring intensity changes in trypthophan fluorescence at the 330 nm wavelength under increasing concentrations of guanidine hydrochloride (GdnHCl). From this, the fraction of denatured protein could be calculated by www.nature.com/scientificreports/ assuming the protein is completely unfolded at the highest concentration. When possible this data was fitted with a Boltzmann-sigmoid curve to determine the concentration at which 50% of the protein is denatured and the proportionality constant (m), see Table 2. v13WRAP-T remained stable at high GdnHCl concentration thus fitting the data was not possible.

Discussion
A symmetric seven-bladed β-propeller was designed from a highly symmetric protein with unknown function (PDB:2ymu). This design completes the arsenal of designed symmetric β-propellers from five-to nine-fold symmetry. All these proteins consist of a motif between 40 and 49 amino acids long that is repeated multiple times. Some motifs permit multiple symmetries. The Cake sequence for example can both fold into an eight-and nine-bladed propeller depending on the number of repeats expressed 16 , and the WRAP sequence has also been shown to possess this property 18 . Most sequences however will only properly fold in a specific symmetry; Pizza will only adopt a six-bladed propeller, even when seven repeats are tandemly placed in a single polypeptide 9 . So far it has proved impossible to predict accurately the correct folding symmetry for a designed sequence, or to assess whether a designed structure will correctly fold. In a recent study we designed a nine-bladed propeller that was found in practice to form an eight-bladed propeller 24 . In order to design confidently a β-propeller ab initio, improved understanding of these structures is required. Analysis of artificial as well as natural proteins will assist in this process. Until now, no successful design of a seven-bladed propeller has been reported. The first attempt was done by the group of M. Paoli in 2006. They created a consensus sequence, taking into account structural alignments to assess the selected amino acids, but this evaluation lacked a computational evaluation of the sequence. The resulting protein had a low stability, and the structure could not be verified, thus the authors concluded that is had adopted a molten globule state. The group of Baker also attempted to create WD40 β-propeller proteins utilizing Rosetta De Novo design 25 with additional structural and sequence constraints from the protein family. However, none of their 13 designs including three seven-bladed propellers could be structurally verified 26 . This might be attributed to the worse RosettaHoles scores 27 observed for these proteins compared to their other designs. This indicates worse side chain packing in these proteins. We attempted to use our previously reported RE 3 Volutionary design method 28 , which was successful in designing the Pizza and Cake proteins, to For the creation of the WRAP-T protein an extremely repetitive natural template was used. This design was probably successful because only limited changes were made to the natural protein, while two previous designs had less then 50% sequence identity to their natural templates. All the WRAP-T permutants expressed well and could be purified easily. Solving the crystal structures revealed that they possess an identical structure. Interestingly, the different position of the chain termini had an influence on the preferred crystallisation conditions and crystal packing. While for previously designed symmetric proteins, the proteins typically packed in distinct symmetric ways reflecting the inherent six-, eight-or nine-fold symmetry of the proteins, here no clear symmetric interactions could be observed within the lattice. This may be expected given the incompatibility of seven-fold internal symmetry with any space group. Despite the non-equivalent positions of each blade within the crystal structures, the proteins proved to be perfectly symmetric, with a TM-score above 0.97 calculated by CE-symm 29 . Analysing the thermal and chemical stability of the WRAP-T permutants revealed that nvWRAP-T, lacking the velcro closure, is just as stable as the variant v13WRAP-T. While v31WRAP-T is considerably more resistant to GdnHCl induced denaturation compared to the other proteins, its increased thermal stability is not that significant. As expected, v22WRAP-T is the least stable protein, reflecting the fact this conformation is almost never seen in nature.
In a previous study, we investigated the influence of the "Velcro" strap on the stability of the Pizza protein, which belongs to the NHL family of propeller proteins 19 . It was found that the two"non-Velcro" Pizza variants (respectively having the termini just in front of the first strand, or behind the last strand of the blade) are the least stable, having a melting temperature of 41.78 • C and 45.32 • C compared to 52.48 • C for the regular Pizza. The "Velcro 2-2" variant is the least stable Pizza with a "Velcro", having a melting temperature of 44.13 • C. The "Velcro 3-1" variant is almost as stable as the regular Pizza, with a melting temperature of 52.48 • C. The same trends can be found for chemical denaturation, with nv1Pizza6 having a C m value of 0.660 M, nv2Pizza6 a value of 0.782 M and v22Pizza6 a value of 0.749 M, this can be compared to a value of 0.949 M for the regular Pizza. Just as with WRAP-T, the inside "Velcro3-1" position is highly resistant to chemical denaturation.
In both Pizza and WRAP-T, the "Velcro 2-2" conformation is the least stabilising, which presumably explains why it is rarely observed in nature. The few exceptions include the RCC1-like domains 30 , which are only distantly related to other β-propellers 4 . For WD40 repeats this decreased stability can be explained through the disruption of the highly conserved hydrogen bonding network between a serine/threonine in β-strand two, an aspartate and Figure 2. Construction of the "Velcro" variants, each "Velcro" protein has the same sequence but the N-terminal amino acid is shifted within the repeat thus resulting in a different location of the "Velcro". The first repeat is shown in colour, underneath each structure a diagram is shown illustrating the order of β-strands within the repeat. This figure was created with PyMOL 23 . www.nature.com/scientificreports/ tryptophan in β-strand three and a histidine in the connecting loop, see Fig. 4. The importance of this conserved hydrogen bonding network for the WD40 class of proteins was previously confirmed by mutating the amino acids involved 31 . However the Pizza protein does not posses this network, yet the middle "Velcro" is still the least stable. Possibly burying the charged chain termini deep within the protein is destabilising enough to explain the observed results. In both cases the "Velcro 3-1" position yields the protein with the highest stability, especially against chemical denaturants such as GdnHCl. A striking difference however is the stability of the "non-Velcro"   www.nature.com/scientificreports/ variant, the least stable for Pizza but as stable as the natural "Velcro" position in WRAP-T. This may be due to differences between the sequences, as the two different "non-Velcro" variants of Pizza have different stabilities as well. However, these two examples alone may not be sufficient to determine a clear correlation, and additional proteins will be needed to confirm whether the inner "Velcro" is consistently the most stable conformation for propeller proteins. Such information may be helpful in engineering β-propeller enzymes for industrial applications such as the production of fructans by fructosyltransferases 32 . Another goal of this research was the investigation of common trends in the backbone arrangement, especially the blade orientation within the propeller architecture. These findings could serve as guidelines for the de novo design of propeller backbones with a desired symmetry. We performed a structural analysis of the currently reported perfectly symmetric monomeric propeller proteins, but excluded any non-symmetric proteins in order to avoid bias to the backbone structure induced by the pseudo-symmetry of the sequence. The investigated proteins include: Tachylectin-2 (PDB:5c2m), Pizza (PDB:3ww9), WRAP-T (PDB:7big), Tako8 (PDB:6g6n), Cake8 (PDB:6tjg), and Cake9 (PDB:6tjh). Each has a different number of repeats, with the Cake sequence giving both eight-and nine-fold symmetric variants. A structural alignment of a single repeat of each protein allowed for the selection of four reference C α atoms in each repeat marked in the structure and sequence alignment in Fig. 5. Two are located on the inside β 1 -strand, the other two are found in the β 3 -strand. From these reference points we calculated six distances and four angles to characterize the specific protein.
The distances from the inner position to the central axis (a, b); the distance between reference points in subsequent repeats, for inner points (c, d) and outer points (e, f); the rotation angle ( α ); the twist angles of the blade as the dihedral angle between the β 1 -strand and the central axis ( β ) and the internal dihedral angle between the two strands ( γ ); and finally the tilt angle between the central axis and the β 1 -strand ( δ ). Average values and standard deviation of these values are shown in Table 3, and plots are shown in Fig. 5. Both WRAP-T and Tako belong to the WD40 family of proteins and are marked by a hollow triangle, while the other propellers belong to different families and are marked by a circle. As could be expected, the rotation angle α is entirely dictated by the symmetry α = 360 • n , with n the number of blades. The channel radius is also highly dependent on symmetry, generally increasing with increasing blade number. Tachylectin-2 is the exception, having a larger central channel than Pizza. A linear regression fit of the channel radii against symmetry number indicates a high correlation ( r 2 > 0.95 ) for the larger propellers, but only a weak correlation if Tachylectin-2 is included.
The inter-blade distances also follow a linear pattern; while the inside inter-blade distance increases with the number of blades, the outside distance decreases. Again Tachylectin is the one exception to this rule. It should be noted that these distances are not independent, and are related following the simple model of an isosceles triangle, c = 2 × a sin 360 n . The same is true for the relation between d and b. A similar formula can be applied to e and f with the difference that the distance between the first and third β-strand has to be taken in to account. As can be seen from the alignment of all blades in Fig. 5, this distance is nearly identical for all symmetries, thus e and f are also completely determined by a/b and the number of repeats. This results in a linear decrease with r 2 > 0.8 From these distances it is clear that Tachylectin-2 is the misfit within these proteins, although it is unclear whether this is true for all five-bladed propellers, or specific to Tachylectin-2 and its homologs. The dihedral twist angles show some relation with symmetry, with β increasing while the internal γ angle decreases, however r 2 is only above 0.5. In this case Tako seems to be the misfit, and excluding this protein yields a much higher correlation, r 2 > 0.95 . The tilt angle δ seems uncorrelated to the blade number with an r 2 < 0.2 . It is interesting to note that for the two Cake proteins, the dihedral angles are nearly identical while the tilt angles are different. It seems that this sequence can adopt different symmetry by increasing the channel radius and decreasing the tilt with respect to the central axis, similar as how a wire helix can be expanded or contracted in a similar way by rotating coils at each end in opposite directions about the central axis.
While the inter-blade distances obey simple geometric rules depending on the channel radius and symmetry, no design guidelines on blade orientation could be deduced from these few proteins. Tachylectin-2 shows that it is possible to have a larger channel with a low symmetry by including a longer loop between the third and fourth β-strand, filling the gap that would otherwise be created on the outside of the propeller. The Tako protein deviates from the trend and has drastically different angles from Cake8, the other eight-fold protein, showing that the same symmetry can be achieved by very different blade orientations.
It is known that close homologs of ring-shaped protein oligomers may have different numbers of subunits in the complex. For example the Tryptophan RNA-binding attenuation protein (TRAP) usually consists of eleven subunits, but the variant from B. halodurans (PDB: 3zzl) has twelve. This is caused by a deletion of the last five residues of each subunit, shifting the angle of rotation slightly. Equally the insertion of a larger amino acid on Table 2. Stability parameters of the proteins.   www.nature.com/scientificreports/ the inside of the circle also causes a shift in rotation angle, resulting in a conformation with twelve subunits 33 . For the monomeric β-propellers, it seems more difficult to find such rules, as deviations in repeat structure are not pronounced and the same sequence can result in multiple symmetries. Starting from a highly repetitive β-propeller sequence, we designed a consensus motif sequence that formed a perfectly symmetric 7-bladed β-propeller. This protein completes a set of artificial propeller proteins with five to nine-fold internal symmetry. A structural analysis of these proteins was performed to deduce their geometric patterns and rules. Although some trends are present in this small dataset, no definitive rules for propeller protein design can be deduced from it. A geometric relation between blade angle, radial distance and inter-blade distance was found, and a trend in angles of orientation could be observed, but exceptions were found to both tendencies. In the future it would be useful to compare proteins with the same symmetry but belonging to different propeller sub-families, in order to determine whether blade orientation is controlled more by the symmetry or features of the sequence.

Methods
Protein preparation. Linear DNA sequences encoding the variants of WRAP-T were commercially prepared, amplified by PCR and inserted into pET28 vector using the NdeI and XhoI restriction sites. Proteins were expressed in E. coli BL21(DE3) and purified using published protocols 34 . Protein expression was induced by adding IPTG to a final concentration of 0.5 mM, and subsequently incubating the cells with shaking at 20 • C for 20 h. Cells were harvested by centrifugation and suspended in 50 mM NaH 2 PO 4 , 200 mM NaCl and 10 mM imidazole. After lysis by sonication, cell debris was removed and the supernatant loaded onto a 10 ml volume nickelsepharose column equilibrated with the same buffer. The column was washed with a similar buffer containing 20 mM imidazole. The protein was finally eluted with buffer with 300 mM imidazole and digested with thrombin overnight at 4 • C during dialysis into 50 mM NaH 2 PO 4 , 200 mM NaCl, 10 mM imidazole. The protease:WRAP-T ratio was 1:200. The protein was re-loaded onto the washed nickel-sepharose column and the same steps were repeated. The fractions containing tag-free WRAP-T were pooled before loading onto a Superdex-200 column equilibrated with 20 mM HEPES and 200 mM NaCl buffer at pH 8.0.
The purified proteins were concentrated to 10 mg/mL and shown to be at least 95% pure by SDS-PAGE. All purified proteins were analysed by size-exclusion chromatography (SEC). The SEC analysis was performed using a Superdex 200 increase 10/300 GL column equilibrated with 20 mM HEPES and 200 mM NaCl buffer at pH 8.0.
Tryptophan fluorescence. Denaturation of the protein samples was measured by observing intrinsic tryptophan fluorescence with a Sapphire2 96-well plate reader following the protocol described in 19 . 2 µ L of protein sample ( OD 280 of 10) were pipetted into 98 µ L of guanidinium hydrochloride (GdnHCl) solution to give a final protein concentration of 0.5 µg/mL. GdnHCl concentration was tested from zero to 6 M, in steps of of 0.25 M. Tryptophan fluorescence was measured after seven days storage at 20 • C, by observing the emission intensity at 330 nm after excitation at 280 nm. Samples were prepared and measured in triplicate 35 . Fluorescence measurements from blank samples (containing no protein) were subtracted from the measured values. Normalized values for the fraction of denatured protein were obtained by scaling, with the maximum value set to 1 and the smallest to zero. The data points were fitted to a Boltzmann-sigmoid equation (1) with an in-house Python script utilising the SciPy library 36 .
F is the measured fluorescence, α N is the F value for the native state with zero denaturant, β N is the slope of this signal, α D and β D are the corresponding values for the completely denatured state. These parameters are introduced because the denatured and native states change linearly with the denaturant concentration. C m is the concentration at which 50% of the protein is denatured, m is the constant of proportionality ( = −∂G D−N ∂[denaturant] ) and has the dimensions of cal/mol/M 37 .
The difference in free energy between the "Velcro" mutants and the natural position of v13WRAP-T was calculated with Eq. (2) 38 .
m ′ is the m-value of the mutant and C m is the difference in the concentration point at which 50% of the protein is denaturated. CD spectroscopy. All circular dichroism measurements were performed with a JASCO J-1500 instrument following the protocol described in 19 . Thermal denaturation measurements were performed in a 2 mm path quartz cuvette using 0.05 mg/mL protein in 20 mM phosphate pH 7.6. The samples were heated from 20 to 85 • C in steps of 0.2 • C while monitoring the CD signal at a wavelength of 218 nm.
The CD signal was fitted to a Boltzmann-sigmoid equation (3) using the same script employed for the chemical denaturation experiments.