Design of complicated all-α protein structures

Sakuma, Koya; Kobayashi, Naohiro; Sugiki, Toshihiko; Nagashima, Toshio; Fujiwara, Toshimichi; Suzuki, Kano; Kobayashi, Naoya; Murata, Takeshi; Kosugi, Takahiro; Tatsumi-Koga, Rie; Koga, Nobuyasu

doi:10.1038/s41594-023-01147-9

Download PDF

Article
Open access
Published: 04 January 2024

Design of complicated all-α protein structures

Nature Structural & Molecular Biology volume 31, pages 275–282 (2024)Cite this article

7786 Accesses
107 Altmetric
Metrics details

Subjects

Abstract

A wide range of de novo protein structure designs have been achieved, but the complexity of naturally occurring protein structures is still far beyond these designs. Here, to expand the diversity and complexity of de novo designed protein structures, we sought to develop a method for designing ‘difficult-to-describe’ α-helical protein structures composed of irregularly aligned α-helices like globins. Backbone structure libraries consisting of a myriad of α-helical structures with five or six helices were generated by combining 18 helix–loop–helix motifs and canonical α-helices, and five distinct topologies were selected for de novo design. The designs were found to be monomeric with high thermal stability in solution and fold into the target topologies with atomic accuracy. This study demonstrated that complicated α-helical proteins are created using typical building blocks. The method we developed will enable us to explore the universe of protein structures for designing novel functional proteins.

Role of backbone strain in de novo design of complex α/β protein structures

Article Open access 24 June 2021

De novo design of discrete, stable 310-helix peptide assemblies

Article 22 June 2022

Exploration of novel αβ-protein folds through de novo design

Article Open access 03 July 2023

Main

Many naturally occurring protein structures are complicated, lacking distinguishable symmetry and regularity. Prominent examples of such complicated proteins are globin-fold structures with eight irregularly packed α-helices; Kendrew referred to the tertiary arrangement of the secondary structures as being difficult to describe in simple terms¹ (Fig. 1a). In most parts of globin fold structures, two helices adjacent in the sequence are connected crosswise rather than hairpin-like, and the helix–helix packings deviate from the canonical patterns^2,3; this fold does not include internal structural repeats such as α-solenoids^4,5. These asymmetric, irregular and nonrepetitive secondary structure arrangements make it difficult to simply describe globin structures, and many naturally occurring proteins as well.

**Fig. 1: Comparison of the structural complexities of naturally occurring and de novo designed proteins.**

A wide range of all-α protein structures have been designed, but the designs have been limited to simple and ordered structures consisting of α-helices in almost parallel alignment, such as coiled-coil, bundle and barrel structures (Fig. 1b–d and Extended Data Fig. 1)^{5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27}. Jacobs et al. attempted to design α-helical proteins with more variety¹⁵, but their designs were still bundle-like (the two designs with five α-helices in Fig. 1b). However, the distribution of complexity for naturally occurring all-α protein structures is biased to the complicated ones (Fig. 1d). The observed distribution bias is probably due to the fact that all-α proteins with complicated spatial arrangements of α-helices can provide diverse and heterogenous molecular surfaces, enabling specific interactions with binding partners. Moreover, such complicated all-α proteins should make it possible to incorporate a functional site enclosed on nearly all sides by multiple structural elements in three dimensions, like globins. Therefore, the ability to create protein structures with irregularly packed α-helices would contribute to the design of various functional proteins.

In this article, we sought to develop a computational method to design complicated all-α structures by employing helix–loop–helix (HLH) motifs typically observed in naturally occurring proteins. The developed method enabled us to generate a wide range of α-helical protein structure topologies from bundle-like to complicated by combining the typical HLH motifs and canonical α-helices. Finally, we demonstrated the ability to create complicated all-α proteins by de novo design of five distinct topologies.

Strategy for all-α topology building

Although it has been suggested that the overall tertiary arrangements of helices of naturally occurring α-helical proteins can be approximated by a quasi-spherical polyhedral model²⁸, the major obstacle in designing complicated all-α topologies with irregularly aligned α-helices is attributed to the difficulty in determining a priori feasible topologies with their backbone blueprints involving lengths of secondary structures and loops. This is different from the design of αβ-proteins: the topologies are selected in advance by β-strand arrangements (that is, the order and orientations of β-strands in a β-sheet), and the backbone blueprints were derived from a set of rules relating local backbone structures of a few successive secondary structure elements to the preferred tertiary motifs²⁹. Therefore, we attempted to explore all-α topologies, not by preparing them a priori but by generating backbone structure topologies through the combinatorial enumeration of tertiary building blocks (Fig. 2). Moreover, the tertiary building blocks were selected from those typically observed in nature, so that the generated backbone structures are likely to be feasible. Therefore, the question is whether complicated all-α topologies can be generated from typical building blocks.

**Fig. 2: Strategy for building α-helical backbone structure topologies.**

A typical set of HLH motifs as building blocks

We first attempted to collect a set of HLH tertiary motifs that are typically observed in nature as building blocks. The HLH units consisting of two α-helices and the connecting loop of one to five residues in length were extracted from naturally occurring proteins, then clustered into 18 subgroups based on the five-dimensional feature vectors representing the HLH tertiary geometries³⁰ (Extended Data Figs. 2 and 3 and Methods). The representative 18 HLH motifs corresponding to each cluster density peak exhibited a broad range of bending angles between two helices, such as left- or right-handed helix–turn–helix, helix–corner–helix and kinked helices (Fig. 3a and Extended Data Figs. 3 and 4); the amino acid preference for each motif is shown in Extended Data Fig. 5: Gly at the residues with positive phi backbone torsion angle, helix capping residues immediately before helices such as Asp, Asn, Thr or Ser (refs. ^31,32), and hydrophobic periodicity of helix residues specific to each motif are observed. The 18 HLH motifs are classified into three classes according to the magnitude of the bending angle: hairpin (h), v-shaped (v) and corner (c). The 18 representative HLH motifs were used as building blocks (Fig. 2, top) for generating α-helical backbone structure topologies.

**Fig. 3: 18 HLH tertiary motifs and generated α-helical backbone structures.**

Generation of all-α topologies by combinatorial enumeration

Next, we investigated whether complicated topologies are produced using these typical tertiary motifs. Helical backbone structures composed of five and six helices were built with 90 and 110 residues in the total length, respectively, by combining the set of 18 HLH motifs and canonical α-helices ranging from 5 to 35 residues. The backbone structures were generated by enumerating all the combinations and selecting compact and steric-clash-free structures (Methods): 1,159,937,910 five-helix and 20,878,882,380 six-helix structures were enumerated, and 1,899,355 and 380,869 structures were then selected for each. The resulting topologies exhibited a broad spectrum ranging from helical bundle-like to complicated globular structures, demonstrating that complicated α-helical topologies are created from the typical tertiary motifs and canonical α-helices (Fig. 3b, white bar; Fig. 3c and Extended Data Fig. 6); the helix lengths were also widely distributed in the generated structures (Extended Data Fig. 7). Moreover, we found that the complexities of the generated topologies increase, as tertiary motifs with larger bending angles are included (black, gray and white bars in Fig. 3b). These results highlight the importance of corner-type motifs³³ in building complicated α-helical topologies.

Design of complicated α-helical topologies

From the generated myriad backbone structure topologies, we selected five for de novo design, H5_fold-0, H6_fold-C, H6_fold-Z, H6_fold-U and H7_fold-K (the Arabic numeral after ‘H’ indicates the number of helices) (Fig. 4 and Supplementary Fig. 1), in the following way. We first selected three topologies exhibiting extremely low helix order (HO) values (for the definition, see Fig. 1c and Methods): H5_fold-0, H6_fold-C and H6_fold-Z (Fig. 1d). Next, to test whether all identified HLH motifs could be used for de novo design, we selected H6_fold-U and H7_fold-K, which include all of the HLH motifs not used in the first three and still exhibit lower HO values (Fig. 1d). For all target folds except H5_fold-0, the lengths of the terminal helices were manually elongated to ensure sufficient packing interactions. None of these backbone structures is similar to any known protein structures; H5_fold-0, H6_fold-C, H6_fold-Z and H6-fold-U show a TM-score <0.6, using TM-align³⁴ against the ECOD database³⁵, and H6_fold-K shows a score of 0.610, with a structure of e2bnlA1 (Extended Data Fig. 8). The details of the selected topologies are described in Supplementary Text. For each backbone structure, amino acid sequences were designed through iterations of fixed-backbone sequence optimization and fixed-sequence structure optimization using Rosetta design calculations^36,37. Designs with low energy, tight core packing³⁸ and high compatibility between local sequences and structures²⁹ were selected, and their energy landscapes were explored by 10,000 independent Rosetta ab initio structure prediction simulations starting from an extended conformation³⁹. Ninety-one percent (75 of 82 designs) for H5_fold-0, 45% (18 of 40 designs) for H6_fold-C, 68% (27 of 40 designs) for H6_fold-Z, 67% (60 of 90 designs) for H6_fold-U, and 40% (36 of 90 designs) for H7_fold-K, showed funnel-shaped energy landscapes. Among the designs having funnel-shaped energy landscapes, we selected approximately ten designs for each topology (for the details, see Methods).

**Fig. 4: Backbone structures for the five design target topologies.**

Experimental characterization of designed proteins

We obtained synthetic genes encoding ten designs for H5_fold-0, seven for H6_fold-C, seven for H6_fold-Z, eight for H6_fold-U and eight for H6_fold-K. Some designs (H6_fold-Z, 2; H6_fold-U, 1; H7_fold-K, 2) have weak sequence similarity to known proteins with blast E-value <0.005, but the structures are unknown (Supplementary Table 1). The proteins were expressed in Escherichia coli and purified using a Ni²⁺-NTA affinity column. The purified proteins were then characterized by circular dichroism (CD) spectroscopy and size-exclusion chromatography combined with multi-angle light scattering (SEC–MALS). For all design target topologies, 34 of 40 designed proteins were found to be well expressed and highly soluble, and showed CD spectra typical of α-helical proteins; 27 out of the 34 designs were found to be monomeric by SEC–MALS (Supplementary Tables 2–6). Furthermore, the monomeric designs were characterized by ¹H-¹⁵N heteronuclear single quantum coherence (HSQC) nuclear magnetic resonance (NMR) spectroscopy, and 23 designs showed well-dispersed sharp peaks (Supplementary Tables 2–6 and Supplementary Fig. 2). The experimental results for all the designs are summarized in Extended Data Table 1. For each topology, we selected one monomeric design with well-dispersed sharp NMR peaks for NMR structure determination (Fig. 5 and Supplementary Fig. 3). All the designs were found to be highly stable from thermal denaturation up to 170 °C by CD (Fig. 5b,c). The NMR structures were solved at high quality using MagRO-NMRViewJ^40,41 (Table 1, Supplementary Text, Supplementary Figs. 4 and 5 and Supplementary Table 7), and the solved structures were consistent with the design models (Fig. 6 and Supplementary Table 8). For H5_fold-0, one of the designs was solved by X-ray crystallography and was nearly identical to the design model except for the domain swapping in the crystallized condition (Fig. 6, Table 2 and Supplementary Fig. 5). Despite the inclusion of noncanonical helix–helix packing arrangements in each design, the sidechains from distant α-helices were found to be coherently packed to constitute a single hydrophobic core similar to the design model. Notably, the bulky hydrophobic sidechains from the loops and neighboring α-helices also contributed largely to the core: they spiked the core and pinned the loops to the target conformations (Extended Data Fig. 9; for the importance of hydrophobic residues in the HLH motifs on energy landscapes of the designs, see Supplementary Fig. 6b,e). Interestingly, the N- and C-terminal helices of H6-FoldU_Nomur was found to be fluctuated despite the helix formation (Supplementary Figs. 7–9). Furthermore, in the thermal denaturation, the helical content of H6-FoldU_Nomur was gradually decreased before the transition (the second from the bottom in Fig. 5c), and in the chemical denaturation, the m-value, which represents the cooperativity, was lower than those of the other designs (Fig. 5d; note that m-values also depend on protein size, with larger proteins having larger m-values⁴²; therefore, the H5_fold-0_Elsa and Chantal, which are smaller in size than the other designs, show lower m-values). These results would be attributed to the low hydrophobicity for the core-forming residues of the C-terminus: almost all of the residues are Ala (Supplementary Fig. 8). We also compared the loop geometries of all HLH motifs at the ABEGO level in the design models and experimental structures (Supplementary Fig. 10 and Supplementary Table 9) (for the importance of helix capping residues in the HLH motifs on energy landscapes of the designs, see also Supplementary Fig. 6c,f). Except for the loop immediately before the C-terminal helix of H6-FoldU_Nomur, all loop geometries of the experimental structures agreed with those of the design models. These results indicate that the difficult-to-describe α-helical proteins are designable with typical building blocks.

**Fig. 5: Characterization of designed proteins.**

Table 1 NMR constraints and structure statistics of the five designed structures

Full size table

**Fig. 6: Comparison of computational models with experimentally determined structures.**

Table 2 X-ray crystallography data collection and refinement statistics

Full size table

Discussion

De novo designs of α-helical proteins have focused on structures consisting of parallelly aligned α-helices (Fig. 1), many of which are based on helical structure models such as the helical wheel⁴³ and Crick’s parameterization⁴⁴. We sought to develop a computational method for designing difficult-to-describe α-helical protein structures. We first identified the 18 HLH motifs typically observed in naturally occurring proteins. We then demonstrated that a wide range of globular all-α backbone structure topologies from bundle-like to complicated are generated by combining the 18 typical HLH motifs and canonical α-helices. The key to building complicated α-helical topologies is to include HLH motifs with larger bending angles such as corner-type motifs. The approach of this developed method is regarded as the reverse of blueprint-based design: design target topologies are searched by the combinations of HLH motifs in this approach, whereas design target topologies are predetermined and then local backbone structures favoring the topologies are selected in blueprint-based design.

We succeeded in designing complicated α-helical protein structures with five distinct topologies, three of which, H5_fold-0, H6_fold-C and H6_fold-Z, exhibited structural complexities comparable to the globin fold. The design success rate was as high as that of previous de novo designs, and the design exhibited high solubility and thermal stability, similarly to previous designs^{29,45,46,47,48,49}. Moreover, the loop geometries of almost all HLH motifs were formed as designed, which must have enabled the designed proteins to fold into the target topologies. These de novo design results indicate that the compact and steric-clash free backbone structures generated by using the typical HLH motifs are probably designable. In this regard, however, one of the questions is whether all or how much of the generated backbone structures can have tight core packing of sidechains. We have demonstrated that the selected five backbone structures are packable through de novo design, but the packability for the other backbone structures has not been clarified, which should be addressed in next works.

The computationally generated myriad of complicated all-α structures should provide diverse and heterogeneous molecular surfaces for engineering functions such as binding, enzymatic activity and self-assembly into symmetric oligomers. The myriad of generated structures, which are presumably highly soluble and stable, coupled with the recently developed massive gene synthesis^50,51 and parallel high-throughput screening^17,18,26,52, should make it possible to create proteins with optimal structures for specific functions^17,26.

Methods

Definition of HO

HO is the order parameter that captures the complexities of α-helical proteins. HO is defined by the average of inner products between helix orientation vectors, u_i, for all pairs of N α-helices⁵⁵:

$${\mathrm{HO}}=\frac{2}{N(N-1)}\mathop{\sum }\limits_{\begin{array}{c}i=1\\ i < j\end{array}}^{N}{({{\bf{u}}_{\bf{i}}}\cdot {{\bf{u}}_{\bf{j}}})}^{2}.$$

Higher values indicate more ordered and lower values, more complicated.

Analysis of all-α protein structures for de novo designed and naturally occurring proteins

Twenty-two de novo designed all-α protein structures were collected from Protein Data Bank (PDB). To this end, de novo designs were searched by the keyword ‘de novo’ or ‘de-novo’ in PDB as of November 2020, and then all-α structures containing no β-strands were extracted on the basis of the secondary structure assignments by the DSSP algorithm⁵⁷ (for the PDB structures including multiple chains or NMR models, the first chain or model was used). The following four classes of de novo designed proteins were excluded from the dataset: (1) designed proteins created on the basis of backbone structures of naturally occurring proteins, and those with sequence similarity higher than 0.90 (as an exception, the three-helix bundle structure designs (PDB code: 6DS9 and 2A3D) were both included because of their structural dissimilarity); (2) assemblies composed of one or two α-helices (for example, 3R3K and 1U7J); (3) repetitive structures such as α-solenoids (for example, 1MJ0 and 5K7V); (4) membrane proteins.

For naturally occurring all-α protein structures, 7,352 representative structures found in the mainly-α class in the CATH database⁵⁸ with sequence identity less than 40% were used.

For calculating the HO values of the collected structures, secondary structure elements and loops were assigned by DSSP⁵⁷ (α-helices are defined for the residue regions composed of at least five successive residues assigned as ‘H’ by the DSSP calculation). Note that the secondary structure assignments by DSSP are not always consistent with those originally defined by the authors. For example, the number of α-helices in the structures (PDB codes: 4TQL and 1P68) respectively designed with three and four α-helices were defined as four and five due to partially distorted α-helices.

Clustering of HLH units using the five features representing a HLH geometry

A total of 13,667 HLH structures were extracted from 7,280 X-ray structures (secondary structures were assigned by DSSP⁵⁷), obtained from the PISCES server⁵⁹, with resolution ≤2.5 Å, R-factor ≤0.3, sequence length more than or equal to 40, and ≤25% sequence identity. We then classified the HLH structures by their loop residue lengths and extracted 13,510 HLH structures in total with loop of one to five residues in length. The extracted HLH structures were clustered for each loop length from one to five using the density clustering algorithm³⁰ (Extended Data Fig. 3), with the five features representing a HLH geometry (Extended Data Fig. 2).

Building backbone structures

α-Helical backbone structures were built using Rosetta by exhaustive sampling for the conformations with steric-clash free (Rosetta vdw score <4.0 using the weight value, 0.1) and smaller radius of gyration (<14 Å) (the threshold value corresponds to the peak of the distribution of the radius of gyration for naturally occurring proteins; Supplementary Fig. 11) by combining canonical α-helices ranging from 5 to 35 residues (backbone torsion angles, phi, psi and omega, were set to −60.0, −45.0 and 180.0, respectively) and the identified 18 HLH motifs (Main and Fig. 3a), with length constraints of 90 and 110 residues for the five- and six-helix proteins, respectively. For generating five-helix structures, 64,440,995 steric-clash free four-helix structures with 70 residues were first generated, and then an α-helix with 18 types of connecting loops was appended to the C-terminal of the generated four-helix structures so that the total length becomes 90 residues. For generating six-helix structures, an α-helix with 18 types of connecting loops was appended to the N-terminal of the generated five-helix structures so that the total length becomes 110 residues. From these structures, the globular five- and six-helix structures were collected on the basis of the radius of gyration.

Selection of designs for experimental characterization based on the shapes of energy landscapes

We set three criteria for the selection by the shapes of energy landscapes. First, the overall shape of the landscape should be funnel-like with an apparent and sharp ‘neck’ reaching low-root mean square deviation (RMSD) and low-energy region, which is the hallmark of the foldability specifically into the target conformation. This is the most important criterion on the selection of energy landscape plots: for ill-designed sequences, all conformations remain in the high-RMSD and high-energy regions and do not have such a ‘neck’. Second, the funnel should not have subminima that indicate that the protein has alternative folded states. This is a criterion to exclude the possibility of misfolding and avoid a rugged energy landscape. Third, the ensemble of lowest-RMSD and lowest-energy conformations at the bottom of the funnels should not be away from, and ideally should overlap with, the conformational ensemble in the simulations starting from the target structure. This criterion is not mandatory, but consistency between fragment assembly simulations that offer global sampling and near-native relax simulations helps us to rank the designs with the similar quality in terms of the first and second criteria.

Expression and purification of designed proteins

The genes encoding the designed sequences were synthesized and inserted into pET21b vectors. The whole plasmid constructs were purchased from FASMAC or Eurofins Genomics. The target proteins were overexpressed by IPTG induction in E. coli BL21 Star (DE3) cells cultured in MJ9 minimal media including ¹⁵N ammonium sulfate as the sole nitrogen source and ¹²C glucose as the sole carbon source⁶⁰. The expressed uniformly (U-)¹⁵N-labeled proteins with a 6xHis tag at the C-terminus were purified by Ni²⁺-affinity columns. The purified proteins were then dialyzed against phosphate-buffered saline (PBS) buffer, 137 mM NaCl, 2.7 mM KCl, 10 mM Na₂HPO₄ and 1.8 mM KH₂PO₄, at pH 7.4; this buffer was used for all the experiments except NMR structure determination. The expression level, solubility and purity of each designed protein were evaluated by sodium dodecyl sulfate–polyacrylamide gel electrophoresis. To further confirm them, the samples were analyzed by mass spectroscopy (Bruker Daltonics REFLEX III and Thermo Scientific Orbitrap Elite).

Experiments to identify designed proteins exhibiting folding ability

The following three experiments were conducted to evaluate the folding ability of designed sequences: CD spectroscopy, size exclusion chromatography with multi-angle light scattering (SEC–MALS) and ¹H-¹⁵N HSQC NMR spectroscopy. Supplementary Tables 2–6 present the results of the evaluations for each designed sequence for each fold.

CD spectroscopy under 1-bar pressure

Far-UV CD spectra was measured to study whether the designs show the characteristic spectra of α-helical proteins, by scanning from 260 to 200 nm at 20 °C for ∼15 μM protein samples in PBS buffer on a JASCO J-1500 CD spectrometer. The measurements were performed four times and then averaged.

SEC–MALS

Oligomeric states for the designs in solution were studied by SEC–MALS with miniDAWN TREOS static light scattering detector (Wyatt Technology Corp.) combined with a high-performance liquid chromatography system (1260 Infinity LC, Agilent Technologies) with a Shodex KW-802.5 column (Showa Denko K.K.) for H5_fold-0_Chantal and H6_fold-C_Rei or a Superdex 75 increase 10/300 GL column (GE Healthcare) for H5_fold-0_Elsa, H6_fold-Z_Gogy and H7_fold-K_Mussoc. After the equilibration of the column with PBS buffer, 100 µl of the samples after purification by Ni²⁺-affinity columns were injected. The absorbance at 280 nm was measured by the high-performance liquid chromatography system to give the protein concentrations and intensity of light scattering at 659 nm was measured at angles of 43.6°, 90.0° and 136.4°. These data were analyzed by the ASTRA software (version 6.1.2, Wyatt Technology) using a change in the refractive index with concentration, a dn/dc value, 0.185 ml g⁻¹, to estimate the molecular weight of dominant peaks.

¹H-¹⁵N HSQC NMR spectroscopy

Whether the designs fold into well-packed structures or not was evaluated by ¹H-¹⁵N HSQC 2D-NMR spectroscopy. The purified protein samples were concentrated to 0.2–1.0 mM, and mixed with their 10% volume of D₂O. The experiments were performed at 25 °C on a JEOL JNM-ECA 600 MHz spectrometer, and data were analyzed by JEOL Delta (version 5.3.1).

High-pressure CD spectroscopy for melting temperature (T _m) estimation

For the designs that were evaluated to have the folding ability in the above experiments (one design for each target topology was selected), thermal denaturation was studied by using high-pressure CD spectroscopy. JASCO J-1500 CD spectrometer was equipped with additional pressure instruments so that temperature of the solution samples can be scanned from 30 °C to 170 °C under 10 bar. Temperature was increased 1 °C per minute for ∼15 μM protein samples. Fixed wavelength measurements at 222 nm were performed at every 1 °C, and wavelength scanning measurements (260 to 200 nm) were performed at 30, 40, 60, 80, 90, 100, 110, 120, 130, 140, 150, 160 and 170 °C. Thermal denaturation was measured once. T_m was estimated by nonlinear fitting to thermal denaturation CD curve at 222 nm. The nonlinear least-squares analysis was performed by nls function in R language, given a two-state unfolding and linear extrapolation model. After this fitting, we obtained T_m at which the estimated populations of folded and unfolded states become equal.

CD spectroscopy for chemical denaturation

Chemical denaturation with GuHCl was monitored at 222 nm for 2–3 μM protein samples in PBS buffer (pH 7.4) at 25 °C in a 1-cm path length cuvette. The GuHCl concentration was automatically controlled by a JASCO ATS-530 titrator. Chemical denaturation was measured once. The chemical denaturation curves were fit by nonlinear least-squares analysis using a two-state unfolding and linear extrapolation model⁶¹. The free energy change, ΔG, for the unfolding transition and its dependency on the denaturant, m-value, were obtained from the fitting.

Sample preparation for NMR structure determination

The most promising design for each target topology was overexpressed by IPTG induction in E. coli BL21 Star (DE3) cells cultured in MJ9 minimal media containing ¹⁵N ammonium sulfate as the sole nitrogen source and ¹³C glucose as the sole carbon source⁶⁰. The expressed U-¹⁵N,U-¹³C-enriched proteins were purified by Ni²⁺-affinity columns, and dialyzed against PBS buffer. The protein samples were further purified by gel filtration chromatography on an ÄKTA Pure 25 FPLC (GE Healthcare) using a Superdex75 or Superdex75 increase 10/300 GL column (GE Healthcare), which also replaced the PBS buffer at pH 7.4 with the customized buffer for NMR spectroscopy. The following 95% H₂O/5% D₂O buffer conditions for each sample were used: 100 mM NaCl, 5.6 mM Na₂HPO₄, 1.1 mM KH₂PO₄, at pH 7.4 for H5_fold-0_Chantal; 50 mM NaCl, 5.5 mM Na₂HPO₄, 4.5 mM KH₂PO₄, at pH 6.9 for H6_fold-C_Rei; 50 mM NaCl, 3.2 mM Na₂HPO₄, 4.5 mM KH₂PO₄, at pH 6.5 for H6_fold-Z_Gogy; 155 mM NaCl, 3.0 mM Na₂HPO₄, 1.1 mM KH₂PO₄, 10 μM ethylenediaminetetraacetic acid, 0.02% NaN₃, cOmplete protease inhibitor cocktail (Roche), at pH 7.4 for H6_fold-U_Nomur; and 155 mM NaCl, 3.0 mM Na₂HPO₄, 1.1 mM KH₂PO₄, at pH 7.4 for H7_fold-K_Mussoc.

Solution structure determination by NMR

NMR measurements

NMR measurements were performed on Bruker AVANCE III NMR spectrometers equipped with QCI cryo-Probes at 303 K. The spectrometers with 600, 700 and 800 MHz magnets were used for the signal assignments and nuclear Overhauser effect (NOE)-related measurements, while 700, 900 and 950 MHz ones, for residual dipolar coupling (RDC) experiments. For the signal assignments, 2D ¹H-¹⁵N HSQC (echo/anti-echo), ¹H-¹³C Constant-Time HSQC for aliphatic and aromatic signals, 3D HNCO, HN(CO)CACB and 3D HNCACB for backbone signal assignments, while BEST pulse sequence was applied to the triple resonance measurements for H6_fold-C_Rei. For structure determination, 3D ¹⁵N-edited NOESY and 3D ¹³C-edited NOESY for aliphatic and aromatic signals (mixing time 100 ms) were performed. For H6_fold-U_Nomur, additional 3D HN(CA)CO, HN(CO)CA, HNCA, HBHA(CO)NH, HBHANH, H(CCCO)NH, CC(CO)NH, 3D ¹³C-HSQC (¹³C-t1) NOESY ¹³C-HSQC, 3D ¹³C-HSQC (¹³C-t1) NOESY ¹⁵N-HSQC and 4D ¹³C-HSQC NOESY ¹³C-HSQC were measured. Except for 3D-edited NOESY, all the other spectra were performed using non-uniform sampling (NUS) for H6_fold-U_Nomur and H7_fold-K_Mussoc. For NUS, sampling ratio was set at 25% for 3D and 6% for 4D with a fixed random seed. The NUS spectra were reconstructed by iteratively re-weighted least squares for 3D while iterative soft thresholding for 4D spectra with virtual-echo technique using qMDD tool⁶².

For the RDC experiments, 2D in-phase and anti-phase (IPAP) ¹H-¹⁵N HSQC using water-gate pulses for water suppression were measured with or without 6–10 mg ml⁻¹ of Pf1 phage (ASLA Biotech). For confirming the positions of ¹H-¹⁵N signals in the 2D IPAP ¹H-¹⁵N HSQC, 3D HNCO at the identical buffer condition containing Pf1 phage were measured. The α- and β-states of ¹⁵N signals split by ¹H-¹⁵N ¹J-coupling were separately identified for the protein in the isotropic and weakly aligned states, to obtain 1-bond RDC ${}^{1}D_{{}^{1}{\rm{H}}/{}^{15}{\rm{N}}}$ values. For the sample H6_fold-U_Nomur, 3D J-HNCO (without ¹H decoupling for ¹⁵N evolution) was measured at 25% NUS, which were used for confirming α- and β-states of ¹⁵N signal positions overlapped in 2D IPAP spectra. 3D J-HN(CO)CA spectrum was also measured for H6_fold-U_Nomur to obtain ${}^{1}D_{{}^{1}{\rm{H}}\alpha /{}^{13}{\rm{C}}\alpha }$ for appending an additional number of alignment data at the identical magnetic field and alignment tensor.

NMR signal assignments

All NMR signals were identified in a fully automated manner using MagRO-NMRViewJ (upgraded version of Kujira⁴⁰), in which noise peaks were filtered by deep-learning methods using Filt_Robot⁴¹. FLYA module was used for fully automated signal assignments and structure calculation⁶³ to obtain roughly assigned chemical shifts (Acs), and then trustworthy ones were selected into the MagRO Acs table. After confirmation and correction of the Acs by visual inspection using MagRO, TALOS+⁶⁴ calculations were performed to predict phi/psi dihedral angles, which were then converted to angle constraints for the CYANA format.

Structure calculation

Several CYANA⁶⁵ calculations were performed using the Acs table, NOE peak table and dihedral angle constraints. The Acs table was exported by the MagRO CYANA module, and then the aliased chemical shifts were automatically calculated depending on the spectrum width of responsible NOESY spectra. For dihedral angle constraints, phi and psi, with deviation were derived from TALOS+ prediction using chemical shifts of ¹⁵N, ¹³C′, ¹³Cα and ¹³Cβ, with high prediction score noted by ‘Good’. The minimal angle deviation was set at 20°. After several iterations of CYANA calculations, dihedral angle constraints derived from TALOS+⁶⁴ revealing large violation for nearly all models in structure ensemble were eliminated.

After the averaged target function of the ensemble reached to less than 2.0 Å², refinement calculations by Amber12 were carried out for 20 models with lowest target functions. The coordinates of final.pdb calculated by CYANA, distance constraints (final.upl), dihedral angle constraints derived from TALOS+ prediction were converted into Amber format and topology file using Sander Tools. Firstly, 500 steps of minimization (250 steps of steepest decent, 250 steps of conjugate gradient) were carried out without electrostatic potential and NMR constraints. Second, molecular dynamics simulations with the ff99SB force field using implicit water system (0.1 M of ionic strength, 18.0 Å of cutoff) were performed, in which the temperature was gradually increased from 0.0 K to 300.0 K by 1,500 steps, followed by the simulation with 28,500 steps at 300.0 K (1.0 fs time step, total 30 ps). Finally, 2,000 steps for minimization (1,000 steps for steepest decent and 1,000 steps for conjugate gradient) with constraints of distance and dihedral angle were applied at the same condition used in the molecular dynamics simulations.

NMR structure validation

The RMSD values were calculated for the 20 structures overlaid to the mean coordinates for the ordered regions, automatically identified by Filt_Robot using multi-dimensional nonlinear scaling⁵⁴.

The RDC back-calculation was performed by PALES⁶⁶ using experimentally determined values of RDC. The averaged correlation between the simulated and experimental values was obtained using the signals except the residues on overlapped regions in ¹H-¹⁵N HSQC and the ones in low-order parameters less than 0.8 predicted by TALOS+. For the validation of H6_fold-U_Nomur, a lot of signals were overlapped in 2D IPAP-HSQC spectra. To overcome this problem, ${}^{1}J_{{\mathrm{HN}}-{}^{15}{\rm{N}}}$ split 3D HNCO (without ¹H-decoupling scheme in ¹⁵N evolution period) spectra in isotropic and anisotropic states were measured by NUS (25% data point reduction) to obtain signal positions of α- and β-states of ¹⁵N spins at resolution of 0.3 Hz. ${}^{1}J_{{\mathrm{H}}{\rm{\alpha }}/{}^{13}{\mathrm{C}}{\rm{\alpha }}}$ split 3D HN(CO)CA spectra at the same conditions were also measured to obtain $\,{}^{1}D_{{}^{1}{\rm{H}}{\rm{\alpha }}/{}^{13}{\rm{C}}{\rm{\alpha }}}$ at resolution of 0.2 Hz. Initially the RDC reproducibility of H6_fold-U_Nomur were examined using separately ${}^{1}D_{{\mathrm{HN}}-{}^{15}{\rm{N}}}$ and ${}^{1}D_{{\mathrm{H}}{\rm{\alpha }}-{}^{13}{\mathrm{C}}{\rm{\alpha }}}$ tables by PALES for all models to confirm that the averaged correlation coefficients are greater than 0.9, and then final correlation coefficients were calculated with two merged tables.

Solution structural dynamics of H6_fold-U_Nomur measured by NMR

¹⁵N R ₁, R ₂ and ¹⁵N-{¹H} NOE experiments

The ¹⁵N R₁, R₂ and ¹⁵N-{¹H} NOE measurements were performed for a uniformly ¹⁵N-labeled H6_fold-U_Nomur protein sample with a concentration of 0.78 mM, which is the same condition as the solution used for the structure determination. These were conducted at 303 K on Bruker 700 MHz Avance-III NMR spectrometer equipped with cryogenic probe, using the 4-mm-diameter NMR Shigemi-tube. The ¹⁵N R₁ and R₂ were obtained by measuring 2D ¹H-¹⁵N HSQC with the inversion-recovery technique and with the temperature-compensated CPMG method, respectively⁶⁷. Steady-state ¹⁵N-{¹H} NOE was obtained by measuring 2D ¹H-¹⁵N HSQC spectra with and without saturation pulse in each of the retardation time acquired by the interleaved method. The 2D ¹H-¹⁵N peaks were automatically identified and assigned using the MagRO software⁴⁰. Some assignments were corrected with visual inspection. The ¹⁵N-{¹H} NOE values were estimated as the peak intensity ratio I/I₀ derived from the 2D HSQC spectra with (I) and without (I₀) saturation pulse. The I/I₀ data were fitted by using an exponential equation, I/I₀ = exp(−R × t) with delay time t (s) to obtain the ¹⁵N relaxation rate constant R (s⁻¹).

2D ¹H-¹⁵N CLEANEX-PM FHSQC experiments

The uniformly ¹⁵N-labeled protein sample of H6_fold-U_Nomur was lyophilized, and then 2D ¹H-¹⁵N HSQC data were collected immediately after dissolving the lyophilized sample in 100% D₂O. However, protons of the amide groups of most residues were promptly replaced by deuterium within 10 min after the dissolution, probably due to the high pH of the sample solvent (pH 7.4). This prevented us to obtain practical H–D exchange rates. Therefore, the exchange rates between the water and amide protons were obtained using the 2D ¹H-¹⁵N CLEANEX-PM FHSQC^68,69 scheme. In this method, the exchange ratio depends only on k_open in the protein folding/unfolding. The amide group would be in the EX1 limit due to the relatively high pH of 7.4, namely k_close « k, where k_close is the global and/or local folding rate of a protein and k is the exchange rate of amide group in the unfolded state, the observable solvent exchange rate k_ex would be obtained as the global and/or local unfolding rate of a protein, k_open. The 2D ¹H-¹⁵N FHSQC data without applying spin-lock pulse was also measured under the same condition to obtain the reference, I₀. For 2D ¹H-¹⁵N CLEANEX-PM FHSQC spectra with different spin-lock time t_m and the reference spectrum, the observed peaks were automatically identified and assigned by MagRO⁴⁰ with manual correction to obtain a normalized list of signal intensities for each residue. The following equation was used to obtain k_obs for each residue:

$$\frac{I}{{I}_{0}}=\frac{{k}_{{{\mathrm{ex}}}}}{{k}_{{{\mathrm{ex}}}}+{R}_{1{\mathrm{A}}}-{R}_{1{\mathrm{B}}}}\times \left\{\exp \left(-{R}_{1{\mathrm{B}}}\times {t}_{m}\right)-\exp \left[-\left({R}_{1{\mathrm{A}}}+{k}_{{{\mathrm{ex}}}}\right)\times {t}_{m}\right]\right\},$$

where R_1B is the apparent longitudinal relaxation rate of water molecules, and R_1A is a mixture of the apparent longitudinal and transverse relaxation rates on the rotational frame for the residue of interest. The values of R_1A and k_ex for each residue with error values were obtained by curve-fitting by this equation, with the assumption, R_1B = 0.6 (s⁻¹).

X-ray structure determination of H5_fold-0_Elsa

Sample preparation for X-ray structure determination

The gene encoding the designed sequence of H5_fold-0_Elsa in pET21b vector was digested at the NdeI and XhoI restriction sites and cloned into pET15b-TEV vector with cleavable sites by TEV protease instead of thrombin (original) between the designed sequence and the N-terminal 6xHis tag. Designed protein was expressed in E. coli BL21 Star (DE3) cells, and purified by a Ni²⁺-affinity column. The N-terminal His tag was then cleaved by TEV protease, and removed through a Ni²⁺-affinity column. The protein samples without a His tag were purified by an anion-exchange chromatography (HiTrapQ HP 1-ml column, GE Healthcare) followed by gel filtration chromatography (Superdex 75 10/300 GL column) on an ÄKTA Pure 25 FPLC. Mass spectroscopy was performed to confirm that a His tag was successfully cleaved.

To assess the effect of the tag cleavage on the oligomeric state and stability, we performed SEC–MALS and thermal denaturation CD experiments under high pressure for the original and tag-cleaved samples of H5_fold-0_Elsa. The solvent was exchanged to PBS at pH 7.4 before these experiments. The results showed that the tag-cleaved protein was also monomeric and had nearly identical denaturation temperature (the second row in Fig. 5c, 106 °C) as the original sample with the C-terminal His tag (Supplementary Fig. 12, 105 °C), which indicates that the removal of tag and slight differences in flanking amino-acid sequences do not largely change the stability and oligomeric state of the designed protein in solution.

Crystallization and X-ray structure determination

The protein samples of H5_fold-0_Elsa at the concentration of 12 mg ml⁻¹ (1.07 mM) was crystallized in the solution of 0.4 M MgCl₂, 0.1 M Tris–HCl (pH 7.5) and 30% PEG 3350, using the sitting-drop vapor diffusion method at 296 K. The obtained crystals were soaked in the solution of 0.4 M MgCl₂, 0.1 M Tris–HCl (pH 7.5), 30% PEG 3350 and 10% glycerol, mounted on cryo-loops (Hampton Research), flash-cooled and stored in liquid nitrogen.

X-ray diffraction data of the crystal were collected with BL-1A beamline (λ = 1.1000 Å) at Photon Factory, and processed to 2.3 Å by XDS⁷⁰. After phase determination by molecular replacement using the design model by Molrep⁷¹ in the CCP4 suite, the molecular model was constructed and refined using Coot⁷² and Phenix Refine⁷³. Translation/Libration/Screw refinement was performed in late stages of refinement. The refined structures were validated with RAMPAGE⁷⁴. Ramachandran plot statistics showed that 98.8% and 0.00% of residues were in favored and outlier regions, respectively. The crystallographic data collection is summarized in Table 2.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The solution NMR structures of the five designs have been deposited in the PDB under the accession numbers 7BQM (H5_fold-0_Chantal), 7BQN(H6_fold-C_Rei), 7BQQ (H6_fold-Z_Gogy), 7BQS (H6_fold-U_Nomur) and 7BQR (H7_fold-K_Mussoc). The NMR data have been deposited in the Biological Magnetic Resonance Data Bank under the accession numbers 36335(H5_fold-0_Chantal), 36336 (H6_fold-C_Rei), 36337 (H6_fold-Z_Gogy), 36339 (H6_fold-U_Nomur) and 36338 (H7_fold-K_Mussoc). The crystal structure of H5_fold-0_Elsa has been deposited in the PDB under the accession number 7DNS. The computational design models are presented as Supplementary Data 1. The generated compact and steric-clash-free five-helix (1,899,355) and six-helix (380,869) structures are available at https://github.com/kogalab21/all-alpha_design. The plasmids encoding the designed sequences are available through Addgene under the accession numbers 201825 (H5_fold-0_Elsa), 201826(H5_fold-0_Chantal), 201827 (H6_fold-C_Rei), 201828 (H6_fold-Z_Gogy), 201829(H6_fold-U_Nomur) and 201830 (H7_fold-K_Mussoc). Source data are provided with this paper.

Code availability

The code for building and analyzing helical structures has been implemented into Rosetta at https://github.com/RosettaCommons/main/tree/koga/all-alpha_design. The demo for building helical structures is available at https://github.com/kogalab21/all-alpha_design.

References

Kendrew, J. C. et al. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181, 662–666 (1958).
Article CAS PubMed ADS Google Scholar
Crick, F. H. C. The packing of α-helices: simple coiled-coils. Acta Crystallogr. 6, 689–697 (1953).
Article CAS Google Scholar
Chothia, C., Levitt, M. & Richardson, D. Structure of proteins: packing of alpha-helices and pleated sheets. Proc. Natl Acad. Sci. USA 74, 4130–4134 (1977).
Article CAS PubMed PubMed Central ADS Google Scholar
Kobe, B. & Kajava, A. V. When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci. 25, 509–515 (2000).
Article CAS PubMed Google Scholar
Doyle, L. et al. Rational design of alpha-helical tandem repeat proteins with closed architectures. Nature 528, 585–588 (2015).
Article CAS PubMed PubMed Central ADS Google Scholar
Walsh, S. T., Cheng, H., Bryson, J. W., Roder, H. & DeGrado, W. F. Solution structure and dynamics of a de novo designed three-helix bundle protein. Proc. Natl Acad. Sci. USA 96, 5486–5491 (1999).
Article CAS PubMed PubMed Central ADS Google Scholar
Dai, Q. H. et al. Structure of a de novo designed protein model of radical enzymes. J. Am. Chem. Soc. 124, 10952–10953 (2002).
Article CAS PubMed Google Scholar
Wei, Y., Kim, S., Fela, D., Baum, J. & Hecht, M. H. Solution structure of a de novo protein from a designed combinatorial library. Proc. Natl Acad. Sci. USA 100, 13270–13273 (2003).
Article CAS PubMed PubMed Central ADS Google Scholar
Go, A., Kim, S., Baum, J. & Hecht, M. H. Structure and dynamics of de novo proteins from a designed superfamily of 4-helix bundles. Protein Sci. 17, 821–832 (2008).
Article CAS PubMed PubMed Central Google Scholar
Calhoun, J. R. et al. Solution NMR structure of a designed metalloprotein and complementary molecular dynamics refinement. Structure 16, 210–215 (2008).
Article CAS PubMed Google Scholar
Huang, P. S. et al. High thermodynamic stability of parametrically designed helical bundles. Science 346, 481–485 (2014).
Article CAS PubMed PubMed Central ADS Google Scholar
Murphy, G. S. et al. Computational de novo design of a four-helix bundle protein–DND_4HB. Protein Sci. 24, 434–445 (2015).
Article CAS PubMed Google Scholar
Brunette, T. J. et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015).
Article CAS PubMed PubMed Central ADS Google Scholar
Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Jacobs, T. M. et al. Design of structurally distinct proteins using strategies inspired by evolution. Science 352, 687–690 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329–335 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).
Article MathSciNet CAS PubMed PubMed Central ADS Google Scholar
Polizzi, N. F. et al. De novo design of a hyperstable non-natural protein–ligand complex with sub-A accuracy. Nat. Chem. 9, 1157–1164 (2017).
Article CAS PubMed PubMed Central Google Scholar
Studer, S. et al. Evolution of a highly active and enantiospecific metalloenzyme from short peptides. Science 362, 1285–1288 (2018).
Article CAS PubMed ADS Google Scholar
Koebke, K. J. et al. Clarifying the copper coordination environment in a de novo designed red copper protein. Inorg. Chem. 57, 12291–12302 (2018).
Article CAS PubMed PubMed Central Google Scholar
ElGamacy, M., Coles, M. & Lupas, A. Asymmetric protein design from conserved supersecondary structures. J. Struct. Biol. 204, 380–387 (2018).
Article CAS PubMed Google Scholar
Chen, Z. et al. Programmable design of orthogonal protein heterodimers. Nature 565, 106–111 (2019).
Article CAS PubMed ADS Google Scholar
Silva, D. A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature 565, 186–191 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Xu, C. et al. Computational design of transmembrane pores. Nature 585, 129–134 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Sesterhenn, F. et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science 368, eaay5051 (2020).
Article CAS PubMed PubMed Central Google Scholar
Murzin, A. G. & Finkelstein, A. V. General architecture of the α-helical globule. J. Mol. Biol. 204, 749–769 (1988).
Article CAS PubMed Google Scholar
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).
Article CAS PubMed PubMed Central ADS Google Scholar
Rodriguez, A. & Laio, A. Machine learning. Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014).
Article CAS PubMed ADS Google Scholar
Richardson, J. S. & Richardson, D. C. Amino acid preferences for specific locations at the ends of α helices. Science 240, 1648–1652 (1988).
Article CAS PubMed ADS Google Scholar
Doig, A. J. & Baldwin, R. L. N- and C-capping preferences for all 20 amino acids in α-helical peptides. Protein Sci. 4, 1325–1336 (1995).
Article CAS PubMed PubMed Central Google Scholar
Efimov, A. V. A novel super-secondary structure of proteins and the relation between the structure and the amino acid sequence. FEBS Lett. 166, 33–38 (1984).
Article CAS PubMed Google Scholar
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H. et al. ECOD: an evolutionary classification of protein domains. PLoS Comput. Biol. 10, e1003926 (2014).
Article PubMed PubMed Central Google Scholar
Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).
Article CAS PubMed ADS Google Scholar
Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sheffler, W. & Baker, D. RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci. 19, 1991–1995 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rohl, C. A., Strauss, C. E., Misura, K. M. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004).
Article CAS PubMed Google Scholar
Kobayashi, N. et al. KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J. Biomol. NMR 39, 31–52 (2007).
Article CAS PubMed Google Scholar
Kobayashi, N. et al. Noise peak filtering in multi-dimensional NMR spectra using convolutional neural networks. Bioinformatics 34, 4300–4301 (2018).
Article CAS PubMed Google Scholar
Myers, J. K., Pace, C. N. & Scholtz, J. M. Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci. 4, 2138–2148 (1995).
Article CAS PubMed PubMed Central Google Scholar
Schiffer, M. & Edmundson, A. B. Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. Biophys. J. 7, 121–135 (1967).
Article CAS PubMed PubMed Central Google Scholar
Crick, F. H. C. The Fourier transform of a coiled-coil. Acta Crystallogr. 6, 685–689 (1953).
Article CAS Google Scholar
Lin, Y. R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA 112, E5478–E5485 (2015).
Article CAS PubMed PubMed Central Google Scholar
Huang, P. S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).
Article CAS PubMed ADS Google Scholar
Marcos, E. et al. Principles for designing proteins with cavities formed by curved beta sheets. Science 355, 201–206 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Marcos, E. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat. Struct. Mol. Biol. 25, 1028–1034 (2018).
Article CAS PubMed PubMed Central Google Scholar
Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).
Article CAS PubMed PubMed Central ADS Google Scholar
Klein, J. C. et al. Multiplex pairwise assembly of array-derived DNA oligonucleotides. Nucleic Acids Res. 44, e43 (2016).
Article PubMed ADS Google Scholar
Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359, 343–347 (2018).
Article CAS PubMed PubMed Central ADS Google Scholar
Basanta, B. et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl Acad. Sci. USA 117, 22135–22145 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Koradi, R., Billeter, M. & Wuthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph 14, 51–55 (1996).
Article CAS PubMed Google Scholar
Kobayashi, N. A robust method for quantitative identification of ordered cores in an ensemble of biomolecular structures by non-linear multi-dimensional scaling using inter-atomic distance variance matrix. J. Biomol. NMR 58, 61–67 (2014).
Article CAS PubMed Google Scholar
Krissinel, E. & Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D 60, 2256–2268 (2004).
Article CAS PubMed ADS Google Scholar
Minami, S., Sawada, K. & Chikenji, G. MICAN: a protein structure alignment algorithm that can handle multiple-chains, inverse alignments, C α only models, alternative alignments, and non-sequential alignments. BMC Bioinf. 14, 24 (2013).
Article CAS Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Article CAS PubMed Google Scholar
Orengo, C. A. et al. CATH–a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).
Article CAS PubMed Google Scholar
Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).
Jansson, M. et al. High-level production of uniformly N-15- and C-13-enriched fusion proteins in Escherichia coli. J. Biomol. NMR 7, 131–141 (1996).
Article CAS PubMed Google Scholar
Santoro, M. M. & Bolen, D. W. Unfolding free energy changes determined by the linear extrapolation method. 1. Unfolding of phenylmethanesulfonyl α-chymotrypsin using different denaturants. Biochemistry 27, 8063–8068 (1988).
Article CAS PubMed Google Scholar
Hassanieh, H., Mayzel, M., Shi, L., Katabi, D. & Orekhov, V. Y. Fast multi-dimensional NMR acquisition and processing using the sparse FFT. J. Biomol. NMR 63, 9–19 (2015).
Article CAS PubMed Google Scholar
Schmidt, E. & Guntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134, 12817–12829 (2012).
Article CAS PubMed Google Scholar
Shen, Y., Delaglio, F., Cornilescu, G. & Bax, A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44, 213–223 (2009).
Article CAS PubMed PubMed Central Google Scholar
Guntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62, 453–471 (2015).
Article PubMed Google Scholar
Zweckstetter, M. & Bax, A. Prediction of sterically induced alignment in a dilute liquid crystalline phase: aid to protein structure determination by NMR. JACS 122, 3791–3792 (2000).
Article CAS Google Scholar
Farrow, N. A. et al. Backbone dynamics of a free and phosphopeptide-complexed Src homology 2 domain studied by 15 N NMR relaxation. Biochemistry 33, 5984–6003 (1994).
Article CAS PubMed Google Scholar
Hwang, T. L., Mori, S., Shaka, A. J. & vanZijl, P. C. M. Application of phase-modulated CLEAN chemical EXchange spectroscopy (CLEANEX-PM) to detect water–protein proton exchange and intermolecular NOEs. JACS 119, 6203–6204 (1997).
Article CAS Google Scholar
Hwang, T. L., van Zijl, P. C. & Mori, S. Accurate quantitation of water-amide proton exchange rates using the phase-modulated CLEAN chemical EXchange (CLEANEX-PM) approach with a Fast-HSQC (FHSQC) detection scheme. J. Biomol. NMR 11, 221–226 (1998).
Article CAS PubMed Google Scholar
Kabsch, W. Xds. Acta Crystallogr. D 66, 125–132 (2010).
Article CAS PubMed PubMed Central ADS Google Scholar
Vagin, A. & Teplyakov, A. Molecular replacement with MOLREP. Acta Crystallogr. D 66, 22–25 (2010).
Article CAS PubMed ADS Google Scholar
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126–2132 (2004).
Article PubMed ADS Google Scholar
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010).
Article CAS PubMed PubMed Central ADS Google Scholar
Lovell, S. C. et al. Structure validation by Cα geometry: ϕ,ψ and Cβ deviation. Proteins 50, 437–450 (2003).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank RIKEN Yokohama NMR Facility for NMR measurements, the Functional Genomics Facility, NIBB Core Research Facilities, especially Y. Makino, for mass spectrometry analysis, and the Instrument Center, Okazaki, Japan, especially M. Nakano, for HSQC spectra measurements. We also thank M. Yamamoto for experimental assistance, S. Minami for advice on structure similarity comparison and useful discussions, and Y. Ishii and S. Akiyama for valuable comments on the paper. The computations were performed using the Research Center for Computational Science (RCCS), Okazaki, Japan (Project: 21-IMS-C174, 20-IMS-C157, 19-IMS-C175, 18-IMS-C155, 17-IMS-C147, 16-IMS-C129 and 15-IMS-C180). Three-dimensional structure determination was supported by Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS) from AMED under grant numbers JP19am0101072 and JP20am0101083. The synchrotron radiation experiments were performed at Photon Factory (proposals 2016G-048). We also thank the beamline staff at BL1A of Photon Factory (Tsukuba, Japan) for help during data collection. This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grants-in-Aid for Scientific Research 15H05592 to N. Koga, 18H05420 to T.K. and N. Koga, 18H05425 to T.M. and 18K06152 to Naohiro K., the Japan Science and Technology Agency (JST) Precursory Research for Embryonic Science and Technology (PRESTO, grant number JPMJPR13AD to N. Koga) and the JST-Mirai Program (JPMJMI17A2 to Naohiro K.). K. Sakuma was also supported by JSPS KAKENHI Grant-in-Aid for JSPS Research Fellow 15J02427.

Author information

Nobuyasu Koga
Present address: Institute for Protein Research, Osaka University, Suita, Japan
These authors contributed equally: Koya Sakuma, Naohiro Kobayashi.

Authors and Affiliations

Department of Structural Molecular Science, School of Physical Sciences, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Japan
Koya Sakuma, Takahiro Kosugi & Nobuyasu Koga
RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
Naohiro Kobayashi & Toshio Nagashima
Institute for Protein Research, Osaka University, Suita, Japan
Naohiro Kobayashi, Toshihiko Sugiki & Toshimichi Fujiwara
Department of Chemistry, Graduate School of Science, Chiba University, Chiba, Japan
Kano Suzuki & Takeshi Murata
Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of National Sciences, Okazaki, Japan
Naoya Kobayashi, Takahiro Kosugi, Rie Tatsumi-Koga & Nobuyasu Koga
Membrane Protein Research Center, Chiba University, Chiba, Japan
Takeshi Murata
Structural Biology Research Center, Institute of Materials Structure Science, High Energy Accelerator Research Organization (KEK), Tsukuba, Japan
Takeshi Murata
Research Center of Integrative Molecular Systems, Institute for Molecular Science, National Institutes of National Sciences, Okazaki, Japan
Takahiro Kosugi & Nobuyasu Koga

Authors

Koya Sakuma
View author publications
You can also search for this author in PubMed Google Scholar
Naohiro Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiko Sugiki
View author publications
You can also search for this author in PubMed Google Scholar
Toshio Nagashima
View author publications
You can also search for this author in PubMed Google Scholar
Toshimichi Fujiwara
View author publications
You can also search for this author in PubMed Google Scholar
Kano Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Naoya Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Murata
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Kosugi
View author publications
You can also search for this author in PubMed Google Scholar
Rie Tatsumi-Koga
View author publications
You can also search for this author in PubMed Google Scholar
Nobuyasu Koga
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K. Sakuma and N. Koga designed the research. K. Sakuma analyzed natural proteins and performed computational design work. K. Sakuma wrote the program code. K. Sakuma, T.K. and R.T.-K. expressed, purified, characterized the designed proteins by biochemical assay, and prepared protein samples for NMR: H5_fold-0, H6_fold-C and H6_fold-Z, by K. Sakuma, and H6_fold-U and H7_fold-K, by T.K. and R.T.-K. For NMR structure determination, Naohiro K., T.S. and T.N. collected data: H5_fold-0_Chantal, H5_fold-C_Rei and H6_fold-Z_Gogy, by T.S., and H6_fold-U_Nomur and H7_fold-K_Mussoc, by Naohiro K. and T.N. Naohiro K. performed structural analysis. For NMR structural dynamics of H6_fold-U_Nomur, Naohiro K. and T.N. collected and analyzed data. For crystal structure determination, Naoya K. prepared protein samples and K. Suzuki, with advice from T.M., performed crystallization and structural analysis. K. Sakuma, Naohiro K., T.F., K. Suzuki, T.M., T.K., R.T.-K. and N. Koga wrote the paper.

Corresponding author

Correspondence to Nobuyasu Koga.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks John Anderson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Sara Osman, in collaboration with the Nature Structural & Molecular Biology team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Twenty-two de novo designed proteins collected from Protein Data Bank (PDB).

a, Structures and their PDB IDs of the twenty-two de novo designed proteins collected from PDB. b, The helix order (HO) values of the designs were plotted in the HO histograms for naturally occurring all-α proteins (these histograms are identical to those in Fig. 1).

Source data

Extended Data Fig. 2 The five features representing the HLH tertiary geometry.

a-d, For representing tertiary geometries of HLH units, the following angles, θ_N, θ_C, ϕ_NC, ϕ_NL, and ϕ_CL, were identified using the V_N, V_C, V_L, V_HN, and V_HC vectors (these vectors are calculated using Cα atoms). a, The definitions of θ_N and θ_C. V_N and V_C respectively represent the helix vectors for the N- and C- terminal helices in a HLH geometry, which are calculated using the equations proposed by Krissinel et al.⁵³. V_L is the loop vector from the last Cα atom (blue) in the N-terminal helix to the first Cα atom (red) in the C-terminal helix. θ_Ν was identified as the angle between the V_N and V_L vectors; θ_C, was identified as the angle between the V_C and V_L vectors. b, The definitions of ϕ_NC· ϕ_NC was identified as the dihedral angle between the plane defined with the V_N and V_L vectors and that with the V_C and V_L vectors. c, The definition of ϕ_NL· V_HN is the helix spiral vector at the end of the N-terminal helix, which was identified as the vector pointed to the last Cα atom (blue) in the N-terminal helix from the Cα atom immediately before the last Cα atom (white). ϕ_NL was identified as the dihedral angle between the plane defined with the V_HN and V_L vectors and that with the V_HN and V_N vectors. d, The definition of ϕ_CL· V_HC is the helix spiral vector at the beginning of the C-terminal helix, which was identified as the vector from the first Cα atom (red) in the C-terminal helix to the Cα atom immediately after the first Cα atom (yellow). ϕ_CL was identified as the dihedral angle between the plane defined with the V_C and V_HC vectors and that with the V_L and V_HC vectors.

Extended Data Fig. 3 Statistical analysis of HLH motifs of naturally occurring proteins.

HLH structures collected from naturally occurring protein structures were clustered for each loop length from one to five based on the pairwise Euclidean distance between the five-dimensional vectors of the features shown in Extended Data Fig. 2, using the density clustering algorithm³⁰. For each loop length, decision graphs to determine density peaks of clusters are shown, in which rho represents the local density of a point in the five-dimensional feature vector space and delta represents the minimum distance between a point to any other point with higher density; for the point with highest density, delta is calculated as the maximum distance to any other points.

Source data

Extended Data Fig. 4 Mapping of 18 representative HLH motifs on the ϕ_NC-ϕ_NL plane.

The loop numbers with their ABEGO torsion patterns correspond to the 18 representative HLH motifs shown in Extended Data Fig. 3.

Source data

Extended Data Fig. 5 ABEGO-based loop geometries and amino acid sequence preferences of the cluster that each HLH motif belongs to.

(Left) The 18 representative HLH motifs are shown as in Extended Data Fig. 3. (Middle) ABEGO torsion patterns of the loop. This result suggests that the relative arrangements of adjacent helices strongly limit the torsion patterns of the connecting loop. (Right) Amino acid sequence preferences of each HLH motif. The first residue of the loop is indicated by an arrow.

Source data

Extended Data Fig. 6 Examples of compact structures obtained from the enumeration of six-helix structures.

The structures are sorted by their HO values from top left to bottom right. The top left structure shows the smallest HO value and has irregularly packed α-helices, whereas the bottom right one shows the highest HO value and has parallelly aligned α-helices. The five designed topologies are shown together by enclosed squares.

Source data

Extended Data Fig. 7 Distribution of the helix lengths for the generated backbone structures with four-helix and 70 residues, five-helix and 90 residues, and six-helix and 110 residues.

Left: The distribution of mean helix lengths for the generated backbone structures. For each backbone structure, the mean helix length was calculated by averaging lengths of the helices in the structure. Middle: The distribution of standard deviation of helix lengths for the generated backbone structures. For each backbone structure, the standard deviation of lengths of the helices in the structure was calculated. Right: The two-dimensional distribution of the mean and standard deviation of helix lengths. The width of the distribution of the standard deviation was in the order of the four-helix, five-helix, and six-helix structures. This is because the four-helix structures were not subject to the Rg constraint, and the five- and six- helix structures were the ones with Rg < 14 Å. Since the same threshold value for the Rg constraint was used, the distribution width for the five-helix structures is slightly wider than that of the six-helix structures. The helix lengths of the designs chosen for experimental characterization correspond to the vicinity of the peaks of the distributions.

Source data

Extended Data Fig. 8 Comparison of designed structures and the most similar naturally occurring proteins.

The designed structures (left) and the most similar ones (right) with pdb ids and TM-score values.

Extended Data Fig. 9 Comparison of computational models (left) with experimentally determined structures (right).

Hydrophobic core residues are shown in stick. Bulky hydrophobic side chains from loops and the neighboring α-helices, which spiked the core and pinned the loops to the target conformations, are shown in yellow.

Extended Data Table 1 Summary of experimental results for designed proteins

Full size table

Supplementary information

Supplementary Information

Supplementary Text, Figs. 1–12 and Tables 1–9.

Reporting Summary

Peer Review File

Supplementary Data 1

Computational design models of the designed all-α proteins.

Supplementary Data 2

Source data for Supplementary Fig. 11.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Figs. 4 and 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sakuma, K., Kobayashi, N., Sugiki, T. et al. Design of complicated all-α protein structures. Nat Struct Mol Biol 31, 275–282 (2024). https://doi.org/10.1038/s41594-023-01147-9

Download citation

Received: 10 September 2021
Accepted: 04 October 2023
Published: 04 January 2024
Issue Date: February 2024
DOI: https://doi.org/10.1038/s41594-023-01147-9

Subjects

Abstract

Similar content being viewed by others

Main

Strategy for all-α topology building

A typical set of HLH motifs as building blocks

Generation of all-α topologies by combinatorial enumeration

Design of complicated α-helical topologies

Experimental characterization of designed proteins

Discussion

Methods

Definition of HO

Analysis of all-α protein structures for de novo designed and naturally occurring proteins

Clustering of HLH units using the five features representing a HLH geometry

Building backbone structures

Selection of designs for experimental characterization based on the shapes of energy landscapes

Expression and purification of designed proteins

Experiments to identify designed proteins exhibiting folding ability

CD spectroscopy under 1-bar pressure

SEC–MALS

1H-15N HSQC NMR spectroscopy

High-pressure CD spectroscopy for melting temperature (T m) estimation

CD spectroscopy for chemical denaturation

Sample preparation for NMR structure determination

Solution structure determination by NMR

NMR measurements

NMR signal assignments

Structure calculation

NMR structure validation

Solution structural dynamics of H6_fold-U_Nomur measured by NMR

15N R 1, R 2 and 15N-{1H} NOE experiments

2D 1H-15N CLEANEX-PM FHSQC experiments

X-ray structure determination of H5_fold-0_Elsa

Sample preparation for X-ray structure determination

Crystallization and X-ray structure determination

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

¹H-¹⁵N HSQC NMR spectroscopy

High-pressure CD spectroscopy for melting temperature (T _m) estimation

¹⁵N R ₁, R ₂ and ¹⁵N-{¹H} NOE experiments

2D ¹H-¹⁵N CLEANEX-PM FHSQC experiments