Main

Most proteins fold co-translationally during biosynthesis on the ribosome1. There is increasing evidence of a direct role for the ribosome in regulating folding of the nascent chain2,3,4,5,6,7,8,9,10, with increasing clarity on how it interacts with the elongating nascent chain7,11,12,13, which is thought to contribute to alterations to nascent chain thermodynamic stability6,7,8,9,10 and folding and unfolding rates5,8. Consequently, co-translational folding (coTF) differs from in vitro refolding studies of analogous, isolated counterparts3,5,6,14,15,16, with unique intermediate conformations in coTF3,5,6,14,15,17, folding in the absence of the complete protein sequence4,14, and the ability of the ribosome to mitigate misfolding-prone destabilizing mutations4 among the many discriminating observations whose origins remain poorly understood. This is a crucial gap in our understanding of proteostasis as many proteins reach an active conformation following coTF, whereas post-translational unfolding–refolding in the cell is generally avoided owing to high kinetic stabilities, and when proteins are unfolded (in vitro), they often do not refold spontaneously, but instead misfold and aggregate1,18,19.

In contrast to refolding studies, the unfolded state on the ribosome exists under native conditions7, and is adopted by all proteins during early biosynthesis. The ribosome-bound unfolded state has not been characterized in structural detail owing to technical challenges, yet is likely to be crucial to understanding folding thermodynamics and pathways20,21,22. Here, using paramagnetic relaxation enhancement (PRE) NMR spectroscopy (PRE-NMR) combined with atomistic molecular dynamics simulations, we have determined structural ensembles of the unfolded state and found that the ribosome structurally expands the conformational ensemble. We infer an entropically driven destabilization of the unfolded state on the ribosome relative to in isolation arising primarily from the increased solvation of the more expanded ensemble. Experiments show that this results in the ribosome reducing the entropic penalty of protein folding by up to around 30 kcal mol−1. Despite previous suggestions that interactions between nascent chains and the ribosome surface influence folding kinetics and thermodynamics5,6,7, we show here that these interactions account for a minor fraction of the energetic changes observed between protein folding on and off the ribosome. Instead, we establish that the entropic destabilization of the unfolded state provides the fundamental basis for why protein folding on the ribosome is distinct to refolding in vitro.

Structures of the unfolded state

We investigated the unfolded state of a model immunoglobulin-like domain, FLN52,6,7,11,23,24,25, and determined a set of structural ensembles on and off the ribosome. FLN5 folds reversibly in isolation, thus also facilitating detailed, quantitative comparisons of post-translational folding versus coTF thermodynamics6,7,11,23,24. The variant FLN5 A3A3 (Extended Data Fig. 1a) enables the characterization by NMR of conformational and dynamic preferences of unfolded FLN5 without the complication of folding7,24. For the ribosome–nascent chain complex (RNC), FLN5 A3A3 is tethered to the ribosome peptidyl transferase centre (PTC) via a 31-amino acid linker (FLN5+31 A3A3), comprising the subsequent FLN6 domain and SecM stalling sequence (Extended Data Fig. 1a). This construct has the entire FLN5 sequence emerged from the ribosomal exit tunnel and is the earliest linker length at which some folding is observed in wild-type FLN56,11. PRE-NMR experiments of FLN5 A3A3 showed less broadening for the RNC compared with the isolated proteins (Fig. 1a,b, Extended Data Figs. 1 and 2 and Methods), suggesting that the conformational ensemble is less compact on the ribosome (Supplementary Notes 14). Restraints obtained from these experiments were used to reweight all-atom molecular dynamics simulations with explicit solvent of the unfolded states (Methods and Supplementary Notes 57). Molecular dynamics simulations of the isolated protein were initially used to identify a suitable force field for this protein (Extended Data Fig. 3 and Supplementary Note 5) and were subsequently validated against the radius of hydration (Rh), NMR chemical shifts, residual dipolar couplings (RDCs) and small-angle X-ray scattering (SAXS) data (Extended Data Fig. 4 and Supplementary Note 6). The simulations exhibited good convergence with respect to the overall compactness, secondary structure and long-range contacts in the ensembles (Supplementary Note 7). The reweighted structural ensembles are in good agreement with both the PRE-NMR and validation data on and off the ribosome (Extended Data Figs. 4 and 5).

Fig. 1: The ribosome modulates the conformational ensemble of the unfolded state.
figure 1

a, Exemplar regions of a 1H−15N HMQC NMR spectrum of isolated FLN5 A3A3 spin-labelled at C740 (right) and FLN5+31 A3A3 C740 (left) with the paramagnetic and diamagnetic spectrum overlaid. NMR data were recorded at 800 MHz, 283 K. b, PRE-NMR intensity ratio profiles (intensity in the paragmagnetic, Ipara, and diamagnetic, Idia, spectrum; fitted mean ± root mean square error (RMSE) propagated from spectral noise) for isolated FLN5 A3A3 spin-labelled at C740 (black) and FLN5+31 A3A3 C740 (RNC; blue). Theoretical reference profiles expected for a fully extended polypeptide are shown as dashed lines (Methods). The secondary structure elements (β-strands) of native FLN5 are indicated at the top. The shaded region at the C terminus represents the region of FLN5 that is broadened beyond detection owing to ribosome interactions7 (N730–K746, in the RNC). c, Representative ensembles of unfolded FLN5 on and off the ribosome. The structures shown represent the top 50 cluster centroids (to scale; clustering details in Methods) after reweighting with the PRE-NMR data. d, Top 10 individual structural clusters on and off the ribosome labelled with their respective population (not to scale). e, Distributions of the radius of gyration (Rg) for both ensembles for the residues belonging to the FLN5 domain (M637–G750). The shaded area represents the s.e.m. estimated from block averaging. f,g, Average inter-residue contact maps of isolated (f) and ribosome-bound (g) FLN5 (zoomed to a probability of 0.1 for ease of visualization). The black contours represent contacts formed in the native state of FLN5. h, Average inter-residue long-range (defined here as a separation of at least ten residues) contact probability along the protein sequence (shaded regions represent s.e.m. from block averaging).

Source data

Both structural ensembles of the unfolded state on and off the ribosome display heterogeneity (Fig. 1c,d). An analysis of the main structural clusters reveals that the isolated ensemble samples more compact and spherical states (Fig. 1d and Extended Data Fig. 6i) with the radius of gyration of the nascent chain increasing by approximately 26% on the ribosome from 34.9 ± 1.0 Å to 44.1 ± 1.8 Å (Fig. 1e). This structural expansion (throughout this Article, ‘expansion’ refers to structural expansion) of the ensemble is partly caused by steric exclusion from and tethering to the ribosome, but additional factors also contribute (Supplementary Note 8). Owing to the expansion, the amount of β-strand secondary structure in the RNC ensemble decreases along the entire sequence from 3.2 ± 0.5% to 1.1 ± 0.3% in total (Extended Data Fig. 5f) and fewer contacts are observed compared with the isolated protein (0.4 ± 0.1% and 1.0 ± 0.2% on average, respectively; Fig. 1h). Most of these transient contacts are non-native (Fig. 1f,g) and only 1.4 ± 0.2% and 1.0 ± 0.1% of native contacts are formed off and on the ribosome, respectively. Long-range contacts are particularly reduced at the C terminus (residues N730–G750) of FLN5 A3A3 (Fig. 1h), which in turn is bound to the ribosome surface around 80% of the time7 (Extended Data Fig. 5g,h). These nascent chain–ribosome interactions are driven predominantly by electrostatic effects and mediated via ribosomal RNA and RNA-bound Mg2+ ions (Extended Data Fig. 5i,j), whereas contacts within the unfolded protein itself occur more frequently between hydrophobic amino acids (Extended Data Fig. 5k). This structural analysis demonstrates that the ribosome significantly affects the global structural properties of the unfolded state.

Entropic destabilization on the ribosome

We utilized our structures of the unfolded state on and off the ribosome to estimate their effect on folding energetics from an enthalpic (ΔH) and entropic (ΔS) point of view, both of which determine the folding free energy (ΔG = ΔH – TΔS). Ribosome interactions have been shown to modulate folding thermodynamics and these interactions with the unfolded FLN5+31 A3A3 nascent chain result in a destabilization of the folding free energy7 (ΔΔGN-U,RNC-iso; where N, U, and iso are native state, unfolded state and isolated protein, respectively) by +1 kcal mol−1.

We explored whether the overall entropy of the unfolded state changes on the ribosome compared to off the ribosome (ΔSRNC-iso). Using our molecular dynamics ensembles, we analysed the protein conformational entropy (ΔSconf) and solvation entropy (ΔSsolv) (Methods), which together comprise the total entropy change (ΔS = ΔSconf + ΔSsolv). A residue-specific analysis of the unfolded state shows distinct expanded regions on the ribosome (having an increased local radius of gyration; Fig. 2a). The same nascent chain regions (for example, residues G700–N740) also show a significant reduction in the conformational entropy on the ribosome (Fig. 2b) and more restricted sampling of the Ramachandran map (Extended Data Fig. 6e), which is consistent with a more elongated shape of the nascent chain ensemble26 (Fig. 1b and Extended Data Fig. 6h). We observed this decrease in conformational entropy orthogonally through a cluster analysis, revealing fewer accessible conformational states on the ribosome relative to off the ribosome (Extended Data Fig. 6a,b). Notably, the entropic destabilization is observed even for residues distal to the ribosome (for example, V664–I674), showing that the ribosome exerts a long-range entropic effect that arises from more than ribosome interactions alone (Fig. 2b). The conformational restriction imposed by the ribosome is estimated to globally destabilize the unfolded state relative to the isolated unfolded protein (−TΔSRNC-iso,conf) by at least +2 kcal mol−1 at 298 K (Methods and Extended Data Fig. 6g).

Fig. 2: The unfolded state is entropically destabilized on the ribosome.
figure 2

a, The local radius of gyration along the sequence (21-residue moving average). The shaded regions correspond to residues K646–S650 (left), G660–K680 (middle) and G700–N740 (right). b, Difference (RNC-iso) in the total conformational entropy using 50 bins (Methods). The s.e.m. was estimated from block averaging (using a 7.5 μs sampling block size). Bars are coloured according to the gradient in d. c, Difference in the maximum theoretical SASA of the unfolded state (RNC-iso; Methods). The s.e.m. was estimated from block averaging. The bars are coloured according to the gradient in e. d, Entropy difference between the RNC and isolated ensembles mapped onto representative ensembles of FLN5+31 A3A3. e, Changes in SASA between the RNC and isolated ensembles is mapped onto the representative ensembles of FLN5+31 A3A3. f, Bar chart (mean ± s.e.m.) summarizing the energetic changes between the unfolded state on and off the ribosome (RNC-iso) at 298 K. All quantities are estimated based on the molecular dynamics ensemble averages, except the ‘ribosome binding’ contribution, which was experimentally determined7. Errors were combined from block averaging and empirical parameter uncertainties (Methods). g, 19F NMR spectra of the folding equilibrium of FLN5 labelled with 4-trifluoromethyl-l-phenylalanine (tfmF) at position 655 on and off the ribosome at 288 K and 298 K. Native (N), unfolded (U) and intermediate states on (I1, I2) and off (Iiso) the ribosome are indicated. h, Temperature dependence of the folding equilibrium constant (Keq) of FLN5 on and off the ribosome measured by 19F NMR (mean ± s.e.m.). Data were fit to a modified Gibbs–Helmholtz equation (Methods). l, Thermodynamic parameters (mean ± s.d. from fits, T = 298 K) calculated from the nonlinear fit in h.

Source data

An increase in solvation entropy has long been described as the major driving force of the hydrophobic collapse in protein folding27. The solvation entropy of the unfolded state was thus explored by analysing the solvent-accessible surface area (SASA) of FLN5. The SASA was significantly increased on the ribosome compared to off the ribosome (+6 ± 1 nm2 in total; Extended Data Fig. 6i), particularly in regions where the nascent chain is locally expanded (Fig. 2c). On the basis of the changes in SASA, we estimated the resulting solvent entropy changes (Methods and Supplementary Note 9). These calculations show a reduced solvation entropy which further destabilizes the unfolded state on the ribosome (−TΔSRNC-iso,solv) by +11 ± 4 kcal mol−1 at 298 K. Both the conformational and solvation entropy are thus globally reduced throughout the RNC ensemble (Fig. 2d,e). This results in a combined entropic destabilization of 13 ± 4 kcal mol−1, outcompeting both enthalpic gains in stability due to ribosome interactions and increased solvation (ΔHRNC-iso,solv = −3 ± 4 kcal mol−1). The net increase in free energy of the unfolded state on compared to off the ribosome is therefore expected to be +9 ± 6 kcal mol−1 (Fig. 2f) at this nascent chain length. The strong contribution of solvation entropy effects was also verified using direct entropy calculations, resulting in an estimate of approximately 30 ± 10 kcal mol−1 at 298 K (Supplementary Note 10).

Measurements of folding thermodynamics

To experimentally consider these entropic effects, we sought to determine ΔS and ΔH of folding by investigating the temperature dependence of folding for wild-type FLN5. The FLN5+34 RNC and the C-terminal truncation FLN5Δ6 variant24 as the analogous isolated protein were selected, both of which enable the simultaneous observation of the unfolded and native states by 19F NMR spectroscopy6 within the same temperature range (278 K–303 K). Under these conditions, we also observe two folding intermediates in the FLN5+34 RNC (I1 and I2) and one intermediate in the isolated FLN5Δ6 variant6,24 (Iiso) (Fig. 2g and Extended Data Fig. 7a,b).

The 1D 19F NMR spectra were fitted to determine the population of each species, enabling quantification of thermodynamic parameters from a nonlinear fit of the equilibrium constant as a function of temperature (Fig. 2h and Methods). Both on and off the ribosome, the apparent enthalpy of folding (ΔHN-U) is negative, whereas the apparent entropy of folding (−TΔSN-U) is positive—that is, the folding reaction is enthalpy-driven to compensate for an unfavourable entropic penalty. The heat capacity of folding (ΔCp,N-U) obtained for the isolated protein (−1.7 ± 0.3 kcal mol−1 K−1) is as expected on the basis of protein size28 (−1.5 ± 0.2 kcal mol−1 K−1), but increases on the ribosome (ΔΔCp,N-U,RNC-iso = +0.9 ± 0.4 kcal mol−1 K−1), presumably owing to the increased water ordering and local ion concentration near the ribosome surface29. These experiments also show the temperature dependence of folding of the RNC to be significantly attenuated compared to the corresponding isolated protein (Fig. 2g,h) with the magnitudes of ΔH and −TΔS being strongly reduced on the ribosome (ΔΔHN-U,RNC-iso = +32.9 ± 3.2 kcal mol−1, −TΔΔSN-U,RNC-iso = −34.5 ± 3.2 kcal mol−1 at 298 K; Fig. 2i). Folding on the ribosome is consequently less enthalpically driven but also exhibits a lower entropic penalty (Fig. 2i). The reduction in −TΔS on the ribosome experimentally confirms the predicted entropic destabilization of the unfolded state and is within the range of the estimated solvation entropy change based on a solvation analysis of our molecular dynamics ensembles (30 ± 10 kcal mol−1; Supplementary Note 10). Of note, folding from the intermediate state(s) to the native state is only marginally sensitive to temperature, both on and off the ribosome (Extended Data Fig. 7c,d), corroborating that the entropic differences originate predominantly from modulation of the unfolded state. The less negative ΔH on the ribosome must therefore predominantly result from the destabilization of the native state on the ribosome relative to off the ribosome6 (Supplementary Note 11). Our thermodynamic experiments thus clearly show that the expansion of the unfolded state results in the lowering of the entropic penalty of folding relative to the isolated protein.

Entropy effects are sequence-independent

Given the strong interactions of the unfolded FLN5 nascent chain with the negatively charged ribosome surface as observed in our structures, we next examined its effect on the large folding enthalpy and entropy differences on and off the ribosome. We performed 19F NMR experiments of a polyglutamate mutant (E6) (Extended Data Fig. 7e,f), which has reduced ribosome interactions7 (from 85 ± 5% to 10 ± 2%). Large changes in folding enthalpy and entropy (relative to an analogous isolated protein) are still observed and only marginally reduced relative to wild-type (ΔΔHN-U,RNC-iso = +22.6 ± 5.5 kcal mol−1, −TΔΔSN-U,RNC-iso = −20.6 ± 5.5 kcal mol−1 at 298 K, Extended Data Fig. 7g,h). These results show that ribosome interactions only partially contribute to the large change in coTF energetics. This is consistent with the entropic effects originating predominantly from the increased hydration of the expanded nascent chain (Fig. 2f), suggesting that this phenomenon may be sequence-independent.

Persistence during biosynthesis

We reasoned that the structural expansion of the unfolded state and re-balanced enthalpy–entropy of coTF should decrease in magnitude as the nascent chain elongates and the distance between FLN5 and the ribosome surface increases. To test this, we performed PRE-NMR experiments on the unfolded nascent chain of two longer FLN5 RNCs (FLN5+47 A3A3 and FLN5+67 A3A3; Extended Data Fig. 8a–c). The measured PRE intensity ratios decrease with increasing nascent chain length (Fig. 3a), showing that the expansion decreases as expected (see Supplementary Note 2). However, the intensity ratios remain higher than those of the isolated protein, indicating that the unfolded nascent chain remains more expanded on the ribosome at all RNC lengths tested, highlighting the long-range effect that the ribosome exerts on nascent chain structure. We next measured the enthalpy and entropy of folding of FLN5+67 using our 19F NMR approach (Fig. 3b and Extended Data Fig. 8d–i). Correlating with the decreased structural expansion of the unfolded state at FLN5+67 (relative to FLN5+31), we observed that the change in folding entropy on the ribosome persists but is reduced from −34.5 ± 3.2 kcal mol−1 at FLN5+34 to −10.1 ± 2.8 kcal mol−1. Likewise, the enthalpy of folding becomes more favourable as the nascent chain elongates from FLN5+34 to FLN5+67 and the native state becomes less destabilized further away from the ribosome surface6,9,30,31. We conclude that the thermodynamic effects persist during biosynthesis but progressively decrease in magnitude. These experiments also establish a direct relationship between the structure of the unfolded nascent chain and coTF thermodynamics.

Fig. 3: Entropic destabilization persists at long linker lengths and leads to stable coTF intermediates.
figure 3

a, PRE-NMR intensity ratio profiles are shown for the C740 labelling site (black star) window averaged over three residues for isolated FLN5 A3A3, FLN5+31 A3A3 and FLN5+67 A3A3 as in Fig. 1b. b, Left, temperature dependence of the folding equilibrium constants of isolated FLN5Δ6, FLN5+34 (wild-type) and FLN5+67 measured by 19F NMR (mean ± s.e.m.) fit to a modified Gibbs–Helmholtz equation (Methods). The error bars of individual datapoints are similar in magnitude to the size of the circles. Right, Thermodynamic parameters (mean ± s.d., T = 298 K) calculated from the nonlinear fits and shown as the difference relative to the isolated protein. c, Folding free energies (mean ± s.e.m. propagated from NMR lineshape fits) of the coTF intermediates I1 and I2 at different linker lengths (x axis) (ref. 6 and Extended Data Fig. 8o). The folding free energy of the isolated FLN5Δ6 intermediate is shown as a horizontal line. The contributions to the stabilization of the intermediates on the ribosome due to ribosome binding (ΔΔGbinding = RTln(1 − pB), where pB is fraction bound) and entropy (ΔΔGentropy = ΔΔGI-U,RNC-iso − ΔΔGbinding) are shown as vertical bars. The ribosome-bound population was estimated using a τR,bound (rotational correlation time of the bound state) of 3,003 ns (\({S}_{{\rm{bound}}}^{2}\) = 1.0; order parameter of the bound state). d, Model of the free energy landscape of folding on and off the ribosome. The unfolded (U) state is destabilized on the ribosome relative to in isolation, outcompeted by stabilizing ribosome interactions. I2 is stabilized by less than 0.1 kcal mol−1 on the ribosome owing to interactions (see c) from its folding free energy of at least ∆GI(iso)-U(iso) (that is, the stability of Iiso, owing to the structural similarity6 between Iiso and I2; lower bound estimate). 19F NMR has shown that the native state is destabilized relative to U6.

Source data

We then explored whether the entropic destabilization of the unfolded nascent chain during biosynthesis could rationalize the observed differences in the folding of FLN5 on and off the ribosome, common to other proteins3,5,8,9,10,17,30,31,32,33. Whereas the native state is destabilized on the ribosome relative to the native state in isolation6,7,25 (Extended Data Fig. 7i–k), FLN5 paradoxically populates two coTF intermediates that are significantly more stable than the single intermediate found in isolation (Iiso of FLN5∆6; Fig. 2g) and which are completely undetectable in full-length isolated FLN56,24. The stabilities of the coTF intermediates are modulated by the nascent chain length, such that at FLN5+47, their stabilities are more than 4 kcal mol−1 greater than that of Iiso (relative to their respective unfolded states6) (Fig. 3c). We quantified the contribution of ribosome binding to stabilizing the coTF intermediates by estimating the population of intermediates bound to the ribosome surface based on their measured rotational correlation times—that is, how fast the domain tumbles in solution (Extended Data Fig. 8j–p and Methods). These experiments indicate that such binding can only account for less than 0.1 kcal mol−1 of stabilization on the ribosome at FLN5+47 (Fig. 3c). Therefore, ribosome interactions contribute only weakly to stabilizing coTF intermediates. These measurements are also consistent with the observed persistence of the intermediates within a broad folding transition6 (that is, from approximately FLN5+31 to FLN5+67) and in a range of conditions that disrupt or reduce ribosome–nascent chain interactions, including changes in the distance from the ribosome (Fig. 3c), high concentrations of salt and urea, nascent chain and ribosome surface mutations2,6 (Extended Data Fig. 7e–h), and temperature (Fig. 2g,h).

We next built a model of the free energies of coTF by comparison to the isolated protein. As the most stable intermediate off the ribosome, Iiso, is structurally similar to I2 (ref. 6), we used our measurements of binding energies (Fig. 3c) to link the relative free energies of FLN5 on and off the ribosome (Fig. 3d). From this thermodynamic analysis, we can infer that the unfolded state in the FLN5+42 RNC (the longest linker length at which an unfolded population is observed6) is destabilized by at least 2.7 kcal mol−1 relative to the isolated unfolded protein (ΔΔGU,RNC-iso; Fig. 3d). Together, we conclude that the ribosome persistently destabilizes unfolded and folded FLN5 during biosynthesis to promote the formation of partially folded intermediates.

Thermodynamic effects across proteins

As the entropic effects are at least partly sequence-independent (Extended Data Figs. 5l,m and 7), we examined whether the reduction of the folding entropy penalty on the ribosome and its implications for coTF are also observed for other proteins. We investigated the folding of a structurally homologous domain, titin I27 (the 27th immunoglobulin-like domain of titin; Fig. 4a) and the common oncoprotein HRAS34, a GTPase protein with an α/β-fold35 (Fig. 4b). In isolation, I27 has been shown to fold reversibly36, which, as for FLN5, enables thermodynamic comparisons of folding on and off the ribosome. I27 exhibits two-state folding behaviour in urea (Fig. 4c) but populates one high-energy intermediate in a destabilized mutant (Iiso; Extended Data Fig. 9a). Although a previous study suggested that the ribosome does not affect folding of I2737, our results show two folding intermediates being stabilized on the ribosome (Fig. 4c, I1 and I2). Similarly, 19F NMR spectra of HRAS also show the population of stable coTF intermediates, even before complete translation, that are not populated in isolation (Fig. 4d). The coTF intermediates of both proteins are partially folded, since they are completely destabilized by mutations that disrupt the native hydrophobic core37 (Extended Data Fig. 9c,e,f). Furthermore, as observed for FLN5, the temperature dependence of folding is reduced for I27 on the ribosome (Fig. 4e,f and Extended Data Fig. 9a,b), with a reduced enthalpy of folding (ΔΔHN-U,RNC-iso = +28.8 ± 10.1 kcal mol−1 at 298 K) and a lower entropic penalty of at least 18 kcal mol−1 on the ribosome (−TΔΔSN-U,RNC-iso = −28.5 ± 10.1 kcal mol−1 at 298 K). Folding from the unfolded state to the first HRAS intermediate (I1, HRAS1–81 RNC) is similarly temperature-insensitive (Fig. 4g and Extended Data Fig. 9d) and exhibits an entropic penalty (−TΔSI1-U) of only +5.0 ± 2.0 kcal mol−1 (Fig. 4h). The thermodynamic effects reported in this work and the resulting population of stable coTF intermediates thus appear to be a general phenomenon.

Fig. 4: Co-translational folding intermediates of titin I27 and HRAS are stabilized on the ribosome.
figure 4

a,b, Crystal structures of titin I27 (Protein Data Bank (PDB): 1TIT) (a) and HRAS (PDB: 4Q21) (b). c, 19F NMR spectra of titin I27, tfmF-labelled at position 14, off the ribosome (isolated), isolated I27 in urea, and I27 tethered with a 34-residue linker to the ribosome (I27+34 RNC) recorded at 298 K. d, 19F NMR spectra of the HRAS G-domain (residues 1–166), tfmF-labelled at position 32, off the ribosome (isolated), in urea and HRAS on the ribosome arrested at 3 different lengths and recorded at 298 K. HRAS1–81, HRAS1–189 and HRAS1–189+20GS correspond to residues 1–81 of HRAS, full-length HRAS and full-length HRAS with an additional 20-residue poly(glycine-serine) linker, respectively, each tethered to the ribosome by the arrest-enhanced SecM motif. e,g, Nonlinear fits to a modified Gibbs–Helmholtz equation (Methods) of the equilibrium constants (mean ± s.e.m. propagated from NMR lineshape fits) of titin I27 folding off and on the ribosome (e) and the HRAS1-81 RNC (g) measured by 19F NMR. f, Thermodynamic parameters determined from the nonlinear fit (mean ± s.d., T = 298 K) of the temperature dependence of I27 folding. The heat capacity of folding (ΔCp,N-U) obtained for the isolated and ribosome-bound protein are −2.0 ± 0.1 and +0.6 ± 1.7 kcal mol−1 K−1, respectively. The heat capacity of the isolated protein is similar to the literature value reported for the wild-type variant28 (−1.4 kcal mol−1). h, Thermodynamic parameters determined from the nonlinear fit (mean ± s.d., T = 298 K) of the temperature dependence of folding for HRAS1–81. The heat capacity of folding obtained from the fit is −0.4 ± 0.3 kcal mol−1 K−1.

Source data

Given the differences in folding thermodynamics and pathways on and off the ribosome, we sought to examine how coTF events may determine the post-translational fate of nascent proteins. Whereas our model systems FLN5 and I27 have been shown to fold reversibly to their native state in isolation36,38, many proteins are not able refold off the ribosome1,18,19. Indeed, the proteolytic stability of the KRAS isoform has been found to be modulated by codon usage39, and so we examined whether the acquisition of native HRAS structure is also dependent on its coTF pathway. Consistent with prior observations on KRAS, refolded isolated HRAS showed reduced proteolytic stability compared with control or native HRAS (Extended Data Fig. 9h), which also persisted when refolded in eukaryotic cell lysate (Extended Data Fig. 9i). A residue-specific analysis by 1H,15N NMR shows that refolded HRAS forms a native-like, GDP-bound conformation, consistent with prior biophysical experiments40, but distinct structural regions, including the switch 2 region that is involved in nucleotide exchange41, show increased NMR signal intensities—that is, probably altered backbone dynamics (Extended Data Fig. 9j). Indeed, when assessing HRAS function with a GDP/GTP nucleotide exchange assay, we found that refolded HRAS is completely inactive, whereas HRAS purified from cells and the HRAS+20GS RNC are both active (Extended Data Fig. 9g). Subtle differences in structure and dynamics thus alter the fate of refolded HRAS, which appears to be kinetically trapped in an inactive state. These results show that the thermodynamic modulation by the ribosome and resulting coTF pathway appear to be obligate to the formation of functionally active HRAS.

Mutations are buffered on the ribosome

We hypothesized that the ribosome additionally modulates the effect of destabilizing mutations4. To test this, we designed nine variants of FLN5 that include disruptions to the hydrophobic core, proline isomerization24 and electrostatic charge7. For all mutants, we measured the folding free energy on and off the ribosome using 19F NMR (Fig. 5a,b and Extended Data Fig. 10a,b,h,i). The mutants exhibited a wide range of stabilities (ΔGN-U) from −0.7 to −5.4 kcal mol−1 off the ribosome (equal to a 4.7 ± 0.3 kcal mol−1 range in stabilities). However, on the ribosome, the stabilities of the mutants exhibited a narrower range of 1.5 ± 0.1 kcal mol−1 (Fig. 5a) and were all less destabilizing (ΔΔΔGN-U) by 0.3–3.7 kcal mol−1 (Fig. 5b).

Fig. 5: Destabilizing mutations are buffered by the ribosome.
figure 5

a, Folding free energies of destabilizing mutants off and on the ribosome (FLN5+34 and FLN5+67 depending on the stability of the mutant) determined by 19F NMR from U and N state populations. b, Destabilization (ΔΔGN-U,mut-WT) of all mutants in isolation compared to the RNC. WT, wild type. c, Destabilization of 4 mutants in isolation, on the ribosome (FLN5+67) and on the ribosome in the presence of 2.5 M urea. d, Left, temperature dependence of the folding equilibrium constant involving the N and U state of isolated FLN5Δ6, FLN5+34 and FLN5+34 in 1.5 M urea measured by 19F NMR fit to a modified Gibbs–Helmholtz equation (Methods). The error bars of individual datapoints (propagated from bootstrapped errors of NMR lineshape analyses) are similar in magnitude to the size of the circles. Right, thermodynamic parameters (T = 298 K) from the nonlinear fits (mean ± s.d.) shown as the difference relative to the isolated protein. Unless stated otherwise, all values in the figure represent the mean ± s.e.m. propagated from NMR lineshape fits.

Source data

We speculated that the re-balanced enthalpy–entropy compensation contributes to this buffering effect. Given the large contribution from increased nascent chain solvation (Fig. 2f), we also measured the destabilization of four hydrophobic mutants in the presence of 2.5 M urea (Extended Data Fig. 10c); urea weakens the hydrophobic effect by displacing several water molecules from the protein solvation shell42,43. By effectively reducing the gains in solvation of the unfolded RNC, we find that the mutations are less strongly buffered in urea (Fig. 5c). In agreement with this, the differences in entropy and enthalpy of folding on the ribosome (relative to the isolated protein) are reduced in urea compared with in pure water (Fig. 5d and Extended Data Fig. 10e–g). Thus, the magnitude and extent of mutation buffering correlates with the reduced temperature dependence of protein folding on the ribosome. We conclude that an additional consequence of the destabilized unfolded and folded states is to buffer and therefore mitigate the effect of destabilizing mutations during coTF folding.

Discussion

Here we have determined an atomistic structural ensemble of a nascent, unfolded protein tethered to the ribosome, analysed its differences to the protein in isolation, and studied its implications for coTF. Our structures reveal that the unfolded state on the ribosome is more structurally expanded and samples fewer long-range contacts than off the ribosome. The ribosome thus has a key role in shaping the conformational space of the emerging nascent chain. The expansion and increased solvation of the unfolded state on the ribosome (Fig. 6) result in reduced conformational and water entropies, a finding that constrasts with previous theoretical studies44,45. The entropic destabilization observed in the unfolded nascent chain relative to the isolated protein outcompetes the enthalpic stabilization provided by electrostatic ribosome interactions7 and increased solvation (Fig. 2f). Meanwhile, the native state structure or environment is perturbed (Extended Data Fig. 7i,j) and enthalpically destabilized on the ribosome relative to its isolated form6 (Figs. 2i and 3d), probably owing to the space constraints near the exit vestibule30 and long-range electrostatic effects from the negatively charged ribosome surface9,31. Together, these effects result in a marked re-balancing of the enthalpy–entropy compensation for protein folding that occurs on the ribosome (Fig. 6).

Fig. 6: The ribosome re-balances the enthalpy–entropy compensation of protein folding.
figure 6

The unfolded state is entropically destabilized owing to a lower conformational entropy and increased solvation, whereas the native state is enthalpically destabilized. This results in a reduction of the entropic penalty for folding and a less favourable folding enthalpy. The labels in grey and black indicate weak and strong energetic contributions, respectively.

The re-balanced enthalpy–entropy compensation of coTF folding provides a physical rationale for understanding differences in the folding on and off the ribosome. It enables nascent proteins to fold via distinct partially folded intermediates on the ribosome that are absent or significantly less stable in isolation6,24, as observed for multiple proteins3,6,14,15,17,32,33 including all three model systems in this work (Figs. 3 and 4). Although the biological effect could be reduced for small proteins that unfold and refold post-translationally, the substantial physical, thermodynamic changes on the ribosome are likely to affect the entire folded proteome during biosynthesis. Indeed, our analyses reveal that the expansion and destabilization of the unfolded state is partially caused by the physical effects of steric exclusion and tethering (Extended Data Fig. 5l,m) and is not dependent on sequence-specific ribosome–nascent chain interactions (Extended Data Fig. 7e–h). The ribosome can therefore act as a universal foldase, that in contrast to others is ATP-independent, and can promote the formation of functionally active proteins, many of which do not spontaneously unfold and refold off the ribosome1,18,19, including HRAS (Extended Data Fig. 9g–j).

The distinct thermodynamics of nascent chains may benefit other co-translational processes that are also entropically disfavoured, such as chaperone binding46,47, translocation47 or protein assembly47,48,49. The high stabilities of coTF intermediates across a wide folding transition, as observed for FLN56,24 and HRAS (Fig. 4d), may additionally provide a longer time frame for such processes to occur. Conversely, partially folded intermediates may result in the formation of non-productive states, such as off-pathway or misfolded species4,50,51,52, highlighting that in the cellular environment there is indeed a fine line between folding and misfolding on the ribosome.

Finally, we present quantitative evidence of mutation buffering by the ribosome as an additional consequence of the thermodynamic effects occurring co-translationally. Throughout evolution proteins diversify through mutations, most of which are destabilizing and impose limits on evolvability while maintaining a fold and function53,54. Many destabilizing mutants studied here would be expected to be fully unfolded co-translationally in the absence of buffering, despite being completely natively folded in isolation (Extended Data Fig. 10h), because folded structure is less stable on the ribosome (Fig. 3d). The buffering effect thus minimizes the increased population of unfolded nascent chain resulting from a destabilizing mutation, effectively promoting coTF over post-translational folding and averting potentially harmful consequences for mutant proteins. For example, accumulation of unfolded populations on the ribosome and failure of coTF have been linked to co-translational ubiquitination and degradation of nascent chains55. Additionally, lack of coTF could be detrimental for nascent chains that rely on co-translational complex assembly (more than 20% of the proteome48), chaperone engagement46, or cannot fold into an active conformation post-translationally1,18,19 (Extended Data Fig. 9g–j). Notably, cellular chaperones have also been implicated in mutation buffering and their availability has been linked to the rate at which proteins evolve53,56,57,58,59,60. CoTF may therefore also have a universal role in mutation buffering and evolution during the initial stages of protein folding before transferring nascent chains to chaperones47.

In conclusion, we have demonstrated that the ribosome entropically destabilizes the unfolded state. This provides a general, physical explanation for the fundamental differences in protein folding pathways and energetics observed in vitro versus on the ribosome. Beyond the effects of steric exclusion and tethering, other factors that contribute to the destabilization of the unfolded and native states on the ribosome remain unexplored. Deeper insights may decipher additional physical principles behind what we propose to be a universal phenomenon during de novo protein folding.

Methods

Protein expression and purification

DNA constructs of FLN5 were previously described7,11. Coding sequences for titin I27 and HRAS were introduced into the pLDC-17 vector using standard procedures. Further mutations were introduced using site-directed mutagenesis; for 19F labelling, amber stop codons were introduced6 in position 32 in HRAS, and residue 14 with an additional K87H point mutation in I27. FLN5 variants were expressed as His-tagged proteins and isotopically labelled in Escherichia coli BL21 DE3-Gold cells as previously described6,7; an identical protocol was used to produce purified samples of I27 and HRAS. RNC constructs comprised an arrest-enhanced variant of the SecM stalling sequence, FSTPVWIWWWPRIRGPP, as previously described6. Purification of isolated FLN5 A3A3 was performed by affinity chromatography followed by size-exclusion chromatography in the presence of 6 M urea prior to buffer exchange into Tico buffer (10 mM Hepes, 30 mM NH4Cl, 12 mM MgCl2, 1 mM EDTA). The full protein sequence of the FLN5 A3A3 is deposited together with its chemical shift assignment on the BMRB (entry 51023). For the RDC, pulse-field gradient NMR (PFG-NMR) and PRE-NMR experiments, the additional mutation C747V (referred to FLN5 A3A3 V747) was introduced to yield a cysteine-less construct for site-specific spin labelling. The protein concentration was determined using the BCA assay according to the manufacturer’s instructions. RNCs were expressed, isotopically labelled uniformly with 15N, or site-specifically with 19F, and purified as previously described6,7. For samples for intermolecular PRE-NMR experiments involving ribosome labelling, we generated modified E. coli BL21 strains with cysteine mutations in uL23 and uL24 using CRISPR as previously described2. RNC samples were prepared in Tico buffer for experiments. Western blot analyses were undertaken with an anti-hexahistidine horseradish peroxidase-linked antibody (Invitrogen, 1:5,000 dilution).

Fluorescent and PEG-maleimide labelling of 70S and RNC samples

Ribosomes and RNCs were first reduced using 2 mM TCEP overnight at 277 K, then buffer exchanged into labelling buffer. For fluorescein-5-maleimide and PEG-maleimide, labelling was performed in Tico at pH 7.5. ABD-MTS labelling was performed in labelling buffer (50 mM HEPES, 12 mM MgCl2, 20 mM NH4Cl, 1 mM EDTA, pH 8.0). Samples were labelled using a 10x molar excess of ABD-MTS, or fluorescein-5-maleimide. Cysteine mass-tagging by PEGylation was performed as previously described with 10,000-fold molar excess of PEG over sample7. ABD-MTS and PEGylation reactions were analysed using 12% Bis-Tris SDS–PAGE gels61. The fluorescein-labelled reactions were run on a 20% Tricine SDS–PAGE gel, modified from ref. 62.

NMR spectroscopy

All NMR experiments were recorded with Topspin 3.5pl2. NMR experiments of FLN5 A3A3 were performed in Tico buffer at pH 7.5 and 283 K. Chemical shifts were previously assigned7 and obtained from data recorded on a Bruker Avance III operating at 700 and 800 MHz equipped with TCI cryoprobes. All samples contained 10% (v/v) D2O and 0.001% (w/v) DSS as a reference. Data were processed analysed using NMRPipe63 (v11.7), CCPN64 (v2.4) and MATLAB (R2017a, Mathworks).

Amide 1H and 15N chemical shifts were obtained from two-dimensional 1H–15N SOFAST-HMQC experiments65 using an acquisition time of 50 ms in the direct dimension. The inter-scan delay was 50 ms. Cα chemical shifts were obtained from 3D BEST-HNCA experiments recorded at 800 MHz with acquisition times of ~50 ms and inter-scan delays of 150 ms. C’ chemical shifts were obtained from BEST HNCO experiments recorded at 700 MHz using acquisition times of ~50 ms and inter-scan delays of 200 ms. RNC samples were doped with 20 mM NiDO2A (Ni(ii) 1,4,7,10-tetraazacyclododecane-1,7-bis(acetic acid)) to enhance sensitivity66. Cosine-squared window functions were used in processing the spectra.

For PRE-NMR experiments, we used a cysteine-less construct with the C747V mutation and introduced six and eight labelling sites in the isolated and ribosome-bound protein, respectively. Samples were reduced overnight at 277 K in Tico supplemented with 2 mM TCEP. TCEP was then removed by buffer exchange into labelling buffer (50 mM HEPES, 12 mM MgCl2, 20 mM NH4Cl, 1 mM EDTA, pH 8.0) and subsequently labelled overnight at 277 K with 10× molar excess of MTSL. Following labelling, excess MTSL was removed by buffer exchanging the sample back into Tico buffer for NMR. The same labelling protocol was used for isolated protein and RNC samples. To measure the PREs, we recorded the signal intensities with MTSL in the paramagnetic and diamagnetic state. Direct measurements of relaxation rates proved not feasible for RNC samples due sensitivity limitations. 2D 1H–15N SOFAST-HMQC experiments65 were recorded at 800 MHz and 283 K using ~100 μM of protein or ~10 μM of RNC. Experiments were recorded with an acquisition time of 100 ms and 35 ms, in the direct and indirect dimension, respectively. The inter-scan delay was 450 ms to allow for complete relaxation. To acquire the diamagnetic data, the sample was reduced with 2.5 mM (RNC) or 100× molar excess (isolated) sodium ascorbate. Following complete reduction, the same HMQC experiment was recorded. To extract the PREs, spectral peaks were first fitted to a Lorentzian shape in both the direct and indirect dimension using NMRPipe63. Errors were obtained from the spectral noise (RMSE). From the fitted peaks, intensity ratios of Ipara/Idia were calculated and converted to PRE rates for Bayesian ensemble reweighting by numerically solving equation S34 (see Supplementary Notes 34) for Γ2. Sample integrity was monitored using interleaved 1H,15N SORDID diffusion measurements as previously described7.

PFG-NMR experiments were used to measure the diffusion coefficients and the Rh of FLN5 variants. 1D 1H,15N-XTSE diffusion measurements were recorded at 700 (FLN5, FLN5 Δ6) and 800 MHz (FLN5 A3A3). Eight to sixteen gradient strengths ranging linearly from 5% to 95% of the maximum gradient strength of 0.556 T m−1 were used. By measuring the signal intensity at each gradient strength, diffusion coefficients could be obtained by fitting the data to the Stejskal–Tanner equation67, which were converted to Rh using the Stokes–Einstein equation.

RDCs for isolated FLN5 A3A3 were measured in Tico buffer at 283 K and pH 7.5 in a PEG/octanol mixture68. RDCs are reported as the splitting of the isotropic splitting subtracted from the aligned splitting, corrected for the negative gyromagnetic ratio of 15N. RDCs were measured by preparing a solution containing 4.6% (w/w) pentaethylene glycol monooctyl ether (C8E5), 1-octonal (molar ratio 1-octanol:C8E5 = 0.94) and 110 μM of protein. Alignment was confirmed by measuring the D2O deuterium splitting at 283 K (17.6 Hz). All RDC NMR experiments were acquired on a Bruker Avance III HD 800 MHz spectrometer equipped with a TCI cryoprobe. A set of four different RDCs (1DNH, 1DCαCO, 1DCαHα and 2DHNCO) was measured per sample (isotropic and anisotropic) using the 3D BEST HNCO (JCOH and JCC) or BEST-HNCOCA (JCAHA) experiments69,70,71. The one-bond 1H−15N coupling was determined by recording two 15N-HSQC sub-spectra, in-phase (IP) and anti-phase (AP). For the measurement of the 1H-13CO coupling constants a BEST HNCO-JCOH experiment was used with an introduced DIPSAP filter. Such J-mismatch compensated DIPSAP spin-state filter offers an attractive approach for accurate measurement of small spin–spin coupling constants72. For that, three separate experiments were recorded with different filter lengths (2τ = 1/J) for each anisotropic and isotropic media, where the sub-spectra associated to the separated spin states (two in phase and one anti-phase) are combined using a linear relation k (IP) + (k − 1) (IP) ± (AP) with k = 0.73, the theoretical optimized scaling factor. The spectra were recorded with 144 × 104 × 1,536 complex points in the 13C(t1)/15N (t2)/1H (t3) dimensions, respectively, and with the spectral widths set to 15,244 Hz (1H), 2,070 Hz (15N) and 1,510 Hz (13C) for the HNCO-JCOH. For the HNCO-JCC and HNCOCA-JCAHA 256 × 200 × 1,536 complex points were acquired in the 13C(t1)/15N (t2)/1H (t3) dimensions, with spectral widths of 15,244 Hz (1H), 1,900 Hz (15N) and 1,214 Hz/5050 Hz (13C). The recycle delay was set to 200 ms, the acquisition time to 100 ms with 16 scans per increment, and the data was acquired in the non-uniform sampling format (2246 points for HNCO-JCOH and 7680 for the HNCO-JCC/HNCOCA-JCAHA experiments were sampled using the schedule generator from the web portal nus@HMS (http://gwagner.med.harvard.edu/intranet/hmsIST/). The time domain data was converted into the NMRPipe63 format and reconstructed using the sparse multidimensional iterative lineshape-enhanced method (SMILE)73. Coupling constants were obtained from line splitting in the 13C or 15N dimension obtained with CCPN analysis software64.

19F NMR experiments were recorded on a 500 MHz Bruker Avance III spectrometer equipped with a TCI cryoprobe at 298 K (unless otherwise indicated) using a 350 ms acquisition time and 1.5–3 s recycle delay as previously described6. We used an amber-suppression strategy to incorporate the unnatural amino acid tfmF, as previously described6. Multiple experiments were recorded in succession to monitor sample integrity over time also as previously described6. Data were processed using NMRPipe63. Spectra were baseline corrected, peaks were fit to Lorentzian functions and errors of the linewidths and integrals (that is, populations) were estimated using bootstrapping (200 iterations, calculating the standard error of the mean), or from the spectral noise for states whose resonance was not detectable, in MATLAB6. 19F-translational diffusion experiments were performed as previously described6.

Thermodynamic parameters of folding (ΔH, ΔS and ΔCp) were obtained from a nonlinear fit to a modified Gibbs–Helmholtz equation, assuming ΔCp remains constant across the experimental temperature range:

$${\rm{ln}}\left({K}_{{\rm{eq}},T}\right)=-\left(\frac{\Delta {H}_{{T}_{0}}+\Delta {C}_{{\rm{p}}}(T-{T}_{0})}{R}\right)\left(\frac{1}{T}\right)+\left(\frac{\Delta {S}_{{T}_{0}}+\Delta {C}_{{\rm{p}}}{\rm{ln}}\left(\frac{T}{{T}_{0}}\right)}{R}\right)$$
(S1)

Keq is the equilibrium constant, T is the temperature in Kelvin and T0 is the standard temperature (298 K). We also fitted the data to the linear van’t Hoff equation (assuming ΔCp = 0).

$${\rm{ln}}\left({K}_{{\rm{eq}}}\right)=-\left(\frac{\Delta H}{R}\right)\left(\frac{1}{T}\right)+\frac{\Delta S}{R}$$
(S2)

The Scipy package with optimize.curve_fit function was used to perform the fits74 and errors were estimated as one s.d. from the diagonal elements of the parameter covariance matrix. All parameters (ΔH, ΔS, ΔCp) generally showed strong correlations with each other (r ≥ 0.8), and thus, their uncertainties correlate also. These parameter correlations are expected75. The magnitudes of ΔH and −TΔS are also expected to correlate because we study the temperature dependence of folding in a range where ΔG of folding is close to 0.

Folding free energies were calculated from the experimental populations using ΔG = −RTln(K). The folding free energy of the FLN5+67 wild-type RNC, where no unfolded state is observable, was estimated on the basis of two destabilizing mutants FLN5(V664A/F665A) and FLN5(V707A). The stability of these mutants was measured using 19F NMR on and off the ribosome. The FLN5+67 wild-type folding energy (ΔGN-U) was then calculated as the average from the V664A/F665A and V707A mutants using ΔGWT,+67 = ΔGmut,+67 − ΔΔGmut-WT,iso, where ΔΔGmut-WT,iso is the experimentally measured destabilization in isolation. Given that at FLN5+34, both mutants show a weaker destabilization than in isolation, we reasoned that this estimate of the FLN5+67 wild-type folding free energy is its lower bound (most negative).

19F transverse relaxation rate (R2) measurements were recorded using a Hahn-echo sequence and acquired as pseudo-2D experiments with relaxation delays of 0.1 to 200 ms. Data were processed using NMRPipe and analysed using MATLAB. Data were fit to lineshapes and R2 was obtained by fitting the integrals to single exponential functions. We also orthogonally determined R2 from linewidth measurements of spectra acquired by 1D 19F pulse-acquire experiments, which showed excellent correlations. The lineshape-derived R2 values also showed good correlation with previously determined rotational correlation times25 (τC). We additionally determined the S2τC of FLN5 in 60% glycerol at 278 K by measurements of triple quantum build-up and single quantum relaxation as previously described76,77. Thus, our R2 values can be used to determine rotational correlation times (τC,exp). The obtained τC,exp was used to estimate the bound population as \(\frac{{\tau }_{{\rm{C}},\exp }-{\tau }_{{\rm{C,iso}}}}{{\tau }_{{\rm{C,bound}}}-\,{\tau }_{{\rm{C,iso}}}}\), where τC,iso is the rotational correlation time of the isolated protein25 (7.7 ns at 298 K) and τC,bound is the expected rotational correlation time of the bound state. τC,bound is taken as the rotational correlation time of the ribosome itself (~3,000 ns at 298 K) for a fully rigid bound state. From the bound populations (pB), the resulting change in the folding free energies of the intermediates was calculated as ΔΔGI-U,RNC-iso = RT(ln(1 − pB)). We report the estimate for a fully rigid bound state in the main text (\({S}_{{\rm{bound}}}^{2}=1.0\)) but note that even one order of magnitude more flexibility in the bound state (\({S}_{{\rm{bound}}}^{2}=0.1\)) only accounts for up to 1.1 ± 0.6 and 0.4 ± 0.1 kcal mol−1 of stabilization for I1 and I2 on the ribosome at FLN5+47, respectively. These estimates still cannot account for the >4 kcal mol−1 of intermediate stabilization observed on the ribosome6.

All NMR experiments of RNCs are recorded and continuously interleaved with a series of 1D 1H/19 F spectra and 1H,15N/19F diffusion measurements6,7,61,78. These provide the most sensitive means to assess changes in the sample, and when alterations in signal intensities or linewidths (that is, transverse relaxation rates), chemical shifts or translational diffusion measurements of the nascent chain are observed, data acquisition is halted. Only data corresponding to intact RNCs are summed together and subjected to a final round of analysis. Where signal-to-noise remains low, datasets from multiple samples are compared to ensure identical spectra, before summation together into a single NMR spectrum. Biochemical assays provide an orthogonal means to assess nascent chain attachment to the ribosome. Identical samples incubated in parallel with NMR samples are analysed by SDS–PAGE (under low pH conditions61) and detected with nascent-chain-specific antibodies. Ribosome-bound species migrate with an addition ~17-kDa band-shift relative to released nascent chains due to the presence of the tRNA covalently linked to the nascent chain. Combined with time-resolved NMR measurements, these analyses confirm that the reported NMR resonances originate exclusively from intact RNCs.

Mass spectrometry

FLN5 A3A3 was buffer exchanged into 100 mM (NH4)2CO3 at pH 6.8 (using formic acid for pH adjustment). Analyses were run on the Agilent 6510 QTOF LC–MS system at the UCL Chemistry Mass Spectrometry Facility. Samples contained ~10–20 μM of protein and 10 μl were injected onto a liquid chromatography column (PLRP-S, 1,000 Å, 8 μm, 150 mm × 2.1 mm, maintained at 60 °C). The liquid chromatography was run using water with 0.1% formic acid as mobile phase A and acetonitrile with 0.1% formic acid as phase B with a gradient elution and a flow rate of 0.3 ml min−1. ESI mass spectra were continuously acquired. The data were processed to zero charge mass spectra with the MassHunter software, utilizing the maximum entropy deconvolution algorithm.

Small-angle X-ray scattering

We measured SAXS of an isolated FLN5 A3A3 C747V sample in Tico buffer supplemented with 1% (w/v) glycerol. Data collection was performed at the DIAMOND B21 beamline (UK)79 with a beam wavelength of 0.9408 Å, flux of 4 × 1012 photons s−1 and an EigerX 4 M (Dectris) detector distanced at 3.712 m from the sample. A capillary with a 1.5 mm diameter kept at 283 K was used for data acquisition. We acquired SAXS data at multiple protein concentrations (5.5, 2.75, 1.38, 0.69, 0.34 and 0.17 mg ml−1) to assess whether the sample exhibited signs of aggregation or interparticle interference. At 5 mg ml−1, we observed weak signs of interparticle interference in the low q region of the scattering profile, which is also reflected in the Rg obtained by Guinier analysis (using the autorg tool from ATSAS80; Supplementary Table 1). Data were recorded as a series of frames, non-defective frames were averaged, and buffer subtracted with PRIMUS80. Size-exclusion chromatography–SAXS (SEC–SAXS) experiments were additionally performed in Tico buffer with 1% (w/v) glycerol using a KW402.5 (Shodex) column to confirm the monodispersity of the sample. We chose the 2.75 mg ml−1 dataset as the final dataset to compare with our molecular dynamics simulations. This dataset exhibited the highest signal to noise ratio and did not show signs of interparticle interference, and accordingly, the Rg obtained from the 2.75 mg ml−1 dataset is consistent with the value obtained from lower concentrations and the main SEC–SAXS peak (Supplementary Table 1).

Circular dichroism spectroscopy

The circular dichroism (CD) spectrum of isolated FLN5 A3A3 V747 was acquired in 10 mM Na2HPO4 pH 7.5 at 283 K. A Chirascan-plus CD spectrometer (Applied Photophysics), a protein concentration of 44 μM and a cuvette with a 0.5 cm pathlength were used.

HRAS refolding experiments

HRAS refolding experiments were performed with the HRAS G-domain (residues 1–166). The protein was unfolded overnight at 298 K in Tico buffer with 2 mM β-mercaptoethanol, 8 M urea and protein concentration of 15 μM. The protein was then refolded by rapidly diluting into Tico buffer (supplemented with 2 mM β-mercaptoethanol and 50 μM GDP) to reach final urea and protein concentrations of 0.94 M and 1.76 μM, respectively, and allowed to incubate at 298 K for 24 h. For NMR analyses of refolded samples, we prepared 18 μM of refolded protein with the same urea concentrations and dilutions.

We assayed the functional/activity state of HRAS using GDP/GTP nucleotide exchange (‘activity’) assay81 with fluorescently labelled GTP that exhibits higher fluorescence when bound to HRAS than free in solution (BODIPY FL GTP, ThermoFisher). 0.4 μM of HRAS, 0.01 μM of BODIPY GTP and 1 μM of SOScat (the catalytic domain of Son of sevenless) were incubated at room temperature and the maximum (plateau) fluorescence recorded and normalized by the signal of the buffer (signal/noise ratio). SOScat was produced as previously described82. Fluorescence measurements were performed using the CLARIOstar microplate reader (BMG Labtech) with excitation and emission wavelengths set to 488 and 514 nm, respectively.

The proteolytic stability of HRAS was assayed with thermolysin at a concentration of 0.05 mg ml−1 incubated with HRAS samples over the course of 5 h in vitro and 9 h in rabbit reticulocyte lysate (RRL, TNT coupled reticulocyte lysate, Promega). Reactions were quenched with 23 mM EDTA. Timepoints were analysed by western blot analysis using a pan-RAS polyclonal antibody (ThermoFisher, 1:1,000 dilution), utilizing an anti-rabbit IgG horseradish peroxidase-linked secondary antibody (Cell Signaling Technology, 1:1,000 dilution). Densitometry analyses were performed with ImageJ83. For the RRL experiments, refolding reactions were performed in RRL for 24 h at 298 K and a final HRAS concentration of 1.6 μM followed by pulse proteolysis and we quantified the relative band intensities (refolded/control) for each time point to account for increased background on the western blot during the proteolysis reaction.

Molecular dynamics simulations

We used the FLN5 A3A3 C747V sequence for all simulations. A reliability and reproducibility checklist is provided in Supplementary Table 8. GROMACS (version 2021)84 was used for all all-atom molecular dynamics simulations in explicit solvent. We employed the Charmm36m force field in combination with the CHARMM TIP3P water model (C36m) and the CHARMM TIP3P water model with an increased water hydrogen LJ well-depth (denoted here as C36m+W)85. We also used the a99sb-disp force field together the a99sb-disp TIP4P-D water model86. Default protonation states were used in all cases. Starting from a random extended conformation, for all force field combinations the system was solvated in a dodecahedron box with 151,135 water molecules and 12 mM MgCl2 (resulting in 455,116 atoms and an initial box volume of 4,688 nm3). Systems were then energy minimized using the steepest-decent algorithm. For the following dynamics simulations, we used the LINCS algorithm87 to constrain all bonds connected to hydrogen and a timestep of 2 fs using the leap-frog algorithm for integration. Nonbonded interactions were calculated with a cut-off at 1.2 nm (including a switching function at 1.0 nm for van der Waals interactions) and the particle mesh Ewald (PME) method88 was used for long-range electrostatic calculations. We then equilibrated the systems in two phases. First, we performed a 500 ps equilibration simulation in the NVT ensemble with position restraints on all protein heavy atoms. The temperature was kept at 283 K using the velocity rescaling algorithm89 and a time constant of 0.1 ps. Next, we further equilibrated the systems for 500 ps in the NPT ensemble at 283 K and a pressure of 1 bar with a compressibility of 4.5 × 10−5 bar−1 using the Berendsen barostat90. Following equilibration, we relaxed our initial structure for 100 ns at 283 K without any position restraints using the Parrinello–Rahman algorithm91 and then picked five structures from this simulation for production simulations. We ran a total of 5× 2 μs (with different initial coordinates and velocities) yielding a total of 10 μs of sampling per force field. For the C36m+W combination we ran an additional 5× 2 μs starting from 5 new starting structures yielding 20 μs in total.

We also generated a prior ensemble with a physics-based coarse-grained (C-alpha) model. We generated the C-alpha model template from the FLN5 crystal structure using SMOG 2.392, where all bonded terms have a global energy minimum at the values taken in the crystal structure93. Nonbonded van der Waals interactions were modelled using a 10–12 Lennard-Jones potential with σ and λ parameters described in the M1 parameter determined by Tesei et al.94 (equation (S3)). We used the arithmetic mean of two residues to determine σ and λ. Electrostatic interactions were modelled using the Debye–Hückel theory with parameters described previously7. Interactions between Cα beads separated by less than four residues were excluded. We ran initial simulations at a range of reduced temperatures to determine the effect on the average compactness and ran final simulations at a reduced temperature of 1.247 (150 K in GROMACS) as we did not observe a significant increase in average Rg beyond this temperature. Simulations were run for a total of 3 × 109 steps with GROMACS (v2018.3).

$${u}_{{\rm{LJ}}}=\mathop{\sum }\limits_{i}^{N}\lambda \left[{5\left(\frac{\sigma }{r}\right)}^{12}-6{\left(\frac{\sigma }{r}\right)}^{10}\right]$$
(S3)

After simulations, the coarse-grained ensemble was backmapped to an all-atom structure using PULCHRA (v3.06)95.

RNC simulations were parameterized using the C36m+W force field/water model combination85,96. We modelled the ribosome using the structure PDB 4YBB97 as a template, which we previously refined against a cryo-EM map containing an FLN5 RNC98. As in our previous work, we only retained ribosome atoms around the nascent chain exit tunnel and accessible surface outside the vestibule7. The FLN6 linker and SecM sequence were initially modelled using a cryo-EM map of a FLN5+47 RNC (Mitropoulou et al., manuscript in preparation). The rest of the nascent chain (MHHHHHAS N-terminal tag and FLN5) was then built using PyMol version 2.3 (The PyMol Molecular Graphics System, Schrödinger) and we generated a random initial starting structure with a short simulation using a structure-based force field, SMOG2.392, without native contacts. The FLN5+31 A3A3 RNC (containing the C747V mutation) complex was then centred in a dodecahedral box, solvated using 1,030,527 water molecules and neutralized with 706 Mg2+ ions, resulting in a final system size of 3,163,127 atoms. The initial box volume was 32,117 nm3. The large box size was necessary to accommodate the highly expanded unfolded state. We then used the same cut-offs and simulation methods as for the isolated protein. We initially also ran a 500 ps equilibration simulation in the NVT ensemble using position restraints on all heavy atoms using a force constant of 1,000 kJ mol−1 nm2 in along the x, y and z axes. We used a temperature of 283 K, which was held constant using the velocity rescaling algorithm89 and a time constant of 0.1 ps. Then, we ran a 500 ps equilibration simulation in the NPT ensemble at 283 K using the same position restraints. The pressure was kept at 1 bar with a compressibility of 4.5 × 10−5 bar−1 using the Berendsen barostat90. The position restraints for all nascent chain atoms (except the terminal residue at the PTC in the ribosome) were then removed, while the ribosome atoms kept being position restrained. In this setup, we ran a 1 ns equilibration simulation at 283 K and 1 bar, using the Parrinello–Rahman algorithm91. All production simulations were performed using position restraints for the ribosome atoms and C-terminal nascent chain residue at the PTC. Using the equilibrated configuration, we then ran two simulations of ~100 ns to picked ten starting structures for production simulations. Then, ten production simulations of 1.5 μs each (15 μs) were initiated from these different starting structures using random initial velocities. Before the production simulation, each structure was re-equilibrated at 283 K and 1 bar with a 500 ps NVT and 500 ps NPT simulation.

Lastly, to compare our C36m+W simulations with a model that only considers steric exclusion as a nonbonded interaction, we also ran simulations of a simple all-atom model, based on a structure-based model template92. We used the FLN5 crystal structure to define the energy minima of all bond and dihedral angles and removed all native contacts. Simulations of isolated and ribosome-bound FLN5 A3A3 were run for 1 × 109 steps and 100,000 frames were harvested for analysis. This ensemble was used to compare the expansion of the ensemble, ribosome interactions and conformational entropy with the C36m+W simulations.

Calculation of PREs

The transverse PRE rates of backbone amide groups, Γ2, were back-calculated from the ensembles using the Solomon–Bloembergen equation99,100

$${\varGamma }_{2}=\frac{1}{15}{\left(\frac{{\mu }_{0}}{4\pi }\right)}^{2}{\gamma }_{{\rm{H}}}^{2}\,{g}_{{\rm{e}}}^{2}{\mu }_{{\rm{B}}}^{2}S(S-1)[4J(0)+3J({\omega }_{{\rm{H}}})]$$
(S4)

where μ0 is the permeability of space, γH is the gyromagnetic ratio of the proton, ge is the electron g-factor, γB is the Bohr magneton, S is the proton nuclear spin and J(ω0) is the generalized spectral density function. For flexible spin labels attached via rotatable bonds the spectral density can be expressed as in equation (S5)101.

$$J({\omega }_{{\rm{H}}})=\langle {r}^{-6}\rangle \left[\frac{{S}^{2}{\tau }_{{\rm{c}}}}{1+{({\omega }_{{\rm{H}}}{\tau }_{{\rm{c}}})}^{2}}+\frac{(1-{S}^{2}){\tau }_{{\rm{t}}}}{1+{({\omega }_{{\rm{H}}}{\tau }_{{\rm{t}}})}^{2}}\right]$$
(S5)

where \(\langle {r}^{-6}\rangle \) is the average of the electron–hydrogen distance (r) distribution, S2 is the generalized order parameter for the electron–hydrogen interaction vector, τC is the correlation time defined in terms of the rotational correlation time of the protein (τr) and the electron spin relaxation time (τs):

$${\tau }_{c}={\left({\tau }_{r}^{-1}+{\tau }_{s}^{-1}\right)}^{-1}$$
(S6)

τt is the total correlation time defined as:

$${\tau }_{{\rm{t}}}={\left({\tau }_{r}^{-1}+{\tau }_{s}^{-1}+{\tau }_{{\rm{i}}}^{-1}\right)}^{-1}$$
(S7)

τi is the internal correlation time of the spin label. Since for nitroxide labels electron spin relaxation occurs on a much slower timescale than rotational tumbling101,102, τC can be approximated to τr such that expression for τt simplifies to

$${\tau }_{{\rm{t}}}={\left({\tau }_{{\rm{C}}}^{-1}+{\tau }_{i}^{-1}\right)}^{-1}$$
(S8)

Given that τC is not known a priori, we iteratively scanned τC values in the range of 1 to 15 ns to find a value for which optimal agreement with the experimental data is achieved (as judged by the reduced χ2)94,103. The spin label correlation time104, τi was set to 500 ps, in agreement with molecular dynamics simulations105 and electron spin resonance measurement106.

The generalized order parameter S2 for the electron–hydrogen interaction vector can be decomposed into its radial and angular components107:

$${S}_{{\rm{PRE}}}^{2}\approx {S}_{{\rm{PRE}},{\rm{angular}}}^{2}{S}_{{\rm{PRE}},{\rm{radial}}}^{2}$$
(S9)

where the individual components are defined as

$${S}_{{\rm{PRE,angular}}}^{2}=\frac{4\pi }{5}\mathop{\sum }\limits_{m=-2}^{2}{\left|\langle {Y}_{2}^{m}({\varOmega }^{{\rm{mol}}})\rangle \right|}^{2}$$
(S10)
$${S}_{{\rm{PRE}},{\rm{radial}}}^{2}=\langle {r}^{-6}{\rangle }^{-1}\langle {r}^{-3}{\rangle }^{2}$$
(S11)

and \({Y}_{2}^{m}\) are the second order spherical harmonics and Ωmol are the Euler angles in the frame. A weighted ensemble average of S2 can be calculated by taking a weighted ensemble average of the individual radial and angular components.

A previously published rotamer library containing 216 MTSL rotamers108 was used to explicitly model the flexibility of the spin label, similar to other existing methods109,110. The rotamer library was aligned to all employed labelling sites for each conformer using the backbone atoms of the labelling site and Cys-MTSL moiety. Clashing rotamers were discarded, where a steric clash between the rotamer and the protein was defined using a 2.5 Å cut-off distance. Only backbone and Cβ atoms were considered for the protein, assuming sidechains can rearrange to accommodate the MTSL rotamer111. For MTSL, only the sidechain was included (heavy atoms beyond the Cβ atom). Protein frames for which at least one labelling position cannot sterically allow any MTSL rotamers were discarded. The rotamer library was used to calculate a weighted ensemble-averaged Γ2 over the rotamer ensemble for each protein conformer in the protein ensemble using equations (S3S11). The protein ensemble average can then be calculated by averaging Γ2 over the ensemble.

PRE intensity ratios were then calculated from the ensemble-averaged PRE rate, \(\langle {\varGamma }_{2}\rangle \), using

$$\frac{{{\rm{I}}}_{{\rm{p}}{\rm{a}}{\rm{r}}{\rm{a}}}}{{{\rm{I}}}_{{\rm{d}}{\rm{i}}{\rm{a}}}}=\frac{{R}_{2}{{\rm{e}}}^{-2\Delta \langle {\varGamma }_{2}\rangle }}{{R}_{2}+\langle {\varGamma }_{2}\rangle }\times \frac{{R}_{2,{\rm{M}}{\rm{Q}}}}{{R}_{2,{\rm{M}}{\rm{Q}}}+\langle {\varGamma }_{2}\rangle }$$
(S12)

where R2 is the linewidth in the proton dimension (residue-specific), R2,MQ is the linewidth in the nitrogen dimension (multiple-quantum term) and Δ is the delay time in the HMQC experiment (5.43 ms). See Supplementary Note 3 for additional details.

For RNCs, we considered that that ribosome tethering may increase the correlation time of the electron–amide interaction vector due to restricted molecular tumbling near the exit tunnel. We therefore calculated an order parameter, \({S}_{{\rm{NC}}}^{2}\), which quantifies the motion of the electron-interaction vector over the entire nascent chain conformer ensemble (\({S}_{{\rm{NC}}}^{2}\) is distinct from the order parameter S2 that quantifies the motion of the MTSL rotamer library attached to a labelling site for a specific protein conformer; equation (S9)). S2 is given by

$${S}_{{\rm{NC}}}^{2}\approx {S}_{{\rm{NC,angular}}}^{2}{S}_{{\rm{NC,radial}}}^{2}$$
(S13)

where \({S}_{{\rm{NC}},{\rm{angular}}}^{2}\) and \({S}_{{\rm{NC}},{\rm{radial}}}^{2}\) are given by

$${S}_{{\rm{PRE}},{\rm{angular}}}^{2}=\frac{4\pi }{5}\mathop{\sum }\limits_{m=-2}^{2}{| \langle {Y}_{2}^{m}({\varOmega }^{{\rm{mol}}})\rangle | }^{2}$$
(S14)
$${S}_{{\rm{PRE}},{\rm{radial}}}^{2}=\langle {r}^{-6}{\rangle }^{-1}\langle {r}^{-3}{\rangle }^{2}$$
(S15)

and \({Y}_{2}^{{\rm{m}}}\) are the second order spherical harmonics and Ωmol are the Euler angles in the frame. We approximated the position of the free electron with the Cα atom of the labelling site in this case. A \({S}_{{\rm{NC}}}^{2}\) value of 0 indicates that the vector tumbles completely independent of the ribosome and that the correlation time of the electron–amide vector is the same as for the isolated protein, τC,iso. A \({S}_{{\rm{NC}}}^{2}\) value of 1 means that the vector tumbles with the same rotational correlation time as the ribosome (τr,70S = 3.3 μs per cP, as determined by fluorescence depolarization112, and τr,70S = 4.3 μs at 283 K in H2O). The effective correlation time, τC,eff, of each amide-electron vector is given by

$${\tau }_{{\rm{C,eff}}}\,={S}_{{\rm{NC}}}^{2}{\tau }_{{\rm{r,70S}}}+(1-{S}_{{\rm{NC}}}^{2}){\tau }_{{\rm{C,iso}}}$$
(S16)

We used a value of 3 ns for τC,iso, which was the optimal value determined for isolated FLN5 A3A3. Generally, τC (equation (S6)) is approximated as τC ≈ τr because the electron spin relaxation time, τs, occurs on a much slower timescale. In fact, measurements of the spin relaxation time of nitroxides have been measured to be on a timescale from hundreds of nanoseconds to several microseconds113,114,115. The calculated values of τC,eff are predominantly below 100 ns except for labelling sites C744, uL23 G90C and uL24 N53C, where values of up to ~250 ns are observed (Supplementary Tables 5 and 6). Thus, we still expect τC to be dominated by τr and make use of the τC ≈ τ approximation.

Finally, reference PRE profiles for a fully extended peptide were calculated from a linear polyalanine chain using a τC of 5 ns and R2,H/R2,MQ of 100 Hz.

Bayesian inference reweighting

We performed ensemble refinement by reweighting the molecular dynamics-derived ensembles against the experimentally deduced Γ2 rates using the Bayesian Inference of Ensembles (BioEn) software and method described in the corresponding paper116,117. These calculations were performed using in-house scripts of the software with the modification to incorporate upper and lower bound restraints in addition to regular restraints with gaussian errors. To this end, these inequality restraints were treated as normal gaussian restraints but subjected to a conditional statement. Lower bound restraints (Γ2 > 64.5 s−1 for isolated FLN5 A3A3; Γ2 > 96.0 s−1 for the RNCs) were applied only if the back-calculated Γ2 was below the lower bound value. Similarly, upper bound restraints (Γ2 < 2.2 s−1 for isolated FLN5 A3A3; Γ2 < 3.7 s−1 for the RNCs) were applied only if the back-calculated average was above the upper bound. This effectively allows the back-calculated value to vary freely above the lower bound and below the upper bound but imposes a penalty if the inequality condition is not met. The errors of the lower and upper bound values were taken as the combined relative error of that datapoint (that is, the intensity ratio).

As described by Köfinger et al.117, the reweighting optimization problem can be efficiently solved by minimizing the negative log-posterior function (L).

$$L=\theta {S}_{{\rm{KL}}}+\mathop{\sum }\limits_{i=1}^{M}\frac{{\left({\sum }_{\alpha =1}^{N}{w}_{\alpha }{y}_{i}^{\alpha }-{Y}_{i}\right)}^{2}}{2{\sigma }_{i}^{2}}$$
(S17)

θ expresses the confidence in the initial ensemble, N is the ensemble size, M is the number of experimental restraints, wα is the vector of weights for the conformers in the ensemble, \({y}_{i}^{\alpha }\) is the back-calculated experimental value i, Yi is the experimental restraint i, σi is the uncertainty of experimental restraint i, and SKL is the Kullback–Leibler divergence defined as

$${S}_{{\rm{KL}}}=\mathop{\sum }\limits_{\alpha =1}^{N}{w}_{\alpha }{\rm{ln}}\frac{{w}_{\alpha }}{{w}_{\alpha }^{0}}$$
(S18)

\({w}_{\alpha }^{0}\) is the vector of initial weights (which were uniform). We used the log-weights method to minimize the negative log-posterior117 and performed reweighting calculations for a range of θ values, as the optimal value of θ cannot be known a priori. Therefore, we conduct L-curve analysis117,118 by plotting SKL (entropy) on the x axis and the goodness of fit, quantified by the reduced χ2 value, on the y axis. The reduced χ2 was calculated against the experimental intensity ratios (Ipara/Idia). This is an effective method to prevent overfitting and introducing a minimal amount of bias into the prior ensemble117,119. After reweighting, we also calculated the effective fraction of frames contributing to the ensemble average119 as an indication of the extent of fitting.

$${N}_{{\rm{eff}}}=\exp (-{S}_{{\rm{KL}}})$$
(S19)

For RNCs, we used the same approach with an additional modification. Since the PRE depends on τC,eff and \({S}_{{\rm{NC}}}^{2}\) which are a function of the weights of individual structures in the ensemble, this consequently leads to changes in τC,eff and \({S}_{{\rm{NC}}}^{2}\) when reweighting is performed. Therefore, the conformer-specific PRE values that were used for reweighting are not the same anymore after reweighting. To account for this, we performed 20 iterative rounds of reweighting where each additional round receives input weights and τC,eff from the previous round. We found that this leads to convergence of the weights and conformer-specific PREs.

We found that for the ribosomal labelling sites, uL23 G90C and uL24 N53C, the reweighting results are sensitive to the specific ribosome structure used to fit the MTSL rotamer library to, since small variations in the local structure of the labelling site can lead to different rotamer distributions. We tested two different rotamer distributions for the ribosomal labelling sites (Extended Data Fig. 5a), finding that one of them (referred to as R2) gives better agreement with the intermolecular PRE data after reweighting and fits better into the expected density or MTSL rotamers when rotamers are fitted to ten high-resolution ribosome structures (Extended Data Fig. 5a). The R2 rotamer distribution is more representative of the expected variation from structural changes in the labelling sites and was therefore used for our final reweighting calculations.

Calculation of RDCs, R h and chemical shifts from molecular dynamics simulations

To back-calculate the Rh from static structures we used an approximate relationship between Rg and Rh values120, the latter being calculated from the programme HYDROPRO121. Thus, we calculated the Rg from Cα atoms using MDAnalysis122 and then converted it to Rh using

$${R}_{{\rm{h}}}=\frac{{R}_{{\rm{g}}}}{\frac{{\alpha }_{1}\left({R}_{{\rm{g}}}-{\alpha }_{2}{N}^{0.33}\right)}{{N}^{0.60}-{N}^{0.33}}+{\alpha }_{3}}$$
(S20)

N is the number of amino acids, α1 takes a value of 0.216 Å−1, α2 takes a value of 4.06 Å, and α3 has a value of 0.821. The estimated value of Rh (relative to the HYDROPRO calculation) has an average relative uncertainty120 of 3%. HYDROPRO itself has a relative uncertainty of ±4% with respect to experimental values121. Therefore, we treat the back-calculated ensemble-average Rh with a total relative uncertainty of ±5%. The ensemble average was calculated as previously described by Ahmed et al. for back-calculation of PFG-NMR derived values123 of Rh

$$\langle {R}_{{\rm{h}}}\rangle ={\rm{ln}}{\left(\langle \exp \left(-{R}_{{\rm{h}}}^{-1}\right)\rangle \right)}^{-1}.$$
(S21)

Chemical shifts were calculated using the SHIFTX2 software124 and RDCs were calculated using the global alignment prediction method implemented in PALES125. We then scaled the magnitude (that is, the extent of alignment) of the calculated RDCs by a global factor to optimize the Q-factor for each ensemble.

Calculation of SAXS profiles from molecular dynamics simulations

We used Pepsi-SAXS126 to compute the theoretical scattering profiles of each conformer in the molecular dynamics ensembles. We treated the contrast of the hydration layer p) and the effective atomic radius (r0) as global parameters and used values of 3.34 e nm−3 and 1.025 × rm (rm = average atomic radius of the protein) in line with previous work that showed these parameters to well suited for flexible proteins127. The constant background and scale factor were also fitted globally using least-squares regression103,127. The goodness of fit was assessed using the reduced χ2 metric, where n is the number of datapoints, q is the scattering angle, \({I}_{q}^{{\rm{calc}}}\) and \({I}_{{q}}^{\text{exp}}\) are the calculated and experimental scattering intensities, respectively, and σq is the experimental error:

$${\chi }_{r}^{2}=\frac{1}{n}\mathop{\sum }\limits_{{q}}^{n}\frac{{({I}_{{q}}^{\text{calc}}-{I}_{{q}}^{\text{exp}})}^{2}}{{\sigma }_{{q}}^{2}}$$
(S22)

Structural analysis

The Python package MDAnalysis122 and MDTraj128 were used for general analysis of the ensembles involving atomic coordinates. For native contact analysis, we calculated the fraction of native contacts (relative to the native FLN5 crystal structure) as129

$$Q(X)=\frac{1}{N}\sum _{ij}\frac{1}{1+{{\rm{e}}}^{\left(\beta \left({r}_{i,j}-\lambda {r}_{i,j}^{0}\right)\right)}}$$
(S23)

where ri,j and \({r}_{i,j}^{0}\) are the distances between atoms i and j in frame X and the template structure, respectively, β modulates the smoothness of the switching function (default value 5 Å−1 used) and λ is a factor allowing for fluctuations of the contact distance (default value 1.8 used).

Asphericity was calculated using MDAnalysis122 as defined by Dima and Thirumalai130:

$$\Delta =\frac{3}{2}\frac{{\sum }_{n=1}^{3}{\left({\lambda }_{i}-\bar{\lambda }\right)}^{2}}{tr{T}^{2}}$$
(S24)

\(\bar{\lambda }\) represents the mean eigenvalue obtained from the inertia tensor \(\bar{\lambda }=\,\frac{{\rm{trT}}}{3}\).

For the intrachain contact analysis, we defined contacts between Cα–Cα distances of less than 10 Å. The contact features qualitatively were unchanged when using lower cut-off values or when calculating contacts between all heavy atoms. Secondary structure populations were calculated using DSSP131 implemented in MDTraj. The SASA was calculated using GROMACS84. Clustering was also performed in GROMACS using the GROMOS algorithm132 and Cα RMSD cut-offs in the range of 1.2–1.8 nm.

Error analysis from ensembles

Errors from the molecular dynamics ensembles were estimated using a block analysis of the full concatenated ensembles (composed of multiple statistically independent simulations). We performed block analysis for the concatenated ensembles to verify that the estimate of the standard error of the mean (s.e.m.) plateaus/fluctuates at block sizes larger than blocks corresponding to the individual trajectories. The final block size was chosen either in the plateau region of block analysis plots or corresponding to the blocks of the statistically independent simulation (10 independent simulations were run for the isolated and RNC systems with the C36m+W force field, and thus 10 blocks were chosen for block analysis and error estimation). The error after reweighting with PRE-NMR data was calculated the same way using a weighted standard error, where blocks are weighted according to the weights obtained from reweighting with PRE-NMR data. Exemplar block analysis plots are shown in Supplementary Fig. 8.

Energetic analyses from structural ensembles

The conformational entropy was calculated as defined by Baxa et al.133. Proline, glycine and alanine entropies were calculated from the backbone probability distribution Pi(Φ,Ψ). Residues with a maximum of two sidechain torsion angles, Xn, the entropy was calculated from the probability distribution Pi(Φ,Ψ,X1,X2), while residues with more sidechain torsion angles was calculated from the sum of entropies obtained using the Pi(Φ,Ψ,X1), and Pi(Xn), after subtraction of the entropy obtained from Pi(X1). Entropies were calculated from probability distributions using \(S={-k}_{{\rm{B}}}{\sum }_{i=1}^{n}{P}_{i}{\rm{ln}}({P}_{i})\). We used a block analysis from the pooled ensembles (i.e., all individual trajectories concatenated together) to check that the entropy difference between on and off the ribosome is robust with respect to sampling by calculating entropy changes with increasing amounts of total sampling (from the 15 and 20μs of concatenated sampling for the RNC and isolated protein, respectively). The errors were then also estimated from the same sampling/block sizes up to 7.5 μs of molecular dynamics sampling. This is because the estimate of entropy differences trend increases up to total sampling times of 7.5 μs (Extended Data Fig. 6g).

The energetic contributions due to changes in solvation were estimated based on empirical relationships between changes in the polar and apolar accessible surface area75,134 (ΔASApolar and ΔASAapolar). The apolar and polar surface area of the protein were defined based on the atomic partial charges in the C36m force field85. Atoms with an absolute charge of less than or equal to 0.3 were defined as apolar. The change in heat capacity of hydration is related to these quantities by

$$\Delta C=\Delta {C}_{{\rm{apolar}}}+\Delta {C}_{{\rm{polar}}}=\alpha \times \Delta {{\rm{ASA}}}_{{\rm{apolar}}}+\beta \times \Delta {{\rm{ASA}}}_{{\rm{polar}}}$$
(S25)

where α and β are 0.34 ± 0.11 and −0.12 ± 0.12 cal mol−1 K−1 Å−2, respectively. We obtained these values as an average and standard deviation of parameters previously reported in the literature as summarized in ref. 135 to account for the uncertainty of the parameters in addition to the uncertainty coming from conformational sampling in our simulations. The enthalpy change due to solvation is then obtained from75

$$\Delta {H}_{{\rm{solv}}}\left(333\,{\rm{K}}\right)=\gamma \times \Delta {{\rm{ASA}}}_{apolar}+\delta \times \Delta {{\rm{ASA}}}_{polar}$$
(S26)
$$\Delta {H}_{{\rm{solv}}}\left(T\right)=\Delta {H}_{{\rm{solv}}}\left(333\,{\rm{K}}\right)+\Delta C(T-333\,{\rm{K}})$$
(S27)

T is the temperature and γ and δ constants taking on values of −8.44 and 31.4 cal mol−1 Å−2, respectively. While we are not aware of alternative parameter sets for the solvation enthalpy (equation (S26)) in the literature, we treated these parameters with a relative uncertainty of 50% to show that even with such high levels of uncertainty our conclusions are not affected. Finally, the solvation entropy and change in free energy are then calculated using

$$\Delta {S}_{333{\rm{K}},{\rm{solv}}}=\Delta {C}_{{\rm{apolar}}}\,{\rm{ln}}\left(\frac{T}{{T}_{{\rm{apolar}}}}\right)-\Delta {C}_{{\rm{polar}}}\,{\rm{ln}}\left(\frac{T}{{T}_{{\rm{polar}}}}\right)$$
(S28)
$$\Delta {G}_{{\rm{solv}}}=\Delta {H}_{{\rm{solv}}}-T\Delta {S}_{{\rm{solv}}}$$
(S29)

where Tapolar and Tpolar are the temperatures at which ΔSsolv,apolar and ΔSsolv,polar are 0 (385 K and 335 K, respectively). Our previous work indicated ribosome solvation changes during coTF is not a major factor in coTF thermodynamics (see Supplementary Note 9), we estimated the above quantities using surface areas calculated excluding the ribosome. We regard these absolute quantities as an estimated upper bound for ΔGsolv because it is likely that folding intermediates and the native state also interact with the ribosome6, thus effectively cancelling out any reduction in SASA of the unfolded state due to ribosome interactions. However, in the following section we describe an alternative, more direct approach for the solvation entropy that does not rely on this assumption.

Calculation of solvation entropy changes using the 2PT method

The water and solvation entropy changes were also assessed more directly from molecular dynamics simulations using the two-phase thermodynamic (2PT) method136 implemented in the DoSPT code (https://dospt.org/index.php/DoSPT)137. For these calculations, we chose five snapshots from our isolated FLN5 A3A3 V747 simulations detailed above (that is, with different initial protein conformations and solvent configuration) and use these to initiate short molecular dynamics simulations for entropy calculations. We first re-equilibrated the boxes for 10 ns at the target temperature in the NPT ensemble at 1 bar using the velocity rescaling algorithm89 and the Parrinello–Rahman algorithm91 as detailed above and the velocity Verlet integration algorithm (md-vv in GROMACS84). Production simulations were then run in the NVT ensemble at 283 K and 298 K (to assess the effect of temperature on the water entropy calculations) for 20 ps using the md-vv integrator and saving coordinates and velocities for analysis every 4 fs. Control simulations of pure TIP3P (CHARMM TIP3P) water in a cubic box with a box vector length of 5 nm, resulting in 4,055 water molecules. Five independent simulations were performed by first energy minimizing the system using the steepest-decent algorithm. Then, using a 2 fs timestep and thermostat/barostat settings as for the protein and the md-vv integrator we equilibrated the water box first in the NVT ensemble for 1 ns, followed by 1 ns in the NPT ensemble using the Berendson barostat90. The water box was then further equilibrated in the NVT ensemble for 1 ns prior to the production simulation in the NVT ensemble for 20 ps, saving coordinates and velocities every 4 fs. These production simulations were also performed at 283 K and 298 K and then used to calculate the molar entropies of pure water at these temperatures with DoSPT.

For water entropy calculations in the protein system, we first analysed the radial distribution function water surrounding the protein molecule using our 15 μs and 20 μs molecular dynamics ensembles of the isolated protein and RNC and the GROMACS rdf functionality84 to identify the region of the first two hydration shells that show significantly reduced water dynamics. Using this analysis, we chose a distance cut-off of 3.5 Å between the protein and water centre of mass to define the hydration layer around the protein. With this criterion we then calculated the probability distribution and average number of water molecules in the hydration layer to assess the difference in solvation on and off the ribosome. Water molecules that remain within a defined distance range from the protein during the entire 20 ps production simulation were then selected to calculate the average molar entropy per molecule of water in different environments with DoSPT. The accessible volume for this subsystem was estimated by using the average volume occupied per water molecule in a pure water box under identical conditions multiplied by the number of molecules. To obtain the change in solvation entropy (difference between the RNC and isolated system, ΔSsolv,RNC-iso), we used

$$\Delta {S}_{{\rm{solv,RNC}}-{\rm{iso}}}={N}_{{\rm{diff}}}\Delta {S}_{{\rm{solv,water}}}$$
(S30)

where Ndiff is the average difference in the number of water molecules in the hydration layer (RNC-iso) and ΔSsolv,water is the entropy difference between water molecules in the hydration layer (0–3.5 Å from the protein) and water molecules in bulk solution (defined here as 36–46 Å from the protein).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.