The ribosome lowers the entropic penalty of protein folding

Most proteins fold during biosynthesis on the ribosome1, and co-translational folding energetics, pathways and outcomes of many proteins have been found to differ considerably from those in refolding studies2–10. The origin of this folding modulation by the ribosome has remained unknown. Here we have determined atomistic structures of the unfolded state of a model protein on and off the ribosome, which reveal that the ribosome structurally expands the unfolded nascent chain and increases its solvation, resulting in its entropic destabilization relative to the peptide chain in isolation. Quantitative 19F NMR experiments confirm that this destabilization reduces the entropic penalty of folding by up to 30 kcal mol−1 and promotes formation of partially folded intermediates on the ribosome, an observation that extends to other protein domains and is obligate for some proteins to acquire their active conformation. The thermodynamic effects also contribute to the ribosome protecting the nascent chain from mutation-induced unfolding, which suggests a crucial role of the ribosome in supporting protein evolution. By correlating nascent chain structure and dynamics to their folding energetics and post-translational outcomes, our findings establish the physical basis of the distinct thermodynamics of co-translational protein folding.

We recorded paramagnetic relaxation enhancement (PRE) NMR experiments of FLN5 A3A3 on and off the ribosome (Extended Data Fig. 1B-D).PREs provide a sensitive measure of long-range distances (10-25Å) within disordered proteins 106, [143][144][145] .PREs were measured using the 1 H, 15 N-correlated NMR signal intensities of the paramagnetic state resonances relative to those recorded under diamagnetic (i.e., reducing) conditions and plotted along the protein sequence (Extended Data Fig. 1C-D, see Supplementary Figure 1 for full spectra).Six labelling sites within the FLN5 sequence (without the native cysteine, C747V), were chosen for MTSL labelling, focusing on the C-terminal region (N730-K746) that interacts with the ribosome 61 (Extended Data Fig. 1B).
For the isolated protein, short-range effects (~15-30 residues from the labelling site) are observed, suggesting compaction (Extended Data Fig. 1D).Moreover, long-range effects (sequence-specific broadenings) are observed along the polypeptide chain in regions distal to the labelling sites (Extended Data Fig. 1D), including within N-terminal residues A665-K680 (in the C657 dataset), and C-terminal residues D720-I738 (C699 & C706 datasets).Additionally, broadening is observed consistently around G700 (C734, C740 and C744 datasets) (Extended Data Fig. 1D).The same six cysteine variants (and two additional sites, Extended Data Fig. 1D-F) were then spin-labelled as RNCs.Several sites located near the C-terminus act as probes to the ribosome-interacting segment (residues N730-K746) 7,11 of FLN5 (Extended Data Fig. 1D, shaded area) whose resonances are broadened beyond detection (even in the absence of the spin label) 7 .
The PRE profiles for each of the spin-labelled RNCs generally show broadening of resonances in the vicinity of the labelling sites, with few indications of long-range effects (Extended Data Fig. 1D, Supplementary Figure 1).Interestingly, despite being tethered to the ribosome, which is expected to slow tumbling of the protein 25,146 and therefore increase the extent of broadening due to the PRE effect 147 , the RNC shows reduced broadening relative to the isolated protein in all datasets (Extended Data Fig. 1D).See supplementary note 2 for further details.This observation is made for all labelling sites, with a consistent trend across the protein sequence.The differences between the isolated and RNC samples are particularly apparent for dataset C740 and C744.
Intermolecular PREs between the NC and ribosome surface were measured to obtain additional distance restraints informing on the vicinity of the NC to the ribosome surface.They were obtained via spin-labelled RNC variants in which single cysteines had been introduced onto the surface of the ribosome in either the solvent-exposed uL24 loop (N53C) or C-terminal tail of uL23 (G90C) near the exit tunnel (Extended Data Fig. 1E) using CRISPR-Cas9 2 .A greater extent of broadening in the uL23 dataset shows the NC's N-terminal preference to interact with regions near uL23 upon emerging from the exit tunnel (Extended Data Fig. 1F).Collectively, the experimental PRE data show that the unfolded state exhibits major differences in the conformations on the ribosome, appears to be more expanded with fewer intramolecular contacts, and exhibits orientational preferences upon emerging from the tunnel.
All samples for PRE experiments were produced with uniform 15 N isotopic labelling in E.coli 7,11,61 , purified to homogeneity, and spin-labelled with MTSL; the latter and sample integrity were confirmed using polyacrylamide gel electrophoresis, mass spectrometry, and interleaved NMR translational diffusion measurements (see Extended Data Fig. 2A-E,H).Complete spin-labelling of the isolated protein was confirmed using mass spectrometry, showing >98% labelling for two tested labelling sites (Extended Data Fig. 2A-B).For the RNC, MTSL labelling of the NC was qualitatively confirmed using a fluorescent MTSL analogue and in-gel fluorescence (see Methods, Extended Data Fig. 2C).In the NMR spectra, labelling sites C657, C699, C706, and C720 exhibited complete local broadening around the labelling site, suggesting that labelling was essentially complete.We also verified, using a cysteine mass-tagging and western blot approach 7 , that the proximity of NC cysteines to the ribosome surface does not impede its accessibility for MTSL labelling.Cysteines placed at C699 and C740 show indistinguishable reactivity to a maleimide tag (Extended Data Fig. 2D).Labelling of engineered cysteines on the ribosome surface was confirmed using fluorescein-maleimide and in-gel fluorescence (Extended Data Fig. 2E).As RNC labelling was accompanied by background labelling of several ribosomal proteins (Extended Data Fig. 2C), we measured PREs in a cysteine-free FLN5+31 A3A3 variant, C747V (Cys) and observed no PRE effects in the NC (Extended Data Fig. 2F).This result suggests that non-specific spin-labelled ribosomal proteins and intermolecular effects between RNC molecules at this concentration (~10M) do not contribute any PRE effects and thus any PREs observed in the RNCs originate exclusively from the introduced cysteine.

Note 2: Impact of slower dynamics on the measured PRE-NMR intensity ratios
The PRE-NMR data of the isolated and RNC constructs indicate that the structural ensemble on the ribosome is more expanded structurally (i.e., has a larger average Rg).The comparison between the data of the protein on and off the ribosome is, however, complicated by the fact that PRE-NMR intensity ratios depend both on the distance distributions and dynamics within the protein.Due to being tethered to the large ribosomal particle and ribosome surface interactions, the NC experiences slower dynamics particularly towards the C-terminus (higher C values, Supplementary Tables 5-6), as indicated by reduced NMR signal intensities 7,12 .We therefore explored the impact of these increased dynamics by performing additional experiments of isolated FLN5 A3A3 (labelled at position C740) in different concentrations of glycerol.We reasoned that this would globally (across the sequence) lead to slower dynamics without changing the ensemble properties and distance distributions significantly.Indeed, we measured the diffusion coefficients of the protein under all conditions (0%, 5% and 18% glycerol) and found that its diffusion and radius of hydration does not change significantly, accounting for viscosity changes by diffusion measurements of DSS (Extended Data Fig. 2J).The PRE-NMR intensity ratios, however, showed that with increasing glycerol concentrations the intensity ratios become lower across the entire sequence (Extended Data Fig. 2K).This is in line with the expected PRE-NMR profiles for the same ensemble with slower dynamics (a higher C uniform value, Extended Data Fig. 2L).On the ribosome, a similar effect would be expected although with greater magnitude towards the C-terminus where tethering will restrict dynamics more significantly (Extended Data Fig. 2M).These experiments and calculations show that if the ensemble on the ribosome had the same distance distributions (and Rg) the PRE-NMR intensity ratios would be expected to be lower than those of the isolated protein.
However, we observe for all labelling sites that the intensity ratios are higher on the ribosome, and this increase cannot be explained by dynamics (as dynamics would have the opposite effect).The higher PRE-NMR intensity ratios on the ribosome compared to in isolation thus support the notion that the chain is structurally expanded on the ribosome.Similarly, the higher PRE-NMR intensity ratios at short RNC length (FLN5+31) compared to longer lengths (FLN5+47 and FLN5+67) therefore supports the notion that the NC becomes less expanded as the ribosome elongates the polypeptide.

Note 3: Calculation of PRE rates for Bayesian reweighting
To quantify the transverse PRE relaxation rates (2) from intensity ratios, the approach outlined for 1 H-15 N HSQC 104,148 experiments was adjusted to account for the effect of 1 H-1 H couplings active during the multiple-quantum evolution period (t1) in the 1 H-15 N SOFAST-HMQC experiment as following: Eq. S31 where Idia and Ipara are the peak intensities (heights) of paramagnetic and diamagnetic respectively,  is the total time during which 1 H relaxation occurs (10.9 ms), and R2 dia and R2 para are the transverse relaxation rates for paramagnetic and diamagnetic amide protons, respectively.The R2,H and R2,MQ components of R2 dia and R2 para were calculated by multiplying the linewidths (LW) from the first and second dimension of the 1 H-15 N SOFAST HMQC spectra by π: 2, =  1     2, =  15    Eq.S32 The total electron spin-enhanced relaxation rate R2 para is the sum of the intrinsic R2 dia and spin contribution 2 149 : 2  =  2  +  2 Eq.S33 After substituting equation S33 into equation S31, the intensity ratio for a particular amide proton is given by: The transverse PRE relaxation rates (2) can be obtained by numerically solving equation S34 (performed in MATLAB).

Note 4: Classification of PRE restraints for Bayesian reweighting
Four classes of PRE transverse relaxation rates (2) were used for Bayesian reweighting.Peaks with intensity ratios between 0.2-0.8 were restrained as the calculated 2 with propagated upper and lower bound relative errors derived as standard deviation of the spectral noise.Peaks with intensity ratios between 0.1-0.2 and 0.8-0.9 were restrained in a similar manner with an addition of 4% (isolated FLN5 A3A3) and 5% (FLN5 A3A3 RNC) propagated lower and upper bound errors.Peaks with an intensity ratio >0.9 were restrained only with an upper bound of 2.19 s -1 (isolated FLN5 A3A3) and 3.65 s -1 (FLN5 A3A3 RNC); peaks with intensity ratios <0.1 were restrained only with a lower bound of 64.48 s -1 (isolated FLN5 A3A3) and 96.02 s -1 (FLN5 A3A3 RNC).

Note 5: Comparison of force fields for isolated FLN5 A3A3
To obtain a high-resolution structural ensemble describing the unfolded FLN5 A3A3 variant, we obtained four different initial ensembles generated by different force fields.Three ensembles were generated by all-atom MD simulations in explicit water and one ensemble was generated using a coarse-grained model (referred to as "CA", see methods).For all-atom simulations, two ensembles were simulated using the recently developed CHARMM36m (C36m) force field 86 , which was optimised to yield better descriptions of both folded and unfolded proteins.As the authors pointed out in their paper, for some IDPs the C36m force field still yields ensembles that are more compact than expected from experiments 86 , and it was proposed to use a modified water model where the water hydrogen Lennard-Jones well depth was increased, yielding more favourable protein-water dispersion interactions.Here, we test this possibility for our system by running simulations with the original CHARMM TIP3P water model (C36m) as well as this modified water model (referred to as C36m+W in this work).The third all-atom ensemble was obtained using the a99sb-disp force field, which was recently developed also to result in state-ofthe-art descriptions of both folded and disordered proteins 87 .We ran a total of 10s of unbiased MD simulations for all all-atom force field and water model combinations.
All ensembles, except the C36m ensemble, have similar distribution of radius of gyration (Rg) (Extended Data Fig. 3A).Consistent with previous reports 86, 150 , we observe a large increase in Rg for C36m+W compared to C36m.C36m generated the most compact ensemble (19.6 ± 0.4Å Rg) while C36m+W yielded the most expanded ensemble (30.3 ± 1.3Å Rg).The a99sb-disp and CA ensembles are slightly more compact compared to C36m+W with Rg values of 28.1 ± 0.9Å and 28.0 ± 0.1Å, respectively.We also analysed the fraction of native contacts relative to natively folded FLN5 (i.e., based on the crystal structure 94 PDB ID 1QFH).We find that all ensembles sample very few native contacts and have on average 2-4% of native contacts present (Extended Data Fig. 3B-C).The ensembles also differ substantially in the observed secondary structure propensities (Extended Data Fig. 3C-D).While the CA model predominantly samples coil structures as expected (99.1%), the allatom MD ensembles have residual secondary structure to various degrees.The a99sb-disp ensembles samples the most helical structure (6.7 ± 0.7%), particularly at the N-terminus, in the D-and F-strand regions, as well as the E-F loop (Figure 1C-D).C36m on the other hand samples the most -strand structures (9.5 ± 1.3%), predominantly at the N-terminus, strands A', C, C', D, F, and G.These secondary structure preferences match the native secondary structure propensities surprisingly well.
The C36m+W ensembles has preferences for -strand structures in similar regions, but to a lesser extent overall (total 4.1 ± 0.5%), in agreement with its lower compactness compared to C36m.Finally, we analysed the inter-residue contact probabilities of the different ensembles.This analysis shows that C36m samples the most inter-residue contacts (Extended Data Fig. 3E), likely due to its higher compactness, whereas C36m+W and a99sb-disp have weaker contact preferences.Overall, the allatom MD ensembles appear to substantially sample non-native contacts (i.e., outside the black contours which represent the native contact map).However, some native-like contacts are indeed also present.The C36m ensembles samples contacts between strands D&E, C&F and F&G, while the C36m+W ensembles contains contacts between strands B&E and C&F, but with weaker propensity.
The a99sb-disp ensemble only samples native-like contacts between strands D&E and appears to be the most non-native.The CA ensembles, contrary to the others, samples barely any long-range contacts with most contacts near the diagonal, mainly at the C-terminus (Extended Data Fig. 3E).
We then proceeded to refine the isolated ensembles using a Bayesian inference (reweighting) approach 118 .As expected, the agreement with the experimental data increases dramatically in all cases (Extended Data Fig. 3F-I).However, the C36m ensemble shows significant local discrepancies, particularly near the labelling sites resulting in a lower fraction of frames contributing to the ensemble average after reweighting (Neff, 0.11) and worse overall agreement with the PRE data ( 2 , 25.05) compared to the other ensembles (Extended Data Fig. 3F-I).This suggests that this prior ensemble contained few structural states that can explain the experimental data and significant modification to the prior is necessary.The local disagreement near the labelling sites is indicative of a rigidified backbone in these regions, potentially due to sampling overly compact structures.Indeed, the C36m ensemble is by far the most compact (average Rg of 19.6 ± 0.4 Å).The C36m+W and a99sb-disp are modified to a similar extent with a fraction of frames of 0.50 and 0.52, respectively.Their agreement with the PRE data is considerably better compared to the C36m ensemble, achieving values of  2 of 3.85 and 2.35 for C36m+W and a99sb-disp, respectively.This is broadly consistent with these ensembles sampling fewer compact states compared to C36m.Lastly, the CA ensembles achieved the highest Neff and the lowest  2 value (0.83 and 1.61, respectively).Overall, C36m+W, a99sb-disp and the CA ensembles reach reasonable agreement with the PRE data, without excessive amounts of fitting, while the C36m ensemble appears to be a poor prior ensemble.In all cases, a balance between the force field and experimental is chosen using an L-curve analysis (Extended Data Fig. 3I).

Note 6: Comparison of isolated FLN5 A3A3 ensembles with validation data
We first assessed the agreement with the experimental radius of hydration (Rh) obtained from pulsefield gradient (PFG) NMR.This experiment provides a global measure of the compactness of the unfolded state.Compared to the experimental Rh, the C36m ensemble shows the largest discrepancy with an average Rh of 23.3 ± 1.2 Å and 25.2 ± 1.3 Å before and after reweighting, respectively (Extended Data Fig. 4A-B).Thus, reweighting with the PRE data has led to an improvement in the agreement with Rh, but not sufficiently likely due to sampling predominantly compact states (Extended Data Fig. 3A).The C36m+W ensemble, on the other hand, has the best agreement with the experimental Rh before and after reweighting, with ensemble averages of 28.7 ± 1.4 Å and 30.6 ± 1.5 Å, respectively (Extended Data Fig. 4A-B).Therefore, reweighting with PRE data has also improved the agreement of this ensemble to almost within uncertainty (which comes predominantly from the forward model used to calculate Rh, see methods).For the a99sb-disp and CA ensembles the agreement also improves slightly after reweighting, however the change in Rh is smaller and yields ensembles averages less consistent than that of C36m+W with values of 28.7 ± 1.4 Å and 28.1 ± 1.4 Å, respectively.Overall, the posterior C36m+W ensemble therefore appears to best capture the average compactness (Extended Data Fig. 4B).
The same conclusion is reached when the ensembles are compared with small angle X-ray scattering (SAXS) data.The SAXS experimental data show that FLN5 A3A3 in isolation has an average Rg of 34.4 ± 0.6 Å by Guinier analysis (Extended Data Fig. 4F-H), and the Rg estimate is robust to the choice of method (we obtained 34.8 ± 0.1 Å by molecular form factor analysis 140 , Extended Data Fig. 4H).The C36m ensemble also shows the largest discrepancy in this case, with an average Rg of 19.6 ± 0.4 and 23.1 ± 0.7 Å before and after reweighting, respectively.The a99sb-disp and CA ensembles agree better with the SAXS-derived Rg value prior to reweighting with average Rg values of 28.1 ± 0.9 Å and 28.0 ± 0.1 Å, respectively.Their agreement further improves slightly after reweighting with PRE-NMR data resulting in average Rg values of 29.5 ± 1.1 and 28.3 ± 0.1 Å, respectively.Consistent with the Rh analysis, however, the C36m+W ensemble agrees best with the SAXS-derived Rg value with an average of 30.3 ± 1.3 and 34.7 ± 1.1 Å before and after reweighting, respectively.We also compared the back-calculated SAXS profiles from the MD ensembles with the experimental profile (Extended Data Fig. 4I).The same trend is observed with this analysis, showing that the C36m+W ensemble has the lowest reduced  2 before and after reweighting.This analysis also validates that reweighting with the PRE-NMR restraints improves the quality of the ensembles, as for all force fields we observe improved agreement with the SAXS profile when reweighted (Extended Data Fig. 4I).In particular, the C36m+W ensemble exhibits a remarkable agreement with the experimental profile after reweighting ( 2 = 0.7), supporting the accuracy of this force field when refined with PRE-NMR data.
When we compare the experimental chemical shifts with our MD ensembles, we find as in the case with Rh and SAXS, the C36m+W ensemble is in best agreement with all chemical shifts on average (Extended Data Fig. 4D).Indeed, the average RMSD for each nucleus is within the uncertainty of the forward model, suggesting that the ensemble captures the local and secondary structure preferences very well.This is both the case before and after reweighting, meaning reweighting does not substantially affect the ensemble-averaged chemical shifts.This is not surprising as chemical shifts report on very different phenomena than PRE experiments.However, in the case of the C36m ensembles, which did not agree well with PREs and Rh after reweighting, we find a modest but consistent worsening in the agreement with experimental shifts for all nuclei (Extended Data Fig. 4D), suggesting the comparison with chemical shifts comparisons can in some cases still detect scenarios of poor prior ensembles and overfitting.The a99sb-disp ensemble is in worse agreement with the HN, N and C chemical shifts compared to the two C36m and C36m+W ensembles, but in better agreement with the C' chemical shifts (Extended Data Fig. 4D).Finally, the CA ensemble is in worst agreement with chemical shifts overall (Extended Data Fig. 4D).It is not unexpected that this ensemble does not agree very well overall with chemical shifts due to the nature of the simple force field and approximations made during all-atom back-mapping 96 , but also shows that an ensemble without any secondary structure preferences agrees less well with chemical shifts.This ultimately supports that an ensemble with small populations of residual -strand structure (as predicted by C36m+W) is the most consistent prediction with respect to the experimental NMR chemical shifts.We also independently verified that FLN5 A3A3 does not sample significant amounts of -helix or -sheet structure using circular dichroism (CD) spectroscopy.The CD spectrum exhibits a strong minimum at 198nm and low amplitude between 210-230nm, characteristic of a fully disordered protein (Extended Data Fig. 4J).Moreover, we estimated the secondary structure populations from the experimental NMR chemical shifts using 2C 141 .These predictions showed that FLN5 A3A3 is predominantly coil and only samples very small amounts of residual -helical or -sheet structure, with a small preference for -sheet structure, as observed with the C36m and C36m+W ensembles (Extended Data Fig. 4K).
Lastly, we compared our MD ensembles with the experimental backbone RDCs measured in PEG/octanol.Strikingly, we find that again the C36m+W ensemble is in better agreement than the other three ensembles (Extended Data Fig. 4E), consistent with the trend observed for the Rh, SAXS and chemical shift analysis.The C36m+W has a Q-factor 0.55 after reweighting with PREs, while the other ensembles Q-factors of 0.65-0.75.This level of agreement is in the range expected for disordered proteins 87 .Qualitatively, the C36m+W ensemble captures the trend of the experimental profiles well, particularly the 1 DNH and 1 DCH RDCs (Extended Data Fig. 4E).The effect of reweighting on the RDC profiles is minor.
In summary, comparison with our validation data has consistently revealed that the ensemble obtained with the C36m+W force field is the most consistent with other experimental data before and after reweighting with the PRE data.While a99sb-disp and CA also achieve good agreement with the PRE data before and after reweighting, their agreement with orthogonal data (particularly SAXS and RDCs) is worse than with C36m+W.This is, at least in part, due balancing force field and experimental data (Extended Data Fig. 3I) and we have also not explored using SAXS or RDCs as reweighting restraints.
However, our analysis suggests that for the combination of C36m+W with PRE-NMR data as restraints results in the most accurate structural ensemble for this system.

Note 7: Convergence of isolated FLN5 A3A3 simulations using C36m+W
Given the C36m+W ensemble appeared to agree best with the experimental Rh, chemical shifts and RDCs, we appended the C36m+W simulation by running an additional 5x2s from new starting structures resulting in an ensemble worth 20s of total sampling.It was then assessed whether the properties observed in the two sets of 10s ensembles and the full 20s ensemble showed consistent behaviour including agreement with experimental data.Convergence of MD simulations describing disordered proteins is highly challenging 151 , but we are interested in whether the main conclusions and structural properties remain qualitatively consistent.
Prior to reweighting, all ensembles showed similar global structural properties including Rg, Q and secondary structure (Supplementary Figure 5A-E).We then reweighted both the first and second set of 10s ensemble and the full 20s ensemble using the PRE data (Supplementary Figure 5J-L).The PRE profiles were similar before reweighting and as expected even more after reweighting (Supplementary Figure 5K-L).The agreement with the validation data also remained highly similar (Supplementary Figure 5M-O), with all C36m+W ensembles being in better agreement with all the validation data compared to the other force fields.This also suggests that the structural properties captured by the validation data appear to be reasonably converged.Upon inspection of the ensemble structural properties, we find that after reweighting the ensembles from 10s and 20s of sampling are more similar than before reweighting with respect to global properties (Rg and Q), secondary structure and long-range contacts (Supplementary Figure 5F-I).Therefore, we consider our results qualitatively robust with respect to sampling, particularly after reweighting with PRE-NMR data.

Note 8: Expansion of the unfolded state on the ribosome
We further considered whether the large conformational expansion observed on the ribosome (26% increase in Rg) could be a consequence of steric exclusion and tethering alone.Therefore, we additionally compared our atomistic ensemble with a simpler MD model where only steric exclusion is considered (see methods).Based on this model, we find an increase in Rg of ~9% from 34.6 ± 0.2 Å to 37.6 ± 0.2 Å relative to the isolated protein (Extended Data Fig. 5L) and, also, that the probability of interaction between the C-terminus of FLN5 and the ribosome is decreased as expected (Extended Data Fig. 5M).This suggests that the large expansion of the structural ensemble we observed with the RNC is partially (but not fully) caused by tethering and steric exclusion from the ribosome.Other proteins may consequently experience the same effect.
We also note that the structural expansion of the unfolded state on the ribosome (relative to the isolated protein) is also observed prior to reweighting the ensembles with PRE-NMR data.Thus, our all-atom MD simulations also predict the expansion (increase in Rg and solvent-accessible surface area) independent of experiments (Extended Data Fig. 5N-O).Moreover, removing the C-terminal datasets (C734, C740, C744), which show the most obvious differences between the isolated and RNC PRE-NMR profiles, does not significantly affect the structural expansion after reweighting (Extended Data Fig. 5N-O, Supplementary Tables 2-4).

Note 9: Assumptions and uncertainties involved in estimating solvation entropy and enthalpy changes from the solvent-accessible surface area/MD ensembles
The analysis of SASA/NC hydration to estimate thermodynamic parameters and consequences for folding involves the following main uncertainties and assumptions: 1.The uncertainty in the SASA values must be considered, which is calculated from the MD ensembles (as standard errors of the mean) and reflects the uncertainty due to finite sampling.
2. The empirical parameters used to convert SASA changes to thermodynamic parameters (using equations S25-S28) have an uncertainty themselves.To account for these (parameter) errors, we took an average and standard deviation from values reported in the literature 136 and used standard error propagation to include them in the final results (see Methods).
3. Assessing the potential changes in ribosome surface solvation (and ion binding, see point below) during coTF is currently not possible.We thus assume that the main contributing factor to coTF energetics from solvation arises from changes in NC solvation, without any significant contribution from changes in ribosome surface solvation.Ribosome surface solvation may be influenced by NC-ribosome interactions.This assumption is supported by earlier experiments.
Ribosome-NC interactions (and thus ribosome surface solvation) can be effectively modulated by varying the ionic strength (e.g., with arginine/glutamate salt mixtures).We previously found that increasing the ionic strength leads to a slight increase in coTF intermediates and native state on the ribosome by reducing the ribosome interactions and stability of the unfolded state by up to ~ 1 kcal mol -1 6,7 .However, these energetic changes are significantly smaller than the difference we observe between folding on and off the ribosome for FLN5 6 , where unique intermediates are formed and stabilised by more than 4 kcal mol -1 on the ribosome.Thus, ribosome interactions and resulting changes in ribosome surface solvation are not sufficient to rationalise the large difference in folding thermodynamics on and off the ribosome.Our data of the FLN5 E6 mutant also shows that strongly reducing the ribosome interactions with the NC is not a major factor in driving the large enthalpy/entropy changes and formation of coTF intermediates (Extended Data Fig. 7E-H).
4. We also assume that potential changes in ion (e.g., Mg 2+ ) binding to the ribosome surface during coTF does not have a major effect.This is supported by earlier experiments that showed that the folding free energies on the ribosome are minimally affected by varying the Mg 2+ concentration 6 .

Note 10: Estimating the solvation entropy effects from direct water entropy calculations
To verify the predicted, strong solvation entropy effects, we additionally calculated this quantity using an orthogonal approach.We directly analysed solvent molecules around the unfolded protein and found a ~4% increase in the number of water molecules hydrating the unfolded state on (versus off) the ribosome (Extended Data Fig. 6J-L), which was converted into a change in solvation entropy using the two-phase thermodynamic method 137 (see methods).These analyses estimate an entropic destabilisation of the unfolded NC relative to in isolation, −∆ −, , by ~30 ± 10 kcal mol -1 at 298K (Extended Data Fig. 6M-P), indicating that the solvent entropy change due to the structural expansion of the unfolded NC is indeed a strong determinant of its altered energetics.

Note 11: Enthalpic destabilisation of the native state on the ribosome
An analysis of NMR 1 H, 13 C-methyl and 19 F chemical shifts of FLN5 RNCs reveals small but widespread chemical shift perturbations (relative to the isolated protein) throughout the protein structure, including sidechains buried in the core (Extended Data Fig. 7I-J).Similarly, the ribosome perturbs residuespecific dynamics of methyl sidechains within FLN5 (Extended Data Fig. 7K).These data show that the structure and/or the environment of the folded state is altered on the ribosome, despite <1% of ribosome interactions 6,25 , which may account for the less negative H of folding on the ribosome.
Indeed, factors such as steric exclusion in the vestibule and long-range electrostatic effects between the ribosome surface and negatively charged NCs have previously been suggested to play a role in destabilising native structure on the ribosome 6,9,30,31 .S2. PRE-NMR reweighting statistics (reduced  2 and fraction of effective frames, Neff) using different datasets for the FLN5+31 A3A3 RNC.The numbers in parentheses show the reduced  2 prior to reweighting.Global refers to the agreement with all datasets, work refers to the agreement with the dataset used for reweighting, and free refers to the dataset that was excluded during reweighting.The following lists describe the labelling sites that were included in the individual reweighting datasets:

Fig. S3 :
Fig. S3: Raw western blot images of FLN5+31 A3A3 V747 RNCs labelled with PEG-maleimide at different cysteine positions.The rectangular boxes show the cropped area of the gel presented in the Extended Data.

Fig. S5 :
Fig. S5: Convergence analysis of the C36m+W ensemble by comparing the first (#1) and second (#2) sets of 10s and the full 20s ensemble.MD-calculated properties are shown as the mean ± SEM obtained from block averaging.(A) Rg probability distributions of the ensembles before reweighting.(B) Native contact fraction (Q) probability distributions of the ensembles before before reweighting.(C) Secondary structure profiles along with the annotated native strands before reweighting.(D) Contact maps overlayed with the native contact map in black contours before reweighting.(E) Average properties ensembles before reweighting including the chemical shift score, quantifying the agreement with chemical shifts (see Methods).(F) Comparison of the C36m+W ensembles

Fig. S6 :
Fig. S6: Raw western blot images of HRAS refolding experiments in vitro.Refolded (R) and control (C) samples were subjected to pulse proteolysis (proteolysis times annotated).The rectangular boxes show the cropped area of the gel presented in the Extended Data.

Fig. S7 :
Fig. S7: Raw western blot images of HRAS refolding experiments in rabbit reticulocyte lysate (RRL).Refolded (R) and control (labelled with 'GDP') samples were subjected to pulse proteolysis (proteolysis times annotated).The arrows point to HRAS on the images and the rectangular boxes show the cropped area of the gel presented in the Extended Data.

Fig. S8 :
Fig.S8: (A-C) Block analysis to estimate the standard error of the mean (SEM) for the isolated and RNC ensemble properties including Rg (A), SASA (B) and number of water molecules in the hydration layer (C) before (prior) and after (posterior) reweighting with PRE-NMR data.The block size is shown in number of frames.The total number of frames in the ensemble is 100,000 for each ensemble (except for the analysis for the number of water molecules in the hydration layer; the RNC and isolated ensembles contain 15,000 and 20,000 frames, respectively, due to saving water coordinates only every 1ns).The red point shows the block size chosen to calculate the final error.

Table S1 .
Radii of gyration obtained from individual SAXS datasets at different concentrations (mg/ml) and from

Table S3 .
PRE-NMR reweighting statistics (reduced  2 and fraction of effective frames, Neff) using different datasets for isolated FLN5 A3A3.The numbers in parentheses show the reduced  2 prior to reweighting.Global refers to the agreement with all datasets, work refers to the agreement with the dataset used for reweighting, and free refers to the dataset that was excluded during reweighting.The following lists describe the labelling sites that

Table S4 .
Ensemble-averaged Rg and total SASA for isolated FLN5 A3A3 and the FLN5+31 A3A3 RNC shown before reweighting (prior) and for the different reweighting datasets as outlined in TablesS3 and S4.

Table S5 .
Prior RNC ensemble residue-and MTSL-labelling site specific effective C values for all non-proline residues in the NMR-visible region of the FLN5 A3A3 sequence calculated using equation S16 (see methods).

Table S6 .
Posterior RNC ensemble (reweighted with all ten labelling sites) residue-and MTSL-labelling site specific effective C values for all non-proline residues in the NMR-visible region of the FLN5 A3A3 sequence calculated using equation S16 (see methods).