In situ analysis of osmolyte mechanisms of proteome thermal stabilization

Organisms use organic molecules called osmolytes to adapt to environmental conditions. In vitro studies indicate that osmolytes thermally stabilize proteins, but mechanisms are controversial, and systematic studies within the cellular milieu are lacking. We analyzed Escherichia coli and human protein thermal stabilization by osmolytes in situ and across the proteome. Using structural proteomics, we probed osmolyte effects on protein thermal stability, structure and aggregation, revealing common mechanisms but also osmolyte- and protein-specific effects. All tested osmolytes (trimethylamine N-oxide, betaine, glycerol, proline, trehalose and glucose) stabilized many proteins, predominantly via a preferential exclusion mechanism, and caused an upward shift in temperatures at which most proteins aggregated. Thermal profiling of the human proteome provided evidence for intrinsic disorder in situ but also identified potential structure in predicted disordered regions. Our analysis provides mechanistic insight into osmolyte function within a complex biological matrix and sheds light on the in situ prevalence of intrinsically disordered regions.

(B) Linear regression between DSF-calculated ∆Tm and protein stabilisation score for combined data from Figure 3D.Shaded area represents the confidence interval of the linear fit (dashed line).(C) The plot shows the difference in enolase melting temperature (∆Tm) between osmolyte and control conditions, at different concentrations of enolase.∆Tm is calculated relative to the mean melting temperature of the control condition at each protein concentration.Results are based on a single experiment (n = 4 technical replicates).Points represent the mean +/-standard deviation.(D) The heatmap shows the significantly different biophysical features between proteins with high vs low correlation of the stabilisation score with that of the global proteome (first column, Cor) or for proteins significantly stabilised vs not stabilised by individual osmolytes.The colour intensity indicates significance levels.The colour indicates whether the feature is higher (red) or lower (blue) in proteins with high correlation (first column) and in stabilised proteins (other columns) compared to the rest.Feature significance was determined by two-sided t-test followed by a correction for multiple hypothesis testing with the Benjamini-Hochberg method.See Methods for how individual features were measured or predicted.

EWS
Additionally, the peptide intensities are scaled, using min-max scaling according to the following formula, separately for each peptide and condition.

x =
x − min(x) max(x) − min(x) Finally, Spectronaut_to_python() splits the dataframe into 2 lists according to their trypticity.Both files will contain the following columns:

Learning peptide intensity profiles a long temperature
In this step we apply Gaussian processes (GP) to learn the temperature profiles f or e ach p eptide i n different conditions.The approach was applied to filtered a nd s caled d ata f rom t he s tep a bove.I n d etail, we used gpytorch version 1.4.2 with an ExactGP model choosing a constant mean function, a squared exponential kernel and a Gaussian likelihood.For each peptide, separate GP models for the peptide intensities in absence (control condition) and presence of an osmolyte (osmolyte condition) as well as a joint model were defined and model hyperparameters were found by maximizing the sum marginal log-likelihood across all models using Adam optimizer with a learning rate of 0.1 and 1000 iterations.Based on the resulting posterior of the fit, p redicted m ean a bundance p rofiles an d co nfidence int ervals bas ed on 2 s ta ndard dev iations aro und the mean were found for each peptide and condition.The residual sum of squares between the observed peptide intensities and the predicted intensities are calculated for each peptide and condition to assess the goodness of the fit.
For both fully and half tryptic peptides we obtain two .csvfiles.One contains the newly fitted points for each peptide (solution_Small_molecule_FT.csv) and the second one contains information about the goodness of fit for each peptide (MLL_Small_molecule_FT.csv).
To highlight the different steps, we take the peptide QAVTNPQNTLFAIK as an example.Panel A below shows the raw MS data.After running the function Spectronaut_to_python(), we have scaled the intensity as shown in panel B. Finally, after performing GP fitting, we obtain fitted curves for each condition as well as for all data points combined as shown in panel C. Additionally, we compute confidence intervals for the fits (in panel D), which are later used to calculate the area between the curves.
Example output for peptide QAVTNPQNTLFAIK.A) Raw MS data B) Scaled MS data C) GP fitted data D) GP fitted data with confidence intervals.The line represents the GP fit and the shaded area represents the confidence intervals.

Python to list
Since the GP modeling was done in python, we merge the output again with the metadata about the run and the peptides such as trypticity and the protein, from which the peptide originated.Furthermore, half and fully tryptic peptides are now merged back into a single dataframe.

Calculate area: Quantify and classify the significant differences
After the fitting o f t he G P m odels, d istances b e tween t he l earnt c ontrol a nd o smolyte c urves w ere calculated for the temperature regions where their confidence i n tervals d o n o t o v erlap.I n t h e c a se o f overlapping confidence i n tervals, t h e d i stance w a s s e t t o 0 .T o s t udy s p ecifically pr ot ein st ab ilisation, temperature intervals with intensity changes were classified a s b inding ( changes a t t he s tart o f t he t emperature gradient), stabilisation (changes at the middle of the temperature gradient) or aggregation (changes at the end of the gradient).The distances between the curves in temperature intervals were summed up as a proxy for the area between the curves.For each region, an example is shown in the plots below.
Example peptides showing binding, (de-)stabilisation and aggregation effects.The line represents the fit and the shaded area represents confidence intervals of the fit.The respective areas between the curves, representing binding, (de-)stabilisation and aggregation are highlighted in red.
To illustrate better the calculation of the stabilisation score, we take the following peptides QAVTNPQNTL-FAIK,SLEGDEPEPLPQVR and QGDLICLDGK as an example; these are shown in the plots below.As shown in panel A, we observe a large non-overlapping region in the middle of the temperature range.In order to approximate this area, we first calculate the distances between small molecule and control curve at temperatures where confidence intervals do not overlap.This is checked by conditional statements comparing the upper bounds and lower bounds of the respective confidence i ntervals.The distances can then b e plotted against the temperature (panel B) and apex of the peak is found (X) to correctly classify the area as binding, aggregation or stabilisation.In the example, the area is classified as s tabilisation.Everything that contributes to the approximated area is colored in red as shown in panel C.
However, there are some cases, where the approach needs slight adaptation.Given how the area is approximated, a stabilised peptide with increasing profile w ill l ead t o a n egative s tabilization s core a s s hown in panels D and E, even though the p eptide is stabilized by the small molecule.In order to account for these cases, we first need to determine the profile sh ape of in dividual peptides.
Another case that needs to be addressed are peptides, which show a non-monotonous curve shape and are stabilized by the small molecule.Often, the fitted c urve u pon a ddition o f s mall m olecule i s t hen shifted towards higher temperatures as shown in panel G.This then leads to 2 areas, where one will have a positive sign and the other one will lead to a negative sum for the area (panel H).A gain, we use the profile shapes to discriminate these cases.The stabilization score in such cases is then the absolute value of the largest area as shown in panel I.The sections b elow will go into more detail how this is achieved.

Median cluster profiles after clustering
Using the mean cluster shapes shown above, we define ce rtain ch aracteristics, ba sed on wh ich we then classify the curves into the different profiles shapes.The following characteristics based on the median profile at each temperature of each cluster are calculated: • intensity difference = |IntensityT=37 -IntensityT=76| • maximal intensity difference = (max(Intensity) + 2) -(min(Intensity) + 2) • monotonous curve shape = |diff| < max(|diff|)/1.65 Median cluster profiles are then assigned to the previously defined profiles shapes using the flow chart shown below.

Clustering Overview
Going back to the examples of how area is calculated shown earlier, we then use the assigned clusters to determine the proper sign of the area.Given that SLEGDEPEPLPQVR has an increasing profile and its area initially has a negative sign due to how the area is calculated in the next step, this will be corrected by using the information of the profile shape and the score for this peptide is correctly classified as "stabilised".

Find significant peptides
As described above, we use clustering information as well as the calculated area between the curves to assign the 4 different effects (binding, (de-)stabilization and aggregation).Mainly, this function corrects the stabilization effect for peptides with an increasing profile for which initially destabilization would have been assigned due to the negative area.Furthermore, we correct for shifted non-monotonous curves being described as a change in aggregation instead of (de-)stabilization (as shown in panel I above).

Peptide fasta matching
Before aggregating scores from peptide level to protein level, we match peptides to their positions on the protein.Any peptide that will match to two different proteins or that still aligns to two different positions on the same protein will be discarded in this step.

Aggregate scores
Finally we calculate protein level stabilisation scores.To do so, we summarise the peptide level stabilisation scores into a protein level stabilisation score.This is achieved in 2 steps.First, we aggregate peptide-level scores of overlapping peptides to the amino acid level score.In the second step we aggregate amino-acid level score into a protein-level score.In both steps, we firstly calculate the mean of all scores, to determine whether the position/protein is overall stabilised (mean > 0) or destabilised (mean < 0).We then aggregate the score, using a 75% weighted quantile (weighted by the goodness of fit from the model) for stabilised positions and 25% weighted quantile for the destabilised positions.Next, we again use a 75% weighted quantile to aggregate the amino acid scores to a single protein level score for stabilised proteins and 25% weighted quantile for the destabilised proteins, shown in the two figures below.10.9 Aggregate_scores() aggregated_scores <-score_aggregation(significant_all,small_molecule, matched_peptides) Biophysical features of osmolyte-stabilised proteins.(A)The number of proteins stabilised by different numbers of osmolytes is plotted.Zero indicates proteins not stabilised by any tested osmolyte.
Direct binding of osmolytes and effects on native protein structure.(A) Distributions of scaled peptide stability scores for proteins stabilised by proline, betaine or glucose, plotted for known binders of each of these osmolytes (dark grey) and for all stabilised proteins (grey).Scores were derived from a single experiment, at 10 temperatures (2 LiP replicates each; n= 94 peptides from 13 proteins (binding); 12114 peptides from 1069 proteins (not binding)).Horizontal lines define the median and boxes the 25th and 75th percentiles; whiskers represent the maximum and minimum values.Significance is determined using two-sided Wilcoxon test (**** p-value < 0.0001).(B) As in A but plotted for all stabilised proteins (grey) and for known binders of mismatched osmolytes (red) (e.g, known betaine-binders from the proline and glucose datasets).Plots and statistical analysis as in A. (C) Distributions of peptide-level stabilisation scores for proteins stabilised by glucose, plotted for known binders of glucose (dark grey), of monosaccharides (grey), and for all stabilised proteins (light grey, right).Plots and statistical analysis as in A. (n= 48 peptides from 9 proteins (glucose), 63 peptides from 13 proteins (monosaccharides), 4619 peptides from 854 proteins (not binding)).(D) Linear regression between the stabilisation score for Glucokinase (Glk), Fructokinase (Mak) and Ribokinase (RbsK) in the presence of different osmolytes, and the mean stabilisation score across all detected proteins.Each point represents one osmolyte condition.Shaded area represents the confidence interval of the linear fit (dashed line).(E) DSF-measured melting temperature of Ribokinase (RbsK) at varying ribose concentrations in control (grey line) and upon addition of 1 M glucose (red), 1 M glycerol (purple) or 1 M TMAO (blue).Error bars show mean +/-standard deviation (n=5 replicates).Shaded area represents confidence interval of the fit.(F) The fraction of stabilised proteins that also show a change in native structure for each osmolyte, as determined by LiP-MS.Osmolyte effects on protein aggregation in native lysates.(A) TPP-measured fraction of stabilised proteins out of all detected proteins in the indicated osmolyte condition (right) and distribution of stabilisation scores for the corresponding stabilised proteins (left).Horizontal lines define the median and boxes the 25th and 75th percentiles; whiskers represent the maximum and minimum values.Significance is determined using two-sided Wilcoxon test (** p-value < 0.01, **** p-value < 0.0001).Scores were derived from a single experiment at 10 temperatures (2 technical replicates).Each plot represents all stabilised proteins per condition (n= 549 proteins (TMAO), 626 proteins (glucose), 353 proteins (proline)).(B) Differential analysis of protein abundance in soluble fraction between control and 1M TMAO conditions, at 68.2°C, 72.5°C and 76°C.Dashed lines indicate significance cutoff (adj.p-value <0.05, |log2FC| > 1).Each dot represents an individual protein.Proteins with significantly promoted (red) or decreased (blue) aggregation are indicated.(C) Distribution of LiP-MS-calculated stabilisation score for proteins with decreased aggregation in TMAO (TRUE, 8 proteins with >1 peptide per protein) compared to the rest (FALSE, 686 proteins with > 1 peptide per protein).Example TPP (protein) and LiP-MS (peptide) profiles for selected proteins under control (grey) and TMAO (blue) conditions are shown (right).Shaded area represents confidence interval of the fit.Plots and statistics as in A. (D) Distribution of fold changes (FC) between 76°C and 37°C in the absence of added osmolytes, plotted for proteins where TMAO promotes aggregation (TRUE, 158 proteins) vs the rest (FALSE, 1094 proteins)).Higher FC means protein mostly remains in soluble fraction even at higher temperatures.Plots and statistics as in A, C. (E) GO enrichment analysis for proteins where TMAO promotes aggregation.BP, biological processes, MF, molecular functions, and CC, cellular compartments.Significantly enriched terms (p-value < 0.01) for all three analyses are shown.(F) Relative concentration of soluble ribosomal recycling factor (Frr) after heating (70 °C) of the purified protein under indicated conditions, scaled for initial protein concentration.(n=6 replicates).(G) Circular dichroism-measured thermal denaturation profile of Frr in the indicated conditions.

Osmolytes have a global effect on protein stability
.(A)The plots (right) show peptide profile changes upon osmolyte addition.The plot (left) shows the fractions of peptides that correspond to these changes, coloured as on the right.(B) Peptide-level false discovery rate (FDR; see Methods) is shown for indicated analyses.(C) Protein-level FDR calculated for different quantiles used in summarisation of peptide-level scores into a protein-level score.The x-axis shows quantiles used for summarisation of stabilised (Stab.)or destabilised (Destab.)proteins.FDR is calculated by combining significantly stabilised or destabilised proteins.The red line represents peptide level FDR and blue line represents FDR level of 0.05, corresponding roughly to a 0.75:0.25 quantile split for stabilised:destabilised proteins (green line).Shaded area represents confidence interval of the fit.