A comprehensive computational study of amino acid interactions in membrane proteins

Mbaye, Mame Ndew; Hou, Qingzhen; Basu, Sankar; Teheux, Fabian; Pucci, Fabrizio; Rooman, Marianne

doi:10.1038/s41598-019-48541-2

Download PDF

Article
Open access
Published: 19 August 2019

A comprehensive computational study of amino acid interactions in membrane proteins

Mame Ndew Mbaye^1,2^na1,
Qingzhen Hou¹^na1,
Sankar Basu ORCID: orcid.org/0000-0003-1393-1982¹,
Fabian Teheux¹,
Fabrizio Pucci^1,3^na1 &
…
Marianne Rooman¹^na1

Scientific Reports volume 9, Article number: 12043 (2019) Cite this article

10k Accesses
34 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Transmembrane proteins play a fundamental role in a wide series of biological processes but, despite their importance, they are less studied than globular proteins, essentially because their embedding in lipid membranes hampers their experimental characterization. In this paper, we improved our understanding of their structural stability through the development of new knowledge-based energy functions describing amino acid pair interactions that prevail in the transmembrane and extramembrane regions of membrane proteins. The comparison of these potentials and those derived from globular proteins yields an objective view of the relative strength of amino acid interactions in the different protein environments, and their role in protein stabilization. Separate potentials were also derived from α-helical and β-barrel transmembrane regions to investigate possible dissimilarities. We found that, in extramembrane regions, hydrophobic residues are less frequent but interactions between aromatic and aliphatic amino acids as well as aromatic-sulfur interactions contribute more to stability. In transmembrane regions, polar residues are less abundant but interactions between residues of equal or opposite charges or non-charged polar residues as well as anion-π interactions appear stronger. This shows indirectly the preference of the water and lipid molecules to interact with polar and hydrophobic residues, respectively. We applied these new energy functions to predict whether a residue is located in the trans- or extramembrane region, and obtained an AUC score of 83% in cross validation, which demonstrates their accuracy. As their application is, moreover, extremely fast, they are optimal instruments for membrane protein design and large-scale investigations of membrane protein stability.

How AlphaFold2 shaped the structural coverage of the human transmembrane proteome

Article Open access 20 November 2023

Márton A. Jambrich, Gabor E. Tusnady & Laszlo Dobson

IMPROvER: the Integral Membrane Protein Stability Selector

Article Open access 16 September 2020

Steven P. D. Harborne, Jannik Strauss, … Adrian Goldman

Stabilization and structure determination of integral membrane proteins by termini restraining

Article 17 January 2022

Shixuan Liu, Shuang Li, … Weikai Li

Introduction

Biological membranes form permeable fences between the interior of cells and the external environment. They are composed of phospholipid bilayers, which form a particular, fluid, medium that differs from the surrounding aqueous solution. A lot of proteins are embedded in, attached to, or cross the membranes. We focus here on integral membrane proteins, which cross the membrane and have thus a transmembrane, and one or two extramembrane domains.

Membrane proteins are a very important class of proteins. They play key roles in the localization and organization of the cell, as well as in the cellular function by transferring specific molecules, ions and other types of signals from the cell exterior to the interior and vice versa. They constitute about 30% of the entire human proteome¹. They are the focus of a lot of pharmaceutical research, as they correspond to about 60% of the current drug targets².

In spite of their importance, membrane proteins have been much less studied than globular proteins. They are indeed very difficult to analyze, as their folding, native structure, stability and activity is reached only within the lipid bilayer, which complicates getting their experimental X-ray structures. Generally, their large size makes also difficult to obtain them by nuclear magnetic resonance spectroscopy. These are the reasons why transmembrane protein structures only represent about 2% of the available structures deposited in the Protein Data Bank (PDB)³. The analysis and modeling of the 3-dimensional (3D) structure of membrane proteins are thus key objectives for rationally guiding protein design and engineering experiments.

Due to the difference between the aqueous and lipid environments, the structure and composition of transmembrane regions substantially differ from those of the extramembrane domains and from globular proteins⁴. This implies that interactions that are favorable in globular regions are not necessarily so in transmembrane regions, and vice versa. However, the relative strength of the different types of interactions in the two environments is not easy to evaluate.

To tackle this issue, semi-empirical, physics-based, energy functions adapted to membrane proteins have been designed and used for computational modeling and design purposes (see^5,6 for reviews). Such potentials have also been used to orient proteins into membranes, using coarse-grained molecular dynamics simulations^7,8, or simplified potentials including anisotropic solvent models of lipid bilayers⁹. Another approach consists in deriving statistical potentials from sets of known membrane protein structures. Such potentials have been applied to evaluate structural models of membrane proteins^10,11,12,13 and to position proteins into lipid membranes^10,14.

Some authors analyzed separately α-helical and β-barrel proteins^15,16. Indeed, gram-negative bacteria have two membranes, an inner membrane composed of a phospholipid bilayer and an outer membrane which is an asymmetrical bilayer of phospholipids in the inner leaflet and lipopolysaccharides in the outer leaflet. This difference implies that the membrane proteins differ according to whether they are inserted in the inner or outer membrane. In particular, α-helical transmembrane proteins are mostly found in the cytoplasmic membranes of prokaryotic and eukaryotic cells and rarely in outer membranes, whereas β-barrel proteins have so far only been found in outer membranes of gram-negative bacteria, mitochondria and chloroplasts^17,18.

In this paper, we chose to apply the statistical potential formalism to derive distance potentials from trans- and extramembrane protein regions, as this yields an objective way to compare residue-residue interactions that prevail in lipid and aqueous environments. We also derived potentials separately on α-helical and β-barrel transmembrane regions to investigate whether differences are visible between interaction strengths. We should in principle also distinguish between extramembrane residues that are in the cytoplasmic, periplasmic or extracellular regions. For example, it has been shown that positive charges in α-helical domains are more often situated in the cytoplasmic domain where they make interactions with the lipid molecules^19,20, and that charged residues in β-proteins are more frequently located on the extracellular side^21,22. However, we chose to group these regions into a single category called extramembrane, which we occasionally separate into two subcategories: intracellular regions that are situated at the cellular side and are either cytoplasmic or periplasmic, and extracellular regions that can be periplasmic or really extracellular. Indeed, the number of membrane proteins with an experimental structure is currently too limited to yield reliable statistics if we define too many subregions.

Methods

Membrane and globular protein datasets

To set up our membrane protein dataset, we used the OPM database⁹, which contains experimental structures of integral membrane proteins. From these, we selected the proteins of which the structure was obtained by X-ray crystallography with a resolution of 2.5 Å at most. In a second step, we imposed a threshold on the pairwise sequence identity of 30%, with the help of the protein culling server PISCES²³. Our final dataset ${\mathscr{D}}$ contains 165 membrane protein structures, among which 108 α-helical and 52 β-barrel polytopic integral proteins, and 5 α-helical monotopic integral proteins that do not span the lipid bilayer completely. They are listed in Supplementary Material Table S1.

The proteins from this dataset were divided into their transmembrane and extramembrane regions, using the OPM annotations. Note that most of these annotations are predictions; the impact of this is discussed in the Conclusion section. We got in this way two datasets, the ${{\mathscr{D}}}^{{\rm{TM}}}$ set that contains all the transmembrane protein segments, and the ${{\mathscr{D}}}^{{\rm{EM}}}$ set that contains the extramembrane protein regions, and thus mix extracellular, periplasmic and cytoplasmic segments. We occasionally separated the ${{\mathscr{D}}}^{{\rm{TM}}}$ dataset into transmembrane regions with α-helical or β-barrel conformations. The protein segments that make up these datasets are specified in Table S2.

To the best of our knowledge, the dataset of protein membrane structures constructed in this paper is currently the largest non-redundant dataset used to derive effective potentials^10,11,12.

For comparison, we also considered the ${{\mathscr{D}}}^{{\rm{GL}}}$ dataset set up in²⁴, which contains 3,823 X-ray structures of globular proteins, with a resolution of maximum 2.5 Å and a pairwise sequence identity of 20% at most.

Statistical potentials

Statistical potentials are coarse-grained energy functions derived from frequencies of observation of associations between sequence and structure elements in a dataset of protein structures using the inverse Boltzmann law^25,26. In particular, we considered here the potentials:

$$\begin{array}{lllll}{\rm{\Delta }}W(s,d) & = & -{k}_{B}T\,\mathrm{ln}\,\frac{F(s,d)}{F(s)F(d)} & = & -{k}_{B}T\,\mathrm{ln}\,\frac{n(s,d)\,n}{n(s)\,n(d)}\\ {\rm{\Delta }}W({s}_{1},{s}_{2},d) & = & -{k}_{B}T\,\mathrm{ln}\,\frac{F({s}_{1},{s}_{2},d)}{F({s}_{1},{s}_{2})F(d)} & = & -{k}_{B}T\,\mathrm{ln}\,\frac{n({s}_{1},{s}_{2},d)\,n}{n({s}_{1},{s}_{2})\,n(d)}\end{array}$$

(1)

where k_B is the Boltzmann constant, T the absolute temperature conventionally taken to be room temperature, s, s₁ and s₂ amino acid types, n numbers of occurrences and F relative frequencies. d is the spatial distance between the side chain geometric centers of two residues separated by at least one residue along the chain; the type of one of these residues (s) or of both residues (s₁ and s₂) are specified. The distance values between 3 and 9.9 Å are divided into discrete bins of 0.3 Å width and the last bin contains all distances above 9.9 Å. Details about the computation of the potentials can be found in^24,26,27.

The potentials depend on the protein structure dataset from which the relative frequencies F are computed. Taking advantage of this dependence, a careful analysis of the relative strength of the interactions as a function of the temperature²⁸ and of the solubility²⁹ has been previously performed. Here, we extended this approach to membrane proteins and considered for that purpose the three datasets ${{\mathscr{D}}}^{{\rm{TM}}}$, ${{\mathscr{D}}}^{{\rm{EM}}}$ and ${{\mathscr{D}}}^{{\rm{GL}}}$. From these, we derived the transmembrane potential ΔW^TM, the extramembrane potential ΔW^EM and the globular protein potential ΔW^GL, which describe the interactions in these respective protein regions.

Amino acids that share similar properties can be considered together when computing the potentials. Such potentials are referred to as group potentials. In summing up the number of occurrences of different amino acid types belonging to the same group, their sizes have to be taken into account. In practice, we shifted the inter-residue distances d between larger amino acids towards smaller distances by subtracting the difference in radii between these amino acids and the smallest amino acid in the group. We analyzed here group potentials involving positively charged residues (Lys, Arg), negatively charged residues (Glu, Asp), aromatic residues (Phe, Tyr, Trp), aliphatic residues (Ile, Val, Leu), non-charged polar residues (Gln, Asn, Ser, Thr), small residues (Gly, Ala), and sulfur-containing residues (Cys, Met).

Coping with finite-size dataset effect

Using frequencies of observation in a protein structure dataset to estimate free energy contributions through Eq. (1) implicitly assumes that the number of structures in the set is large enough to provide statistically significant values. This is, in general, a reasonable hypothesis for standard statistical potentials derived from thousands of globular structures. However, in the case of membrane proteins, the number of experimental structures is rather small and they are moreover divided into their trans- and extramembrane parts.

To cope with the finite-size effect, and get smooth and statistically significant potentials, we introduced two additional layers of computation. The first layer consists in dropping the potentials computed from distance bins d that do not contain a sufficient number of occurrences. We chose the threshold value on n(s, d) and n(s₁, s₂, d) equal to 10. If this value is not reached, the potentials are set to zero. Equation (1) thus becomes for ΔW(s₁, s₂, d):

$$\begin{array}{llll}{\rm{\Delta }}W({s}_{1},{s}_{2},d) & = & -{k}_{B}T\,ln\,\frac{n({s}_{1},{s}_{2},d)\,n}{n({s}_{1},{s}_{2})\,n(d)} & {\rm{if}}\,n({s}_{1},{s}_{2},d)\ge 10\\ {\rm{\Delta }}W({s}_{1},{s}_{2},d) & = & 0\, & {\rm{otherwise}}\end{array}$$

(2)

and similarly for the potential ΔW(s, d).

The second layer consists in smoothing the potential curves by replacing the number of occurrences in each bin with the weighted sum of the occurrences of the β neighborhood bins as:

$$\hat{n}({s}_{1},{s}_{2},d)=\mathop{\sum }\limits_{i=1}^{\beta }\,\frac{1}{{\alpha }^{i}}n({s}_{1},{s}_{2},d-i)+n({s}_{1},{s}_{2},d)+\mathop{\sum }\limits_{i=1}^{\beta }\,\frac{1}{{\alpha }^{i}}n({s}_{1},{s}_{2},d+i)$$

(3)

where d represents here a discrete distance bin rather than a continuous distance value, and where we chose β = 4 and α = 4/3. The number of occurrences $\hat{n}({s}_{1},{s}_{2})$, $\hat{n}(d)$ and $\hat{n}$ are obtained from $\hat{n}({s}_{1},{s}_{2},d)$ by summing over all distance bins and/or all amino acid types. The smoothing of ${\rm{\Delta }}W(s,d)$ is done in the same way.

Trans- and extramembrane folding free energy

The folding free energy of a protein represented by its sequence S and 3D conformation C was computed using the potentials derived from the ${{\mathscr{D}}}^{TM}$, ${{\mathscr{D}}}^{EM}$ and ${{\mathscr{D}}}^{GL}$ datasets as:

$$\begin{array}{rcl}{\rm{\Delta }}{W}_{sd}^{\mu }(S,C) & = & \frac{1}{2}\mathop{\sum }\limits_{i,j=1,|i-j| > 1}^{N}\,{\rm{\Delta }}{W}^{\mu }({s}_{i},{d}_{ij})\\ {\rm{\Delta }}{W}_{sds}^{\mu }(S,C) & = & \frac{1}{2}\mathop{\sum }\limits_{i,j=1,|i-j| > 1}^{N}\,{\rm{\Delta }}{W}^{\mu }({s}_{i},{s}_{j},{d}_{ij})\end{array}$$

(4)

where i and j denote positions along the amino acid sequence, N is the sequence length, and μ equals TM, EM or GL. To avoid any overfitting, the folding free energies were computed using a leave-one-out cross validation strategy, consisting in removing the target protein $(\bar{S},\bar{C})$ from the dataset ${{\mathscr{D}}}^{\mu }$ when computing its folding free energy ${\rm{\Delta }}{W}^{\mu }\,(\bar{S},\bar{C})$. Note that this cross validation procedure is very strict, since the datasets contain, by construction, no proteins with more than 30% pairwise sequence identity.

Per-residue folding energies

To test the accuracy and applicability of our potentials, we employed them to determine whether residues are localized in the trans- or extramembrane regions. For that purpose, we estimated the per-residue contributions to the folding free energy³⁰. For residue i, we have:

$$\begin{array}{ccc}{\rm{\Delta }}{G}_{sd}^{i,\mu } & = & \frac{1}{2}\mathop{\sum }\limits_{j=1,|i-j| > 1}^{N}\,{\rm{\Delta }}{W}^{\mu }({s}_{i},{d}_{ij})\\ {\rm{\Delta }}{G}_{sds}^{i,\mu } & = & \frac{1}{2}\mathop{\sum }\limits_{j=1,|i-j| > 1}^{N}\,{\rm{\Delta }}{W}^{\mu }({s}_{i},{s}_{j},{d}_{ij})\end{array}$$

(5)

It is easy to see that the sum over all residues yields the global folding free energies of Eq. (4).

Results and Discussion

Amino acid frequencies

The relative frequencies of the twenty amino acids differ among the trans- and extramembrane datasets ${{\mathscr{D}}}^{{\rm{TM}}}$ and ${{\mathscr{D}}}^{{\rm{EM}}}$, as seen in Figs 1a and S1. Notably, the ${{\mathscr{D}}}^{{\rm{EM}}}$ frequencies are quite similar to the ${{\mathscr{D}}}^{{\rm{GL}}}$ frequencies, which is not surprising as the environments of globular proteins and extramembrane regions are similar, except for the region interacting with the membrane and the transmembrane region. We also analyzed the frequency of different types of residues as a function of the distance to the intra- and extracellular water-membrane interfaces, as shown in Fig. 1b.

The clearest difference between transmembrane and extramembrane regions is observed for aliphatic residues Val, Ile and Leu: they are much more numerous in the former than in latter. In extramembrane regions, they tend to be located in the protein interior to avoid contact with water molecules, whereas in transmembrane regions, they are almost uniformly distributed; only near the interface does their frequency start to decrease. Note that Leu is more frequent than Val and Ile in transmembrane regions, probably because the former are favored in α-helices and the latter in β-strands³¹ and our dataset contains more α- than β-transmembrane domains.

Aromatic amino acids were also found more frequently in the transmembrane than in the extramembrane regions. They are preferentially located near the water-membrane and the protein-membrane interfaces. This observation is consistent with the finding that aromatic residues are very important in anchoring the protein into the membrane where they tend to form cation-π interactions with some positively charged lipid head groups^32,33,34.

In contrast, charged amino acids are much more frequently observed in extra- than in transmembrane regions. This results from the large energetic cost of transferring a charged amino acid from an aqueous environment with a high dielectric constant (ε_water = 80) to the membrane that has a low dielectric constant (ε_membrane = 2 to 4)³⁵. Moreover, we found differences in the distribution of positively charged residues in proteins whose transmembrane domain is α-helical. Indeed, as seen in Fig. S2, their frequency is higher in the regions oriented towards the cell interior than towards the cell exterior. This is consistent with the “positive-inside rule”, stating that positive residues are more abundant in the cytoplasmic regions than in the periplasmic regions for α-helical transmembrane domains inserted in bacterial inner membranes, or than extracellular regions in the case of eukaryotic membranes³⁶. In cytochrome P450, the insertion or deletion of positively charged residues in some loop regions have been shown to modify the protein orientation with respect to the membrane and the translocation of protein segments across it^37,38. The general explanation of this rule is that the interaction of the positively charged residues of the intracellular domain with the negatively charged lipids of the cytosolic membrane surface through electrostatic interactions causes the retention of the positively charged residues on the cytoplasmic face of the membrane^39,40,41. Note that the positive-inside-rule has been used to predict the transmembrane orientation of α-helical membrane proteins⁴².

In β-barrel membrane proteins inserted into outer bacterial membranes, no significant differences are visible in Fig. S2 between charged residue frequencies in the intra- and extracellular regions. Yet, a compositional asymmetry has been described before, with a larger frequency of both positively and negatively charged residues in the extracellular regions^22,43, where lipopolysaccharides are generally attached to the membrane. This “charge-outside” rule is not observed in our dataset.

Like the charged residues, the uncharged polar residues are also preferentially located in the extramembrane regions rather than inside the membrane. Their frequency is almost identical at both sides of the membrane.

Preferred interactions in transmembrane regions

Statistical distance potentials were derived separately from the datasets ${{\mathscr{D}}}^{{\rm{TM}}}$, ${{\mathscr{D}}}^{{\rm{EM}}}$ and ${{\mathscr{D}}}^{{\rm{GL}}}$, as described in Methods. Their comparison yields an objective evaluation of the residue-residue interactions that are more favorable in the transmembrane than in the extramembrane regions, and than in globular proteins. The potentials so obtained are depicted in Fig. S3. The potentials computed separately for α-helical and β-barrel transmembrane regions are shown in Fig. S5.

As expected, potentials obtained from residues with similar properties have similar shapes, up to a global distance shift due to the differences in residue size. We thus defined group potentials that combine residue pairs that have similar distance potentials, according to the procedure detailed in Methods. A selected subset of these potentials – those that are discussed in what follows – are shown in Figs 2–8 and in Figs S4 and S6. Group potentials have the double advantage of limiting the small-dataset effect that affects the statistical significance of the potentials, and of simplifying the interpretations of the results by considering together the residue pairs that form the same types of interactions.

Salt bridge interactions

Salt bridges are electrostatic interactions between positively (Lys, Arg) and negatively charged (Glu, Asp) residues which play an important role in the stabilization - especially thermostabilization⁴⁴ - of globular proteins. Here we studied the energetic contributions of this kind of interaction in the different regions of membrane proteins as a function of the distance between the residues’ side chain geometric centers. As shown in Fig. 2a, both the extra- and transmembrane potentials have a characteristic minimum at a distance of about 4 Å¹^{Footnote 1}, but the latter are shifted downwards, by about −0.6 kcal/mol, over the whole distance range. Salt bridges appear thus much more stabilizing in the transmembrane than in the extramembrane region.

Two energy contributions play a role in the formation of salt bridges in globular proteins: the desolvation penalty upon burying an ion inside the protein, which is usually counterbalanced by the electrostatic gain in approaching the two opposite charges. In transmembrane protein regions, the situation is substantially different because the protein interior is more hydrophilic than the surface that is in contact with lipid molecules: the dielectric constant of the lipid bilayer is ε_membrane ≈ 1−2, whereas ε_interior varies from 2−6, up to 80 in the case of the hydrophilic channel in β-barrel porins or α-helical aquaporins⁴⁵. Thus, burying an ion constitutes here an energy gain, which is added to the stabilizing electrostatic interaction between the two charged residues. We also observe that, in the transmembrane regions, Lys-containing salt bridges tend to be less stabilizing than Arg-containing ones (Fig. 2b), in which the positive charge is delocalized on the guanidinium group.

The salt bridge geometries vary according to the type of proteins. For example, stabilizing salt bridges are recurrently found across transmembrane helices in “charge zipper” conformations, defined as extended salt bridge ladders along transmembrane helical segments⁴⁶, as illustrated in Fig. 2c. In other membrane proteins such as porin-like β-barrel structures, a large network of salt bridge interactions is observed in the hydrophilic pore, as shown in Fig. 2d.

Note that salt bridges have sometimes also pivotal functional roles. For example, they are responsible for G protein-coupled receptor (GPCR) activation and trafficking⁴⁷, and for ion channel gating⁴⁸.

Interactions between amino acids of equal charge

Here we focused on electrostatic interactions between two positively or two negatively charged residues, which are commonly known to be unfavorable. As seen in Fig. 3a,b, this is indeed the case when these interactions are established between residues in globular proteins or extramembrane domains. In contrast, when two amino acids of equal charge are both in the transmembrane domain, the interaction becomes stabilizing. This can be explained by the solvation gain obtained by burying the charged residues in the more hydrophilic core or by locating them inside hydrophilic channels, which tends to dominate the repulsive electrostatic force between the two electric charges.

Surprisingly, these effective potentials become even more favorable at short distances, in spite of the electrostatic repulsion. As seen in Fig. 3c, this counterintuitive effect is actually driven by β-barrel proteins, while in α-helical proteins +/+ and −/− interactions are very rare. Usually, we found such interactions to be located in the hydrophilic channel interior of transmembrane β-barrel structures. This can be explained by the earlier observation⁴⁹ of favorable clusters of positively or of negatively charged residues in interaction with water molecules. Note that this stabilizing effect is amplified for residues in which the charge is delocalized. In Arg, where the charge is delocalized on the guanidinium group, the dispersion forces between stacked guanidinium groups reduces the electrostatic repulsion. An example of an Arg cluster is given in Fig. 3d.

Other polar-polar interactions

Not only the interactions between two charged residues, but also those between two non-charged polar residues, or between one charged and one non-charged polar residue, were found to be much more favorable in the transmembrane than in the extramembrane regions, and even more so, than in globular proteins (Fig. 4). The shift between the potentials is, however, smaller than for charge-charge interactions: about 0.4 kcal/mol at small distances. Note that the stabilization effect is slightly larger in β-barrel transmembrane proteins than in α-helical proteins due to the fact that the former are often channel-like structures filled with water, with which the polar moieties make favorable interactions.

Buried polar residues have previously been described as contributing significantly to the stability of membrane protein structures⁵⁰, and to be especially important in the helix-helix interactions and in homo-oligomerization processes^51,52. An example of polar cluster is shown in Fig. 4b.

Anion - π interactions

Since aromatic rings have non-vanishing quadruple moments, they can establish edgewise interactions with Asp and Glu side-chain carboxylate ions. Only recently has this kind of interaction received special attention in the context of their contribution to protein stabilization^53,54,55. Even though some analyses suggest that their contribution is slightly destabilizing, their high occurrence frequency in biomolecular structures can be taken to signal cooperative phenomena involving other charged or aromatic residues, in which stability compensations could occur through more complex geometries such as anion-π-cation or anion-π-π systems^54,55.

Figure 5 confirms that the effective energy contributions of anion-π interactions are destabilizing in both extramembrane regions and globular proteins, whereas their minimum value becomes neutral in the transmembrane part. Note that in the center of β-barrel membrane proteins, the anion-π interactions occur prevalently in complex geometries such as the one depicted in Fig. 5b involving two anions, two cations and two aromatic residues interacting with the aqueous solvent. In helical transmembrane regions, aromatic residues sometimes establish anion-π interactions with phospholipid anions; this occurs prevalently at the lipid-water interface⁵³.

Cation-π interactions

Cation-π interactions are established when the cationic side chain of Lys or Arg is localized above or below the aromatic ring of Phe, Trp or Tyr. They play an important role in the stabilization of protein structures of both membrane and globular proteins and in protein-protein, protein-DNA and protein-ligand complexes^{56,57,58,59,60}.

The distance-dependent energy profile of this kind of interactions is depicted in Fig. 6a. The potentials extracted from transmembrane, extramembrane and globular regions are similar, with a slightly more negative curve at short distances (<4 Å) in the case of globular proteins, and a preference for transmembrane regions with respect to the extramembrane ones for <6 Å.

It has been suggested that cation-π interactions influence more strongly β-barrel than α-helical transmembrane proteins⁶¹. In order to objectively study this difference, we plotted cation-π energy profiles extracted from these two different protein classes (Fig. 6b). What we found differs from previous findings⁶¹: the energy profile at short distances (below 5 Å) is negative in α-helical and slightly positive in β proteins. This indicates that cation-π interactions contribute more to stability in α-helical transmembrane regions.

In cation-π interactions involving Arg, the planar guanidinium group and the aromatic moieties can make favorable stacking interactions, which add up to the electrostatic interactions. We analyzed the geometry of these interactions through the study the distribution of the angle between the aromatic and guanidinium planes. As shown in Fig. 6c,d, the angle is preferentially around 20° in β-barrel transmembrane regions and the two planes are thus almost in parallel, stacked, conformations. In extramembrane regions, a preference for stacked conformations is also visible, whereas in α-helical transmembrane regions, basically all angle values are observed.

Cation-π interactions are known to be important not only for stability but also for their functional roles such as for example in substrate and ligand binding^60,62. When they are established between the aromatic residues of the protein and the positively charged portion of phospholipid head groups, they are fundamental to anchor the protein to the membrane^32,33,34. The importance of the aromatic rings in membrane anchoring is not easy to show using the statistical potential formalism as the so-obtained effective potentials take only implicitly the impact of the environment into account; indications of this anchoring effect are observed from the aromatic amino acid frequencies in Fig. 1.

Preferred interactions in extramembrane regions

We now have a closer look at the residue-residue interactions that are more favorable in the extramembrane than in the transmembrane regions, as measured by the distance potentials.

Sulfur-aromatic interactions

Sulfur-containing amino acids (Cys and Met) are highly polarizable and can establish nonbonded interactions with aromatic moieties. It has been shown that they play important roles not only in the stabilization of protein structures^63,64,65,66 but also in their function^66,67, as for example in the protection of Met against oxidation leading to methionine sulfide.

The potentials in Fig. 7a show the stabilizing contribution of sulfur-aromatic interactions, which is much stronger for the extramembrane than for the transmembrane regions. Indeed, for the latter region, the entire energy profile is shifted by about +0.2 kcal/mol on the average over all distances. It is interesting to note that sulfur-π interactions in transmembrane regions occur almost exclusively in α-helical proteins where interhelical interactions frequently involve methionine surrounded by a cage of aromatic residues. In the extramembrane region, they frequently involve partially exposed residues and more sulfur than aromatic residues (Fig. 7c).

We compared the strength of sulfur-π interactions and of aromatic-aromatic and sulfur-sulfur interactions in the transmembrane regions, but did not find a clear difference between the minimum energy values (Fig. 7b). This contrasts with earlier results obtained by a combination of structural bioinformatics and ab initio quantum chemistry calculations, which suggested that sulfur-aromatic interactions in membrane proteins are more stabilizing than aromatic-aromatic or sulfur-sulfur interactions⁶⁵.

Regarding the geometry of the sulfur-π interactions, we did not see any substantial difference between the trans- and extramembrane regions. In both regions, we observed a slight preference for conformations with an angle of about 40–45° between the sulfur and the normal vector defined by the plane of the aromatic ring, in agreement with earlier findings⁶³.

Aromatic interactions

Due to their hydrophobic nature, especially marked for Phe, aromatic amino acids prefer to be located in transmembrane regions or in the core of extramembrane regions (Fig. 1). On the basis of their energy profiles (Fig. 8), we observed that the interactions between pairs of aromatic residues are more favorable in extramembrane than in transmembrane regions. Moreover, they have almost the same weight in α-helical and β-barrel proteins, with a slight preference for the former (Fig. 8b), in agreement with earlier studies⁶⁸. Note that in β-barrel proteins, the aromatic residues are usually lipid-facing, whereas in α-helical proteins they are in the protein interior. This difference is due to the fact that β-barrel transmembrane regions have almost no core.

The geometries of the aromatic-aromatic interactions are similar in transmembrane and extramembrane regions (data not shown). They occur preferentially in a T-shaped conformation. Note that π-π stacking plays a role not only in the tertiary structure stabilization but also in the oligomerization of the membrane protein subunits⁶⁸.

When aromatic amino acids are positioned close to the lipid interface, they are known to play important roles in anchoring and positioning the protein inside the lipid medium through lipid-aromatic interactions (see^69,70). The interactions between amino acids and lipid molecules are, however, not captured by our statistical potentials, which consider both lipids and water as the protein environment.

Aliphatic interactions

While hydrophobic forces play a dominant role for folding and stability in globular proteins, they contribute less to the stability of the transmembrane proteins⁷¹. This is indeed exactly what we observe in the energy profiles of Fig. 8c. When the interactions are established in extramembrane regions, the potentials are clearly stabilizing with an energy minimum at about 6 Å like in globular proteins. In transmembrane regions, the minimum is still present but about 0.4 kcal/mol higher, which indicates that these interactions are only marginally stabilizing.

However, even though hydrophobic forces are less important for folding, they are one of the contributing factors for the positioning and anchoring of the protein to the lipid membrane, especially in peripheral membrane proteins⁷¹ but also in integral membrane proteins. Indeed, hydrophobic interactions can be established between exposed non-polar residues and hydrophobic lipid moieties of the membrane, which determine the insertion and position of the proteins⁷². There are indeed more and more indications of protein-membrane hydrophobic matching, in which the hydrophobic part of the transmembrane domain has to match the hydrophobic thickness of the membrane bilayer in which it is embedded; moreover, this matching condition appears to strongly influence protein function⁷². Since our statistical potentials take implicitly but not explicitly the membrane bilayer into account, the latter effects are only observed indirectly.

Application of the membrane potentials to predict residue localization

The newly developed membrane statistical potentials were used to perform a binary classification of the residues into those that belong to the transmembrane or extramembrane regions. We computed for that purpose the per-residue contributions to the folding free energy derived from the extra- and transmembrane datasets ${{\mathscr{D}}}^{EM}$ and ${{\mathscr{D}}}^{TM}$ defined in Eq. (5). In general, we expect that if the per-residue contribution computed with transmembrane potentials is lower than that computed with extramembrane potentials, the residue in situated inside the membrane, and vice versa. But there are sometimes deviations from this rule. Indeed, some residues correspond to stability weaknesses, which means that they contribute unfavorably to the overall folding free energy^30,73.

To predict the localization of a residue, we considered linear combinations of the per-residue folding free energies computed with the potentials “sd” and “sds” from the two datasets ${{\mathscr{D}}}^{{\rm{TM}}}$ and ${{\mathscr{D}}}^{{\rm{EM}}}$:

$${I}^{i}={\alpha }_{1}{\rm{\Delta }}{G}_{sd}^{i,{\rm{TM}}}+{\alpha }_{2}{\rm{\Delta }}{G}_{sds}^{i,{\rm{TM}}}+{\alpha }_{3}{\rm{\Delta }}{G}_{sd}^{i,{\rm{EM}}}+{\alpha }_{4}{\rm{\Delta }}{G}_{sds}^{i,{\rm{EM}}}+{\alpha }_{5}\,\mathrm{log}\,N+{\alpha }_{6}$$

(6)

where the coefficients α are parameters. We added two terms in this localization index: a constant term and a term proportional to the logarithm of the protein length. The latter term is introduced to correct for the possible length dependence of amino acid and distance frequencies⁷⁴. We also defined a smoothed version of this localization index, by averaging it over a window of five successive residues along the chain centered around the target residue:

$${I}_{{\rm{sm}}}^{i}=\frac{1}{2}({I}^{i-2}+{I}^{i-1}+{I}^{i}+{I}^{i+1}+{I}^{i+2})$$

(7)

This index was used to classify the residues into two groups: the residues i with ${I}_{{\rm{sm}}}^{i}\le {\alpha }_{0}$ were considered to belong to the transmembrane region and those with ${I}_{{\rm{sm}}}^{i} > {\alpha }_{0}$ to the extramembrane region. The seven parameters α_j (with j = 0…6) were identified so as to optimize the values of the balanced accuracy (BACC); the area under the receiver operating characteristic curve (AUC) was also computed.

The tests of performance were done using a strict leave-one-out cross validation procedure, where the target protein, whose residues we want to classify, is removed in all the stages of the computations, from the derivation of the statistical potentials to the optimization of the parameters. As the pairwise sequence identity inside the datasets is low (<30%), the cross validation is strict and in principle free from biases.

As shown in Table 1 and Fig. 9a, we obtained a BACC of 0.75 and an AUC of 0.83 on the whole set of membrane proteins. These good results indicate that our potentials describe quite well the stability properties of the membrane proteins in the two completely different environments that are water and lipids, and thus that they can be used to localize residues inside or outside the membrane.

Table 1 BACC and AUC values of the prediction of residue localization (inside or outside the membrane) obtained from the index I_sm.

Full size table

Our predictor works better for the α-helical proteins (AUC = 0.86) than for the β-barrels (AUC = 0.74). We can argue that this difference is due to the fact that our dataset is dominated for two thirds by α-helical proteins, and that it is thus normal that this type of proteins is better predicted than β-barrels. Moreover, the β-barrel subset consist of channel and porin structures, in which the transmembrane region has an internal hydrophilic region in contact with water, and this makes this set substantially more difficult to predict using distance potentials only.

An example of localization prediction is shown in Fig. (9b) for Archaeoglobus fulgidus prenyltransferase, an α-helical integral membrane protein. Its residues are colored according to the predicted values of the localization index ${I}_{{\rm{sm}}}^{i}$. Clearly, our potentials are able to discriminate between extra- and transmembrane regions. Note that some of the residues that are not localized correctly are close to the membrane-water interface, where our potentials are the least accurate (see Conclusion). Some others could correspond to stability weaknesses, which means that they would benefit from being mutated to improve the global protein stability.

Conclusion

In this paper, we developed new transmembrane and extramembrane residue-residue potentials in view of identifying the amino acid interactions that contribute more strongly to the stabilization of either the transmembrane or the extramembrane region, and we compared them with their interaction strength in globular proteins. First of all, we observed that the potentials derived from globular proteins are much more similar to those derived from extramembrane than from transmembrane regions.

Despite their low occurrence in transmembrane regions, it seems that interactions involving polar residues tend to contribute more to the stability of these regions than of the extramembrane regions. In particular, salt bridges are stabler by more than 0.5 kcal/mol, and interactions between residues of equal charge, which are usually destabilizing, become stabilizing when located inside the membrane. This effect can be explained by the fact that burying a charged residue inside the lipid environment is not associated with a desolvation penalty, as it is in an aqueous environment. Note that clusters of positively or negatively charged residues situated inside β-barrel porin channels may have, not only a structural, but also a functional role in the flux of targeted molecules through the membrane. Non-charged polar-polar and anion-π interactions appear also more favorable in the transmembrane region, and so do cation-π interactions but to a much smaller extent.

Opposite trends are observed in the extramembrane regions. Hydrophobic residues, despite their preferential location in transmembrane regions, establish stronger effective interactions in extramembrane regions due to their pronounced tendency to avoid contact with water molecules, but not with lipids. In particular, aromatic-aromatic, aliphatic-aliphatic and aromatic-sulfur interactions appear to contribute more to stability in extramemembrane regions.

Note that these results have to be understood in the context of statistical mean-force potentials in which the water and lipid molecules are not considered explicitly. The lack of interactions between polar residues in extramembrane regions is indeed counterbalanced by interactions between polar residues and water molecules. Similarly, the lack of interactions between hydrophobic residues in intramembrane regions is counterbalanced by interactions between hydrophobic residues and lipid molecules.

Moreover, the class of transmembrane proteins strongly influences the effective strength of some of the residue-residue interactions. Indeed, we observed marked differences between some potentials derived from α-helical and β-barrel transmembrane domains. This is related to the fact that the latter are all channel-like structures filled with water and that the residues pointing towards the channel interior are mostly hydrophilic, whereas only a small fraction of the α-helical transmembrane proteins have such a structure. In fact, β-barrel transmembrane regions have no real core. Another difference between these two protein classes is due to the fact that β-barrel membrane proteins tend to be located in the outer membrane whose characteristics differ from the internal membrane where the α-helical proteins are almost exclusively located. The effect of two different environments of course influences the shape of our membrane-protein statistical potentials.

In order to check the validity of our statistical potentials, we used them to predict whether a residue is localized in the transmembrane or in the extramembrane region. The high BACC and AUC values obtained in cross validation, in addition to the fact that their application is extremely fast, make these potentials an invaluable asset for various investigations in membrane protein design or in large-scale studies of membrane positioning.

Despite the good results obtained, our potentials can still be improved. First of all, we have to remember that we used the OPM annotations to identify transmembrane regions in membrane proteins, and that most of them are predictions. This is unavoidable as the number of proteins with experimentally characterized membrane positioning and thickness is extremely limited. This means that there could be inaccuracies in these annotations, which in turn could lead to some inaccuracies in the potentials even though their construction is robust against OPM misclassification errors.

Furthermore, when larger larger datasets of membrane proteins will be available, our statistical potentials will certainly yield a more accurate description of the stabilizing contributions, and will make it possible to divide the dataset into several subclasses of transmembrane proteins that have specific characteristics such as ion channels, (aqua)porins, α-helical or β-barrel topology, or their insertion into different membrane types, which are likely to influence the effective interactions. Moreover, potentials that involve other structural descriptors than the interresidue distance, such as backbone torsion angle domains or solvent accessibility could further improve the prediction of residue localization presented here. This will be the subject of a forthcoming paper.

Finally, the interactions that prevail at the water-lipid or protein-lipid interface are crucial for the anchoring of transmembrane proteins into the membrane and are not well described by our statistical potentials. These are by definition effective potentials and thus the interactions with the lipid or aqueous environment are only considered indirectly. Combining the present analysis with explicit solvent models could be a possibility to unravel this important aspect of membrane proteins.

Notes

Note that this distance is rescaled towards the smallest amino acid as explained in Methods.

References

Fagerberg, L., Jonasson, K., von Heijne, G., Uhlén, M. & Berglund, L. Prediction of the human membrane proteome. Proteomics 10, 1141–1149 (2010).
Article CAS Google Scholar
Bakheet, T. M. & Doig, A. J. Properties and identification of human protein drug targets. Bioinforma. 25, 451–457 (2009).
Article CAS Google Scholar
Berman, H. M. et al. The protein data bank. Nucleic acids research 28, 235–242 (2000).
Article CAS Google Scholar
Lee, A. G. Lipid–protein interactions in biological membranes: a structural perspective. Biochimica et Biophys. Acta 1612, 1–40 (2003).
Article CAS Google Scholar
Lluis, M. W., Godfroy, J. I. III & Yin, H. Protein engineering methods applied to membrane protein targets. Protein engineering, design & selection 26, 91–100 (2012).
Article Google Scholar
Senes, A. Computational design of membrane proteins. Curr. opinion structural biology 21, 460–466 (2011).
Article CAS Google Scholar
Sansom, M. S., Scott, K. A. & Bond, P. J. Coarse-grained simulation: a high-throughput computational approach to membrane proteins. Biochem. Soc. Transactions 38, 27–32 (2008).
Article Google Scholar
Bond, P. J., Holyoake, J., Ivetac, A., Khalid, S. & Sansom, M. S. Coarse-grained molecular dynamics simulations of membrane proteins and peptides. J. structural biology 157, 593–605 (2007).
Article CAS Google Scholar
Lomize, M. A., Pogozheva, I. D., Joo, H., Mosberg, H. I. & Lomize, A. L. Opm database and ppm web server: resources for positioning of proteins in membranes. Nucleic acids research 40, D370–D376 (2011).
Article Google Scholar
Nugent, T. & Jones, D. T. Membrane protein orientation and refinement using a knowledge-based statistical potential. BMC bioinformatics 14, 276 (2013).
Article Google Scholar
Studer, G., Biasini, M. & Schwede, T. Assessing the local structural quality of transmembrane protein models using statistical potentials (qmeanbrane). Bioinforma. 30, i505–i511 (2014).
Article CAS Google Scholar
Postic, G., Hamelryck, T., Chomilier, J. & Stratmann, D. Mypmfs: a simple tool for creating statistical potentials to assess protein structural models. Biochimie 151, 37–41 (2018).
Article CAS Google Scholar
Postic, G., Ghouzam, Y. & Gelly, J.-C. An empirical energy function for structural assessment of protein transmembrane domains. Biochimie 115, 155–161 (2015).
Article CAS Google Scholar
Postic, G., Ghouzam, Y. & Gelly, J.-C. Orempro web server: orientation and assessment of atomistic and coarse-grained structures of membrane proteins. Bioinforma. 32, 2548–2550 (2016).
Article CAS Google Scholar
Hsieh, D., Davis, A. & Nanda, V. A knowledge-based potential highlights unique features of membrane a-helical and b-barrel protein insertion and folding. Protein Sci. 21, 50–62 (2012).
Article CAS Google Scholar
Leman, J. K., Bonneau, R. & Ulmschneider, M. B. Statistically derived asymmetric membrane potentials from a-helical and b-barrel membrane proteins. Sci. reports 8, 4446 (2018).
Article ADS Google Scholar
Koebnik, R., Locher, K. P. & Van Gelder, P. Structure and function of bacterial outer membrane proteins: barrels in a nutshell. Mol. microbiology 37, 239–253 (2000).
Article CAS Google Scholar
Walther, D. M., Rapaport, D. & Tommassen, J. Biogenesis of b-barrel membrane proteins in bacteria and eukaryotes: evolutionary conservation and divergence. Cell. Mol. Life Sci. 66, 2789–2804 (2009).
Article CAS Google Scholar
von Heijne, G. & Gavel, Y. Topogenic signals in integral membrane proteins. Eur. J. Biochem. 174, 671–678 (1988).
Article Google Scholar
De Marothy, M. T. & Elofsson, A. Marginally hydrophobic transmembrane a-helices shaping membrane protein folding. Protein Sci. 24, 1057–1074 (2015).
Article Google Scholar
Jackups, R. Jr. & Liang, J. Interstrand pairing patterns in b-barrel membrane proteins: the positive-outside rule, aromatic rescue, and strand registration prediction. J. molecular biology 354, 979–993 (2005).
Article CAS Google Scholar
Slusky, J. S. & Dunbrack, R. L. Jr. Charge asymmetry in the proteins of the outer membrane. Bioinforma. 29, 2122–2128 (2013).
Article CAS Google Scholar
Wang, G. & Dunbrack, R. L. Jr. Pisces: a protein sequence culling server. Bioinforma. 19, 1589–1591 (2003).
Article CAS Google Scholar
Pucci, F., Bourgeas, R. & Rooman, M. Predicting protein thermal stability changes upon point mutations using statistical potentials: Introducing hotmusic. Sci. reports 6, 23257 (2016).
Article ADS CAS Google Scholar
Sippl, M. J. Calculation of conformational ensembles from potentials of mena force: an approach to the knowledge-based prediction of local structures in globular proteins. J. molecular biology 213, 859–883 (1990).
Article CAS Google Scholar
Kocher, J.-P. A., Rooman, M. J. & Wodak, S. J. Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. J. molecular biology 235, 1598–1613 (1994).
Article CAS Google Scholar
Rooman, M. J., Kocher, J.-P. A. & Wodak, S. J. Prediction of protein backbone conformation based on seven structure assignments: influence of local interactions. J. molecular biology 221, 961–979 (1991).
Article CAS Google Scholar
Pucci, F., Dhanani, M., Dehouck, Y. & Rooman, M. Protein thermostability prediction within homologous families using temperature-dependent statistical potentials. PloS one 9, e91659 (2014).
Article ADS Google Scholar
Hou, Q., Bourgeas, R., Pucci, F. & Rooman, M. Computational analysis of the amino acid interactions that promote or decrease protein solubility. Sci. reports 8, 14661 (2018).
Article ADS Google Scholar
De Laet, M., Gilis, D. & Rooman, M. Stability strengths and weaknesses in protein structures detected by statistical potentials: application to bovine seminal ribonuclease. Proteins: Struct. Funct. Bioinforma. 84, 143–158 (2016).
Article Google Scholar
Ulmschneider, M. B. & Sansom, M. S. Amino acid distributions in integral membrane protein structures. Biochimica et Biophys. Acta (BBA)-Biomembranes 1512, 1–14 (2001).
Article CAS Google Scholar
Petersen, F. N., Jensen, M. Ø. & Nielsen, C. H. Interfacial tryptophan residues: a role for the cation-p effect? Biophys. journal 89, 3985–3996 (2005).
CAS Google Scholar
Sanderson, J. M. & Whelan, E. J. Characterisation of the interactions of aromatic amino acids with diacetyl phosphatidylcholine. Phys. Chem. Chem. Phys. 6, 1012–1017 (2004).
Article CAS Google Scholar
Grauffel, C. et al. Cation- p interactions as lipid-specific anchors for phosphatidylinositol-specific phospholipase c. J. Am. Chem. Soc. 135, 5740–5750 (2013).
Article CAS Google Scholar
Honig, B. & Yang, A.-S. Free energy balance in protein folding. In Advances in protein chemistry, vol. 46, 27–58 (Elsevier, 1995).
von Heijne, G. The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. The EMBO journal 5, 3021–3027 (1986).
Article CAS Google Scholar
Monier, S., Van Luc, P., Kreibich, G., Sabatini, D. & Adesnik, M. Signals for the incorporation and orientation of cytochrome p450 in the endoplasmic reticulum membrane. The J. cell biology 107, 457–470 (1988).
Article CAS Google Scholar
Szczesna-Skorupa, E. & Kemper, B. Nh2-terminal substitutions of basic amino acids induce translocation across the microsomal membrane and glycosylation of rabbit cytochrome p450iic2. The J. cell biology 108, 1237–1243 (1989).
Article CAS Google Scholar
Goder, V. & Spiess, M. Topogenesis of membrane proteins: determinants and dynamics. FEBS letters 504, 87–93 (2001).
Article CAS Google Scholar
Gallusser, A. & Kuhn, A. Initial steps in protein membrane insertion. bacteriophage m13 procoat protein binds to the membrane surface by electrostatic interaction. The EMBO journal 9, 2723–2729 (1990).
Article CAS Google Scholar
van Klompenburg, W., Nilsson, I., von Heijne, G. & de Kruijff, B. Anionic phospholipids are determinants of membrane protein topology. The EMBO J. 16, 4261–4266 (1997).
Article Google Scholar
Von Heijne, G. Membrane protein structure prediction: hydrophobicity analysis and the positive-inside rule. J. molecular biology 225, 487–494 (1992).
Article Google Scholar
Chamberlain, A. K. & Bowie, J. U. Asymmetric amino acid compositions of transmembrane b-strands. Protein science 13, 2270–2274 (2004).
Article CAS Google Scholar
Pucci, F. & Rooman, M. Physical and molecular bases of protein thermal stability and cold adaptation. Curr. opinion structural biology 42, 117–128 (2017).
Article CAS Google Scholar
Baştu, T. & Kuyucak, S. Role of the dielectric constants of membrane proteins and channel water in ion permeation. Biophys. journal 84, 2871–2882 (2003).
Article ADS Google Scholar
Walther, T. H. & Ulrich, A. S. Transmembrane helix assembly and the role of salt bridges. Curr. opinion structural biology 27, 63–68 (2014).
Article CAS Google Scholar
Janovick, J. A. & Conn, P. M. Salt bridge integrates gpcr activation with protein trafficking. Proc. Natl. Acad. Sci. 107, 4454–4458 (2010).
Article ADS CAS Google Scholar
Craven, K. B. & Zagotta, W. N. Salt bridges and gating in the cooh-terminal region of hcn2 and cnga1 channels. The J. general physiology 124, 663–677 (2004).
Article CAS Google Scholar
Magalhaes, A., Maigret, B., Hoflack, J., Gomes, J. & Scheraga, H. Contribution of unusual arginine-arginine short-range interactions to stabilization and recognition in proteins. J. protein chemistry 13, 195–215 (1994).
Article CAS Google Scholar
Lear, J. D., Gratkowski, H., Adamian, L., Liang, J. & DeGrado, W. F. Position-dependence of stabilizing polar interactions of asparagine in transmembrane helical bundles. Biochem. 42, 6400–6407 (2003).
Article CAS Google Scholar
Choma, C., Gratkowski, H., Lear, J. D. & DeGrado, W. F. Asparagine-mediated self-association of a model transmembrane helix. Nat. Struct. & Mol. Biol. 7, 161 (2000).
Article CAS Google Scholar
Gratkowski, H., Lear, J. D. & DeGrado, W. F. Polar side chains drive the association of model transmembrane peptides. Proc. Natl. Acad. Sci. 98, 880–885 (2001).
Article ADS CAS Google Scholar
Chakravarty, S., Ung, A. R., Moore, B., Shore, J. & Alshamrani, M. A comprehensive analysis of anion–quadrupole interactions in protein structures. Biochem. 57, 1852–1867 (2018).
Article CAS Google Scholar
Lucas, X., Bauzá, A., Frontera, A. & Quinonero, D. A thorough anion–p interaction study in biomolecules: on the importance of cooperativity effects. Chem. science 7, 1038–1050 (2016).
Article CAS Google Scholar
Pucci, F. & Rooman, M. Improved insights into protein thermal stability: from the molecular to the structurome scale. Philos. Transactions Royal Soc. A: Math. Phys. Eng. Sci. 374, 20160141 (2016).
Dougherty, D. A. Cation-p interactions in chemistry and biology: a new view of benzene, phe, tyr, and trp. Sci. 271, 163–168 (1996).
Article ADS CAS Google Scholar
Gallivan, J. P. & Dougherty, D. A. Cation-p interactions in structural biology. Proc. Natl. Acad. Sci. 96, 9459–9464 (1999).
Article ADS CAS Google Scholar
Ma, J. C. & Dougherty, D. A. The cation- p interaction. Chem. reviews 97, 1303–1324 (1997).
Article CAS Google Scholar
Rooman, M., Liévin, J., Buisine, E. & Wintjens, R. Cation–p/h-bond stair motifs at protein–dna interfaces. J. molecular biology 319, 67–76 (2002).
Article CAS Google Scholar
Biot, C., Buisine, E., Kwasigroch, J.-M., Wintjens, R. & Rooman, M. Probing the energetic and structural role of amino acid/nucleobase cation-p interactions in protein-ligand complexes. J. Biol. Chem. 277, 40816–40822 (2002).
Article CAS Google Scholar
Gromiha, M. M. Influence of cation–p interactions in different folding types of membrane proteins. Biophys. chemistry 103, 251–258 (2003).
Article CAS Google Scholar
Roderick, S. L. et al. Structure of human phosphatidylcholine transfer protein in complex with its ligand. Nat. Struct. & Mol. Biol. 9, 507 (2002).
CAS Google Scholar
Valley, C. C. et al. The methionine-aromatic motif plays a unique role in stabilizing protein structure. J. Biol. Chem. 287, 34979–34991 (2012).
Article CAS Google Scholar
Ringer, A. L., Senenko, A. & Sherrill, C. D. Models of s/p interactions in protein structures: Comparison of the h2s–benzene complex with pdb data. Protein Sci. 16, 2216–2223 (2007).
Article CAS Google Scholar
Gómez-Tamayo, J. C. et al. Analysis of the interactions of sulfur-containing amino acids in membrane proteins. Protein Sci. 25, 1517–1524 (2016).
Article Google Scholar
Aledo, J. C., Cantón, F. R. & Veredas, F. J. Sulphur atoms from methionines interacting with aromatic residues are less prone to oxidation. Sci. reports 5, 16955 (2015).
Article ADS CAS Google Scholar
Daeffler, K. N.-M., Lester, H. A. & Dougherty, D. A. Functionally important aromatic–aromatic and sulfur- p interactions in the d2 dopamine receptor. J. Am. Chem. Soc. 134, 14890–14896 (2012).
Article CAS Google Scholar
Hong, H., Park, S., Flores Jiménez, R. H., Rinehart, D. & Tamm, L. K. Role of aromatic side chains in the folding and thermodynamic stability of integral membrane proteins. J. Am. Chem. Soc. 129, 8320–8327 (2007).
Article CAS Google Scholar
Schlebach, J. P. & Sanders, C. R. The safety dance: biophysics of membrane protein folding and misfolding in a cellular context. Q. reviews biophysics 48, 1–34 (2015).
Article CAS Google Scholar
Makwana, K. M. & Mahalakshmi, R. Implications of aromatic–aromatic interactions: From protein structures to peptide models. Protein Sci. 24, 1920–1933 (2015).
Article Google Scholar
Lomize, A. L., Pogozheva, I. D., Lomize, M. A. & Mosberg, H. I. The role of hydrophobic interactions in positioning of peripheral proteins in membranes. BMC Struct. Biol. 7, 44 (2007).
Article Google Scholar
Jensen, M. Ø. & Mouritsen, O. G. Lipids do influence protein function—the hydrophobic matching hypothesis revisited. Biochimica et Biophys. Acta (BBA)-Biomembranes 1666, 205–226 (2004).
Article CAS Google Scholar
Dehouck, Y., Biot, C., Gilis, D., Kwasigroch, J. M. & Rooman, M. Sequence-structure signals of 3d domain swapping in proteins. J. molecular biology 330, 1215–1225 (2003).
Article CAS Google Scholar
Dehouck, Y., Gilis, D. & Rooman, M. Database-derived potentials dependent on protein size for in silico folding and design. Biophys. journal 87, 171–181 (2004).
Article ADS CAS Google Scholar

Download references

Acknowledgements

This work is supported by the FNRS Fund for Scientific Research through a PDR grant. MNM has a PhD grant from the Belgian Commission for Cooperation and Development (ARES-CCD); QH, SB are Postdoctoral Researchers and MR is Research Director at the FNRS. FP has been supported by the FNRS and by the John von Neumann Institute for Computing (NIC).

Author information

Mame Ndew Mbaye, Qingzhen Hou, Fabrizio Pucci and Marianne Rooman contributed equally.

Authors and Affiliations

Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
Mame Ndew Mbaye, Qingzhen Hou, Sankar Basu, Fabian Teheux, Fabrizio Pucci & Marianne Rooman
Department of Mathematics and Informatics, Cheikh Anta Diop University, Dakar-Fann, Senegal
Mame Ndew Mbaye
John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich, Germany
Fabrizio Pucci

Authors

Mame Ndew Mbaye
View author publications
You can also search for this author in PubMed Google Scholar
Qingzhen Hou
View author publications
You can also search for this author in PubMed Google Scholar
Sankar Basu
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Teheux
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Pucci
View author publications
You can also search for this author in PubMed Google Scholar
Marianne Rooman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.N.M., S.B. collected and refined the dataset. M.N.M., Q.H., F.T. and F.P. performed the experiment. M.N.M., Q.H., F.P. and M.R. analyzed the data. M.N.M., F.P. and M.R. wrote the manuscript. All the authors have read, contributed and approved the final version of the manuscript.

Corresponding author

Correspondence to Marianne Rooman.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Files

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mbaye, M.N., Hou, Q., Basu, S. et al. A comprehensive computational study of amino acid interactions in membrane proteins. Sci Rep 9, 12043 (2019). https://doi.org/10.1038/s41598-019-48541-2

Download citation

Received: 08 May 2019
Accepted: 07 August 2019
Published: 19 August 2019
DOI: https://doi.org/10.1038/s41598-019-48541-2

This article is cited by

Metabolic profiling of bacteria with the application of polypyrrole-MOF SPME fibers and plasmonic nanostructured LDI-MS substrates
- Radik Mametov
- Gulyaim Sagandykova
- Pawel Pomastowski
Scientific Reports (2024)
Thermostability engineering of industrial enzymes through structure modification
- Nima Ghahremani Nezhad
- Raja Noor Zaliha Raja Abd Rahman
- Thean Chor Leow
Applied Microbiology and Biotechnology (2022)
Prospecting the therapeutic edge of a novel compound (B12) over berberine in the selective targeting of Retinoid X Receptor in colon cancer
- Temitayo I. Subair
- Opeyemi S. Soremekun
- Mahmoud E. S. Soliman
Journal of Molecular Modeling (2021)
Plausible blockers of Spike RBD in SARS-CoV2—molecular design and underlying interaction dynamics from high-level structural descriptors
- Sankar Basu
- Devlina Chakravarty
- Hirak K Patra
Journal of Molecular Modeling (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.