Introduction

G protein-coupled receptors (GPCRs) are a large and diverse superfamily of eukaryotic cell membrane proteins. They are receptors for a large diversity of extracellular signals including light, pressure, chemical ligands, neurotransmitters and metabolites, among others1,2,3,4, and play an important physiological role as transmitters of extracellular signals to the cell5.

Due to their participation in a wide range of activation pathways and important biological processes and because of their high affinity binding to drugs, GPCRs have become a prime research concern in pharmacology and a major target for drug discovery6. In fact, approximately 34% of all the drugs approved by the US Food and Drug Administration5 target GPCRs with the aim of either activating (agonist) or deactivating (antagonist) the receptor7.

The functionality of proteins is determined by their 3D structural configuration, which varies according to the binding processes of orthosteric and allosteric ligands, the lipidic environment and post-translational modifications8. These sources of variability elicit dynamical changes in the GPCR that result in the generation of specific signals.

The understanding of these signal transmission mechanisms in the receptor would provide us with a key to drug development and testing. The GPCR structure includes seven trans-membrane (TM) helices, linked by intra-cellular loops (ICL) and extracellular loops (ECL). All these regions play a role in the activation process9, but the TM regions are of particular importance10, as they have to undergo a conformational change to transmit the signal trough the cell membrane.

Molecular dynamics (MD) simulations provide a wealth of data about the structure, dynamics, and physiological function of biological macromolecules by modelling the interactions between their atomic constituents. The computer-assisted analysis of MD simulation data should allow the study of the receptors dynamic behavior, particularly in their interaction with drugs. Machine Learning (ML) tools can be particularly efficient in such endeavours.

This study investigates the ability of a recurrent neural network (RNN) model, namely Long Short-Term Memory (LSTM)11, to predict the dynamics of two GPCR states and three specific simulations of each one, through their activation path. Most importantly, the relative relevance of different regions of the receptor (TM, ECL, and ICL) for this prediction is also estimated as part of the analysis. A unidirectional LSTM (ULSTM) and a bidirectional LSTM (BLSTM) are used to predict the path trajectories for three types of simulations in the active and inactive states of the \({\upbeta }\)-2 adrenergic receptor (\({\upbeta }\) 2AR) GPCR. More specifically, these simulations are analysed for the 2RH1 (inactive state) and 3P0G (active state) structures, both with APO, Full Agonist (BI-167107) and Partial Inverse Agonist (carazolol). In addition, the LSTM variants are compared to other architectures, such as Random Forest (RF) and chains of Artificial Neural Networks (ANN), namely Convolutional Neural Networks (CNN) with LSTM (CNN-LSTM).

Advances in biotechnology, X-ray crystallography, and cryoelectron microscopy (cryo-EM) in recent years have generated an exponential increase in available GPCR simulation data, easing GPCR analysis, visualisation, and data-driven experimental designs5. Machine Learning can be used as tool to extract knowledge from complex data and different ML models have successfully been applied to many areas in proteomics, including the MD domain. They have been applied, for instance, to the study of protein pocket dynamics12, to enhance sampling13,14, and to generate new digital structures15,16. Some studies have used ML methods to identify different biological function states from MD conformations to explain the allosteric mechanism. For example, Fleetwood et al. (2020)17 used ML and statistical approaches (Principal Component Analysis, Random Forest, Autoencoder, Restricted Boltzmann Machine, and Multilayer Perceptron) to analyse conformational changes within the soluble proteins and ligand binding to a GPCR. Zhou et al. (2018)18, in turn, used Decision Trees and ANNs to classify ligand unbound and bound states from MD trajectories of PDZ2 protein. Both models achieved, in turn, 75% and 80% of predictive accuracy.

Most recent ML-based approaches concern the use of different variants of Deep Learning (DL) methods. Jumper et al. (2022)19 and Baek et al. (2021)20 developed, respectively, the very successful AlphaFold and RoseTTAFold models for protein structure forecasting from sequence as input. Other authors have applied CNN methods to the prediction of interactions between proteins21, protein with ligand22,22, protein folding, protein phosphorylation23, and protein structure classification24. Notice though that the input data for the CNN are images described as spacial arrays, whereas the characteristics that describe GPCRs conformation are not structured. Hayatshahi et al. (2019)25 distinguished otherwise similar allosteric states of proteins adopting conventional ML and DL approaches on extensive MD simulations. Plante et al. 201926 combined a densely connected ANN and the pixel representation to identify ligand bound and unbound states.

Recurrent Neural Networks are ANNs where connections between points-nodes can create a cycle. This enables them to show temporal dynamic behavior. They have been particularly successful in applications to human language modelling27. The LSTM models address a limitation of the RNN architecture, namely its inability to learn information from the distant past, allowing the network to dynamically learn to forget old aspects of information. They have been used to mimic trajectories produced by simulations28,29, achieving accurate short-term predictions. They have also shown great potential for sequence processing30, resulting in a large body of literature studying the trajectories from simulation systems31.

In Tsai et al. (2020)29, LSTMs were used to predict the temporal evolution of chemical/biophysical trajectories. Mohamma et al. (2019)32 applied these models to find temporal correlations between atoms. Kadupitiya et al. (2020)33 used LSTM for the numerical integrator that solves Newton’s equations in MD simulations33. Other authors have applied LSTM over the low-dimensional molecular simulations to detect rare events in the sequential data34. Liang et al.35 applied them to molecular step-positions forecasting for S-protein on the SARSCoV2 dynamics. Ludwig et al.(2022) evaluated the performance of BLSTMs in the task of increasing the 3D spacial resolution of MD trajectories as a data post-processing step.

We carried out a preliminary study using LSTM36, in which the best representation of amino-acids in 3D space to predict MD trajectories of a GPCR receptor as a whole was obtained, but, to the best of our knowledge, there is no reported work on MD forecasting by GPCR regions, which is the main concern of the current study.

Materials

GPCR MD simulations

The MD simulations used in this study were created in Google Exacycle cloud computing platform37 as a way to improve understanding of the drug efficacy at GPCR receptors. These simulations could be incorporated into a valid and functional structure-based drug discovery approach through pathway analysis. The simulations under study were created by Kohlhoff et al.38 by computing intensively short MD trajectories in parallel on the cloud platform, and are publicly available as source data at the SimTK (https://simtk.org/projects/natchemgpcrdata/) repository. The authors of these short simulations further analysed them by assembling larger trajectories using extensive sampling with Markov state modeling. We summarily describe these larger simulations next according to the description by their authors.

The crystal structure of the membrane for PDB id:2RH1 (inactive) and id:3P0G (active) was created from the OPM database39. Inactive (2RH1) and active structures (3P0G) without ligand (APO) in addition to binding of the receptor to the partial inverse agonist and the full agonist (2RH1 with BI-167107, and 3P0G with carazolol)40,41.

The structures were embedded in a bilayer of POPC lipid molecules in a orthorhombic box of size 10.0 \(\times \) 10.0 \(\times \) 8.5 nm. The system was solvated in TIP3P water molecules interspersed with Na+ and Cl- ions for molecular stabilisation with cholesterol and a final ion concentration of 0.15 M.

Protein, water, and ions were parameterized with the AMBER03 force field42 and lipids with the Berger unified atom force field. Carazolol and BI-167107 ligands were extracted from the PDB entries 2RH1 and 3P0G, respectively, and parameterized for the general Amber force field (GAFF)43 with acpype44 and antechamber45. For simulations in which the agonist and the partial inverse agonist were switched, the ligand positions were changed after superimposing the two crystal structures using all protein residues with atoms within 6Å of either ligand. The sizes of the resulting molecular dynamics systems range from 58,406 to 59,044 atoms.

The receptor structures of both N- and C-termini were not fully resolved during crystallography. In 2RH1, the structures involve residues from 30 to 342, and for 3P0G, residues from 23 to 344. In intracellular loop 3 (ICL3), between helices 5 and 6, the missing residues are substituted in 2RH1 and 3P0G with T4 lysosomes and a nanobody, accordingly. These residues are 231–262 in 2RH1 and 228–264 in 3P0G. \({\upbeta }\)2AR remains functional even in the absence of ICL3.

Hydrogen bonds and hydrogen bond networks enable intramolecular water to act as a facilitator of biomolecule dynamics. During the equilibrium and production experiments, water molecules were able to move freely within the simulation system and enter and exit the receptor during the simulations.

Considering ionic lock formation, a salt bridge between intracellular residues E268 and R131 is a feature of the receptor’s inactive state and disruption of this ionic lock is involved in receptor activation46. It has been demonstrated that the inactive state shows a mixture of ionic locks formed and broken at equilibrium47.

In the extracellular region, the helical movements extend around the mean helical position. The crystal structures of the active and inactive conformations on the extracellular side are almost identical in the movements of helices 2 and 3 (with a difference of less than 1% or less), while the other five helices are shifted from 0.379 Å (helix 4) to 0.773 Å (helix 1) in relation to each other. The active structure has a more compact helical formation than the inactive structure. During the simulation, helices 6 and 7 were compressed in all systems, while helices 4 and 5 moved slightly outward. Helix 1 showed the greatest relative movement within the simulations, particularly in the inactive structure.

The central region of the transmembrane helix shows the greatest stability compared to the intra- and extracellular sides. This region is usually quite condensed. The most significant distinctions between the active and inactive structures are observed in helices 1, 6 and 7. During the simulation, helix 6 appears to be converging towards the inactive state when the simulation is initiated from the active conformation. Helix 1, on the other hand, moves out in all systems.

The most notable structural differences are seen in the movement of the transmembrane helices in the intracellular region. Helices 6 and 7 are particularly distinct between the active and inactive structures, with a displacement of 6.951 Å and 3.47 Å respectively. Helices 1, 2, and 4 are further away from the center in the active state, while helix 3 is much closer (with a range of offsets from 1.4 to 2.277 Å).

Residues by helix, and limits of the helices by residue id are defined as follows (with residue numbers in brackets): Helix 1 (29–60), Helix 2 (67–96), Helix 3 (103–136), Helix 4 (147–171), Helix 5 (197–229), Helix 6 (267–298), Helix 7 (305–328)48.

Although Kohlhoff et al.38 assembled a larger simulation for the different crystal structures from the short simulations using an extensive sampling with Markov state modelling, the simulations in this study are based on short trajectories which are released at the source repository by the authors. More precisely, the simulations in the study comprise 2, 000 trajectories for each, classified into six types of crystal structure of \({\upbeta }\)2AR: APO for the simulations of 2RH1-icl3 and 3P0G-a, Full Agonist (FA) for the simulations of 2RH1-b and 3P0G-b, and Partial Inverse Agonist (PIA) for 2RH1-c and 3P0G-c. The receptor consists of 282 amino acids for the inactive and 344 for the active states. Each trajectory describes the 3D position of the receptor along 28 consecutive time-steps (trajectory length), which are hereon referred to as frames. The time elapsed between each frame is 500 picoseconds. Activation and deactivation proceed through multiple pathways and typically visit metastable intermediate states. The simulation data under study, as in Gutiérrez-Mondragón et al.49, primarily comprise intermediate states of each receptor.

Structural sequence domains

GPCRs have three main structural regions, namely a seven-helix TM domain, an extracellular domain built by the N-terminus and three ECLs, and the intracellular domain, including the C-terminus and two ICLs50.

Table 1 provides a detailed description of each region of the \({\upbeta }\)2AR-GPCR receptor under study. The 2RH1 and 3P0G structures contain residues 30-344. Both have gaps in the sequence, where ICL 3 between TM2 and TM3 is replaced in 2RH1 and 3P0G with T4-lysozyme and a nanobody, respectively. These residues are 231-262 for 2RH1, and 228-264 for 3P0G. \({\upbeta }\)2AR remains functional even in the absence these regions.

Figure 1 represents the common structure of a \({\upbeta }\)2 adrenergic GPCR. In it, the 7 TM, 2 ICL and 3 ECL regions are shown. In addition, BI-167107 ligand binding with the protein is displayed in an image inset.

Figure 1
figure 1

Schematic representation of the \({\upbeta }\)2 adrenergic GPCR, including TM, ECL and ICL regions in Full Agonist (BI-167107) trajectory.

Table 1 \({\upbeta }\)2AR-GPCRs amino acid distribution by regions for inactive (2rh1) and active (3p0g) states.

Methods

The long short-term memory model

LSTM11 is a neural network of the RNN family, designed for the analysis of temporal data. A schematic explanation of how LSTM works is shown in Fig. 2. Summarily, LSTM has an input gate (i), a forgetting gate (f), a memory gate (c) and an output gate (o). The input gate decides whether to let the incoming signal go through to the memory gate, or block it. The output gate could allow a new signal output, or avoid it trough the memory gate. The forgetting cell is responsible for remembering or forgetting the previous state of the memory gate. The update of memory gate states is carried out by feeding the previous output gate back onto itself by recurrent connections of two consecutive time steps. The reading-and-writing memory cell is controlled by a group of sigmoid gates (x). At a given time, the LSTM receives inputs from different sources: the current amino-acid positions \(\textit{X}_\textit{xyz}\) as the input, the previous hidden state of all LSTM units (h), as well as the previous memory gate state \(\textit{c}_{(t -1)}\). Then, the output gate returns the estimated probability of the next 3D amino-acid positions for the sequences (Px, Py, Pz).

In short, LSTM solves one of the limitations of the RNN architecture, namely the inability to learn information from the far past. Therefore, LSTMs are able to accumulate information for a long period of time by allowing the network to dynamically learn and forget old aspects of information. In this work, an LSTM model was trained with 3D position sequences to predict their next movement.

In this paper, ULSTM, BLSTM, as well as CNN-LSTM as a chain of ANNs are investigated. ULSTM works by processing data in the forward direction, while BLSTM processes sequence data in both forward and backward directions with two separate hidden layers51. In addition, the LSTM variants are evaluated with other ML approaches, namely RF and CNN, in order to compare the results obtain with model architectures of different complexities.

Figure 2
figure 2

Illustration of an LSTM unit. X represent the input data for 3D positions of the amino-acids, and P their 3D predicted positions.

Convolutional neural network

The CNN is a feedforward Neural Network proposed by Lecun et al.52 that has been shown to perform exceedingly well in image and natural language processing tasks53. It can also be used effectively to predict time series. Local perception and weight sharing of the CNN model can dramatically reduce its number of parameters, increasing the effectiveness of its learning process54. The CNN architecture consists mostly of a convolution layer and a two-part regrouping layer. Each convolution layer contains a plurality of convolution kernels. Following the convolution operation of the convolution layer, the data features are extracted, but their dimension is very high, so, and to reduce the computational cost of training, a pooling layer is added after the convolution layer to reduce the feature dimension55. In our experiments, a combined CNN-LSTM shallow Neural Network-based forecasting model has also been applied. Figure 3 shows an schematic representation of such a model architecture, and Table 2 provides a brief quantitative summary of its elements.

Figure 3
figure 3

Illustration of CNN-LSTM chain of algorithms. n-residues represent the consecutive position of the amino acids in the protein.

Table 2 Deep neural networks architecture.

Experimental methodology

Data underwent linear max-min normalisation56 and were returned to the original range of values in Angstrom (Å) units. The 3D positions of amino acids were extracted for each frame. Note, though, that the original database included the positions of atoms, instead of the positions of amino acids. Therefore, the amino acids mass centers were calculated and used as 3D positions that represent them. Data prepossessing was performed on the 3D positions (x, y, z dimensions) of each residue. Five time frames were used to train the model, and the next frame was predicted from these. An overlap of 4 frames was used to select the following sequence on the training set; this means that, in the end, each frame was predicted, providing the 5 previous frames in the simulation. Then, the average error of all predictions was calculated and reported.

Two thousand trajectories per simulation were used. Simulations are represented in two states: \({\upbeta }\)2AR-2RH1 (simulations started in inactive state) and \({\upbeta }\)2AR-3P0G (simulations started in inactive state) both for APO, Full Agonist (BI-167107) and partial inverse antagonist conforming 6 types of the simulation. We refer to trajectories as nClones.

The LSTM model training was carried out using the 3D amino acid position (xyz) per frame. We refer to the length of a sequence as nSteps-in (which was 5 in our experiments), and the position of the amino acid representative data point as the center of mass. The centers of mass of the amino acid were employed as a representation of the residue’s position. This center of mass parameter is the best representative of the amino acid in the 3D space under LSTM forecasting36. All experiments were carried out in this way.

The parameters of training configuration of the LSTM were: epochs = 100, verbose = 0, activation = relu, input shape = (nSteps-in, length of amino acid chain). All the remaining parameters of the Keras57 framework were retained by default.

The RF58 algorithm, used for comparison, is a more conventional ML approach that can behave as a regression model. The Sklearn library59 implementation was used with default parameters, with the exception of maxdepth = number of residues 3 (xyz positions) and randomstate = 0.

For each type of simulation, 10,000 trajectories are available from the database reported in Kohlhoff et al.38. Two thousand of these are randomly selected and split into 5 folds (Four of them were used for cross-validation, and the remaining hold-out fold was used for test.) each including 400 trajectories. Results do not improve by using a bigger number of simulations. Ten iterations of this procedure were performed to obtain statistical significance tests.

The experiments were repeated by randomly choosing 2000 trajectories from the original set of trajectories (10,000). This procedure generates 10 models that have been evaluated and yields 10 RMSD values per experiment (ML approaches and the 6 types of simulations). Students t-tests (pvalue \(< 0,001\) and pvalue \(< 0,005\))60 were carried out to find statistically significant differences between experiments. Standard deviations and p-values are shown in the tables of results.

Given the potentially higher flexibility of the loops (ECL and ILC), a specific analysis was performed in which only the amino acids belonging to the TM were used for training and prediction. These results can then be compared with models obtained by training with the full protein sequences.

The quality of the test predictions was assessed through the Root Mean Square Deviation (RMSD), commonly used to assess the similarity between simulated and predicted atomic coordinates and therefore straightforwardly generalizable to the centers of mass of the amino acids, as we have have done in this study.

Experimental setup

Three experiments were performed for both GPCR states. The first experiment (E1) evaluates the capability of the RF,ULSTM, BLSTM and CNN-LSTM models to predict steps of the GPCR trajectories, discriminating by TM, ECL and ICL regions for active and inactives states.

For that, the prediction error of each of the amino acid positions was calculated and compared between models, region by region. Similar evaluations were carried out in the second experiment (E2), this time focused on the seven TM, comparing the prediction error between the 2RH1 and 3P0G for each TM (TM1 to TM6).

Focusing now on the large dynamics of the ICL and ECL of the GPCR, Experiment E3 evaluates the prediction capability of the models on the ICL (ICL1, ICL2) and ECL (ECL1, ECL2, ECL3) regions.

Results and discussion

Our study investigates the capability of LSTM models to predict GPCR MD trajectories for its different states and constituent regions. Different GPCR regions may play different roles in the MD associated to each state. The investigation of regions individually is therefore relevant.

As mentioned, prediction errors are reported using the RMSD61, with original range values in Angstroms (Å) units. The standard deviation (std)62 between experiment repetitions is also calculated for the RMSD metric. Table 3 shows the RMSD and the std in Å for 2RH1 inactive and 3P0G active states, in which prediction errors for RF, ULSTM, BLSTM, CNN-LSTM are compared. The results discriminated by APO, PA and PIA simulations are also shown in Table 3.

Regarding experiment E1, three questions can be answered:

  • Which model is the best one? TM, ECL and ICL regions show similar prediction error values for the different models. Furthermore, these errors are not uniform when comparing APO, FA and PIA simulations (see italics in Table 3). Despite the increasing complexity of the models evaluated (RF, ULSTM, BLSTM and CNN-LSTM), no clear differences were found between them with the exception of RF, which performed significantly worse than the rest. Using an Occam’s razor criterion, ULSTM, being the simplest model in terms of architecture complexity and computational resources used to obtain this goals, should be selected.

  • Which GPCR region yields best results and in what analysis? The TM regions are shown to be the best predicted regions of the GPCRs (see bold values in Table 3, with significant differences with \(p<0.005\) for \({\upbeta }\)2AR-2RH1 (inactive state in APO) and \(p<0.001\) for the remaining simulations respect to ICL and ECL . Comparing now the ICL and ECL regions, ILC achieve the minimum error for APO, FA and PIA simulations, with the exception of the active state of the APO simulation, where the minimum error was obtained for the ECL region, however no significance different are shown between these regions.

  • Are there any substantial differences between active and inactive states? Regarding inactive (2RH1) and active (3P0G) states, no strong differences were observed between their regions. Two exceptions are found in this analysis, once for the ECL region in the APO state and the TM region in the FA state that show significant differences ( pvalue \(< {0.05 }\)), see RMSD values in Table 3 with single asterisk). Only in these cases, the active state show clear differences from the inactive one.

The experimental results show that the LSTM model performed the best in predicting the dynamics of the TM regions and that, overall, the ICL regions yielded the highest prediction error. However the dynamics of the GPCR are different by region determining that some regions are more prone to conformational changes. The flexibility of the molecule was calculated by region and compared by examining the prediction error between regions, as shown in Tables 3 and 6.

The transmembrane regions are less flexible than the ILC and EXL regions, which are more likely to experience changes in their 3D structure. This may be one of the reasons why minor error has been obtained in the transmembrane regions.

Regarding now E2 and focusing on the six TM regions, the prediction errors are shown in Tables 4 and 7. The former includes results obtained training the models with the whole receptor, while the latter includes results obtained training the models only with the TM regions. No statistically significant differences (\(p>0.05\)) are observed between the errors obtained for models trained with loops and those trained without loops. In general terms, TM2 was the best predicted region for the active state. These results coincide for all the simulations for APO, FA and PIA. However, for the inactive state, the best values were obtained for TM2 or TM3, depending on the simulation, showing substantial differences between both regions. TM2 and TM3 do not show statistically significant differences between them (\(\textit{p}>0.05\)). However, TM2 shows significant differences with respect to TM1, TM4, TM5, TM6, TM7 in almost all simulations and models (except in active state for PIA simulation with BLSTM in frot of TM4), see value \(^{*1}\) in Table 4.

A more detailed analysis of the experimental results provided further information on the MD of the specific receptor regions. While TM2 coincides for all the simulations as the best predicted in active and inactive states, TM3, instead, is the best for inactive states in PIA simulations, but it do not show significance (pvalue \(> 0.05\))differences in contrast of TM2.

These proteins share a highly conserved motif of seven transmembrane helices connected by three extracellular and three intracellular loops. Movements of transmembrane regions III and IV are responsible for the activation of G protein-coupled receptors63. The conformational changes of the receptor transmembrane regions are closely related to the \({\upbeta }\)2-adrenergic activation (\({\upbeta }\)2AR) pathway64. It is known that an outer displacement of TM6 from the centre of the helices and displacements of TM5 and TM7 are part of the activation mechanism of a receptor10. However, the details of the mechanisms of interaction between residues, which unchain the activation, are still unclear. The Helices 6 and 7 of the original simulation show a strong difference between inactive and active structures, with a relative displacement of 6.951 and 3.47 Å, respectively. Helices 1, 2, and 4 are shifted away from the center in the active state, while helix 3 is noticeably nearer (the range of relative displacements is from 1.4 to 2.277 Å). During simulation, helix 6 moves inwards in the active state simulations, while helix 7 moves outwards. When Kohlhoff et al.,38 compare simulations started from the active with those started from the inactive state, the relative displacements of helices 1 through 4 between active and inactive remain almost constant, indicating the importance of their rearrangement as a distinguishing element of receptor activation. Therefore, it is essential to carry out this experiment to examine each domain of the transmembrane region of the receptor in detail. It could easily be claimed that the region that is structurally incomplete is most inaccurately modelled.

The original paper describing the MD simulations used in this study did not attempt to model the missing sections, which becomes a limitation of our reported results. Regarding missing sections in ICL (231-262 in 2RH1, and 228-264 in 3P0G between helices 5 and 6), it is difficult to draw solid conclusions about the differences between ICL and ECL since the protein is not accurately modelled. Nevertheless, \({\upbeta }\)2AR remains functional even in the absence of ICL3.

The prediction errors for experiment E3 are shown in Tables 5 for ICL and 6 for ECL. For the ICL regions, ICL1 was identified as the region with the lowest prediction error in both states and all simulations. Interestingly , the accurate prediction of MD of the ICL1 region contrasts with the results of ICL2, which yields a significantly statistics differences (pvalue \(< 0.01\)), with RMSD differences greater than 0.2 Å.

In the case of the ECL regions, both simulations showed that ECL1 had the lowest prediction error, which was significantly different from ICL2 (pvalue \(< 0.05\)) and from ECL3 (pvalue \(< 0.01\)). These results coincide for APO and FA simulations (see the RMSD values in Table 6). Regarding the PIA simulations, only significant differences (pvalue \(< 0.05\)) were found between ECL1 and ECL2 (Table 7).

Beyond that, our experiments were carried out with three ML models of increasing structural complexity in terms of their network architecture. The results show inconclusive differences between DL models, with minor differences depending on the simulations and state. Therefore, the simplest of these models, namely ULSTM, would be the preferred choice for further investigation. In the comparison of DL models (ULSTM, BLSTM, CNN-LSTM) with conventional ML models (RF), DL models have shown strong and significant (pvalue \(< 0.01\)) differences with RF.

The existing literature has reported the use of the RMSD error in the static 3D prediction for different proteins: for instance, Chen & Brooks65 used it as a metric to ascertain whether MD simulations provide high-resolution refinement of protein structure. Lee et al.66 predicted 3D structure using molecular mechanics based on the surface area free energies for two small proteins (HP-36 and S15). The RMSD error values obtained were 0.77 Å for HP-36 protein and 0.83 Å for S15 protein. More specifically for GPCRs, Kaczor et al.67 analysed different methods for protein-protein docking and evaluated the generation of new digital protein-protein complexes in the transmembrane environment. The best method achieved an overall RMSD lower than 0.7 Å in 8 out of 12 simulations. Even if not directly comparable to this study, we have reported errors lower than 0.13 Å as measured by RMSD in simulations of dynamics that are a major challenge for models.

The approach proposed in this paper allows predicting 3D residue positions from the MD time series. It could be used in prospective experiments by setting threshold error targets for the discrimination between states and exploring whether the method can achieve them and how well they compare with those obtained with alternative methods. Furthermore, such investigation could be assisted by visualising results by residue, as in Figure 4, which maps prediction errors using coloured ribbons. This would allow for visual interactive and intuitive evaluation, assessing which are the best or worst modelled residues, distinguishing those that are exposed to the solvent, or those exposed to the ligand, to name a few possibilities.

Table 3 Prediction error of the RF, ULSTM, BLSTM and CNN-LSTM models, as measured by RMSD and std in Å units for TM, ICL, ECL loops of the APO, FA, PIA simulations for 2rh1 and 3p0g states.
Table 4 Prediction error of the RF, ULSTM, BLSTM and CNN-LSTM models, as measured by RMSD metric in Å units. The six transmembrane regions (TM) regions of the APO, FA and PIA simulations for 2rh1 and 3p0g states.
Table 5 Prediction error of the RF, ULSTM, BLSTM and CNN-LSTM models, as measured by RMSD in Å units add std on the tables. The intracellular loops (ICL) of the APO, FA, PIA simulations for 2rh1 and 3p0g states.
Table 6 Prediction error of the RF, ULSTM, BLSTM and CNN-LSTM models, as measured by RMSD metric in Å units. The three extracellular loops (ECL) of the APO, FA, PIA simulations for 2rh1 and 3p0g states.
Table 7 Predictionerrorofthe RF, ULSTM, BLSTM and CNN-LSTM models trained only with TM regions (and, therefore, to be compared with results in Table 4), as measured by RMSD metric in Å units. The six transmembrane regions (TM) regions of the APO, FA and PIA simulations for 2rh1 and 3p0g states.
Figure 4
figure 4

Colour-coded mapping of the average prediction errors (as measured by RMSD in Å) of LSTM models for the \({\upbeta }\)2 adrenergic GPCR for TM, ECL and ICL regions, in columns, and for APO, Full Agonist (FA) and Partial Inverse Antagonist (PIA) in rows. Blue colour represent the ligand bound to the protein.

Conclusions

LSTM Neural Networks have in the past shown promise in problems of GPCR dynamics forecasting. The current study has provided evidence that LSTM models, in three different architectures, are capable of predicting the dynamic trajectories of GPCRs in six states with a reasonable efficacy, and far better that more standard ML models such as RF.

The TM helices are a key GPCR region due to their physiological role in signal transmission. Our LSTM models have been abble to predict the dynamics of TM2 and TM3 the best. Nevertheless, the details of the mechanism of interaction between amino-acids that unchains the activation remain unclear.

Although ULSTM is the shallowest of the investigated DL architectures, it has yielded competitive performance when compared to more complex models such as BLSTM and the combination CNN-LSTM.

LSTM models, though, suffer from some limitations when used to process long MD trajectories. For this reason, as a next step, we plan to investigate the capabilities of generative models (which have successfully been used for the modelling of protein MDs16) such as Transformers68 or Autoencoders69, for the prediction of long trajectories. We also plan to evaluate alternative representations of GPCR data, including graph representations.

No significant differences were observed between models trained with loops and only with the TM regions. This could be due to the fact that the data representation used by the models is a frame-by-frame relationship. We suggest that further research should be conducted on the representation of molecular dynamics through graphs that explicitly consider the connections between neighbouring residues.