Deep learning to decompose macromolecules into independent Markovian domains

Mardt, Andreas; Hempel, Tim; Clementi, Cecilia; Noé, Frank

doi:10.1038/s41467-022-34603-z

Download PDF

Article
Open access
Published: 19 November 2022

Deep learning to decompose macromolecules into independent Markovian domains

Nature Communications volume 13, Article number: 7101 (2022) Cite this article

5023 Accesses
8 Citations
17 Altmetric
Metrics details

Subjects

Abstract

The increasing interest in modeling the dynamics of ever larger proteins has revealed a fundamental problem with models that describe the molecular system as being in a global configuration state. This notion limits our ability to gather sufficient statistics of state probabilities or state-to-state transitions because for large molecular systems the number of metastable states grows exponentially with size. In this manuscript, we approach this challenge by introducing a method that combines our recent progress on independent Markov decomposition (IMD) with VAMPnets, a deep learning approach to Markov modeling. We establish a training objective that quantifies how well a given decomposition of the molecular system into independent subdomains with Markovian dynamics approximates the overall dynamics. By constructing an end-to-end learning framework, the decomposition into such subdomains and their individual Markov state models are simultaneously learned, providing a data-efficient and easily interpretable summary of the complex system dynamics. While learning the dynamical coupling between Markovian subdomains is still an open issue, the present results are a significant step towards learning Ising models of large molecular complexes from simulation data.

DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model

Article Open access 05 February 2024

Machine learning/molecular dynamic protein structure prediction approach to investigate the protein conformational ensemble

Article Open access 15 June 2022

Accurate Prediction of Protein Structural Flexibility by Deep Learning Integrating Intricate Atomic Structures and Cryo-EM Density Information

Article Open access 02 July 2024

Introduction

The understanding of protein function is often interlinked with understanding protein dynamics. Molecular dynamics (MD) simulations are a valuable tool to study these dynamics on an atomistic level^1,2,3,4,5,6. However, further methods are necessary to extract the statistically relevant information and to help overcome the discrepancy between feasible simulation length and the timescales of relevant processes. A common approach to enhance sampling of a specific process of interest is to bias the simulation along a reaction coordinate aligning with the process^{7,8,9,10,11,12,13}. In comparison, the Markov modeling approach^{14,15,16,17,18,19,20} extracts kinetic information and tackles the sampling problem without requiring the definition of few predefined reaction coordinates by combining arbitrary numbers of short unbiased distributed simulations to model the long-timescale behavior of target systems. Consequently, multiple software packages^21,22 have been developed over the last decade providing assistance in estimating these models. They often include a pipeline for feature selection^21,22,23,24, dimension reduction^{25,26,27,28,29,30,31}, clustering^32,33,34,35, transition matrix estimation^15,19,36,37, and coarse graining^{38,39,40,41,42,43,44}. Markov state models (MSMs) have been applied to a wide range of molecular biology problems such as protein aggregation^45,46,47 or ligand binding^48,49,50 and can be a valuable tool to understand experimental data on the atomistic scale^51,52.

The necessity to assess a model’s performance and thereby rank its quality encouraged the development of variational methods^53,54, in particular the variational approach for Markov processes (VAMP)⁵⁵. This variational formulation has allowed us to replace the aforementioned pipeline with an end-to-end deep learning framework called VAMPnet⁵⁶, which simultaneously learns a dimension reduction of the molecular system to the collective variables best describing the rare event processes and an MSM on these variables. The framework can be used to further drive MD simulations along these learned collective variables^57,58. We can also use this framework to estimate statistically reversible MSMs and incorporate constraints from experimental observables^59,60,61.

Despite these developments, there is a fundamental scaling problem in describing MD in terms of transitions between global system states: While the assignment of MD configurations to discrete global states representing the metastable groups of structures is an excellent model for small cooperative molecular systems, such as small to medium proteins, larger molecular systems (e.g., proteins with hundreds of amino acids) have an increasing number of subsystems whose dynamics are (nearly) independent⁶² (Fig. 1). Consider, for example, a solution of N proteins which undergo transitions between their open and closed states independently when these proteins are dissociated and these transitions only (partially) couple when they are associated with other proteins. The number of global system states is 2^N, i.e., grows exponentially with the number of subsystems N^63,64. This means any form of simulation or analysis which explicitly distinguishes global system states will not scale to large molecular systems.

**Fig. 1: The iVAMP concept as visualized by modeling dynamics of a protein that has two independent, flexible regions separated by a rigid barrel.**

At the same time, the (approximate) independence between subsystems is also key to the solution of the problem. A scalable solution needs to address two separate issues: (a) dividing the protein system into approximately Markovian subsystems and (b) learning the coupling between them. Olsson & Noé⁶³ made a first attempt at (b), by learning a dynamic graphical model between predefined subsystems. This approach leads to a graphical model, or Markov random field, resembling Ising or Potts models in physics, with the key difference that both the definition of the individual subsystems or spins as well as their transition dynamics need to be learned. In contrast, Hempel et al.⁶⁴ proposed a solution for (a) by approximating the global system dynamics as a set of independent (uncoupled) Markov models (termed Independent Markov decomposition, IMD). They furthermore propose a pairwise independence score of features, which allows to detect nearly uncoupled regions where independent Markov state models can be estimated subsequently.

In this manuscript, we present a joint IMD and VAMP approach (termed independent VAMPnet, or shorthand iVAMPnet) that significantly advances our ability to identify approximately independent Markovian subsystems (issue a) by generalizing IMD to neural network basis functions. iVAMPnets are an integrated end-to-end learning approach that decomposes the macromolecular structure into subsystems that are dynamically weakly coupled, and estimates a VAMPnet for each of these subsystems to promote a comprehensible analysis of the subsystem dynamics (Fig. 1). In comparison to previous implementations of IMD, our approach learns an optimal decomposition into independent subsystems and can find collective variables that are nonlinear combinations of the input features.

Results

Markov state models and Koopman models

Markovian dynamics can be modeled by the transition density:

$${p}_{\tau }({{{{{{{\bf{y}}}}}}}}|{{{{{{{\bf{x}}}}}}}})={\mathbb{P}}({{{{{{{{\bf{x}}}}}}}}}_{t+\tau }={{{{{{{\bf{y}}}}}}}}|{{{{{{{{\bf{x}}}}}}}}}_{t}={{{{{{{\bf{x}}}}}}}}),$$

(1)

which is the probability density to observe configuration y at time t + τ given that the system was in configuration x at time t. Based on the transition density we can characterize the time evolution of a probability density χ as:

$${\chi }_{t+\tau }({{{{{{{\bf{y}}}}}}}})=\int{p}_{\tau }({{{{{{{\bf{y}}}}}}}}|{{{{{{{\bf{x}}}}}}}}){\chi }_{t}({{{{{{{\bf{x}}}}}}}}){{{{{{{\rm{d}}}}}}}}{{{{{{{\bf{x}}}}}}}}.$$

(2)

By discretizing the molecular state space in a suitable way and defining a transition matrix T between discrete states, we can linearize this equation as:

$${{{{{{{{\boldsymbol{\chi }}}}}}}}}_{t+\tau }({{{{{{{\bf{y}}}}}}}})={{{{{{{{\bf{T}}}}}}}}}_{\tau }^{T}{{{{{{{{\boldsymbol{\chi }}}}}}}}}_{t}({{{{{{{\bf{x}}}}}}}})$$

(3)

This is the equation of a Markov state model, where the element i of the vector χ_t+τ(y) is the probability to be in the discrete state i at time t + τ. Furthermore, the transition matrix elements ${({{{{{{{{\bf{T}}}}}}}}}_{\tau })}_{ij}$ describe the transition probabilities for jumping to state j given state i within a time τ. In the case of fuzzy state assignments, e.g., as with VAMPnets, Eq. (3) describes the more general Koopman model⁶⁵ and T_τ becomes the Koopman matrix. This means that probability densities are still propagated but the matrix elements cannot be interpreted as transition probabilities.

The lag time τ is common to all Markovian models and is usually chosen with the aid of an implied timescales test⁶⁶. If a too small τ is chosen, the resulting model is not a valid Markov model (resulting in errors of the predicted variables)—a too large lag time produces a model that unnecessarily discards kinetic information. We therefore usually choose the smallest lag time above which the implied timescales are approximately constant.

We now seek to find a state assignment χ and model matrix T that satisfy Eq. (3) and also succeed in predicting the long-time behavior, i.e., for multiples of the lag time τ. Formally, χ are (initially unknown) basis functions, i.e., we assume that the relevant dynamic features can be expressed by a linear combination of them. VAMP⁵⁵ tells us that an optimal solution is reached when χ can span the left ${({\psi }_{1},...,{\psi }_{k})}^{T}$ and right singular functions ${({\phi }_{i},...,{\phi }_{k})}^{T}$ of the transition operator. They can be found by maximizing the singular values of a matrix that can be estimated from simulation data (see Eqs. (9)–(13) in “Methods”). In the case of a VAMPnet⁵⁶, deep neural networks are trained by maximizing the VAMP score, so as to represent optimal fuzzy state assignments. In equilibrium, the singular functions correspond to the eigenfunctions of the Markov state model and the singular values to its eigenvalues. As the Koopman model still propagates densities, it is instructive to inspect the eigenfunctions and implied timescales of T since they describe the slow dynamics of a given system.

iVAMPnets and iVAMP-score

To implement iVAMPnets, we need to bridge the gap between the deep neural networks of VAMPnets and the spatial decomposition of independent Markov models. The general idea is to set up multiple parallel VAMPnets, each modeling the Markovian dynamics of a separate, independent subsystem of the molecule, together with an attention mechanism that identifies these subsystems. Thus, each independent VAMPnet should only receive the time dependent molecular geometry features representing its specific subsystem. For example, such an attention mechanism could separate different protein domains and channel the data of individual domains to separate VAMPnets. We, therefore, develop an architecture that combines a meaningful attention mechanism and parallel VAMPnets and trains them with a loss function that simultaneously promotes dynamic independence between the subsystems and slow kinetics within each subsystem (Fig. 2). iVAMPnets are designed to optimize both these objectives simultaneously.

**Fig. 2: Architecture of an iVAMPnet for N subsystems, where trainable parts are shaded green.**

In practice, we extract all time-lagged data pairs x_t, x_t+τ that contain all molecular geometry features (e.g., distances, contacts, torsions) of our simulation data and pass them through the architecture presented in Fig. 2. The data is fed through an attention mechanism (represented by the matrix G) that yields subsystem specific vectors ${{{{{{{{\bf{Y}}}}}}}}}_{t}^{i}$, each of which attends to features relevant for subsystem i. These vectors then serve as inputs to N parallel feature transformations ηⁱ (parallel VAMPnets) which transform those into output features χ¹, …χ^N (with ${{{{{{{{\boldsymbol{\chi }}}}}}}}}^{i}({{{{{{{{\bf{x}}}}}}}}}_{t})={{{{{{{{\boldsymbol{\eta }}}}}}}}}^{i}({{{{{{{{\bf{Y}}}}}}}}}_{t}^{i}({{{{{{{{\bf{x}}}}}}}}}_{t}))$) that represent slow collective coordinates or directly fuzzy assignments to metastable Markov states of each molecular subsystem. Equipped with the state assignments, we can compute correlation matrices (Eq. (9)) and derive a Koopman model matrix from those (Eq. (10)). As in VAMPnets, the feature transformations η¹, …η^N are represented by deep neural networks. In the present study, we use multilayer perceptrons with a SoftMax output layer representing fuzzy state assignments. However, other architectures could be chosen, e.g., graph convolution networks when parameter sharing is desired^67,68, and a linear output layer could be chosen if the aim is to represent slow collective variable rather than discrete states^57,58. The parameters of the feature transformations η and the attention matrix are learned end-to-end via backpropagation.

In more detail, given N individual subsystem models, the global system state can be given by the Kronecker product of all subsystem states:

$${{{{{{{{\boldsymbol{\chi }}}}}}}}}^{G}({{{{{{{{\bf{x}}}}}}}}}_{t})=\mathop{\bigotimes}\limits_{i}{{{{{{{{\boldsymbol{\chi }}}}}}}}}^{i}({{{{{{{{\bf{x}}}}}}}}}_{t})$$

(4)

and by computing the global correlation matrices $({{{{{{{{\bf{C}}}}}}}}}_{00}^{G},{{{{{{{{\bf{C}}}}}}}}}_{0\tau }^{G},{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{G})$ from Eqs (9) using χ^G. We note that this step does not require that we have independent Markovian models, but it is simply a formalism to express global states in terms of a combination of local states.

Furthermore, we construct a candidate for the global Koopman model from the subsystem models by combining the individual singular values and vectors with a Kronecker product⁶⁴:

$${\hat{{{{{{{{\bf{K}}}}}}}}}}^{G}=\mathop{\bigotimes}\limits_{i}{{{{{{{{\bf{K}}}}}}}}}^{i}\qquad {\hat{{{{{{{{\bf{U}}}}}}}}}}^{G}=\mathop{\bigotimes}\limits_{i}{{{{{{{{\bf{U}}}}}}}}}^{i}\qquad {\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}=\mathop{\bigotimes}\limits_{i}{{{{{{{{\bf{V}}}}}}}}}^{i}.$$

(5)

The matrices ${\hat{{{{{{{{\bf{U}}}}}}}}}}^{G}$ and ${\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}$ map the global state assignments onto the constructed singular functions and are computed from the local matrices as defined in Eqs. ((11), (12)). The diagonal matrix ${\hat{{{{{{{{\bf{K}}}}}}}}}}^{G}$ encodes the singular values and is computed from the subsystem singular value matrices via Eq. (10).

In order to evaluate the performance of the constructed model to predict the dynamics in the global state space, the VAMP-E validation⁵⁵ score can be exploited,

$$\begin{array}{l}{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{G}={{{{{{{\rm{tr}}}}}}}}\left[2{\hat{{{{{{{{\bf{K}}}}}}}}}}^{G}{({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{0\tau }^{G}{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}-\right.\\ \left.{\hat{{{{{{{{\bf{K}}}}}}}}}}^{G}{({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{00}^{G}{\hat{{{{{{{{\bf{U}}}}}}}}}}^{G}{\hat{{{{{{{{\bf{K}}}}}}}}}}^{G}{({\hat{{{{{{{{\bf{V}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{G}{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}\right].\end{array}$$

(6)

The VAMP-E score measures the difference between the estimated Koopman model and the true dynamics. Here, it is evaluated for the global state assignments ⨂_i χⁱ (as encoded in ${{{{{{{{\bf{C}}}}}}}}}_{00}^{G},{{{{{{{{\bf{C}}}}}}}}}_{0\tau }^{G},{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{G}$) mapped on the constructed singular functions (as encoded in ${\hat{{{{{{{{\bf{U}}}}}}}}}}^{G},{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}$). If the subsystems are independent the constructed singular functions are optimal and the singular values of the global system are indeed the product of singular values of the subsystems (as formalized in Conditions for independent systems, also see Supplementary Note 1). In this case, the global VAMP-E score Eq. (6) has a product form

$${{{{{{{{\mathcal{R}}}}}}}}}_{E}^{G}=\mathop{\prod}\limits_{i}{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{i}$$

(7)

that poses a necessary condition for subsystem independence.

To finally train the model, we develop a loss function that (i) maximizes the global VAMP-E score, assuming that they describe independent dynamics (Eqs. (4)–(6)), and (ii) minimizes a term that penalizes statistical dependence between these subsystems (Eq. (7)) scaled by a weighting factor ξ.

We evaluate the scores only pairwise, to escape the growth of the global state space, and sum over all possible pairs i, j:

$$L=-\mathop{\sum}\limits_{i < j}{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{ij}+\xi \mathop{\sum}\limits_{i < j}\frac{||{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{ij}-{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{i}{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{j}||}{{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{ij}}.$$

(8)

Here, ${{{{{{{{\mathcal{R}}}}}}}}}_{E}^{ij}$ measures the quality of the constructed Koopman model of subsystems i and j and is computed using Eq. (6). The weighting factor ξ is a hyperparameter that should be chosen large enough to find decoupled systems and small enough to not interfer with the subsystem dynamics. Even though the choice of an appropriate ξ depends on the nature of the dynamics and the coupling, it is directly related to the training procedure as it, briefly, balances focus of the optimizer between kinetics and decoupling. Further conditions (Eq. (18)), which evaluate the independence of the singular functions and values, can be used as post training validation metrics for adjusting ξ and for testing to which degree dynamically independent subsystems were found.

Benchmark model with two independent subsystems

The iVAMPnet architecture, which is implemented using PyTorch⁶⁹, is depicted in Fig. 2. Generally, various neural network architectures are possible; we here choose fully connected feed forward neural networks with up to 5 hidden layers with 100 nodes each. The scripts to reproduce the results including the details for the training routine, choice of hyperparameters, and network architecture can be found in our GitHub repository. We note that an implementation of VAMPnets is available in the current version of DeepTime⁷⁰.

We first demonstrate that iVAMPnets are capable of decomposing a dynamical system into its independent Markovian subsystems based on observed trajectory data using an exactly decomposable benchmark model (Fig. 3).

**Fig. 3: Hidden Markov state model as a benchmark example for independent subsystems.**

Akin to the protein illustrated in Fig. 1, we define a system that consists of two independent subsystems with two and three states, respectively. It is modeled by two transition matrices with the corresponding number of states. We sample a discrete trajectory with each matrix (100k steps)⁷⁰. The global state is defined as a combination of these discrete states. The discrete subsystem states are now interpreted as the hidden states of hidden Markov models⁷¹ that emit to separate, subsystem-specific dimensions of a 2D space. The output of each subsystem is modeled with Gaussian noise $N({\mu }_{i},\tilde{\sigma })\in {\mathbb{R}}$ that is specific to the state that the system is in, specified by the mean μ_i, and a constant $\tilde{\sigma }$. The two state subsystem, therefore, describes a jump process between Gaussian basins along the x-axis and the three state subsystem along the y-axis, respectively (Fig. 3a). These variables compare to collective variables of the green (x) and blue (y) system depicted in Fig. 1. Please note that while in this benchmark system the relevant slow collective variables are known, iVAMPnets are generally capable of finding them (cf. 10D hypercube benchmark model and Synaptotagmin-C2A).

Since the generative benchmark model consists of perfectly independent subsystems and the pair already describes the global system, our method can simply be optimized for the global VAMP-E score (Eq. (6)) without the need for any further constraints. We train a model with a two and three state subsystem at a lag time of τ = 1 step.

Once trained, the iVAMPnet yields a model of the dynamics in each of the identified subsystems. As expected, we find that the estimated transition matrices for both subsystems closely agree with the ground truth (Fig. 3c). To additionally assess the slow subsystem dynamics in more detail, we borrow concepts from MSM analysis and conduct an eigenvalue decomposition of the iVAMPnet models (cf. VAMPnets). The analysis of the eigenfunctions demonstrates that, by construction, the system exhibits one independent process along the x-axis (λ₁ = 0.90) and two along the y-axis (λ₂ = 0.89 and λ₄ = 0.66) (Fig. 3d). In contrast, we note that in the picture of global states, two additional processes would appear as a result of mixing the independent processes (cf. Supplementary Note 2), which makes the combined dynamical model more challenging to analyze, whereas the iVAMPnet analysis remains straightforward and simple.

Besides the dynamical models, our iVAMPnet yields assignments between input features and subsystems. We find that the method correctly identifies the two state system as the x-axis and the three states as the y-axis feature, respectively (Fig. 3b).

10D hypercube benchmark model

In a next step we test the iVAMPnet approach with ten 2-state subsystems, which corresponds to 1024 global states (Fig. 4a, b). As before, the dynamics is generated by ten independent hidden Markov state models with unique timescales. The system is split into five pairs of subsystems, and the two coordinates governing the transition dynamics of each pair are rotated in order to make them more difficult to separate (Fig. 4a). Additionally, we make the learning problem harder by adding ten noise dimensions such that the global system lives on a 10-dimensional hypercube embedded in a 20 dimensional space.

**Fig. 4: Hidden Markov state model with 1024 global states forming a 10D hypercube embedded in a 20D space.**

Although the subsystems are perfectly independent, we will estimate an iVAMPnet with the VAMP-E score in a pairwise fashion, thereby avoiding to estimate expensively large correlation matrices in ${{\mathbb{R}}}^{1024\times 1024}$. As this is only justified if all systems are independent, we additionally enforce Eq. (7) during training by minimizing Eq. (8) and thereby rule out that any two subsystems approximate the same process.

The iVAMPnet estimation yields subsystem models which, as common in MSM analysis, can be validated by testing whether their implied relaxation timescales are converged in the model lag time τ. We find that the implied timescales learned by the iVAMPnet are indeed converged and accurately reproduce the ground truth (Fig. 4d). We note that in addition to the timescales of the individual subsystems that are identified by the iVAMPnet, a global model would also contain all timescales that result from products of eigenvalues, resulting in a total of 1024 timescales. Thus, the iVAMPnet analysis provides a much simpler and more concise model than a global MSM or VAMPnet would.

Furthermore, the subsystem assignment mask indicates that the method correctly assigns high importance weight to two input features for each model (Fig. 4c). Therefore, the method proves its capability of decomposing a noisy, high dimensional global system into its independent sub-processes in a data efficient way.

We have generalized the 10-cube system to a variable number of subsystems (N-cube) to conduct a performance benchmark, finding that iVAMPnets outperform VAMPnets for this particular system. We however note that this result may not be generalizable to arbitrary systems as the N-cube features truely independent 2-state subsystems (compare Supplementary Note 6 for details).

Synaptotagmin-C2A

Finally, we test iVAMPnets on an all-atom protein system. In comparison to our benchmark examples, we expect the underlying global dynamics to be only approximately decomposable into independent subsystems. Our test data consists of 184 μs aggregate MD data of each 2 μs length (92 × 2 μs) of the C2A domain of synaptotagmin (Supplementary Note 7) that was described previously⁷²; synaptotagmin plays a crucial role in the regulation of neurotransmitter release⁷³. It was shown to consist of approximately uncoupled subsystems containing the calcium binding region (CBR) and the C78 loop, respectively⁶⁴.

First, we attempted to model the protein with a global model, i.e., with a single (regular) VAMPnet. Indeed, this approach failed because there were not enough simulation statistics to estimate a reversibly connected transition model between all global metastable states, resulting in diverging implied timescales (Supplementary Note 3 and Supplementary Fig. 2). This is exactly the scenario where iVAMPnets should provide an advantage, by only relying on locally rather than globally converged transition statistics.

Next, we train an iVAMPnet to seek two subsystems of twelve and six states, respectively, each at a lag time of τ = 10 ns where we enforce constraint Eq. (7) to find uncoupled subsystems.

The trained iVAMPnet identifies one subsystem comprising all three CBR loops (CBR-1, CBR-2, CBR-3; Fig. 5a). The second subsystem consists not only of the aforementioned C78 loop but also of the loop connecting beta sheets 3 and 4⁷⁴ (termed C34 henceforth). When mapping the residue positions on the protein structure it becomes obvious that the two subsystems are physically well separated (Fig. 5a), supporting the conclusion that both regions are only weakly coupled⁶⁴.

**Fig. 5: iVAMPnet of synaptotagmin-C2A with two subsystems and twelve and six states, respectively.**

The implied timescales of both systems are approximately constant in the model lag time τ. Most timescales are in the range of 1–10 μs, with the exception of one much slower process with a 100 μs relaxation time found in the first subsystem (Fig. 5b), which has not been found previously. Analysis of the structural changes governing this process reveals that it involves an orchestrated transition of all CBR loops (Fig. 5c). Such a process could however not be resolved by the previous study⁷² where the CBR was modeled as individual loops. The process of the second system involves a simultaneous movement of the C78 and C34 loops (Fig. 5c).

iVAMPnets find metastable structures in the local features that are comparable to the ones described in our previous work ⁷². Specifically, α-helices in two distinct positions and a state burying a methionine residue (Met173) can be found in the CBR1. In the adjacent CBR2 site, both tightly bound and loose configurations are identified, and the C78 site features all three previously described valine residue conformations (Val250, Val255). In addition to the features modeled in our preceding study⁷², iVAMPnets identify dynamics in a lysine rich cluster (Lys189-192) that was previously reported as important for membrane interaction⁷⁵. Please compare Supplementary Note 4 for a detailed view on the metastable states and exchange kinetics. In contrast to our previous work, the kinetic models in the local subsystems are more complex and incorporate a larger number of dynamic processes, providing a more comprehensive picture without the need to define a partitioning manually. In fact, conducting domain-decomposition and local kinetic modeling simultaneously has enabled the identification of very subtle dynamical features as long as they contribute significantly to the local VAMP-scores.

Although estimating a global VAMPnet model for synaptotagmin was not feasible given the sparse data sample, iVAMPnets use the same data efficiently and estimate a statistically valid dynamical model. This result is especially striking because the iVAMPnet approach also simplifies the subsequent task of interpreting models by separating dynamically independent protein domains.

Counterexample: folding of the villin miniprotein

Finally, we conducted an experiment on a villin protein folding trajectory of 125 μs length⁷⁶ as a negative example (Supplementary Note 7). Small proteins such as villin are typically cooperative, i.e., the slowest processes related to folding involve all residues (Supplementary Note 5). Thus, these processes cannot be resolved when decomposing the system into several subsystems. Indeed, we find that a splitting into two subsystems with two states each results in timescales that are not converged, and whose relaxation processes approximate a partial folding on disjoint areas (cf. Supplementary Fig. 6).

Testing statistical independence of the learned dynamical subsystems

As constraint Eq. (7) was used as a penalty during training (as independence score Eq. (19)), we assess the validity of an estimated subsystem assignment by evaluating the constraints that were not enforced during training (Eq. (17)) as post-training independence scores M_U, M_V, and M_UV (defined in Eq. (18)). Low values for M_U and M_V imply that the constructed left and right singular functions are indeed valid candidates for singular functions in the global state space. A small value for M_UV indicates that the kinetics in the global state space is well predicted by the Kronecker product of subsystem models. We find that the three metrics are well suited to indicate independence of the learned subsystems (Table 1). Out of the tested systems only villin cannot be split into independent parts (all scores > 0.1). In comparison, the benchmark models and synaptotagmin can be decomposed into statistically uncoupled subsystems (all scores < 0.01). The slightly increased M_R-value for synaptotagmin suggests that its subsystems might be weakly coupled.

Table 1 Post-training independence validation

Full size table

Discussion

We have proposed an unsupervised deep learning framework that, using only molecular dynamics simulation data, learns to decompose a complex molecular system into subsystems which behave as approximately independent Markov models. Thereby, iVAMPnet is an end-to-end learning framework that points a way out of the exponentially growing demand for simulation data that is required to sample increasingly large biomolecular complexes. Specifically, we have developed and demonstrated iVAMPnets for molecular dynamics, but the approach is, in principle, also applicable to different application areas, such as fluid dynamics. The specific implementation, such as the representation of the input vectors x_t and the neural network architecture of the χ-functions, depend on the application and can be adapted as needed.

We now have a hierarchy of increasingly powerful models ranging from MSMs over VAMPnets to iVAMPnets. MSMs always consist of (1) a state space decomposition and (2) a Markovian transition matrix governing the dynamics between these states. VAMPnets provide a deep learning framework for MSMs, and thereby (3) learn the collective coordinates in which the state space discretization (1) is best made. iVAMPnets additionally learn (4) a physical separation of the molecular system into subsystems, each of which has its own slow coordinates, Markov states, and transition matrix.

We have demonstrated that iVAMPNets are a powerful multiscale learning method that succeeds in finding and modeling molecular subsystems when these subsystems indeed evolve statistically independently. Additionally, iVAMPnets are capable of learning from high dimensional MD data. To prove that point, we have demonstrated that the synaptotagmin C2A domain is decomposable into two almost independent Markov state models. Importantly, we have shown that this dynamical decomposition of synaptotagmin C2A succeeds while an attempt to model the system with a global Markov state model fails due to poor sampling. This is a direct demonstration that iVAMPnets are statistically more efficient than VAMPnets, MSMs, or other global-state models and may indeed scale to much larger systems.

We note, however, that iVAMPnets do not learn how the subsystems are coupled, and are, therefore, in their current form, only applicable to molecular systems that consist of uncoupled or weakly coupled subsystems. Although most biomolecular complexes are known to be cooperative, there are examples that have been modeled very successfully using independent subsystems, such as the Hudgkin-Huxley model of voltage-gated channel proteins^77,78. For other systems, the degree of coupling is a matter of debate, for example, the C2-tandem (C2A and C2B domains) in synaptotagmins^79,80. Since isolated domains are known to conduct function by themselves in many cases, we believe that discarding couplings is a first-order modeling assumption that is suitable to identify these domains and their relevant metastable states.

Following up on ref. 63 and introducing coupling parameters that describe how the learned MSMs are coupled, is subject to ongoing research. Furthermore, the weak-coupling assumption is made for the time-scale of the investigated molecular processes and may not be generalizable to arbitrary times. E.g., the degree of coupling between domains found in an MD simulations of a folded protein state may be very different in its unfolded state, which will be eventually encountered for a long enough simulation time.

Besides the usual hyperparameter choices in deep learning approaches, iVAMPnets require the specification of the number of sought subsystems. This choice can be guided by training an iVAMPnet for different numbers of subsystems and then interrogating the independence scores (Eqs. (19) and (18)) to choose a decomposition where statistical independence is optimal. We suggest to start with decomposing the system into two subsystems as a starting point, and to increase this number subsequently. Non-optimal choices may, e.g., reflect in non-converged implied timescales (possibly an incarnation of the sampling problem that may be mitigated by increasing the number of subsystems) or high independence scores (not possible to split the system because too many or non-optimal number of subsystems were chosen). Furthermore, the choice of the number of subsystems can be guided by the number of structural domains in a protein (complex) or by using the network-based approach presented in ref. 64. Furthermore, the number of states in each subsystems needs to balance (a) the quality of the singular function approximation (higher for few states) and (b) model resolution (higher for more states). Ultimately, different choices may yield converged validation measures, and the number of states may be chosen to yield the desired model resolution in this case.

iVAMPnets can be improved and further developed in multiple ways, e.g., by employing more advanced network architectures, e.g., graph neural networks, where parameters could be shared across subsystems. This might result in higher quality models and a greater robustness against the hyperparameter choice. Very recently, graph neural networks were indeed successfully combined with VAMPnets, showing that the resulting method (GraphVAMPnets) is applicable to MD data and that the estimated models are high quality⁸¹.

In summary, iVAMPnets pave a possible path for modeling the kinetics of large biological systems in a data-efficient and interpretable manner.

Methods

VAMPnets

Since an iVAMPnet implements multiple parallel VAMPnets representing the kinetics of separate independent subsystems, we will introduce VAMPnets first⁵⁶. VAMPnets are multilayer perceptrons that represent feature functions χ (we omit the upper subsystem index i for the sake of clearness here). Their last layer is often chosen to be a SoftMax function, i.e., summing over all non-negative outputs yields a 1. Therefore, the output of a VAMPnet can be interpreted as a fuzzy assignment to a metastable state. Taking the linear combination of states with equal weights results in the constant singular function with the singular value 1, which will be reflected by the singular values of the Koopman matrix (Eq. (10) with the normalized correlation matrix). Given the feature functions χ, we can compute the following correlation matrices:

$${{{{{{{{\bf{C}}}}}}}}}_{00} =\frac{1}{L}\mathop{\sum}\limits_{t}{{{{{{{\boldsymbol{\chi }}}}}}}}({{{{{{{{\bf{x}}}}}}}}}_{t}){{{{{{{\boldsymbol{\chi }}}}}}}}{({{{{{{{{\bf{x}}}}}}}}}_{t})}^{T}\\ {{{{{{{{\bf{C}}}}}}}}}_{0\tau } =\frac{1}{L}\mathop{\sum}\limits_{t}{{{{{{{\boldsymbol{\chi }}}}}}}}({{{{{{{{\bf{x}}}}}}}}}_{t}){{{{{{{\boldsymbol{\chi }}}}}}}}{({{{{{{{{\bf{x}}}}}}}}}_{t+\tau })}^{T}\\ {{{{{{{{\bf{C}}}}}}}}}_{\tau \tau } =\frac{1}{L}\mathop{\sum}\limits_{t}{{{{{{{\boldsymbol{\chi }}}}}}}}({{{{{{{{\bf{x}}}}}}}}}_{t+\tau }){{{{{{{\boldsymbol{\chi }}}}}}}}{({{{{{{{{\bf{x}}}}}}}}}_{t+\tau })}^{T},$$

(9)

where L is the number of collected data pairs in the simulations.

Training VAMPnets or iVAMPnets involves the computation of covariance matrices over minibatches. We, therefore, need to choose the batchsize to balance large estimator variance obtained for small batches and high memory requirements for large batches. Instead of using the trivial covariance estimator (Eq. (9)) which is asymptotically unbiased⁵⁵ but has a high-variance, one can employ a shrinkage estimator^82,83 which reduces the overall estimator error by trading larger bias for lower variance. For the current study, we assume that our benchmark and MD data has been sufficiently sampled to yield adequate approximations with the estimator given in Eq. (9).

The approximation of the singular functions and values can be estimated via the singular value decomposition (SVD) of the following matrix $\bar{{{{{{{{\bf{K}}}}}}}}}$:

$$\bar{{{{{{{{\bf{K}}}}}}}}}={{{{{{{{\bf{C}}}}}}}}}_{00}^{-1/2}{{{{{{{{\bf{C}}}}}}}}}_{0\tau }{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{-1/2}={{{{{{{\bf{A}}}}}}}}{{{{{{{\bf{K}}}}}}}}{{{{{{{{\bf{B}}}}}}}}}^{T}$$

(10)

K is the diagonal matrix of approximated singular values corresponding to the left and right singular functions:

$${{{{{{{{\bf{f}}}}}}}}}^{T}({{{{{{{{\bf{x}}}}}}}}}_{t})={{{{{{{\boldsymbol{\chi }}}}}}}}{({{{{{{{{\bf{x}}}}}}}}}_{t})}^{T}{{{{{{{\bf{U}}}}}}}}={{{{{{{\boldsymbol{\chi }}}}}}}}{({{{{{{{{\bf{x}}}}}}}}}_{t})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{00}^{-1/2}{{{{{{{\bf{A}}}}}}}}$$

(11)

$${{{{{{{{\bf{g}}}}}}}}}^{T}({{{{{{{{\bf{x}}}}}}}}}_{t+\tau })={{{{{{{\boldsymbol{\chi }}}}}}}}{({{{{{{{{\bf{x}}}}}}}}}_{t+\tau })}^{T}{{{{{{{\bf{V}}}}}}}}={{{{{{{\boldsymbol{\chi }}}}}}}}{({{{{{{{{\bf{x}}}}}}}}}_{t+\tau })}^{T}{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{-1/2}{{{{{{{\bf{B}}}}}}}}.$$

(12)

The matrices U and V construct the left and right singular functions from the individual state assignments. The optimal state assignments can be found by maximizing the VAMP-E score:

$${{{{{{{{\mathcal{R}}}}}}}}}_{E}={{{{{{{\rm{tr}}}}}}}}[2{{{{{{{\bf{K}}}}}}}}{{{{{{{{\bf{U}}}}}}}}}^{T}{{{{{{{{\bf{C}}}}}}}}}_{0\tau }{{{{{{{\bf{V}}}}}}}}-{{{{{{{\bf{K}}}}}}}}{{{{{{{{\bf{U}}}}}}}}}^{T}{{{{{{{{\bf{C}}}}}}}}}_{00}{{{{{{{\bf{U}}}}}}}}{{{{{{{\bf{K}}}}}}}}{{{{{{{{\bf{V}}}}}}}}}^{T}{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }{{{{{{{\bf{V}}}}}}}}].$$

(13)

Given trained state assignments χ(x_t) and correlation matrices Eq. (9), the Koopman matrix T can then be evaluated as:

$${{{{{{{\bf{T}}}}}}}}={{{{{{{{\bf{C}}}}}}}}}_{00}^{-1}{{{{{{{{\bf{C}}}}}}}}}_{0\tau }.$$

(14)

Furthermore, we can estimate the eigenfunction φ and timescales t_i by its eigendecomposition T = QΛQ⁻¹:

$${{{{{{{\boldsymbol{\varphi }}}}}}}}({{{{{{{\bf{x}}}}}}}})={{{{{{{{\bf{Q}}}}}}}}}^{T}{{{{{{{\boldsymbol{\chi }}}}}}}}({{{{{{{\bf{x}}}}}}}}),$$

(15)

$${t}_{i}=\frac{-\tau }{\log (|{{{\Lambda }}}_{ii}|)}.$$

(16)

Please note that this operation is only possible if the eigendecomposition is (approximately) real-valued, a condition that is met for the presented application cases.

Conditions for independent systems

For Markov independent systems, the singular values and functions that are constructed by the Kronecker product match the true global ones,

$${({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{00}^{G}{\hat{{{{{{{{\bf{U}}}}}}}}}}^{G} ={{{{{{{\bf{1}}}}}}}}\\ {({\hat{{{{{{{{\bf{V}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{G}{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G} ={{{{{{{\bf{1}}}}}}}}\\ {({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{0\tau }^{G}{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G} ={\hat{{{{{{{{\bf{K}}}}}}}}}}^{G},$$

(17)

where the first two equations guarantee the orthonormality of the constructed singular functions. The latter verifies that the left and right singular functions correlate as predicted by the Kronecker product of the singular values. These conditions can be translated to the following scores:

$${M}_{U} =|{({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{00}^{G}{\hat{{{{{{{{\bf{U}}}}}}}}}}^{G}-{{{{{{{\bf{1}}}}}}}}|\\ {M}_{V} =|{({\hat{{{{{{{{\bf{V}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{G}{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}-{{{{{{{\bf{1}}}}}}}}|\\ {M}_{UV} =|{({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G})}^{T}{{{{{{{{\bf{C}}}}}}}}}_{0\tau }^{G}{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}-{\hat{{{{{{{{\bf{K}}}}}}}}}}^{G}|$$

(18)

Furthermore, using the identities Eq. (17) and the definition of the VAMP-E score Eq. (13) yields

$${M}_{R}=\frac{|{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{G}-{\prod }_{i}{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{i}|}{{{{{{{{{\mathcal{R}}}}}}}}}_{E}^{G}}.$$

(19)

The norms denote simple means. The last score, M_R, is enforced during training in a pairwise fashion (cf. Eq. (8)).

Network architecture

Given a global system, which we want to decompose into N subsystems, and a time series of input features ${\{{{{{{{{{\bf{x}}}}}}}}}_{t}\}}_{t=1,...T}$, ${{{{{{{{\bf{x}}}}}}}}}_{t}\in {{\mathbb{R}}}^{D\times 1}$, we pass the features through a mask ${{{{{{{\bf{G}}}}}}}}\in {{\mathbb{R}}}^{D\times N}$, which weights each input differently for each subsystem, before the result are transformed individually by the N independent state assignment functions ηⁱ. It should be mentioned that the mask is merely introduced for interpretability reasons and is not essential to find independent subsystems. If the mask was omitted, the extraction of the relevant features would simply be transferred to the downstream neural networks, remaining hidden to the practitioner.

The weighted input is assessed by an element wise multiplication ${\bar{{{{{{{{\bf{Y}}}}}}}}}}_{t}={{{{{{{\bf{G}}}}}}}}\odot {{{{{{{{\bf{x}}}}}}}}}_{t}$. In order to prevent the neural networks to reverse the weighting of the mask in its consecutive layers, we draw for each input feature i and subsystem j an independent, normally distributed random variable ${\epsilon }_{ij} \sim {{{{{{{\mathcal{N}}}}}}}}(0,\sigma (1-{G}_{ij}))$. This noise is added to the weighted features:

$${{{{{{{{\bf{Y}}}}}}}}}_{t}={\bar{{{{{{{{\bf{Y}}}}}}}}}}_{t}+{{{{{{{\boldsymbol{\epsilon }}}}}}}}.$$

(20)

Thereby, the attention weight linearly interpolates between input feature and Gaussian noise, i.e., if the attention weight G_ij = 1, Y_ij carries exclusively the input feature x_i, if G_ij = 0, Y_ij is simple Gaussian noise. By tuning the noise scaling σ, a harder assignment by G can be enforced. This hyperparameter should be optimized by adjusting it so that the resulting mask yields clear subsystem assignments without being binary. Subsequently, the transformed feature vector is split for each individual subsystem ${{{{{{{{\bf{Y}}}}}}}}}_{t}=[{{{{{{{{\bf{Y}}}}}}}}}_{t}^{1},...,{{{{{{{{\bf{Y}}}}}}}}}_{t}^{N}]$ and passed through the subsystem specific neural network ηⁱ resulting in feature transformations ${{{{{{{\boldsymbol{{\chi }}}}}}}^{i}}}({{{{{{{{\bf{x}}}}}}}}}_{t})={{{{{{{{\boldsymbol{\eta }}}}}}}}}^{i}({{{{{{{{\bf{Y}}}}}}}}}_{t}^{i})$. These features are then used to estimate the Koopman models.

The training framework and neural network architecture were implemented in the Python 3 programming language using numpy⁸⁴ and pyTorch⁶⁹; benchmark system data was generated using DeepTime⁷⁰; data visualization was performed using matplotlib⁸⁵ and VMD²⁴.

Constructing the mask

To train an interpretable mask, we use the following three premises:

1.
A single subsystem should not focus on all input features.
2.
Different subsystems compete for high weights for the same feature.
3.
All weights should be in the range [0, 1] and the matrix should be sparse.

Therefore, the mask is constructed by trainable weights ${{{{{{{\bf{g}}}}}}}}\in {{\mathbb{R}}}^{D\times N}$ which are first processed by a softmax function which normalizes along the input feature axis ${{{{{{{{\bf{g}}}}}}}}}_{1}={{{{{{{\rm{softmax}}}}}}}}({{{{{{{\bf{g}}}}}}}},\dim=0)$. Thereby, if a subsystem focuses on one part of the features, a lower weight for the other parts is expected following the first premise.

In a next step, weights which are lower than a threshold θ are clipped to zero g₂ = relu(g₁ − θ) to guarantee sparsity. The threshold θ is a hyperparameter that can be optimized by starting with comparably small values (i.e., very little cutoff) and subsequently increasing it without further training—a reasonable cutoff does not alter the results in this case, as the downstream neural networks still obtain all relevant information.

Since input features could be negligible for all subsystems, a dummy system is added which has a constant value ${{{{{{{\bf{c}}}}}}}}\in {{\mathbb{R}}}^{D\times 1}$ for all features g₃ = [g₂, c]. Consequently, the weights of all subsystems and the dummy system are normed for each feature ${{{{{{{{\bf{g}}}}}}}}}_{4}={{{{{{{{\bf{g}}}}}}}}}_{3}/{{{{{{{\rm{sum}}}}}}}}({{{{{{{{\bf{g}}}}}}}}}_{3},\dim=1)$, which together with the clipping fulfills the premises two and three.

Finally, the mask is given by truncating the dummy system ${{{{{{{{\bf{g}}}}}}}}}_{4}=[{{{{{{{\bf{G}}}}}}}},\bar{{{{{{{{\bf{c}}}}}}}}}]$. Beware that only g₄ is normalized along the system axis.

Application to protein dynamics

Since for proteins the final model is often expected to be invariant with respect to rotations and translations, internal coordinates are employed as input features. For Markov state modeling, the minimal heavy atom distance d_ij between residues i, j has been proven to be a good descriptor^56,86. However, for interpretability, mask weights for each residue are preferable. Therefore, the mask is of size ${{{{{{{\bf{G}}}}}}}}\in {{\mathbb{R}}}^{R\times N}$ with the number of residues R. The input features are then scaled as ${x}_{ij}={G}_{i}{G}_{j}\exp (-{d}_{ij})$.

Furthermore, a smoothing routine is implemented such that neighboring residues along the chain have similar importance weights. W windows of size B are placed along the chain with step size s. Each window has a trainable weight ${{{{{{{\bf{g}}}}}}}}\in {{\mathbb{R}}}^{W\times N}$. Consequently, the softmax function is taken along the window axis $\bar{{{{{{{{\bf{g}}}}}}}}}={{{{{{{\rm{softmax}}}}}}}}({{{{{{{\bf{g}}}}}}}},\dim=0)$. However, before applying the clipping as before the weight for each residue ${{{{{{{{\bf{g}}}}}}}}}_{1}{{\mathbb{\in \; R}}}^{R\times N}$ is calculated as the product of all window weights the residue is part of (Fig. 6).

**Fig. 6: Attention scheme for amino acid chain.**

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The benchmark data can be generated from the Jupyter notebooks that have been deposited on GitHub under https://github.com/markovmodel/ivampnets⁸⁷. The molecular dynamics data set of synaptotagmin C2A have been deposited in Zenodo under https://zenodo.org/record/6908073⁸⁸. The crystal structure of synaptotagmin C2A is available under PDB ID 2R83 [https://doi.org/10.2210/pdb2R83/pdb]. The villin headpiece folding data are available under restricted access and were used under license for this study as courtesy of D.E. SHAW research⁷⁶, access can be obtained from the authors upon request.

Code availability

The code that implements the presented models and reproduces the presented results has been deposited on GitHub under https://github.com/markovmodel/ivampnets⁸⁷.

References

Phillips, J. C. et al. Scalable molecular dynamics on cpu and gpu architectures with namd. J. Chem. Phys. 153, 044130 (2020).
Article CAS PubMed PubMed Central Google Scholar
Vant, J. W. et al. Protein Structure Prediction 301–315 (Springer, 2020).
Buch, I., Harvey, M. J., Giorgino, T., Anderson, D. P. & De Fabritiis, G. High-throughput all-atom molecular dynamics simulations using distributed computing. J. Chem. Inform. Modeling 50, 397–403 (2010).
Article CAS Google Scholar
Eastman, P. et al. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 13, e1005659 (2017).
Article PubMed PubMed Central Google Scholar
Salomon-Ferrer, R., Gotz, A. W., Poole, D., Le Grand, S. & Walker, R. C. Routine microsecond molecular dynamics simulations with amber on gpus. 2. explicit solvent particle mesh Ewald. J. Chem. Theory Comput. 9, 3878–3888 (2013).
Article CAS PubMed Google Scholar
Abraham, M. J. et al. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015).
Article ADS Google Scholar
Bussi, G., Laio, A. & Tiwary, P. Metadynamics: A Unified Framework for Accelerating Rare Events and Sampling Thermodynamics and Kinetics. In Handbook of Materials Modeling (eds Andreoni, W. & Yip, S.) 565–595 (Springer International Publishing, 2020).
Tsai, S.-T., Smith, Z. & Tiwary, P. SGOOP-d: Estimating kinetic distances and reaction coordinate dimensionality for rare event systems from biased/unbiased simulations. J. Chem. Theory Comput. 17, 6757–6765 (2021).
Article CAS PubMed Google Scholar
Liu, C., Brini, E., Perez, A. & Dill, K. A. Computing ligands bound to proteins using meld-accelerated md. J. Chem. Theory Comput. 16, 6377–6382 (2020).
Article CAS PubMed PubMed Central Google Scholar
MacCallum, J. L., Perez, A. & Dill, K. A. Determining protein structures by combining semireliable data with atomistic physical models by Bayesian inference. Proc. Natl. Acad. Sci. USA 112, 6985–6990 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Perez, A., MacCallum, J. L. & Dill, K. A. Accelerating molecular simulations of proteins using Bayesian inference on weak information. Proc. Natl Acad. Sci. USA 112, 11846–11851 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Ge, Y. & Voelz, V. A. Estimation of binding rates and affinities from multiensemble Markov models and ligand decoupling. J. Chem. Phys. 156, 134115 (2022).
Article ADS CAS PubMed Google Scholar
Ribeiro, J. M. L., Bravo, P., Wang, Y. & Tiwary, P. Reweighted autoencoded variational bayes for enhanced sampling (rave). J. Chem. Phys. 149, 072301 (2018).
Article ADS PubMed Google Scholar
Schütte, C., Fischer, A., Huisinga, W. & Deuflhard, P. A direct approach to conformational dynamics based on hybrid Monte Carlo. J. Comput. Phys. 151, 146–168 (1999).
Article ADS MathSciNet MATH Google Scholar
Prinz, J.-H. et al. Markov models of molecular kinetics: Generation and validation. J. Chem. Phys. 134, 174105 (2011).
Article ADS PubMed Google Scholar
Swope, W. C., Pitera, J. W. & Suits, F. Describing protein folding kinetics by molecular dynamics simulations: 1. Theory. J. Phys. Chem. B 108, 6571–6581 (2004).
Article CAS Google Scholar
Noé, F., Horenko, I., Schütte, C. & Smith, J. C. Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states. J. Chem. Phys. 126, 155102 (2007).
Article ADS PubMed Google Scholar
Chodera, J. D., Singhal, N., Pande, V. S., Dill, K. A. & Swope, W. C. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. J. Chem. Phys. 126, 155101 (2007).
Article ADS PubMed Google Scholar
Buchete, N. V. & Hummer, G. Coarse master equations for peptide folding dynamics. J. Phys. Chem. B 112, 6057–6069 (2008).
Article CAS PubMed Google Scholar
Wan, H. & Voelz, V. A. Adaptive Markov state model estimation using short reseeding trajectories. J. Chem. Physi. 152, 024103 (2020).
Article CAS Google Scholar
Scherer, M. K. et al. PyEMMA 2: A software package for estimation, validation, and analysis of Markov models. J. Chem. Theory Comput. 11, 5525–5542 (2015).
Article CAS PubMed Google Scholar
Harrigan, M. P. et al. Msmbuilder: Statistical models for biomolecular dynamics. Biophys J. 112, 10–15 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
McGibbon, R. T. et al. Mdtraj: A modern open library for the analysis of molecular dynamics trajectories. Biophys J. 109, 1528–1532 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Humphrey, W., Dalke, A. & Schulten, K. Vmd - visual molecular dynamics. J. Molec. Graphics 14, 33–38 (1996).
Article CAS Google Scholar
Perez-Hernandez, G., Paul, F., Giorgino, T., D Fabritiis, G. & Noé, F. Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 139, 015102 (2013).
Article ADS PubMed Google Scholar
Ziehe, A. & Müller, K.-R. TDSEP—an efficient algorithm for blind separation using time structure. In ICANN 98, 675–680 (Springer Science and Business Media, 1998).
Mezić, I. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynam. 41, 309–325 (2005).
Article MathSciNet MATH Google Scholar
Schmid, P. J. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010).
Article ADS MathSciNet CAS MATH Google Scholar
Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L. & Kutz, J. N. On dynamic mode decomposition: Theory and applications. J. Comput. Dyn. 1, 391–421 (2014).
Article MathSciNet MATH Google Scholar
Noé, F. & Clementi, C. Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods. Curr. Opin. Struc. Biol. 43, 141–147 (2017).
Article Google Scholar
Klus, S. et al. Data-driven model reduction and transfer operator approximation. J. Nonlinear Sci. 28, 985–1010 (2018).
Article ADS MathSciNet MATH Google Scholar
Bowman, G. R., Pande, V. S. & Noé, F. An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation (Springer, 2014).
Husic, B. E. & Pande, V. S. Ward clustering improves cross-validated Markov state models of protein folding. J. Chem. Theo. Comp. 13, 963–967 (2017).
Article CAS Google Scholar
Sheong, F. K., Silva, D.-A., Meng, L., Zhao, Y. & Huang, X. Automatic state partitioning for multibody systems (APM): An efficient algorithm for constructing Markov state models to elucidate conformational dynamics of multibody systems. J. Chem. Theory Comput. 11, 17–27 (2015).
Article CAS PubMed Google Scholar
Weber, M., Fackeldey, K. & Schütte, C. Set-free Markov state model building. J. Chem. Phys. 146, 124133 (2017).
Article ADS PubMed Google Scholar
Bowman, G. R., Beauchamp, K. A., Boxer, G. & Pande, V. S. Progress and challenges in the automated construction of Markov state models for full protein systems. J. Chem. Phys. 131, 124101 (2009).
Article ADS PubMed PubMed Central Google Scholar
Trendelkamp-Schroer, B., Wu, H., Paul, F. & Noé, F. Estimation and uncertainty of reversible Markov models. J. Chem. Phys. 143, 174101 (2015).
Article ADS PubMed Google Scholar
Kube, S. & Weber, M. A coarse graining method for the identification of transition rates between molecular conformations. J. Chem. Phys. 126, 024103 (2007).
Article ADS PubMed Google Scholar
Yao, Y. et al. Hierarchical nyström methods for constructing Markov state models for conformational dynamics. J. Chem. Phys. 138, 174106 (2013).
Article ADS PubMed Google Scholar
Fackeldey, K. & Weber, M. Genpcca – Markov state models for non-equilibrium steady states. WIAS Report 29, 70–80 (2017).
Google Scholar
Gerber, S. & Horenko, I. Toward a direct and scalable identification of reduced models for categorical processes. Proc. Natl. Acad. Sci. USA 114, 4863–4868 (2017).
Article ADS MathSciNet CAS MATH PubMed PubMed Central Google Scholar
Hummer, G. & Szabo, A. Optimal dimensionality reduction of multistate kinetic and Markov-state models. J. Phys. Chem. B 119, 9029–9037 (2015).
Article CAS PubMed Google Scholar
Orioli, S. & Faccioli, P. Dimensional reduction of Markov state models from renormalization group theory. J. Chem. Phys. 145, 124120 (2016).
Article ADS CAS PubMed Google Scholar
Noé, F., Wu, H., Prinz, J.-H. & Plattner, N. Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. J. Chem. Phys. 139, 184114 (2013).
Article ADS PubMed Google Scholar
Sengupta, U., Carballo-Pacheco, Martín & Strodel, B. Automated Markov state models for molecular dynamics simulations of aggregation and self-assembly. J. Chem. Phys. 150, 115101 (2019).
Article ADS PubMed Google Scholar
Carballo-Pacheco, M. & Strodel, B. Advances in the simulation of protein aggregation at the atomistic scale. J. Phys. Chem. B 120, 2991–2999 (2016).
Article CAS PubMed Google Scholar
Qiao, Q., Bowman, G. R. & Huang, X. Dynamics of an intrinsically disordered protein reveal metastable conformations that potentially seed aggregation. J. Am. Chem. Soc. 135, 16092–16101 (2013).
Article CAS PubMed Google Scholar
Silva, D.-A., Bowman, G. R., Sosa-Peinado, A. & Huang, X. A role for both conformational selection and induced fit in ligand binding by the LAO protein. PLoS Comput. Biol. 7, e1002054 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sengupta, U. & Strodel, B. Markov models for the elucidation of allosteric regulation. Philos. Trans. R. Soc. B: Biol. Sci. 373, 20170178 (2018).
Article Google Scholar
Plattner, N. & Noé, F. Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models. Nat. Commun. 6, 7653 (2015).
Article ADS PubMed Google Scholar
Baiz, C. R. et al. A molecular interpretation of 2D IR protein folding experiments with Markov state models. Biophys. J. 106, 1359–1370 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Olsson, S., Wu, H., Paul, F., Clementi, C. & Noé, F. Combining experimental and simulation data of molecular processes via augmented Markov models. Proc. Natl Acad. Sci. USA 114, 8265–8270 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Noé, F. & Nüske, F. A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Model. Simul. 11, 635–655 (2013).
Article MathSciNet MATH Google Scholar
McGibbon, R. T. & Pande, V. S. Variational cross-validation of slow dynamical modes in molecular kinetics. J. Chem. Phys. 142, 124105 (2015).
Article ADS PubMed PubMed Central Google Scholar
Wu, H. & Noé, F. Variational approach for learning Markov processes from time series data. J Nonlinear Sci. 30, 23–66 (2020).
Article ADS MathSciNet MATH Google Scholar
Mardt, A., Pasquali, L., Wu, H. & Noé, F. Vampnets: Deep learning of molecular kinetics. Nat. Commun. 9, 5 (2018).
Article ADS PubMed PubMed Central Google Scholar
Chen, W., Sidky, H. & Ferguson, A. L. Nonlinear discovery of slow molecular modes using state-free reversible vampnets. J. Chem. Phys. 150, 214114 (2019).
Article ADS PubMed Google Scholar
Bonati, L., Piccini, G. & Parrinello, M. Deep learning the slow modes for rare events sampling. Proc. Natl Acad. Sci. USA 118, e2113533118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mardt, A., Pasquali, L., Noé, F. & Wu, H. Deep learning Markov and Koopman models with physical constraints. In Mathematical and Scientific Machine Learning 451–475 (PMLR, 2020).
Wu, H., Mardt, A., Pasquali, L., & Noe, F. Deep generative Markov state models. In Advances in Neural Information Processing Systems 3975–3984 (2018).
Mardt, A. & Noé, F. Progress in deep Markov state modeling: Coarse graining and experimental data restraints. J. Chem. Phys. 155, 214106 (2021).
Article ADS CAS PubMed Google Scholar
Konovalov, K. A., Unarta, I. C., Cao, S., Goonetilleke, E. C. & Huang, X. Markov state models to study the functional dynamics of proteins in the wake of machine learning. JACS Au 1, 1330–1341 (2021).
Article CAS PubMed PubMed Central Google Scholar
Olsson, S. & Noé, F. Dynamic graphical models of molecular kinetics. Proc. Natl Acad. Sci. 116, 15001–15006 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Hempel, T. et al. Independent Markov decomposition: Toward modeling kinetics of biomolecular complexes. Proc. Natl Acad. Sci. USA 118, e2105230118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Koopman, B. O. Hamiltonian systems and transformations in Hilbert space. Proc. Natl. Acad. Sci. USA 17, 315–318 (1931).
Article ADS CAS MATH PubMed PubMed Central Google Scholar
Wehmeyer, C. et al. Introduction to Markov state modeling with the PyEMMA software [Article v1.0]. LiveCoMS 1, 5965 (2018).
Google Scholar
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet – A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Article ADS PubMed Google Scholar
Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research (Meila, M. & Zhang, T.) 9377–9388 (PMLR, 2021).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 8026–8037 (2019).
Hoffmann, M. et al. Deeptime: A Python library for machine learning dynamical models from time series data. Mach. Learn.: Sci. Technol. 3, 015009 (2022).
ADS Google Scholar
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).
Article Google Scholar
Hempel, T., Plattner, N. & Noé, F. Coupling of conformational switches in calcium sensor unraveled with local Markov models and transfer entropy. J. Chem. Theory Comput. 16, 2584–2593 (2020).
Article CAS PubMed Google Scholar
Südhof, T. C. Neurotransmitter release: The last millisecond in the life of a synaptic vesicle. Neuron 80, 675–690 (2013).
Article PubMed Google Scholar
Jiménez, J. L. et al. Functional recycling of C2 domains throughout evolution: A comparative study of synaptotagmin, protein kinase C and phospholipase C by sequence, structural and modelling approaches. J. Mol. Biol. 333, 621–639 (2003).
Article PubMed Google Scholar
Guillén, J. et al. Structural insights into the Ca2+ and PI(4,5)P2 binding modes of the C2 domains of rabphilin 3A and synaptotagmin 1. Proc. Natl Acad. Sci. USA 110, 20503–20508 (2013).
Article ADS PubMed PubMed Central Google Scholar
Lindorff-Larsen, K., Piana, S., Dror, R. O. & Shaw, D. E. How fast-folding proteins fold. Science 334, 517–520 (2011).
Article ADS CAS PubMed Google Scholar
Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952).
Article CAS PubMed PubMed Central Google Scholar
Rudy, Y. & Silva, J. R. Computational biology in the study of cardiac ion channels and cell electrophysiology. Q. Rev. Biophys. 39, 57–116 (2006).
Article CAS PubMed PubMed Central Google Scholar
Bykhovskaia, M. Calcium binding promotes conformational flexibility of the neuronal Ca2+ sensor synaptotagmin. Biophys. J. 108, 2507–2520 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Tran, H. T., Anderson, L. H. & Knight, J. D. Membrane-binding cooperativity and coinsertion by C2AB tandem domains of synaptotagmins 1 and 7. Biophys. J. 116, 1025–1036 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Ghorbani, M., Prasad, S., Klauda, J. B. & Brooks, B. R. GraphVAMPNet, using graph neural networks and variational approach to Markov processes for dynamical modeling of biomolecules. J. Chem. Phys. 156, 184103 (2022).
Article CAS PubMed Google Scholar
Ledoit, O. & Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88, 365–411 (2004).
Article MathSciNet MATH Google Scholar
Chen, Y., Wiesel, A., Eldar, Y. C. & Hero, A. O. Shrinkage algorithms for MMSE covariance estimation. IEEE Trans. Signal Process. 58, 5016–5029 (2010).
Article ADS MathSciNet MATH Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Article Google Scholar
Scherer, M. K. et al. Variational selection of features for molecular kinetics. J. Chem. Phys. 150, 194108 (2019).
Article ADS PubMed Google Scholar
Mardt, A., Hempel, T., Clementi, C. & Noé, F. Deep learning to decompose macromolecules into independent Markovian domains. Zenodo, https://github.com/markovmodel/ivampnets, https://doi.org/10.5281/ZENODO.7215890 (2022).
Hempel, T., Plattner, N. & Noe, F. Molecular dynamics dataset of Synaptotagmin-1. Zenodo, https://doi.org/10.5281/ZENODO.6908073 (2022).
Wolfram Research, Inc. Mathematica, Version 11.2.0, https://www.wolfram.com/mathematica (2017).
Hagberg, A. A., Schult, D. A., & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference 11–15 Pasadena, CA, USA (2008).

Download references

Acknowledgements

We acknowledge financial support from Deutsche Forschungs-gemeinschaft DFG (SFB/TRR 186, project A12 to T.H., F.N., C.C.; SFB 958, project A04 to A.M., F.N.; SFB 1114, projects C03 to F.N., A04 to F.N., C.C., B03 to C.C.; SFB 1078, project C7 to C.C.; and RTG 2433 to F.N., C.C.), the European Commission (ERC CoG 772230 “ScaleCell” to F.N.), the Berlin Mathematics center MATH+ (AA1-6 and AA1-10 to F.N., C.C.), the BMBF (Research center BIFOLD to F.N.), the National Science Foundation (CHE-1900374, and PHY-2019745 to C.C.), the Welch Foundation (C-1570 to C.C.), and the Einstein Foundation Berlin (project 0420815101 to C.C.). We further thank Manuel Dibak and Moritz Hoffmann (FU Berlin) for fruitful discussions.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: Andreas Mardt, Tim Hempel.

Authors and Affiliations

Freie Universität Berlin, Department of Mathematics and Computer Science, Berlin, Germany
Andreas Mardt, Tim Hempel & Frank Noé
Freie Universität Berlin, Department of Physics, Berlin, Germany
Tim Hempel, Cecilia Clementi & Frank Noé
Rice University, Department of Chemistry, Houston, TX, USA
Cecilia Clementi & Frank Noé
Rice University, Center for Theoretical Biological Physics, Houston, TX, USA
Cecilia Clementi
Microsoft Research AI4Science, Berlin, Germany
Frank Noé

Authors

Andreas Mardt
View author publications
You can also search for this author in PubMed Google Scholar
Tim Hempel
View author publications
You can also search for this author in PubMed Google Scholar
Cecilia Clementi
View author publications
You can also search for this author in PubMed Google Scholar
Frank Noé
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.M. and T.H. performed research (A.M. derived loss functions and implemented deep learning framework; T.H. designed method and developed test systems); A.M. and T.H. analyzed data; A.M., T.H., C.C., and F.N. designed research; A.M., T.H., C.C., and F.N. wrote the paper.

Corresponding author

Correspondence to Frank Noé.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Birgit Strodel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mardt, A., Hempel, T., Clementi, C. et al. Deep learning to decompose macromolecules into independent Markovian domains. Nat Commun 13, 7101 (2022). https://doi.org/10.1038/s41467-022-34603-z

Download citation

Received: 16 May 2022
Accepted: 27 October 2022
Published: 19 November 2022
DOI: https://doi.org/10.1038/s41467-022-34603-z

This article is cited by

Machine learning heralding a new development phase in molecular dynamics simulations
- Eva Prašnikar
- Martin Ljubič
- Jure Borišek
Artificial Intelligence Review (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model

Machine learning/molecular dynamic protein structure prediction approach to investigate the protein conformational ensemble

Accurate Prediction of Protein Structural Flexibility by Deep Learning Integrating Intricate Atomic Structures and Cryo-EM Density Information

Introduction

Results

Markov state models and Koopman models

iVAMPnets and iVAMP-score

Benchmark model with two independent subsystems

10D hypercube benchmark model

Synaptotagmin-C2A

Counterexample: folding of the villin miniprotein

Testing statistical independence of the learned dynamical subsystems

Discussion

Methods

VAMPnets

Conditions for independent systems

Network architecture

Constructing the mask

Application to protein dynamics

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary information

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Machine learning heralding a new development phase in molecular dynamics simulations

Comments

Search

Quick links