Abstract
The increasing interest in modeling the dynamics of ever larger proteins has revealed a fundamental problem with models that describe the molecular system as being in a global configuration state. This notion limits our ability to gather sufficient statistics of state probabilities or statetostate transitions because for large molecular systems the number of metastable states grows exponentially with size. In this manuscript, we approach this challenge by introducing a method that combines our recent progress on independent Markov decomposition (IMD) with VAMPnets, a deep learning approach to Markov modeling. We establish a training objective that quantifies how well a given decomposition of the molecular system into independent subdomains with Markovian dynamics approximates the overall dynamics. By constructing an endtoend learning framework, the decomposition into such subdomains and their individual Markov state models are simultaneously learned, providing a dataefficient and easily interpretable summary of the complex system dynamics. While learning the dynamical coupling between Markovian subdomains is still an open issue, the present results are a significant step towards learning Ising models of large molecular complexes from simulation data.
Introduction
The understanding of protein function is often interlinked with understanding protein dynamics. Molecular dynamics (MD) simulations are a valuable tool to study these dynamics on an atomistic level^{1,2,3,4,5,6}. However, further methods are necessary to extract the statistically relevant information and to help overcome the discrepancy between feasible simulation length and the timescales of relevant processes. A common approach to enhance sampling of a specific process of interest is to bias the simulation along a reaction coordinate aligning with the process^{7,8,9,10,11,12,13}. In comparison, the Markov modeling approach^{14,15,16,17,18,19,20} extracts kinetic information and tackles the sampling problem without requiring the definition of few predefined reaction coordinates by combining arbitrary numbers of short unbiased distributed simulations to model the longtimescale behavior of target systems. Consequently, multiple software packages^{21,22} have been developed over the last decade providing assistance in estimating these models. They often include a pipeline for feature selection^{21,22,23,24}, dimension reduction^{25,26,27,28,29,30,31}, clustering^{32,33,34,35}, transition matrix estimation^{15,19,36,37}, and coarse graining^{38,39,40,41,42,43,44}. Markov state models (MSMs) have been applied to a wide range of molecular biology problems such as protein aggregation^{45,46,47} or ligand binding^{48,49,50} and can be a valuable tool to understand experimental data on the atomistic scale^{51,52}.
The necessity to assess a model’s performance and thereby rank its quality encouraged the development of variational methods^{53,54}, in particular the variational approach for Markov processes (VAMP)^{55}. This variational formulation has allowed us to replace the aforementioned pipeline with an endtoend deep learning framework called VAMPnet^{56}, which simultaneously learns a dimension reduction of the molecular system to the collective variables best describing the rare event processes and an MSM on these variables. The framework can be used to further drive MD simulations along these learned collective variables^{57,58}. We can also use this framework to estimate statistically reversible MSMs and incorporate constraints from experimental observables^{59,60,61}.
Despite these developments, there is a fundamental scaling problem in describing MD in terms of transitions between global system states: While the assignment of MD configurations to discrete global states representing the metastable groups of structures is an excellent model for small cooperative molecular systems, such as small to medium proteins, larger molecular systems (e.g., proteins with hundreds of amino acids) have an increasing number of subsystems whose dynamics are (nearly) independent^{62} (Fig. 1). Consider, for example, a solution of N proteins which undergo transitions between their open and closed states independently when these proteins are dissociated and these transitions only (partially) couple when they are associated with other proteins. The number of global system states is 2^{N}, i.e., grows exponentially with the number of subsystems N^{63,64}. This means any form of simulation or analysis which explicitly distinguishes global system states will not scale to large molecular systems.
At the same time, the (approximate) independence between subsystems is also key to the solution of the problem. A scalable solution needs to address two separate issues: (a) dividing the protein system into approximately Markovian subsystems and (b) learning the coupling between them. Olsson & Noé^{63} made a first attempt at (b), by learning a dynamic graphical model between predefined subsystems. This approach leads to a graphical model, or Markov random field, resembling Ising or Potts models in physics, with the key difference that both the definition of the individual subsystems or spins as well as their transition dynamics need to be learned. In contrast, Hempel et al.^{64} proposed a solution for (a) by approximating the global system dynamics as a set of independent (uncoupled) Markov models (termed Independent Markov decomposition, IMD). They furthermore propose a pairwise independence score of features, which allows to detect nearly uncoupled regions where independent Markov state models can be estimated subsequently.
In this manuscript, we present a joint IMD and VAMP approach (termed independent VAMPnet, or shorthand iVAMPnet) that significantly advances our ability to identify approximately independent Markovian subsystems (issue a) by generalizing IMD to neural network basis functions. iVAMPnets are an integrated endtoend learning approach that decomposes the macromolecular structure into subsystems that are dynamically weakly coupled, and estimates a VAMPnet for each of these subsystems to promote a comprehensible analysis of the subsystem dynamics (Fig. 1). In comparison to previous implementations of IMD, our approach learns an optimal decomposition into independent subsystems and can find collective variables that are nonlinear combinations of the input features.
Results
Markov state models and Koopman models
Markovian dynamics can be modeled by the transition density:
which is the probability density to observe configuration y at time t + τ given that the system was in configuration x at time t. Based on the transition density we can characterize the time evolution of a probability density χ as:
By discretizing the molecular state space in a suitable way and defining a transition matrix T between discrete states, we can linearize this equation as:
This is the equation of a Markov state model, where the element i of the vector χ_{t+τ}(y) is the probability to be in the discrete state i at time t + τ. Furthermore, the transition matrix elements \({({{{{{{{{\bf{T}}}}}}}}}_{\tau })}_{ij}\) describe the transition probabilities for jumping to state j given state i within a time τ. In the case of fuzzy state assignments, e.g., as with VAMPnets, Eq. (3) describes the more general Koopman model^{65} and T_{τ} becomes the Koopman matrix. This means that probability densities are still propagated but the matrix elements cannot be interpreted as transition probabilities.
The lag time τ is common to all Markovian models and is usually chosen with the aid of an implied timescales test^{66}. If a too small τ is chosen, the resulting model is not a valid Markov model (resulting in errors of the predicted variables)—a too large lag time produces a model that unnecessarily discards kinetic information. We therefore usually choose the smallest lag time above which the implied timescales are approximately constant.
We now seek to find a state assignment χ and model matrix T that satisfy Eq. (3) and also succeed in predicting the longtime behavior, i.e., for multiples of the lag time τ. Formally, χ are (initially unknown) basis functions, i.e., we assume that the relevant dynamic features can be expressed by a linear combination of them. VAMP^{55} tells us that an optimal solution is reached when χ can span the left \({({\psi }_{1},...,{\psi }_{k})}^{T}\) and right singular functions \({({\phi }_{i},...,{\phi }_{k})}^{T}\) of the transition operator. They can be found by maximizing the singular values of a matrix that can be estimated from simulation data (see Eqs. (9)–(13) in “Methods”). In the case of a VAMPnet^{56}, deep neural networks are trained by maximizing the VAMP score, so as to represent optimal fuzzy state assignments. In equilibrium, the singular functions correspond to the eigenfunctions of the Markov state model and the singular values to its eigenvalues. As the Koopman model still propagates densities, it is instructive to inspect the eigenfunctions and implied timescales of T since they describe the slow dynamics of a given system.
iVAMPnets and iVAMPscore
To implement iVAMPnets, we need to bridge the gap between the deep neural networks of VAMPnets and the spatial decomposition of independent Markov models. The general idea is to set up multiple parallel VAMPnets, each modeling the Markovian dynamics of a separate, independent subsystem of the molecule, together with an attention mechanism that identifies these subsystems. Thus, each independent VAMPnet should only receive the time dependent molecular geometry features representing its specific subsystem. For example, such an attention mechanism could separate different protein domains and channel the data of individual domains to separate VAMPnets. We, therefore, develop an architecture that combines a meaningful attention mechanism and parallel VAMPnets and trains them with a loss function that simultaneously promotes dynamic independence between the subsystems and slow kinetics within each subsystem (Fig. 2). iVAMPnets are designed to optimize both these objectives simultaneously.
In practice, we extract all timelagged data pairs x_{t}, x_{t+τ} that contain all molecular geometry features (e.g., distances, contacts, torsions) of our simulation data and pass them through the architecture presented in Fig. 2. The data is fed through an attention mechanism (represented by the matrix G) that yields subsystem specific vectors \({{{{{{{{\bf{Y}}}}}}}}}_{t}^{i}\), each of which attends to features relevant for subsystem i. These vectors then serve as inputs to N parallel feature transformations η^{i} (parallel VAMPnets) which transform those into output features χ^{1}, …χ^{N} (with \({{{{{{{{\boldsymbol{\chi }}}}}}}}}^{i}({{{{{{{{\bf{x}}}}}}}}}_{t})={{{{{{{{\boldsymbol{\eta }}}}}}}}}^{i}({{{{{{{{\bf{Y}}}}}}}}}_{t}^{i}({{{{{{{{\bf{x}}}}}}}}}_{t}))\)) that represent slow collective coordinates or directly fuzzy assignments to metastable Markov states of each molecular subsystem. Equipped with the state assignments, we can compute correlation matrices (Eq. (9)) and derive a Koopman model matrix from those (Eq. (10)). As in VAMPnets, the feature transformations η^{1}, …η^{N} are represented by deep neural networks. In the present study, we use multilayer perceptrons with a SoftMax output layer representing fuzzy state assignments. However, other architectures could be chosen, e.g., graph convolution networks when parameter sharing is desired^{67,68}, and a linear output layer could be chosen if the aim is to represent slow collective variable rather than discrete states^{57,58}. The parameters of the feature transformations η and the attention matrix are learned endtoend via backpropagation.
In more detail, given N individual subsystem models, the global system state can be given by the Kronecker product of all subsystem states:
and by computing the global correlation matrices \(({{{{{{{{\bf{C}}}}}}}}}_{00}^{G},{{{{{{{{\bf{C}}}}}}}}}_{0\tau }^{G},{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{G})\) from Eqs (9) using χ^{G}. We note that this step does not require that we have independent Markovian models, but it is simply a formalism to express global states in terms of a combination of local states.
Furthermore, we construct a candidate for the global Koopman model from the subsystem models by combining the individual singular values and vectors with a Kronecker product^{64}:
The matrices \({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G}\) and \({\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}\) map the global state assignments onto the constructed singular functions and are computed from the local matrices as defined in Eqs. ((11), (12)). The diagonal matrix \({\hat{{{{{{{{\bf{K}}}}}}}}}}^{G}\) encodes the singular values and is computed from the subsystem singular value matrices via Eq. (10).
In order to evaluate the performance of the constructed model to predict the dynamics in the global state space, the VAMPE validation^{55} score can be exploited,
The VAMPE score measures the difference between the estimated Koopman model and the true dynamics. Here, it is evaluated for the global state assignments ⨂_{i} χ^{i} (as encoded in \({{{{{{{{\bf{C}}}}}}}}}_{00}^{G},{{{{{{{{\bf{C}}}}}}}}}_{0\tau }^{G},{{{{{{{{\bf{C}}}}}}}}}_{\tau \tau }^{G}\)) mapped on the constructed singular functions (as encoded in \({\hat{{{{{{{{\bf{U}}}}}}}}}}^{G},{\hat{{{{{{{{\bf{V}}}}}}}}}}^{G}\)). If the subsystems are independent the constructed singular functions are optimal and the singular values of the global system are indeed the product of singular values of the subsystems (as formalized in Conditions for independent systems, also see Supplementary Note 1). In this case, the global VAMPE score Eq. (6) has a product form
that poses a necessary condition for subsystem independence.
To finally train the model, we develop a loss function that (i) maximizes the global VAMPE score, assuming that they describe independent dynamics (Eqs. (4)–(6)), and (ii) minimizes a term that penalizes statistical dependence between these subsystems (Eq. (7)) scaled by a weighting factor ξ.
We evaluate the scores only pairwise, to escape the growth of the global state space, and sum over all possible pairs i, j:
Here, \({{{{{{{{\mathcal{R}}}}}}}}}_{E}^{ij}\) measures the quality of the constructed Koopman model of subsystems i and j and is computed using Eq. (6). The weighting factor ξ is a hyperparameter that should be chosen large enough to find decoupled systems and small enough to not interfer with the subsystem dynamics. Even though the choice of an appropriate ξ depends on the nature of the dynamics and the coupling, it is directly related to the training procedure as it, briefly, balances focus of the optimizer between kinetics and decoupling. Further conditions (Eq. (18)), which evaluate the independence of the singular functions and values, can be used as post training validation metrics for adjusting ξ and for testing to which degree dynamically independent subsystems were found.
Benchmark model with two independent subsystems
The iVAMPnet architecture, which is implemented using PyTorch^{69}, is depicted in Fig. 2. Generally, various neural network architectures are possible; we here choose fully connected feed forward neural networks with up to 5 hidden layers with 100 nodes each. The scripts to reproduce the results including the details for the training routine, choice of hyperparameters, and network architecture can be found in our GitHub repository. We note that an implementation of VAMPnets is available in the current version of DeepTime^{70}.
We first demonstrate that iVAMPnets are capable of decomposing a dynamical system into its independent Markovian subsystems based on observed trajectory data using an exactly decomposable benchmark model (Fig. 3).
Akin to the protein illustrated in Fig. 1, we define a system that consists of two independent subsystems with two and three states, respectively. It is modeled by two transition matrices with the corresponding number of states. We sample a discrete trajectory with each matrix (100k steps)^{70}. The global state is defined as a combination of these discrete states. The discrete subsystem states are now interpreted as the hidden states of hidden Markov models^{71} that emit to separate, subsystemspecific dimensions of a 2D space. The output of each subsystem is modeled with Gaussian noise \(N({\mu }_{i},\tilde{\sigma })\in {\mathbb{R}}\) that is specific to the state that the system is in, specified by the mean μ_{i}, and a constant \(\tilde{\sigma }\). The two state subsystem, therefore, describes a jump process between Gaussian basins along the xaxis and the three state subsystem along the yaxis, respectively (Fig. 3a). These variables compare to collective variables of the green (x) and blue (y) system depicted in Fig. 1. Please note that while in this benchmark system the relevant slow collective variables are known, iVAMPnets are generally capable of finding them (cf. 10D hypercube benchmark model and SynaptotagminC2A).
Since the generative benchmark model consists of perfectly independent subsystems and the pair already describes the global system, our method can simply be optimized for the global VAMPE score (Eq. (6)) without the need for any further constraints. We train a model with a two and three state subsystem at a lag time of τ = 1 step.
Once trained, the iVAMPnet yields a model of the dynamics in each of the identified subsystems. As expected, we find that the estimated transition matrices for both subsystems closely agree with the ground truth (Fig. 3c). To additionally assess the slow subsystem dynamics in more detail, we borrow concepts from MSM analysis and conduct an eigenvalue decomposition of the iVAMPnet models (cf. VAMPnets). The analysis of the eigenfunctions demonstrates that, by construction, the system exhibits one independent process along the xaxis (λ_{1} = 0.90) and two along the yaxis (λ_{2} = 0.89 and λ_{4} = 0.66) (Fig. 3d). In contrast, we note that in the picture of global states, two additional processes would appear as a result of mixing the independent processes (cf. Supplementary Note 2), which makes the combined dynamical model more challenging to analyze, whereas the iVAMPnet analysis remains straightforward and simple.
Besides the dynamical models, our iVAMPnet yields assignments between input features and subsystems. We find that the method correctly identifies the two state system as the xaxis and the three states as the yaxis feature, respectively (Fig. 3b).
10D hypercube benchmark model
In a next step we test the iVAMPnet approach with ten 2state subsystems, which corresponds to 1024 global states (Fig. 4a, b). As before, the dynamics is generated by ten independent hidden Markov state models with unique timescales. The system is split into five pairs of subsystems, and the two coordinates governing the transition dynamics of each pair are rotated in order to make them more difficult to separate (Fig. 4a). Additionally, we make the learning problem harder by adding ten noise dimensions such that the global system lives on a 10dimensional hypercube embedded in a 20 dimensional space.
Although the subsystems are perfectly independent, we will estimate an iVAMPnet with the VAMPE score in a pairwise fashion, thereby avoiding to estimate expensively large correlation matrices in \({{\mathbb{R}}}^{1024\times 1024}\). As this is only justified if all systems are independent, we additionally enforce Eq. (7) during training by minimizing Eq. (8) and thereby rule out that any two subsystems approximate the same process.
The iVAMPnet estimation yields subsystem models which, as common in MSM analysis, can be validated by testing whether their implied relaxation timescales are converged in the model lag time τ. We find that the implied timescales learned by the iVAMPnet are indeed converged and accurately reproduce the ground truth (Fig. 4d). We note that in addition to the timescales of the individual subsystems that are identified by the iVAMPnet, a global model would also contain all timescales that result from products of eigenvalues, resulting in a total of 1024 timescales. Thus, the iVAMPnet analysis provides a much simpler and more concise model than a global MSM or VAMPnet would.
Furthermore, the subsystem assignment mask indicates that the method correctly assigns high importance weight to two input features for each model (Fig. 4c). Therefore, the method proves its capability of decomposing a noisy, high dimensional global system into its independent subprocesses in a data efficient way.
We have generalized the 10cube system to a variable number of subsystems (Ncube) to conduct a performance benchmark, finding that iVAMPnets outperform VAMPnets for this particular system. We however note that this result may not be generalizable to arbitrary systems as the Ncube features truely independent 2state subsystems (compare Supplementary Note 6 for details).
SynaptotagminC2A
Finally, we test iVAMPnets on an allatom protein system. In comparison to our benchmark examples, we expect the underlying global dynamics to be only approximately decomposable into independent subsystems. Our test data consists of 184 μs aggregate MD data of each 2 μs length (92 × 2 μs) of the C2A domain of synaptotagmin (Supplementary Note 7) that was described previously^{72}; synaptotagmin plays a crucial role in the regulation of neurotransmitter release^{73}. It was shown to consist of approximately uncoupled subsystems containing the calcium binding region (CBR) and the C78 loop, respectively^{64}.
First, we attempted to model the protein with a global model, i.e., with a single (regular) VAMPnet. Indeed, this approach failed because there were not enough simulation statistics to estimate a reversibly connected transition model between all global metastable states, resulting in diverging implied timescales (Supplementary Note 3 and Supplementary Fig. 2). This is exactly the scenario where iVAMPnets should provide an advantage, by only relying on locally rather than globally converged transition statistics.
Next, we train an iVAMPnet to seek two subsystems of twelve and six states, respectively, each at a lag time of τ = 10 ns where we enforce constraint Eq. (7) to find uncoupled subsystems.
The trained iVAMPnet identifies one subsystem comprising all three CBR loops (CBR1, CBR2, CBR3; Fig. 5a). The second subsystem consists not only of the aforementioned C78 loop but also of the loop connecting beta sheets 3 and 4^{74} (termed C34 henceforth). When mapping the residue positions on the protein structure it becomes obvious that the two subsystems are physically well separated (Fig. 5a), supporting the conclusion that both regions are only weakly coupled^{64}.
The implied timescales of both systems are approximately constant in the model lag time τ. Most timescales are in the range of 1–10 μs, with the exception of one much slower process with a 100 μs relaxation time found in the first subsystem (Fig. 5b), which has not been found previously. Analysis of the structural changes governing this process reveals that it involves an orchestrated transition of all CBR loops (Fig. 5c). Such a process could however not be resolved by the previous study^{72} where the CBR was modeled as individual loops. The process of the second system involves a simultaneous movement of the C78 and C34 loops (Fig. 5c).
iVAMPnets find metastable structures in the local features that are comparable to the ones described in our previous work ^{72}. Specifically, αhelices in two distinct positions and a state burying a methionine residue (Met173) can be found in the CBR1. In the adjacent CBR2 site, both tightly bound and loose configurations are identified, and the C78 site features all three previously described valine residue conformations (Val250, Val255). In addition to the features modeled in our preceding study^{72}, iVAMPnets identify dynamics in a lysine rich cluster (Lys189192) that was previously reported as important for membrane interaction^{75}. Please compare Supplementary Note 4 for a detailed view on the metastable states and exchange kinetics. In contrast to our previous work, the kinetic models in the local subsystems are more complex and incorporate a larger number of dynamic processes, providing a more comprehensive picture without the need to define a partitioning manually. In fact, conducting domaindecomposition and local kinetic modeling simultaneously has enabled the identification of very subtle dynamical features as long as they contribute significantly to the local VAMPscores.
Although estimating a global VAMPnet model for synaptotagmin was not feasible given the sparse data sample, iVAMPnets use the same data efficiently and estimate a statistically valid dynamical model. This result is especially striking because the iVAMPnet approach also simplifies the subsequent task of interpreting models by separating dynamically independent protein domains.
Counterexample: folding of the villin miniprotein
Finally, we conducted an experiment on a villin protein folding trajectory of 125 μs length^{76} as a negative example (Supplementary Note 7). Small proteins such as villin are typically cooperative, i.e., the slowest processes related to folding involve all residues (Supplementary Note 5). Thus, these processes cannot be resolved when decomposing the system into several subsystems. Indeed, we find that a splitting into two subsystems with two states each results in timescales that are not converged, and whose relaxation processes approximate a partial folding on disjoint areas (cf. Supplementary Fig. 6).
Testing statistical independence of the learned dynamical subsystems
As constraint Eq. (7) was used as a penalty during training (as independence score Eq. (19)), we assess the validity of an estimated subsystem assignment by evaluating the constraints that were not enforced during training (Eq. (17)) as posttraining independence scores M_{U}, M_{V}, and M_{UV} (defined in Eq. (18)). Low values for M_{U} and M_{V} imply that the constructed left and right singular functions are indeed valid candidates for singular functions in the global state space. A small value for M_{UV} indicates that the kinetics in the global state space is well predicted by the Kronecker product of subsystem models. We find that the three metrics are well suited to indicate independence of the learned subsystems (Table 1). Out of the tested systems only villin cannot be split into independent parts (all scores > 0.1). In comparison, the benchmark models and synaptotagmin can be decomposed into statistically uncoupled subsystems (all scores < 0.01). The slightly increased M_{R}value for synaptotagmin suggests that its subsystems might be weakly coupled.
Discussion
We have proposed an unsupervised deep learning framework that, using only molecular dynamics simulation data, learns to decompose a complex molecular system into subsystems which behave as approximately independent Markov models. Thereby, iVAMPnet is an endtoend learning framework that points a way out of the exponentially growing demand for simulation data that is required to sample increasingly large biomolecular complexes. Specifically, we have developed and demonstrated iVAMPnets for molecular dynamics, but the approach is, in principle, also applicable to different application areas, such as fluid dynamics. The specific implementation, such as the representation of the input vectors x_{t} and the neural network architecture of the χfunctions, depend on the application and can be adapted as needed.
We now have a hierarchy of increasingly powerful models ranging from MSMs over VAMPnets to iVAMPnets. MSMs always consist of (1) a state space decomposition and (2) a Markovian transition matrix governing the dynamics between these states. VAMPnets provide a deep learning framework for MSMs, and thereby (3) learn the collective coordinates in which the state space discretization (1) is best made. iVAMPnets additionally learn (4) a physical separation of the molecular system into subsystems, each of which has its own slow coordinates, Markov states, and transition matrix.
We have demonstrated that iVAMPNets are a powerful multiscale learning method that succeeds in finding and modeling molecular subsystems when these subsystems indeed evolve statistically independently. Additionally, iVAMPnets are capable of learning from high dimensional MD data. To prove that point, we have demonstrated that the synaptotagmin C2A domain is decomposable into two almost independent Markov state models. Importantly, we have shown that this dynamical decomposition of synaptotagmin C2A succeeds while an attempt to model the system with a global Markov state model fails due to poor sampling. This is a direct demonstration that iVAMPnets are statistically more efficient than VAMPnets, MSMs, or other globalstate models and may indeed scale to much larger systems.
We note, however, that iVAMPnets do not learn how the subsystems are coupled, and are, therefore, in their current form, only applicable to molecular systems that consist of uncoupled or weakly coupled subsystems. Although most biomolecular complexes are known to be cooperative, there are examples that have been modeled very successfully using independent subsystems, such as the HudgkinHuxley model of voltagegated channel proteins^{77,78}. For other systems, the degree of coupling is a matter of debate, for example, the C2tandem (C2A and C2B domains) in synaptotagmins^{79,80}. Since isolated domains are known to conduct function by themselves in many cases, we believe that discarding couplings is a firstorder modeling assumption that is suitable to identify these domains and their relevant metastable states.
Following up on ref. 63 and introducing coupling parameters that describe how the learned MSMs are coupled, is subject to ongoing research. Furthermore, the weakcoupling assumption is made for the timescale of the investigated molecular processes and may not be generalizable to arbitrary times. E.g., the degree of coupling between domains found in an MD simulations of a folded protein state may be very different in its unfolded state, which will be eventually encountered for a long enough simulation time.
Besides the usual hyperparameter choices in deep learning approaches, iVAMPnets require the specification of the number of sought subsystems. This choice can be guided by training an iVAMPnet for different numbers of subsystems and then interrogating the independence scores (Eqs. (19) and (18)) to choose a decomposition where statistical independence is optimal. We suggest to start with decomposing the system into two subsystems as a starting point, and to increase this number subsequently. Nonoptimal choices may, e.g., reflect in nonconverged implied timescales (possibly an incarnation of the sampling problem that may be mitigated by increasing the number of subsystems) or high independence scores (not possible to split the system because too many or nonoptimal number of subsystems were chosen). Furthermore, the choice of the number of subsystems can be guided by the number of structural domains in a protein (complex) or by using the networkbased approach presented in ref. 64. Furthermore, the number of states in each subsystems needs to balance (a) the quality of the singular function approximation (higher for few states) and (b) model resolution (higher for more states). Ultimately, different choices may yield converged validation measures, and the number of states may be chosen to yield the desired model resolution in this case.
iVAMPnets can be improved and further developed in multiple ways, e.g., by employing more advanced network architectures, e.g., graph neural networks, where parameters could be shared across subsystems. This might result in higher quality models and a greater robustness against the hyperparameter choice. Very recently, graph neural networks were indeed successfully combined with VAMPnets, showing that the resulting method (GraphVAMPnets) is applicable to MD data and that the estimated models are high quality^{81}.
In summary, iVAMPnets pave a possible path for modeling the kinetics of large biological systems in a dataefficient and interpretable manner.
Methods
VAMPnets
Since an iVAMPnet implements multiple parallel VAMPnets representing the kinetics of separate independent subsystems, we will introduce VAMPnets first^{56}. VAMPnets are multilayer perceptrons that represent feature functions χ (we omit the upper subsystem index i for the sake of clearness here). Their last layer is often chosen to be a SoftMax function, i.e., summing over all nonnegative outputs yields a 1. Therefore, the output of a VAMPnet can be interpreted as a fuzzy assignment to a metastable state. Taking the linear combination of states with equal weights results in the constant singular function with the singular value 1, which will be reflected by the singular values of the Koopman matrix (Eq. (10) with the normalized correlation matrix). Given the feature functions χ, we can compute the following correlation matrices:
where L is the number of collected data pairs in the simulations.
Training VAMPnets or iVAMPnets involves the computation of covariance matrices over minibatches. We, therefore, need to choose the batchsize to balance large estimator variance obtained for small batches and high memory requirements for large batches. Instead of using the trivial covariance estimator (Eq. (9)) which is asymptotically unbiased^{55} but has a highvariance, one can employ a shrinkage estimator^{82,83} which reduces the overall estimator error by trading larger bias for lower variance. For the current study, we assume that our benchmark and MD data has been sufficiently sampled to yield adequate approximations with the estimator given in Eq. (9).
The approximation of the singular functions and values can be estimated via the singular value decomposition (SVD) of the following matrix \(\bar{{{{{{{{\bf{K}}}}}}}}}\):
K is the diagonal matrix of approximated singular values corresponding to the left and right singular functions:
The matrices U and V construct the left and right singular functions from the individual state assignments. The optimal state assignments can be found by maximizing the VAMPE score:
Given trained state assignments χ(x_{t}) and correlation matrices Eq. (9), the Koopman matrix T can then be evaluated as:
Furthermore, we can estimate the eigenfunction φ and timescales t_{i} by its eigendecomposition T = QΛQ^{−1}:
Please note that this operation is only possible if the eigendecomposition is (approximately) realvalued, a condition that is met for the presented application cases.
Conditions for independent systems
For Markov independent systems, the singular values and functions that are constructed by the Kronecker product match the true global ones,
where the first two equations guarantee the orthonormality of the constructed singular functions. The latter verifies that the left and right singular functions correlate as predicted by the Kronecker product of the singular values. These conditions can be translated to the following scores:
Furthermore, using the identities Eq. (17) and the definition of the VAMPE score Eq. (13) yields
The norms denote simple means. The last score, M_{R}, is enforced during training in a pairwise fashion (cf. Eq. (8)).
Network architecture
Given a global system, which we want to decompose into N subsystems, and a time series of input features \({\{{{{{{{{{\bf{x}}}}}}}}}_{t}\}}_{t=1,...T}\), \({{{{{{{{\bf{x}}}}}}}}}_{t}\in {{\mathbb{R}}}^{D\times 1}\), we pass the features through a mask \({{{{{{{\bf{G}}}}}}}}\in {{\mathbb{R}}}^{D\times N}\), which weights each input differently for each subsystem, before the result are transformed individually by the N independent state assignment functions η^{i}. It should be mentioned that the mask is merely introduced for interpretability reasons and is not essential to find independent subsystems. If the mask was omitted, the extraction of the relevant features would simply be transferred to the downstream neural networks, remaining hidden to the practitioner.
The weighted input is assessed by an element wise multiplication \({\bar{{{{{{{{\bf{Y}}}}}}}}}}_{t}={{{{{{{\bf{G}}}}}}}}\odot {{{{{{{{\bf{x}}}}}}}}}_{t}\). In order to prevent the neural networks to reverse the weighting of the mask in its consecutive layers, we draw for each input feature i and subsystem j an independent, normally distributed random variable \({\epsilon }_{ij} \sim {{{{{{{\mathcal{N}}}}}}}}(0,\sigma (1{G}_{ij}))\). This noise is added to the weighted features:
Thereby, the attention weight linearly interpolates between input feature and Gaussian noise, i.e., if the attention weight G_{ij} = 1, Y_{ij} carries exclusively the input feature x_{i}, if G_{ij} = 0, Y_{ij} is simple Gaussian noise. By tuning the noise scaling σ, a harder assignment by G can be enforced. This hyperparameter should be optimized by adjusting it so that the resulting mask yields clear subsystem assignments without being binary. Subsequently, the transformed feature vector is split for each individual subsystem \({{{{{{{{\bf{Y}}}}}}}}}_{t}=[{{{{{{{{\bf{Y}}}}}}}}}_{t}^{1},...,{{{{{{{{\bf{Y}}}}}}}}}_{t}^{N}]\) and passed through the subsystem specific neural network η^{i} resulting in feature transformations \({{{{{{{\boldsymbol{{\chi }}}}}}}^{i}}}({{{{{{{{\bf{x}}}}}}}}}_{t})={{{{{{{{\boldsymbol{\eta }}}}}}}}}^{i}({{{{{{{{\bf{Y}}}}}}}}}_{t}^{i})\). These features are then used to estimate the Koopman models.
The training framework and neural network architecture were implemented in the Python 3 programming language using numpy^{84} and pyTorch^{69}; benchmark system data was generated using DeepTime^{70}; data visualization was performed using matplotlib^{85} and VMD^{24}.
Constructing the mask
To train an interpretable mask, we use the following three premises:

1.
A single subsystem should not focus on all input features.

2.
Different subsystems compete for high weights for the same feature.

3.
All weights should be in the range [0, 1] and the matrix should be sparse.
Therefore, the mask is constructed by trainable weights \({{{{{{{\bf{g}}}}}}}}\in {{\mathbb{R}}}^{D\times N}\) which are first processed by a softmax function which normalizes along the input feature axis \({{{{{{{{\bf{g}}}}}}}}}_{1}={{{{{{{\rm{softmax}}}}}}}}({{{{{{{\bf{g}}}}}}}},\dim=0)\). Thereby, if a subsystem focuses on one part of the features, a lower weight for the other parts is expected following the first premise.
In a next step, weights which are lower than a threshold θ are clipped to zero g_{2} = relu(g_{1} − θ) to guarantee sparsity. The threshold θ is a hyperparameter that can be optimized by starting with comparably small values (i.e., very little cutoff) and subsequently increasing it without further training—a reasonable cutoff does not alter the results in this case, as the downstream neural networks still obtain all relevant information.
Since input features could be negligible for all subsystems, a dummy system is added which has a constant value \({{{{{{{\bf{c}}}}}}}}\in {{\mathbb{R}}}^{D\times 1}\) for all features g_{3} = [g_{2}, c]. Consequently, the weights of all subsystems and the dummy system are normed for each feature \({{{{{{{{\bf{g}}}}}}}}}_{4}={{{{{{{{\bf{g}}}}}}}}}_{3}/{{{{{{{\rm{sum}}}}}}}}({{{{{{{{\bf{g}}}}}}}}}_{3},\dim=1)\), which together with the clipping fulfills the premises two and three.
Finally, the mask is given by truncating the dummy system \({{{{{{{{\bf{g}}}}}}}}}_{4}=[{{{{{{{\bf{G}}}}}}}},\bar{{{{{{{{\bf{c}}}}}}}}}]\). Beware that only g_{4} is normalized along the system axis.
Application to protein dynamics
Since for proteins the final model is often expected to be invariant with respect to rotations and translations, internal coordinates are employed as input features. For Markov state modeling, the minimal heavy atom distance d_{ij} between residues i, j has been proven to be a good descriptor^{56,86}. However, for interpretability, mask weights for each residue are preferable. Therefore, the mask is of size \({{{{{{{\bf{G}}}}}}}}\in {{\mathbb{R}}}^{R\times N}\) with the number of residues R. The input features are then scaled as \({x}_{ij}={G}_{i}{G}_{j}\exp ({d}_{ij})\).
Furthermore, a smoothing routine is implemented such that neighboring residues along the chain have similar importance weights. W windows of size B are placed along the chain with step size s. Each window has a trainable weight \({{{{{{{\bf{g}}}}}}}}\in {{\mathbb{R}}}^{W\times N}\). Consequently, the softmax function is taken along the window axis \(\bar{{{{{{{{\bf{g}}}}}}}}}={{{{{{{\rm{softmax}}}}}}}}({{{{{{{\bf{g}}}}}}}},\dim=0)\). However, before applying the clipping as before the weight for each residue \({{{{{{{{\bf{g}}}}}}}}}_{1}{{\mathbb{\in \; R}}}^{R\times N}\) is calculated as the product of all window weights the residue is part of (Fig. 6).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The benchmark data can be generated from the Jupyter notebooks that have been deposited on GitHub under https://github.com/markovmodel/ivampnets^{87}. The molecular dynamics data set of synaptotagmin C2A have been deposited in Zenodo under https://zenodo.org/record/6908073^{88}. The crystal structure of synaptotagmin C2A is available under PDB ID 2R83 [https://doi.org/10.2210/pdb2R83/pdb]. The villin headpiece folding data are available under restricted access and were used under license for this study as courtesy of D.E. SHAW research^{76}, access can be obtained from the authors upon request.
Code availability
The code that implements the presented models and reproduces the presented results has been deposited on GitHub under https://github.com/markovmodel/ivampnets^{87}.
References
Phillips, J. C. et al. Scalable molecular dynamics on cpu and gpu architectures with namd. J. Chem. Phys. 153, 044130 (2020).
Vant, J. W. et al. Protein Structure Prediction 301–315 (Springer, 2020).
Buch, I., Harvey, M. J., Giorgino, T., Anderson, D. P. & De Fabritiis, G. Highthroughput allatom molecular dynamics simulations using distributed computing. J. Chem. Inform. Modeling 50, 397–403 (2010).
Eastman, P. et al. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 13, e1005659 (2017).
SalomonFerrer, R., Gotz, A. W., Poole, D., Le Grand, S. & Walker, R. C. Routine microsecond molecular dynamics simulations with amber on gpus. 2. explicit solvent particle mesh Ewald. J. Chem. Theory Comput. 9, 3878–3888 (2013).
Abraham, M. J. et al. Gromacs: High performance molecular simulations through multilevel parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015).
Bussi, G., Laio, A. & Tiwary, P. Metadynamics: A Unified Framework for Accelerating Rare Events and Sampling Thermodynamics and Kinetics. In Handbook of Materials Modeling (eds Andreoni, W. & Yip, S.) 565–595 (Springer International Publishing, 2020).
Tsai, S.T., Smith, Z. & Tiwary, P. SGOOPd: Estimating kinetic distances and reaction coordinate dimensionality for rare event systems from biased/unbiased simulations. J. Chem. Theory Comput. 17, 6757–6765 (2021).
Liu, C., Brini, E., Perez, A. & Dill, K. A. Computing ligands bound to proteins using meldaccelerated md. J. Chem. Theory Comput. 16, 6377–6382 (2020).
MacCallum, J. L., Perez, A. & Dill, K. A. Determining protein structures by combining semireliable data with atomistic physical models by Bayesian inference. Proc. Natl. Acad. Sci. USA 112, 6985–6990 (2015).
Perez, A., MacCallum, J. L. & Dill, K. A. Accelerating molecular simulations of proteins using Bayesian inference on weak information. Proc. Natl Acad. Sci. USA 112, 11846–11851 (2015).
Ge, Y. & Voelz, V. A. Estimation of binding rates and affinities from multiensemble Markov models and ligand decoupling. J. Chem. Phys. 156, 134115 (2022).
Ribeiro, J. M. L., Bravo, P., Wang, Y. & Tiwary, P. Reweighted autoencoded variational bayes for enhanced sampling (rave). J. Chem. Phys. 149, 072301 (2018).
Schütte, C., Fischer, A., Huisinga, W. & Deuflhard, P. A direct approach to conformational dynamics based on hybrid Monte Carlo. J. Comput. Phys. 151, 146–168 (1999).
Prinz, J.H. et al. Markov models of molecular kinetics: Generation and validation. J. Chem. Phys. 134, 174105 (2011).
Swope, W. C., Pitera, J. W. & Suits, F. Describing protein folding kinetics by molecular dynamics simulations: 1. Theory. J. Phys. Chem. B 108, 6571–6581 (2004).
Noé, F., Horenko, I., Schütte, C. & Smith, J. C. Hierarchical analysis of conformational dynamics in biomolecules: Transition networks of metastable states. J. Chem. Phys. 126, 155102 (2007).
Chodera, J. D., Singhal, N., Pande, V. S., Dill, K. A. & Swope, W. C. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. J. Chem. Phys. 126, 155101 (2007).
Buchete, N. V. & Hummer, G. Coarse master equations for peptide folding dynamics. J. Phys. Chem. B 112, 6057–6069 (2008).
Wan, H. & Voelz, V. A. Adaptive Markov state model estimation using short reseeding trajectories. J. Chem. Physi. 152, 024103 (2020).
Scherer, M. K. et al. PyEMMA 2: A software package for estimation, validation, and analysis of Markov models. J. Chem. Theory Comput. 11, 5525–5542 (2015).
Harrigan, M. P. et al. Msmbuilder: Statistical models for biomolecular dynamics. Biophys J. 112, 10–15 (2017).
McGibbon, R. T. et al. Mdtraj: A modern open library for the analysis of molecular dynamics trajectories. Biophys J. 109, 1528–1532 (2015).
Humphrey, W., Dalke, A. & Schulten, K. Vmd  visual molecular dynamics. J. Molec. Graphics 14, 33–38 (1996).
PerezHernandez, G., Paul, F., Giorgino, T., D Fabritiis, G. & Noé, F. Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 139, 015102 (2013).
Ziehe, A. & Müller, K.R. TDSEP—an efficient algorithm for blind separation using time structure. In ICANN 98, 675–680 (Springer Science and Business Media, 1998).
Mezić, I. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynam. 41, 309–325 (2005).
Schmid, P. J. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010).
Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L. & Kutz, J. N. On dynamic mode decomposition: Theory and applications. J. Comput. Dyn. 1, 391–421 (2014).
Noé, F. & Clementi, C. Collective variables for the study of longtime kinetics from molecular trajectories: theory and methods. Curr. Opin. Struc. Biol. 43, 141–147 (2017).
Klus, S. et al. Datadriven model reduction and transfer operator approximation. J. Nonlinear Sci. 28, 985–1010 (2018).
Bowman, G. R., Pande, V. S. & Noé, F. An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation (Springer, 2014).
Husic, B. E. & Pande, V. S. Ward clustering improves crossvalidated Markov state models of protein folding. J. Chem. Theo. Comp. 13, 963–967 (2017).
Sheong, F. K., Silva, D.A., Meng, L., Zhao, Y. & Huang, X. Automatic state partitioning for multibody systems (APM): An efficient algorithm for constructing Markov state models to elucidate conformational dynamics of multibody systems. J. Chem. Theory Comput. 11, 17–27 (2015).
Weber, M., Fackeldey, K. & Schütte, C. Setfree Markov state model building. J. Chem. Phys. 146, 124133 (2017).
Bowman, G. R., Beauchamp, K. A., Boxer, G. & Pande, V. S. Progress and challenges in the automated construction of Markov state models for full protein systems. J. Chem. Phys. 131, 124101 (2009).
TrendelkampSchroer, B., Wu, H., Paul, F. & Noé, F. Estimation and uncertainty of reversible Markov models. J. Chem. Phys. 143, 174101 (2015).
Kube, S. & Weber, M. A coarse graining method for the identification of transition rates between molecular conformations. J. Chem. Phys. 126, 024103 (2007).
Yao, Y. et al. Hierarchical nyström methods for constructing Markov state models for conformational dynamics. J. Chem. Phys. 138, 174106 (2013).
Fackeldey, K. & Weber, M. Genpcca – Markov state models for nonequilibrium steady states. WIAS Report 29, 70–80 (2017).
Gerber, S. & Horenko, I. Toward a direct and scalable identification of reduced models for categorical processes. Proc. Natl. Acad. Sci. USA 114, 4863–4868 (2017).
Hummer, G. & Szabo, A. Optimal dimensionality reduction of multistate kinetic and Markovstate models. J. Phys. Chem. B 119, 9029–9037 (2015).
Orioli, S. & Faccioli, P. Dimensional reduction of Markov state models from renormalization group theory. J. Chem. Phys. 145, 124120 (2016).
Noé, F., Wu, H., Prinz, J.H. & Plattner, N. Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. J. Chem. Phys. 139, 184114 (2013).
Sengupta, U., CarballoPacheco, Martín & Strodel, B. Automated Markov state models for molecular dynamics simulations of aggregation and selfassembly. J. Chem. Phys. 150, 115101 (2019).
CarballoPacheco, M. & Strodel, B. Advances in the simulation of protein aggregation at the atomistic scale. J. Phys. Chem. B 120, 2991–2999 (2016).
Qiao, Q., Bowman, G. R. & Huang, X. Dynamics of an intrinsically disordered protein reveal metastable conformations that potentially seed aggregation. J. Am. Chem. Soc. 135, 16092–16101 (2013).
Silva, D.A., Bowman, G. R., SosaPeinado, A. & Huang, X. A role for both conformational selection and induced fit in ligand binding by the LAO protein. PLoS Comput. Biol. 7, e1002054 (2011).
Sengupta, U. & Strodel, B. Markov models for the elucidation of allosteric regulation. Philos. Trans. R. Soc. B: Biol. Sci. 373, 20170178 (2018).
Plattner, N. & Noé, F. Protein conformational plasticity and complex ligandbinding kinetics explored by atomistic simulations and Markov models. Nat. Commun. 6, 7653 (2015).
Baiz, C. R. et al. A molecular interpretation of 2D IR protein folding experiments with Markov state models. Biophys. J. 106, 1359–1370 (2014).
Olsson, S., Wu, H., Paul, F., Clementi, C. & Noé, F. Combining experimental and simulation data of molecular processes via augmented Markov models. Proc. Natl Acad. Sci. USA 114, 8265–8270 (2017).
Noé, F. & Nüske, F. A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Model. Simul. 11, 635–655 (2013).
McGibbon, R. T. & Pande, V. S. Variational crossvalidation of slow dynamical modes in molecular kinetics. J. Chem. Phys. 142, 124105 (2015).
Wu, H. & Noé, F. Variational approach for learning Markov processes from time series data. J Nonlinear Sci. 30, 23–66 (2020).
Mardt, A., Pasquali, L., Wu, H. & Noé, F. Vampnets: Deep learning of molecular kinetics. Nat. Commun. 9, 5 (2018).
Chen, W., Sidky, H. & Ferguson, A. L. Nonlinear discovery of slow molecular modes using statefree reversible vampnets. J. Chem. Phys. 150, 214114 (2019).
Bonati, L., Piccini, G. & Parrinello, M. Deep learning the slow modes for rare events sampling. Proc. Natl Acad. Sci. USA 118, e2113533118 (2021).
Mardt, A., Pasquali, L., Noé, F. & Wu, H. Deep learning Markov and Koopman models with physical constraints. In Mathematical and Scientific Machine Learning 451–475 (PMLR, 2020).
Wu, H., Mardt, A., Pasquali, L., & Noe, F. Deep generative Markov state models. In Advances in Neural Information Processing Systems 3975–3984 (2018).
Mardt, A. & Noé, F. Progress in deep Markov state modeling: Coarse graining and experimental data restraints. J. Chem. Phys. 155, 214106 (2021).
Konovalov, K. A., Unarta, I. C., Cao, S., Goonetilleke, E. C. & Huang, X. Markov state models to study the functional dynamics of proteins in the wake of machine learning. JACS Au 1, 1330–1341 (2021).
Olsson, S. & Noé, F. Dynamic graphical models of molecular kinetics. Proc. Natl Acad. Sci. 116, 15001–15006 (2019).
Hempel, T. et al. Independent Markov decomposition: Toward modeling kinetics of biomolecular complexes. Proc. Natl Acad. Sci. USA 118, e2105230118 (2021).
Koopman, B. O. Hamiltonian systems and transformations in Hilbert space. Proc. Natl. Acad. Sci. USA 17, 315–318 (1931).
Wehmeyer, C. et al. Introduction to Markov state modeling with the PyEMMA software [Article v1.0]. LiveCoMS 1, 5965 (2018).
Schütt, K. T., Sauceda, H. E., Kindermans, P.J., Tkatchenko, A. & Müller, K.R. SchNet – A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research (Meila, M. & Zhang, T.) 9377–9388 (PMLR, 2021).
Paszke, A. et al. Pytorch: An imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems 8026–8037 (2019).
Hoffmann, M. et al. Deeptime: A Python library for machine learning dynamical models from time series data. Mach. Learn.: Sci. Technol. 3, 015009 (2022).
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).
Hempel, T., Plattner, N. & Noé, F. Coupling of conformational switches in calcium sensor unraveled with local Markov models and transfer entropy. J. Chem. Theory Comput. 16, 2584–2593 (2020).
Südhof, T. C. Neurotransmitter release: The last millisecond in the life of a synaptic vesicle. Neuron 80, 675–690 (2013).
Jiménez, J. L. et al. Functional recycling of C2 domains throughout evolution: A comparative study of synaptotagmin, protein kinase C and phospholipase C by sequence, structural and modelling approaches. J. Mol. Biol. 333, 621–639 (2003).
Guillén, J. et al. Structural insights into the Ca2+ and PI(4,5)P2 binding modes of the C2 domains of rabphilin 3A and synaptotagmin 1. Proc. Natl Acad. Sci. USA 110, 20503–20508 (2013).
LindorffLarsen, K., Piana, S., Dror, R. O. & Shaw, D. E. How fastfolding proteins fold. Science 334, 517–520 (2011).
Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952).
Rudy, Y. & Silva, J. R. Computational biology in the study of cardiac ion channels and cell electrophysiology. Q. Rev. Biophys. 39, 57–116 (2006).
Bykhovskaia, M. Calcium binding promotes conformational flexibility of the neuronal Ca2+ sensor synaptotagmin. Biophys. J. 108, 2507–2520 (2015).
Tran, H. T., Anderson, L. H. & Knight, J. D. Membranebinding cooperativity and coinsertion by C2AB tandem domains of synaptotagmins 1 and 7. Biophys. J. 116, 1025–1036 (2019).
Ghorbani, M., Prasad, S., Klauda, J. B. & Brooks, B. R. GraphVAMPNet, using graph neural networks and variational approach to Markov processes for dynamical modeling of biomolecules. J. Chem. Phys. 156, 184103 (2022).
Ledoit, O. & Wolf, M. A wellconditioned estimator for largedimensional covariance matrices. J. Multivariate Anal. 88, 365–411 (2004).
Chen, Y., Wiesel, A., Eldar, Y. C. & Hero, A. O. Shrinkage algorithms for MMSE covariance estimation. IEEE Trans. Signal Process. 58, 5016–5029 (2010).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Scherer, M. K. et al. Variational selection of features for molecular kinetics. J. Chem. Phys. 150, 194108 (2019).
Mardt, A., Hempel, T., Clementi, C. & Noé, F. Deep learning to decompose macromolecules into independent Markovian domains. Zenodo, https://github.com/markovmodel/ivampnets, https://doi.org/10.5281/ZENODO.7215890 (2022).
Hempel, T., Plattner, N. & Noe, F. Molecular dynamics dataset of Synaptotagmin1. Zenodo, https://doi.org/10.5281/ZENODO.6908073 (2022).
Wolfram Research, Inc. Mathematica, Version 11.2.0, https://www.wolfram.com/mathematica (2017).
Hagberg, A. A., Schult, D. A., & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference 11–15 Pasadena, CA, USA (2008).
Acknowledgements
We acknowledge financial support from Deutsche Forschungsgemeinschaft DFG (SFB/TRR 186, project A12 to T.H., F.N., C.C.; SFB 958, project A04 to A.M., F.N.; SFB 1114, projects C03 to F.N., A04 to F.N., C.C., B03 to C.C.; SFB 1078, project C7 to C.C.; and RTG 2433 to F.N., C.C.), the European Commission (ERC CoG 772230 “ScaleCell” to F.N.), the Berlin Mathematics center MATH+ (AA16 and AA110 to F.N., C.C.), the BMBF (Research center BIFOLD to F.N.), the National Science Foundation (CHE1900374, and PHY2019745 to C.C.), the Welch Foundation (C1570 to C.C.), and the Einstein Foundation Berlin (project 0420815101 to C.C.). We further thank Manuel Dibak and Moritz Hoffmann (FU Berlin) for fruitful discussions.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
A.M. and T.H. performed research (A.M. derived loss functions and implemented deep learning framework; T.H. designed method and developed test systems); A.M. and T.H. analyzed data; A.M., T.H., C.C., and F.N. designed research; A.M., T.H., C.C., and F.N. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Birgit Strodel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mardt, A., Hempel, T., Clementi, C. et al. Deep learning to decompose macromolecules into independent Markovian domains. Nat Commun 13, 7101 (2022). https://doi.org/10.1038/s4146702234603z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702234603z
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.