Programmable Potentials: Approximate N-body potentials from coarse-level logic

Thakur, Gunjan S.; Mohr, Ryan; Mezić, Igor

doi:10.1038/srep33415

Download PDF

Article
Open access
Published: 27 September 2016

Programmable Potentials: Approximate N-body potentials from coarse-level logic

Gunjan S. Thakur¹,
Ryan Mohr² &
Igor Mezić²

Scientific Reports volume 6, Article number: 33415 (2016) Cite this article

976 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

This paper gives a systematic method for constructing an N-body potential, approximating the true potential, that accurately captures meso-scale behavior of the chemical or biological system using pairwise potentials coming from experimental data or ab initio methods. The meso-scale behavior is translated into logic rules for the dynamics. Each pairwise potential has an associated logic function that is constructed using the logic rules, a class of elementary logic functions, and AND, OR, and NOT gates. The effect of each logic function is to turn its associated potential on and off. The N-body potential is constructed as linear combination of the pairwise potentials, where the “coefficients” of the potentials are smoothed versions of the associated logic functions. These potentials allow a potentially low-dimensional description of complex processes while still accurately capturing the relevant physics at the meso-scale. We present the proposed formalism to construct coarse-grained potential models for three examples: an inhibitor molecular system, bond breaking in chemical reactions, and DNA transcription from biology. The method can potentially be used in reverse for design of molecular processes by specifying properties of molecules that can carry them out.

A programmable hybrid digital chemical information processor based on the Belousov-Zhabotinsky reaction

Article Open access 05 March 2024

Abhishek Sharma, Marcus Tze-Kiat Ng, … Leroy Cronin

Polyply; a python suite for facilitating simulations of macromolecules and nanomaterials

Article Open access 10 January 2022

Fabian Grünewald, Riccardo Alessandri, … Siewert J. Marrink

Massively parallel probabilistic computing with sparse Ising machines

Article 02 June 2022

Navid Anjum Aadit, Andrea Grimaldi, … Kerem Y. Camsari

Introduction

Multi-body interactions are ubiquitous in nature and happen at all scales from atomic (quantum description) to molecular (classical approach) to macro scales. A systematic analysis these interactions may unfold the fundamental principles governing a given system. For example, understanding the biophysics of protein folding gives insight into disease pathologies¹. This understanding can be leveraged to develop new vaccines and drug therapies. Engineering these new products requires accurate and computationally tractable models.

Systems having multibody interactions, in fundamental physics, are often formulated as a “N-body potential” problem. In order to fully understand these systems a large number of experiments are needed. Conducting experiments may be expensive and at times even impossible. Another approach is to analyze the N-body potential governing the system dynamics. However, at the quantum level, it may be difficult to determine these potentials from first principles due to the complexity of the system. The computational complexity for ab initio methods can scale exponentially in the number of electrons, limiting the practical size of the system to a few thousand atoms^2,3,4. Even if the detailed potential is determined, it may not be immediately useful. Such is the case when the properties or behaviors of interest are at a coarser level than that of the detailed potential and simulating the detailed dynamics is too expensive. Very coarse approaches such as those of master equation⁵ lack predictability on molecular spatial and time scales due to the assumptions with which they are derived. A potential that models the system is required if one is to make predictions about the system.

It is profitable to restrict one’s efforts to considering approximate potentials that respect known behavior. Such coarse-level descriptions may be determined from experimental observation and may correspond to trajectories in some transformed (reaction) coordinate system. For example, consider a signal transduction mechanism^6,7,8,9,10, hierarchical self-assembly^{11,12,13,14,15,16,17,18,19,20}, Kinesin motor protein translocation on a microtubule^21,22, or hydrogen combustion H₂/O₂^23,24. These systems transition from one stable configuration to another on the occurrence of some trigger event which may comprise of an external stimulus or the system reaching a special configuration. An external stimulus could be an input of energy that initiates hydrogen combustion, leading to a larger release of energy by the reaction itself. A special configuration could be a signaling molecule binding to an active receptor site. These stable configurations can be considered as fixed points in a transformed (reaction) coordinate system. The fixed points, the events, and their associated transitions are the coarse-level descriptions that are to be captured in the approximate N-body potential. However, it is still a challenge to construct a N-body Hamiltonian potential in a systematic manner that encodes the known coarse-level behaviors into a mathematical formulation and successfully predicts intermediate-scale transition events.

This article introduces a method of encoding coarse-level dynamical behavior into logic functions that are used to “stitch” together pairwise interaction potentials into an N-body potential. In this method, the practitioner uses experimentally observed coarse-level behavior to derive logic tables that capture various rules of interaction in the system. The qualitative logic tables are turned into a collection of quantitative logic functions associated with pairwise interaction potentials. The logic functions are then turned into smooth encoding functions via a replacement procedure which in turn are used to modify the pairwise potentials. The effect of an encoding function multiplying a pairwise potential is to smoothly turn the potential on or off when a precise set of conditions are met. The combination of the modified potentials gives an N-body potential that approximates the true potential governing the system.

The method generates a potential that respects what is currently known about the system; it is not claimed that this method results in the unique potential governing the real system. The method does this by leveraging the existing experimental data and the coarse-level behavior that can be derived from it. If more experimental data becomes available, the same procedure can be used to generate a new potential that better models the system. This is equivalent to a refinement of the logic functions and ultimately a refinement of the generated potential. The resulting potential can have a much smaller dimension than the true potential and still accurately capture the relevant physics.

This article begins with a motivating example which is used as an impetus for our modeling framework. In the Methodology section, we define the major components of the framework — logic functions, permissible logical operations, and the translation to the associated encoding functions — and specify how they combine with the pairwise potentials to define the approximate potential. The procedure is depicted in Fig. 1.

The procedure is applied to three examples of increasing complexity to showcase the modeling framework. The first is a simple model of an inhibitor molecule mechanism. It shows how one would go from known coarse-level behavior to an approximate global potential that captures that behavior by explicitly constructing the logic tables, the associated logic functions, and the smooth encoding functions. The inhibitor molecule mechanism has more complicated logic than the motivating example and more effectively demonstrates the modeling procedure.

The second example shows how to model a simple, kinetically controlled, bond breaking chemical reaction using this framework. It shows that bond breaking events, and more generally chemical reactions requiring activation energy, can be naturally modeled in the framework. The general procedure for modeling a bond breaking event and how to account for the activation energy is shown. Furthermore, the derived potential is used with LAMMPS²⁵ to numerically simulate the chemical reaction. By changing the relative dissociation energies, the reaction can be biased in a particular direction.

As opposed to our method, many force fields have trouble capturing bond breaking events²⁶. An exception is the ReaxFF potential² that was developed to model reactions of hydrocarbons. The derivation of ReaxFF is based on using interatomic distances to compute the bond order between two atoms and then using the bond order to obtain the bond energy. Corrections to the bond order are dependent on the valency and the deviation of the uncorrected bond order of an atom with its valency. Corrections to the bond energy, in the form of energy penalties (e.g. for over-/under-coordination) are added to get the system energy. This is contrasted with our method where bond weakening and breaking is due to the encoding function which is derived from coarse-level observed behavior.

The final example is a simple model of DNA transcription. It is shown that after the binding of RNA polymerase to the promoter region we can sequentially add the complementary base nucleotides to the DNA strand that is to be transcribed. DNA transcription is a complex process involving the interaction of many different molecules^27,28. This example shows that we can model such a complex process with a relatively low-dimensional potential that captures the observed mesoscale behavior. To the authors’ knowledge there is no other other current potential accomplishing this task.

Motivating Example

There are a number of examples in biology where chemical reactions occurring within a cell are initiated by some signal or stimulus, followed by an ordered sequence of biochemical reactions. Often the term signal transduction is used to refer to such processes. One such example is the epidermal growth factor (EGF) signaling^9,10. Motivated by this example, we construct a hypothetical system to demonstrate how the proposed formulation can be used to construct a Hamiltonian potential for it. Assume a system of three species, A, B and C, has an evolution dictated by the chemical equation The sequence in which these reactions happen define logical “interaction rules” used to design the potential. Specifically, these rules are (1) when molecules A and B are close, and C is far, then A and B bond; and (2) If C approaches the AB complex, then A and B dissociate. This mechanism is visualized in Fig. 2.

Each of the species in this system can be modeled as a rod having two sites of interaction at the end points; atoms {1, 2} on A, atoms {3, 4} on B, and atoms {5, 6} on C (Fig. 2). Let us write the force field energy for this system. In general, it is composed of the bonded energies formed from the stretch, bending, and torsional terms, the non-bonded van der Waals and electrostatic terms, and the coupling terms²⁶. We can split the potential as

where Φ_(i,j) denotes a pair-wise interaction potential between two atoms i and j. These 2-atom terms encompass the stretching, torsional, van der Waals, and electrostatic terms and the higher order terms include the bending energies and all the k-atom (k ≥ 3) coupling terms.

For this system, the bonded energy terms are composed of the stretch energies between the atom-atom pairs (1, 2), (3, 4) and (5, 6), which we can group into a term U_b. Assume that the only non-negligible non-bonded energy terms are the two van der Waals interactions between atoms 2 and 3 and atoms 2 and 5, and the coupling term between atoms 2, 3, and 5. Denoting these three terms by Φ_(2,3), Φ_(2,5), and Φ_(2,3,5), respectively, we get the force field energy of the system as

The inclusion of the 3-atom potential Φ_(2,3,5) is required in order to capture the transition of the pair (2, 3) being in a stable (bounded) configuration when atom 5 is not present to being in an unstable (free) configuration in the presence of the signaling atom 5.

While in general it may be hard to get the correct forms for the coupling term and other higher-order terms in the expansion, and thus the full potential, we know from the above observations that the effect of the potentials Φ_(2,5) and Φ_(2,3,5) is to basically to turn off Φ_(2,3) when 5 is close to 2. Rewriting the potential as

this means that term in parentheses is approximately 1 whenever atoms 2 and 5 are far and approximately 0 whenever atoms 2 and 5 are close. Instead of attempting to find the exactly functional forms of Φ_(2,5) and Φ_(2,3,5), we approximate the potential as

where S_(2,3) is an encoding function that acts as a switch turning Φ_(2,3) on and off. In this example, the encoding function is only function the distance between atoms 2 and 5. The encoding function takes values between 0 and 1, it is approximately 0 when atoms 2 and 5 are close, and approximately 1 when atoms 2 and 5 are far; thus it encodes the logic of the coarse-level observed behavior of the system. It is an approximation of the other terms:

In the rest of the article, we make this approximation idea (Eq. (1)) precise and derive approximate N-body potentials from simple pairwise interactions that respect observed coarse-level behavior. We give a systematic procedure to construct the encoding functions which allows us to handle systems with more complex interaction rules. We will demonstrate the procedure with three examples. We also use molecular dynamics simulations using the derived potentials that show we can accurately capture the relevant physics.

There are a few items to keep in mind as motivation for the abstract concepts to follow. The basic building blocks for the N-body potential are pairwise interaction potentials (denoted by Φ_(2,3) and Φ_(2,5) for the above example). The explicit form of these potentials can be inferred from the experimental data or ab initio calculations. We approximate the effect of the un-modeled potentials by modifying the relevant pairwise potentials with an encoding function. The encoding function only turns the corresponding potential on and off. The functional form of the potential does not change; it is only scaled between 0 and 1. The logic contained in the encoding functions is obtained from experimental observations or ab initio simulations and the logic only depends on pairwise distances between particles, except the pairwise distance used in the associated potential function; e.g. the logic corresponding to Φ_(2,3) cannot depend on the distance between atoms 2 and 3.

Methodology

It is assumed that there are M interacting entities in a domain , where , for d = 1, 2, or 3. Each entity is modeled by a finite number of particles with constraint forces between the particles; the totality of these particles over all the entities are labeled from 1 to N. This allows us to treat point particles as well as rigid and and non-rigid bodies. The configuration space is . A particular system configuration, , takes the form , where describes the position of particle j in the domain .

The dynamics of the system is driven by a potential gradient and external forces. Specifically, the functional form of the dynamics is

where m_i is the mass of atom i; is the gradient operator in the configuration space with respect to the the position x_i of atom i; collects the external forces on particle i such as external electric or magnetic fields as well as stochastic effects or boundary constraints; and the approximate potential is defined as

The notation in this equation is as follows.

, set of interacting pairs of atoms

defines which pairs of atoms interact. For example, if p = (2, 3) is in , then there exists a pairwise potential between atoms 2 and 3. Since not every pair of atoms in the system has to interact, can be a strict subset of {1, …, N} × {1, …, N}.

, multiplicity function

The multiplicity function determines how many potentials that atom pair p = (p₁, p₂) interacts through. Often will be 1 for every atom pair p. However, non-unit values become important when a pair interacts through multiple different potentials, each with its own encoding function. For example, a non-unit multiplicity is useful when modeling bond-breaking chemical reactions. Initially, two atoms interact through their bond potential; when this bond is broken, another potential is required to model the electron-electron repulsion between the atoms.

Φ_p,j, pairwise interaction potential

is the j^th interaction potential for the pair of atoms . The index j is runs from 1 to . For , it takes the form

where for every i ∈ {1, …, N}, the coordinate map extracts the position x_i of atom i from the configuration vector ; the norm denotes the normal Euclidean norm; and is the j^th 1D pairwise interaction potential through which the pair p interacts. This potential could be, for example, a Lennard-Jones or Morse potential; it can also be different for different interaction pairs. The form given for the potential shows that it is only a function of the distance between and . When , we drop the j index from Φ_p,j and write it as Φ_p.

S_p,j, encoding function

is the encoding function associated with the potential Φ_p,j. It encodes the coarse-level interaction rules and it is a function of pairwise distances between particles, except for the particle pair p to which it corresponds. That is, for the interaction pair , the encoding function S_p,j is not a function of the distance . The effect of S_p,j is to smoothly turn its associated potential function on and off based on the configuration of the system. Since the encoding functions and potentials are functions of relative distances only, Equation (2) defines a Hamiltonian system²⁹ when we neglect the forces . When , we drop the j index from S_p,j and write it as S_p.

A majority of the rest of the paper develops the encoding functions and their properties and shows how one would go from coarse-level interaction rules to encoding functions using a few examples. It is assumed that the coarse-level, interaction rules and the interaction potentials are known. These come from analyzing experimental data or ab initio simulations and are thus application specific and beyond the scope of this article. Ultimately, the encoding function S_p,j will be a smoothed version — which is made precise later — of a logic function , which assigns 0 or 1 to each configuration . The logic function L_p,j will be the constructed from a finite number of logical operations applied to elementary logic functions from a Boolean algebra. More precisely, the logic function will be an element of the Boolean sub-algebra generated by elementary logic functions. Thus to define the logic functions, it is required to know the specific definitions of the logical operators AND, OR, and NOT (symbolically denoted, ∧, ∨, ¬) and what Boolean functions are used as the elementary logic functions.

A function , which assigns either 0 or 1 to each configuration vector is called a Boolean function on and the set of such all such functions on is denoted as . It is easy to see that the functions that are identically 1 and 0 on are Boolean functions. On , define for all the two binary logical operations AND (∧) and OR (∨) and the unary logical operation NOT (¬) by

These three logical operations will be applied to specific elements of the set of all Boolean functions on to generate a Boolean sub-algebra. The logic functions L_p,j will be elements of this sub-algebra.

Proximity functions are used to define the elementary logic functions. A proximity function has the form P_R(r) = χ_[0,R)(r), for some R satisfying 0 ≤ R ≤ ∞. The function is the indicator function for the semi-open interval [0, R) which takes the value 1 if the argument satisfies 0 ≤ r < R and 0 otherwise. Note that the functions that are identically 1 or 0 are proximity functions. The elementary logic functions are defined as compositions of a proximity function with the coordinate functions π_i from above. Specifically, the elementary logic function for atom pair and parameter 0 ≤ R ≤ ∞ has the form

This function is 1 when and are closer than distance R and 0 when not.

A logic function L_p,j is generated by applying finitely many of the logical operations ∧, ∨, and ¬ to the elementary logic functions (5) for any finite set of q’s, none of which are equal to p — that is, cannot be part of the definition of L_p,j. Each logic function L_p,j is continuous almost everywhere in since each elementary logic function is constant almost everywhere. This follows since it is composed from pairwise products and sums of elementary logic function, which themselves are continuous almost everywhere.

Once the logic function is specified, it must be translated into a smooth encoding function. Ideally, this would be accomplished via a convolution in the (dN-dimensional) configuration space with a smooth, nonnegative summability kernel (see Katznelson³⁰ for a definition). Analytically, this is intractable, and computationally, this is very expensive. Instead, we individually smooth each of the 1D elementary logic functions in the expression for . This is done by replacing the proximity function of with a smoothed version. Again, this could be done via the convolution (now 1-dimenional) of each proximity function with a smooth, 1D summability kernel or, alternatively, by the replacement of each indicator function with a specific functional form. We choose the latter approach and replace each proximity function χ_[0,R)(r) with a function of the form

and we define h_0,n(r) = 0 and h_∞,n(r) = 1. For example, if the logic function has the expression

then the corresponding encoding function would be

for some choices of parameters α₁, α₂, n₁, and n₂. The parameters α and n control how well h_α,n approximates a proximity function (see Fig. 3). In particular, h_α,n(0) = 1 for any 0 < α < ∞ and positive n. Furthermore, and it is strictly monotonically decreasing. On the other hand, for a fixed 0 < α < ∞, the transition from 1 to 0 becomes sharper as n increases (Fig. 3(b)). To match a specific indicator function χ_[0,R), choose α = R. With this choice of α, the function satisfies h_R,n(R) = 1/2 for all n ≥ 1;

Examples

To show the entire process, starting from coarse, interaction rules and recovering the encoding function, we apply the method to three examples in increasing order of complexity. The first example is a model for an inhibitor molecule system and is used to exhibit the core methodology of the modeling framework. This system can be considered as an extension of the signaling molecule example above (Fig. 2). The second example is a model for a simple bond breaking chemical reaction and makes use of the multiplicity function from the framework. It is shown that the bond dissociation energy is accurately captured in this framework. Numerical simulations show that (i) the system exhibits the same coarse-level behavior that was used to derive the potential and (ii) that biased chemical reactions are easily handled. The final example is a simple model for DNA transcription and is the most complicated of the three. This example shows that the logic, and hence potential, of real systems can be captured in the modeling framework in a straight-forward manner.

Simple inhibitor molecule mechanism

This example can be thought of as a simple model for the action of an inhibitor molecule in a plane. Consider the three interacting molecules in Fig. 4. The configuration space for this example is , where is written as . The set of interacting atom pairs is . For this example, the multiplicity function is identically 1. Thus we have the pairwise potentials Φ_(2,3), Φ_(2,5), and Φ_(3,6). It is assumed that these potentials are formed using a Morse potential (see (12)). Molecule C is an inhibitor molecule and prevents the formation of the AB complex. Without C, we have A + B → AB. With C present, the there are two possibilities: (i) or (ii) .

This behavior is captured in the logic functions L_(2,3), L_(2,5) and L_(3,6). The logic function L_(2,3) is 0, i.e., the potential Φ_(2,3) is turned off, when either atom 5 is close to atom 2 or when atom 6 is close to atom 3. This is different from the motivational example which only turned off the potential if 2 and 5 were close and the bonds between 2 and 5 or 3 and 6 never formed. Additionally, if AC has formed (atoms 2 and 5 close), then BC cannot form, i.e., L_(3,6) = 0 and Φ_(3,6) is off. Similarly, BC has formed (atoms 3 and 6 close), then AC cannot form. Table 1 captures this logic. As a general rule when determining the logic, the default state for all the potentials should be set to “on” except when encoding a specific mechanism. In this example, this corresponds to the first row of Table 1 which says that the values of all the logic functions are 1 when all of the atoms are far apart. This means that the associated potentials are turned on. This is exactly the behavior we want since the inhibitor mechanism is inherently a short range phenomena and thus we do not want the mechanism to be active when all the particles are far apart. However, since the atoms are all far apart the long-range behavior of the potential is dominant. For a Lennard-Jones or a Morse potential, the means there is a weak attraction force between the pairs of atoms.

Table 1 Bond logic for the inhibitor molecule mechanism.

Full size table

We need to specify what is meant by “close”. We assume that “close” is in this case is determined by experiments to mean being within the distances R_(2,5) and R_(3,6), respectively. Thus, atoms 2 and 5 are close when the elementary logic function evaluates to 1 and not close when it evaluates to 0. Using the table, corresponding to the interaction potential can be written as Equation (7).

The other logic functions are

To turn the logic functions into an encoding function, replace each of the proximity functions, , in (7–9) with their smooth versions, (Eq. (6)). The encoding function S_(2,3) corresponding to L_(2,3) is

The approximate potential for this system is

The original configuration space was 12-dimensional. However, (11) is 8-dimensional since it only depends on four atoms (four unique atoms making pairs in ). Thus we were able to reduce the dimension of the configuration space and still capture the relevant physics.

Here, we are only interested in demonstrating the methodology qualitatively so we make the approximation that derivative of each encoding function is 0 almost everywhere (this would be the case if the logic functions were used in place of the encoding functions in (11). With this approximation the force only consists of terms of the form . One realization of the inhibitor molecule system (11) simulated in LAMMPS²⁵ with this approximation is shown in Fig. 5. The potentials Φ_p are formed from Morse potentials

where D is the dissociation energy, r^eq is the equilibrium distance of the bond, and a is a parameter. Simulations are performed by solving the Langevin equations at constant temperature (i.e. NVE ensemble). The parameters used in the computations are given in Supplementary Table I. Initially, the AB complex is formed. Around 30 femtoseconds C comes close enough, turns off the AB bond and BC forms and can diffuse away from A. This remains the case until around 450 fs, when A approaches BC, the BC bond turns off and the AC bond turns on.

Supplementary Movie 1 shows a simulation of the inhibitor molecule mechanism.

Modeling a bond breaking chemical reaction

We model a reversible, bond breaking, chemical reaction. In particular, we will model the reaction

Modeling such chemical reactions is difficult with traditional force field methods since they cannot describe changes in the electronic structure and, thus, are unable to describe bond-breaking, bond-forming, charge transfer, etc., of the system undergoing a reaction^4,31. Rather than solving the quantum mechanical equations, we take a coarse-level approach and approximate the bond breaking mechanism with logic functions.

This example makes use of the multiplicity function in (2) in order to model the electron-electron repulsion during the transition state. It also shows that the use of a smooth encoding function accurately accounts for the bond dissociation energy. Let be the configuration vector for this system (see Fig. 6).

Table 2 lists all the potentials involved in modeling the system. The potential Φ_(1,3),1 is the potential energy of the bond between A and B when they form the stable molecule AB, for example a Lennard-Jones or Morse potential. Similarly, Φ_(2,5),1 is the bond potential energy between A and C when they form the stable molecule AC. They have the associated logic functions L_(1,3),1 and L_(2,5),1, respectively. The potentials Φ_(1,3),2 and Φ_(2,5),2 are used to model electron-electron repulsion during the transition state when the reactant bonds have broken and the product bonds have not yet formed. The secondary potentials are usually taken to be the repulsive part of the associated bond potential.

Table 2 Potentials in chemical reaction model.

Full size table

Table 3 lists the logic rules for this system. Consider the forward reaction AB + C → AC + B. We model the bond breaking mechanism by turning off the stable bond, Φ_(1,3),1, when C gets “close enough” to A. When C moves within the distance to A, the AB bond (Φ_(1,3),1) turns off. Similarly, for the backwards reaction, the AC bond (Φ_(2,5),1) turns off when B gets within a distance of A. The logic functions are logical NOT’s of the x₂–x₅ and x₁–x₃ proximity functions:

Table 3 Bond-breaking logic rules.

Full size table

We assume that and . With this assumption, the AC bond turns off before B reaches its equilibrium bond length with A.

The use of the smooth encoding function in the potential (as opposed to the logic function) allows the transfer of the correct amount of energy from C to AB in order to break the bond; C must transfer an amount of energy equivalent to the bond dissociation energy D_AB of the AB bond in order to turn off the Φ_(1,3),1 potential. We refer the reader to Sec. III.A in the Supplementary Information for the derivation.

Consider the situation occurring directly after a successful collision of C with AB. In this case, A and B are close to their equilibrium distance () and A and C are closer than the AB-bond dissociation distance (). In this state, the bonds are weak and neither AB nor AC is stable; the system is at its transition state. In this transition state the forces experienced by the molecules due to the bond potentials Φ_(2,3),1 and Φ_(2,5),1 are small since the encoding functions and their partial derivatives are small, and thus the bond potentials are approximately “off”. The dynamics are predominantly dominated by noise and the residual momentum of the molecules.

In this transition state, the electron-electron repulsion should be directly accounted for via a short-range repulsion potential between the molecules; usually this is repulsive part of the associated bond potential. The logic functions are defined such that these repulsion forces are only “on” when the system is in its transition state. This is easily accomplished. Denote the short-range repulsion potential between A and C by Φ_(2,5),2. This force is defined such that , for . This force is turned on when . The logic function for the A-C repulsion is

Similarly, repulsions between A-B and B-C can be defined with logic functions similar to the above one.

There are three possible outcomes for when the system exits its transition state: (i) either AC forms a stable molecule, (ii) AB reforms, or (iii) no bonds are formed and all the molecules are free molecules. This depends on the equilibrium distances of the bonds, the dissociation distances, the incoming momentum of C, and the repulsion forces. Figure 7 shows the two most probable outcomes for a single AB + C event.

For our simulations, the AB and AC bonds (Φ_(1,3),1 and Φ_(2,5),1, respectively) are given by Morse potentials, (12). In simulations, only the short-range A-B and A-C electron-electron repulsions are modeled and are only active during the transition state. The form of these for these repulsions are chosen as the repulsive part of a Morse potential with the same parameters as the full potentials used for the AB and AC bonds. The logic function for the A-C repulsion potential, Φ_(2,5),2, is given by (14) with obvious modifications for Φ_(1,3),2. The associated encoding functions are given by the normal replacement procedure. The full potential used during the numerical experiments is given in (15).

The force derived from (15) is used in LAMMPS²⁵ to simulate the system for an unbiased and a biased potential (parameters in Supplementary Information Table II). The parameters of the first simulation are chosen so that the AB and AC are symmetric (D_AC/D_AB = 1). In this case, the chemical reaction is unbiased and if averaged over all realizations of the noise, it is expected that the amount of time AB is formed is equal to the amount of time AC is formed. Figure 8 shows the potential energy for this simulation (Fig. 8(a)), the corresponding level sets (Fig. 8(b)), and a typical realization of the simulation (Fig. 8(c)). In the energy surface plot and the level set plot, the symmetry of the potential is evident. The realization shown in Fig. 8(c) starts with AB near its equilibrium length (2 Å) with C far from A. The realization shows the approximately equal times that AB and AC are formed. The deviation is due to this being a particular realization rather than an average over an ensemble of realizations and the finite nature of the simulation.

The parameters of the second simulation are chosen so that the reaction is biased in favor of AC. With the chosen parameters (D_AC/D_AB = 2), the AC bond is twice as stable as AB. Figure 9 shows the potential energy for this simulation (Fig. 9(a)), its corresponding level sets (Fig. 9(b)), and a realization of the simulation (Fig. 9(c)). In the energy surface plot and the level set plot, the asymmetry of the potential is evident. The realization shown in Fig. 9(c) starts with AB near its equilibrium length (2 Å) with C far from A. In this particular realization AC forms very quickly. Figure 9(c) shows the bias towards the more stable AC. The system spends most of its time with a stable AC molecule with a relatively small amount of time with a stable AB molecule. Thus, biased reactions can be captured in the framework. A movie of a part of the unbiased reaction simulation can be found in Supplementary Movie 2.

DNA transcription model

The final example is inspired by DNA transcription^27,28. The model consists of a promoter region (sites 1 and 2) to which RNA polymerase (RNA pol) binds (sites 3 and 4), and a four nucleotide DNA strand, ACTG, to be transcribed (Fig. 10). As a first approximation of the transcription process, the movement of the RNA polymerase down the DNA chain and the unwinding/rewinding of the DNA have not been explicitly modeled.

In the absence of the RNA pol, the free nucleotides cannot bind to their complementary nucleotides in the 4 nucleotide DNA strand (ACTG = (5, 6, 7, 8)). Once RNA pol binds to the promoter, the first first nucleotide (A, atom 5) in the DNA strand can bind to the free version of its complementary nucleotide (U, atom 9). Before this binding happens, the remaining nucleotides in the strand (CTG, atoms 6, 7, 8) cannot bind with their (free) complementary nucleotides (atoms 12, 15, 18). Once A has bound to a free U nucleotide, the next nucleotide in the strand (C, atom 6) can bind with a free G nucleotide (atom 12), while the remaining two nucleotides (TG) still cannot bind with their complementary nucleotides. Once the free G has bound with C, the sugar and phosphate groups (atoms 11 and 13) on T and G can bind to start forming the backbone of the complementary DNA strand. This sequential process continues until each nucleotide in the original DNA strand ACTG has bound with its complementary nucleotide, resulting in the complementary RNA strand UGAC. At this point, the complementary strand and the RNA pol unbind from the original strand and promoter region, respectively.

Supplementary Table III lists the reaction potentials for each of the interacting pairs. The nucleotide base pairs interact via a hydrogen bond ϕ_H, whereas the sugar and phosphate groups covalently bond through ϕ_SP. The interaction potentials for the system can be easily read from this table (see Supplementary Information Sec. IV.A).

Let us step through the logic in the order the reaction occurs:

1
The bonds between the RNA pol and the promoter region (Φ_(1,3) and Φ_(2,4)) are “on” except when the complementary chain has formed and is still attached to the original base strand.
2
The A-U bond (Φ_(5,9)) is “on” when the RNA pol has bonded with the promoter and the complementary chain’s backbone has not fully formed. This second condition prevents the complementary strand from reattaching to the original DNA strand once it has been formed. It is “off” otherwise.
3
The C-G bond (Φ_(6,12)) is “on” when all of following conditions are true: (1) RNA pol has bonded with the promotor, (2) the A-U bond has formed, and (3) the complementary backbone has not formed. It is “off” otherwise.
4
The sugar-phosphate group bond Φ_(11,13) turns “on” when (1) RNA pol has bonded with the promoter and (2) both the A-U and C-G bonds have formed. It remains “on” once the complementary backbone has formed. It is “off” otherwise.
5
The T-A bond (Φ_(7,15)) is “on” when all of the following conditions are true: (1) RNA pol has bonded with the promotor, (2) the A-U and C-G bonds have formed, (3) the (11,13) sugar-phosphate bond has formed, and (4) the complementary backbone has not formed. It is “off” otherwise.
6
The sugar-phosphate group bond Φ_(14,16) turns “on” when (1) RNA pol has bonded with the promoter, (2) the A-U, C-G, and T-A bonds have formed, and (3) the (11, 13) sugar-phosphate bond has formed. It remains “on” once the complementary backbone has formed. It is “off” otherwise.
7
The G-C bond (Φ_(8,18)) is “on” when all of the following conditions are true: (1) the conditions in (10) are true, (2) the T-A bond has formed, and (3) the (14, 16) sugar phosphate bond has formed. It is “off” otherwise.
8
The sugar-phosphate group bond Φ_(17,19) turns “on” when (1) RNA pol has bonded with the promoter, (2) the A-U, C-G, T-A, and G-C bonds have formed, and (3) the (11, 13) and (14, 16) sugar-phosphate bonds have formed. It remains “on” once the complementary backbone has formed. It is “off” otherwise.

A global potential derived from the above logic rules is given in Eq. (16). The exact form of the logic functions and the associated smooth encoding functions comprising the potential are given in the Supplementary Information. The derivation of the potential is not difficult, but lengthy. We refer the reader the Supplementary Information Sec. 4 for the details.

Figure 11 shows a trace of the pairwise distances between atoms for a typical simulation using this potential in LAMMPS (parameters in Supplementary Information Table IV). We use the same qualitative approximation of the force as was used in the inhibitor molecule example. For simplicity, all the potentials are taken to be Morse potentials. The red trace (TF) corresponds to the distance between the RNA pol and the promoter region. The variables r_5,9 (cyan), r_6,12 (gold), r_7,15 (black), and r_8,18 (blue) correspond to the sites on the complementary A-U, C-G, T-A, and G-C pairs from the base strand and the free nucleotides. At the start, the RNA pol and the free nucleotides diffuse around in space. Around 900 ps, the RNA pol binds to the promoter region (TF trace ≈0). The free nucleotides then bind in the the order of the designed logic. U binds to A (r_5,9 ≈ 0) around 1100 ps; G binds with C (r_6,12 ≈ 0) between 1800 and 1900 ps; A binds to T (r_7,15 ≈ 0) around 2400 ps; and finally C binds to G (r_8,18 ≈ 0) around 2700 ps. Once this final free nucleotide has bounded with its complement, the complementary chain has finished forming and unbinds as does the RNA pol. The RNA pol can rebind to the promoter region, but the complementary RNA strand cannot rebind to the original DNA strand. This is exactly the behavior designed into the potential. Supplementary Movie 3 in the Supplementary Information shows one simulation of the DNA transcription.

Conclusions

We have developed and demonstrated a methodology and mathematical framework for obtaining an approximate interaction potential for a system which respects known coarse-level behavior. This methodology develops a semi-empirical model for the system by encoding the known coarse-level physics into logic functions that then modify simple pairwise potentials. Each logic function’s only role is to turn its associated pairwise potential on or off. A smooth multi-body interaction potential is obtained by replacing each logic function with a smoothed variant. The reader may wish to think of the resulting approximate potential as a linear combination of pairwise potentials where instead of the coefficients taking scalar values, they are encoding functions capturing the coarse-level logic.

Three relatively simple examples demonstrated our methodology: a simple inhibitor molecule mechanism, a chemical reaction with bond breaking, and a model inspired by DNA transcription. While these examples were simple and inspired by biophysical and chemistry applications, we stress that the methodology is quite general and not restricted to these application domains or only simple problems. Any system that is driven by a potential can utilize this methodology to its benefit.

The result of our procedure is the approximation of a complicated, high-dimensional potential with a lower-dimensional representation that still respects the relevant physics. A significant reduction in the dimensionality of the system is possible; instead of accounting for every interaction between a large number of components, we now only need as many variables as are needed to correctly model the coarse-level logic. In the bond breaking example, the potential capturing the logic was 8-dimensional, whereas the dimension of the configurations space was 12. The same system modeled at the quantum level is much more complicated. Since the bond breaking event is the relevant physics, the reduced order model is accurate enough for this purpose.

With this dimensional reduction, the ability to accurately simulate large, complicated systems within a computational design framework is feasible. The resultant models can be wrapped in an optimization loop as part of exploratory computational experiments, such as for the development of new drug therapies, or as part of an engineering design loop. This in turn allows for the faster and cheaper development of new technologies and products.

We note that the developed framework can be potentially used in reverse: not for approximation to a given physical process with coarse-grained logic given, but for design of molecular processes with logic prescribed by a designer. This is achieved by providing to the designer the specifications of molecules that can carry the logic out.

Additional Information

How to cite this article: Thakur, G. S. et al. Programmable Potentials: Approximate N-body potentials from coarse-level logic. Sci. Rep. 6, 33415; doi: 10.1038/srep33415 (2016).

References

Valastyan, J. S. & Lindquist, S. Mechanisms of protein-folding diseases at a glance. Disease Models & Mechanisms 7, 9–14, doi: 10.1242/dmm.013474 (2014).
Article CAS Google Scholar
van Duin, A. C. T., Dasgupta, S., Lorant, F. & Goddard III, W. A. ReaxFF: a reactive force field for hydrocarbons. J. Phys. Chem. A 105, 9396–9409, doi: 10.1021/jp004368u (2001).
Article CAS Google Scholar
Friesner, R. A. Ab initio quantum chemistry: Methodology and applications. PNAS 102, 6648–6653, doi: 10.1073/pnas.0408036102 (2005).
Article CAS ADS PubMed MATH PubMed Central Google Scholar
Aktulga, H. M., Pandit, S. A., van Duin, A. C. T. & Grama, A. Y. Reactive Molecular Dynamics: Numerical Methods and Algorithmic Techniques. SIAM J. Sci. Comput. 34, C1–C23, doi: 10.1137/100808599 (2012).
Article MathSciNet MATH Google Scholar
Gillespie, D. T. A rigorous derivation of the chemical master equation. Physica A: Statistical Mechanics and its Applications 188, 404–425, doi: 10.1016/0378-4371(92)90283-V (1992).
Article CAS ADS Google Scholar
Kiel, C., Yus, E. & Serrano, L. Engineering signal transduction pathways. Cell 140, 33–47, doi: 10.1016/j.cell.2009.12.028 (2010).
Article CAS PubMed Google Scholar
Laub, M. & Goulian, M. Specificity in two-component signal transduction pathways. Annual Review of Genetics 41, 121–145, doi: 10.1146/annurev.genet.41.042007.170548 (2007).
Article CAS PubMed Google Scholar
Inui, M., Martello, G. & Piccolo, S. Microrna control of signal transduction. Nat Rev Mol Cell Biol 11, 252–263, doi: 10.1038/nrm2868 (2010).
Article CAS PubMed Google Scholar
Yarden, Y. & Sliwkowski, M. X. Untangling the erbb signalling network. Nature Reveiws Molecular Cell Biology 2, 127–137, doi: 10.1038/35052073 (2001).
Article CAS Google Scholar
Sako, Y., Minoghchi, S. & Yanagida, T. Single-molecule imaging of egfr signalling on the surface of living cells. Nat Cell Biol 2, 168–172, doi: 10.1038/35004044 (2000).
Article CAS PubMed Google Scholar
Berger, B., Shor, P. W., Tucker-Kellogg, L. & King, J. Local rule-based theory of virus shell assembly. Proceedings of the National Academy of Sciences 91, 7732–7736 (1994).
Article CAS ADS Google Scholar
Berger, B., King, J., Schwartz, R. & Shor, P. Local rule mechanism for selecting icosahedral shell geometry. Discrete Applied Mathematics 104, 97–111, doi: 10.1016/S0166-218X(00)00187-6 (2000).
Article MathSciNet MATH Google Scholar
Schwartz, R., Shor, P. W., Prevelige, P. E. & Berger, B. Local rules simulation of the kinetics of virus capsid self-assembly. Biophysical journal 75, 2626–2636, doi: 10.1016/S0006-3495(98)77708-2 (1998).
Article CAS ADS PubMed PubMed Central Google Scholar
Klavins, E., Ghrist, R. & Lipsky, D. A grammatical approach to self-organizing robotic systems. Automatic Control, IEEE Transactions on 51, 949–962, doi: 10.1109/TAC.2006.876950 (2006).
Article MathSciNet MATH Google Scholar
Klavins, E. Programmable self-assembly. Control Systems, IEEE 27, 43–56, doi: 10.1109/MCS.2007.384126 (2007).
Article Google Scholar
Whitesides, G. M. & Grzybowski, B. Self-assembly at all scales. Science 295, 2418–2421, doi: 10.1126/science.1070821 (2002).
Article CAS ADS PubMed Google Scholar
Whitesides, G. M. & Boncheva, M. Beyond molecules: Self-assembly of mesoscopic and macroscopic components. PNAS 99, 4769–4774, doi: 10.1073/pnas.082065899 (2002).
Article CAS ADS PubMed PubMed Central Google Scholar
Vlasov, Y. A., Bo, X.-Z., Sturm, J. C. & Norris, D. J. On-chip natural assembly of silicon photonic bandgap crystals. Nature 414, 289–293, doi: 10.1038/35104529 (2001).
Article CAS ADS PubMed Google Scholar
Gracias, D. H., Tien, J., Breen, T. L., Hsu, C. & Whitesides, G. M. Forming electrical networks in three dimensions by self-assembly. Science 289, 1170–1172, doi: 10.1126/science.289.5482.1170 (2000).
Article CAS ADS PubMed Google Scholar
Licata, N. A. & Tkachenko, A. V. Errorproof programmable self-assembly of dna-nanoparticle clusters. Phys. Rev. E 74, 041406, doi: 10.1103/PhysRevE.74.041406 (2006).
Article CAS ADS Google Scholar
Valentine, M. T. & Gilbert, S. P. To step or not to step? how biochemistry and mechanics influence processivity in kinesin and eg5. Current Opinion in Cell Biology 19, 75–81, doi: 10.1016/j.ceb.2006.12.011 (2007).
Article CAS PubMed Google Scholar
Thakur, G. S. Encoding Information in Coarse Grain Models for Self-Assembling Systems. Ph.D. thesis, University of California, Santa Barbara (2011).
Konnov, A. A. Remaining uncertainties in the kinetic mechanism of hydrogen combustion. Combustion and Flame 152, 507–528, doi: 10.1016/j.combustflame.2007.10.024 (2008).
Article CAS Google Scholar
Hong, Z., Davidson, D. F. & Hanson, R. K. An improved H2/O2 mechanism based on recent shock tube/laser absorption measurements. Combustion and Flame 158, 633–644, doi: 10.1016/j.combustflame.2010.10.002 (2011).
Article CAS Google Scholar
Plimpton, S. Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117, 1–19, doi: 10.1006/jcph.1995.1039 (1995).
Article CAS ADS MATH Google Scholar
Jensen, F. Introduction to Computational Chemistry. 2nd ed. (John Wiley & Sons, 2007).
Cramer, P. et al. Architecture of RNA Polymerase II and Implications for the Transcription Mechanism. Science 288, 640–649, doi: 10.1126/science.288.5466.640 (2000).
Article CAS ADS PubMed Google Scholar
Hahn, S. Structure and mechanism of the RNA polymerase II transcription machinery. Nat Struct Mol Biol 11, 394–403, doi: 10.1038/nsmb763 (2004).
Article CAS PubMed PubMed Central Google Scholar
Arnold, V. Mathematical methods of classical Mechanics. 2nd ed., vol. 60 of Graduate Text in Mathematics (Springer-Verlag, 1989).
Katznelson, Y. An Introduction To Harmonic Analysis. 3rd ed., Cambridge Mathematical Library (Cambridge University Press, 2002).
Lin, H. & Truhlar, D. G. QM/MM: what have we learned, where are we, and where do we go from here? Theoretical Chemistry Accounts 117, 185–199, doi: 10.1007/s00214-006-0143-z (2007).
Article CAS Google Scholar

Download references

Acknowledgements

Ryan Mohr and Igor Mezić received funding from Army Research Office grant ARO-MURI W911NF-14-1-0359.

Author information

Authors and Affiliations

Harvard University, John A. Paulson School of Engineering and Applied Sciences, Cambridge, 02138, MA, USA
Gunjan S. Thakur
Department of Mechanical Engineering, University of California Santa Barbara, Santa Barbara, 93106, CA, USA
Ryan Mohr & Igor Mezić

Authors

Gunjan S. Thakur
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Mohr
View author publications
You can also search for this author in PubMed Google Scholar
Igor Mezić
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The original conceptualization of the idea is due to G.S.T. and I.M. The precise formulation of the concept is due to G.S.T., I.M. and R.M. The details of mathematical formulation is due to R.M. and numerical simulations were done by G.S.T. The majority of the writing of the manuscript was done by R.M. with G.S.T. contributing. All authors discussed the results and commented on the manuscript at all stages.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Supplementary Movie 1

Supplementary Movie 2

Supplementary Movie 3

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Thakur, G., Mohr, R. & Mezić, I. Programmable Potentials: Approximate N-body potentials from coarse-level logic. Sci Rep 6, 33415 (2016). https://doi.org/10.1038/srep33415

Download citation

Received: 26 April 2016
Accepted: 22 August 2016
Published: 27 September 2016
DOI: https://doi.org/10.1038/srep33415

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Programmable Potentials: Approximate N-body potentials from coarse-level logic

Subjects

Abstract

Similar content being viewed by others

A programmable hybrid digital chemical information processor based on the Belousov-Zhabotinsky reaction

Polyply; a python suite for facilitating simulations of macromolecules and nanomaterials

Massively parallel probabilistic computing with sparse Ising machines

Introduction

Motivating Example

Methodology

, set of interacting pairs of atoms

, multiplicity function

Φ_p,j, pairwise interaction potential

S_p,j, encoding function

Examples

Simple inhibitor molecule mechanism

Modeling a bond breaking chemical reaction

DNA transcription model

Conclusions

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Information

Supplementary Movie 1

Supplementary Movie 2

Supplementary Movie 3

Rights and permissions

About this article

Cite this article

Comments

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

A programmable hybrid digital chemical information processor based on the Belousov-Zhabotinsky reaction

Polyply; a python suite for facilitating simulations of macromolecules and nanomaterials

Massively parallel probabilistic computing with sparse Ising machines

Introduction

Motivating Example

Methodology

, set of interacting pairs of atoms

, multiplicity function

Φp,j, pairwise interaction potential

Sp,j, encoding function

Examples

Simple inhibitor molecule mechanism

Modeling a bond breaking chemical reaction

DNA transcription model

Conclusions

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Information

Supplementary Movie 1

Supplementary Movie 2

Supplementary Movie 3

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links

Φ_p,j, pairwise interaction potential

S_p,j, encoding function