Introduction

Minimal models that capture the essence of complex phenomena has a rich history in the natural sciences. In condensed matter physics, insights into many phenomena have emerged from analytic theories of models, which use effective many-body Hamiltonians that succinctly capture the essence of the problems1. Examples include phase transitions, superfluidity and superconductivity. However, complex problems, such as spin glasses2, structural glasses3,4 and a host of problems in biology like protein and RNA folding and functions of macromolecules, have resisted solutions using purely theoretical methods. These and other problems in material science, in which a wide range of time, energy, and length scales are intertwined, require well-designed computer simulations, which capture the essential features of the systems. Although the temptation to use detailed atomic simulations in protein folding and more complicated problems is hard to resist, such an approach has given us only limited insights. In contrast, since the first classical molecular dynamics simulation that reported phase transition in hard-sphere systems5, it has been clear that coarse-grained (CG) models are often the only way to describe phenomena that involve an interplay of multiple energy and time scales. Nowhere is the need for CG models greater than in biology in which self-assembly of macromolecules and their functions, which involve multiple partners, occur on time and length scales that cover many orders of magnitude. In the context of protein and RNA folding, simulations using CG models, guided by theoretical concepts6,7,8,9,10, have unearthed the principles of self-assembly. More recently, models, which were introduced to describe folding of isolated proteins11,12,13,14,15 and RNA16, have also been adopted and extended in novel ways to predict functions of large complexes such as ribosomes17, molecular chaperones18, enzyme catalysis19, protein insertion into membranes20 and a number of motors21,22,23,24,25,26,27,28. These developments have resulted in a quiet revolution, which has provided molecular insights into a variety of biological processes.

In the last two decades, fundamental breakthroughs in structural organization and dynamics of proteins, RNA and DNA have been achieved using theoretical concepts from polymer physics29 and CG simulations. Here we describe how simulations using a variety of CG models have been successful in describing dynamical processes in biology, spanning a wide range of length scales. These achievements have been further extended to probe folding under cellular conditions30,31,32, and more recently to describe functional dynamics of biological nanomachines18,22,23,24,26. The use of CG models and simple theoretical ideas have also found fruitful applications in many other areas in biology such as gene networks, systems biology and analysis of complex metabolic pathways.

Length scales determine extent of coarse-graining

Description of reality using models requires a level of abstraction, which depends on the phenomenon of interest. For example, near a critical point, exponents that describe the vanishing of order parameter or divergence of correlation length are universal, depending only on the dimensionality (d), and are impervious to atomic details. These findings, which are rooted in the concepts of universality and renormalization group33, are also applicable to the properties of polymers29. For example, the size of a long homopolymer and its distribution of end-to-end distance depend only on the solvent quality, the degree of polymerization, and d, but not on the details of monomer structure29. However, on length scales that are on the order of a few nm, one has to contend with chemical properties of the monomer.

In the absence of rigorous theoretical underpinnings, intuitive arguments and phenomenology are involved in modelling complex biological processes (Box 1). Here also the level of description depends on length scales. In nucleic acids, at short length scales (l5 Å), detailed chemical environment determines the basic forces (hydrogen bonds and dispersion forces) between two nucleotides. On the scale l(1–3) nm, interactions between two bases, base stacks and grooves of the nucleic acids become relevant. Understanding how RNA folds (l(1–3) nm) requires energy functions that provide at least a CG description of nucleotides, and interactions between them in the native state and excitations around the folded structure. On the persistence length scale lp≈150 bp ≈50 nm (ref. 34) and beyond it suffices to treat dsDNA as a stiff elastic filament without explicitly modelling the base-pairs. If μm dsDNA behaves like a self-avoiding polymer35. On the scale of chromosomes (lmm), a much coarser description suffices. Thus, models for DNA, RNA and proteins vary because the scale of structural organization changes from nearly mm in chromatin to several nm in the folded states of RNA and proteins.

Polymer models for dsDNA and chromosome structure

Length, L, of double-stranded DNA (dsDNA) exceeds a few μm with persistence length, lp≈50 nm. On these scales global properties of dsDNA, such as the end-to-end distance and the dependence of lp on salt concentration, are not greatly affected by fluctuations of individual base pairs. Consequently, dsDNA can be treated as a fluctuating elastic material, for which the worm-like chain (WLC) is a suitable polymer model. On much longer scales (L1 mm), which is relevant to chromosome, the genomic material can be described as a flexible polymer. Using these scale-dependent models, a number of predictions for DNA organization and dynamics can be made.

Looping dynamics

Loop formation in biopolymers is an elementary process in the self-assembly of DNA, RNA and proteins. However, understanding cyclization kinetics is complicated, because multiple length scales and internal chain modes are intertwined in bringing distant parts of DNA into proximity. For a short chain, the cyclization time, τc scales as L3/2 while τcL2 when L increases36,37. The problem of cyclization becomes more challenging in the looping dynamics of dsDNA, an elementary process that is relevant in controlling gene expression and DNA condensation. In the CG model, a single pitch of a double helix, formed by 10.5 base pairs, represents one interaction centre (Fig. 1a). Thus, lp encompasses (14–15) CG interaction centres (lp≈150 bp). The parameters for bond and bending potentials along the chain, consisting of multiple CG centres, are selected to reproduce the persistence length of dsDNA38,39, allowing us to study various dynamics of dsDNA, stretching, looping or supercoiling from the perspective of polymer physics. The ease of loop formation and the associated kinetics is characterized by L/lp. For , energy required to bend dsDNA makes the cyclization difficult for short chains. In contrast, when L/lp1 the cyclization between two ends gets harder because of loss of chain entropy. Theory and simulations using CG model showed that τc is the shortest when L/lp≈2–3 (refs 40,41,42) (Fig. 1a). Interestingly, in looping of dsDNA responsible for gene regulation in prokaryotes L≈100 bp (L/lp≈0.7). For such a dsDNA with L≈100 bp sequence effects are also relevant43,44,45.

Figure 1: DNA applications.
figure 1

(a) Loop formation times between two regions in dsDNA separated by s along the contour from simulations using CG model that represents a single pitch of DNA helix as a monomer unit. Lines are theoretical results. (b) Extension as a function of mechanical force for 97 kb λ-DNA. Symbols are experimental results and the dashed line is the fit using WLC model. (c) Model of bacterial chromosomal separation from simulations of tightly confined polymer chain. The newly synthesized DNA (blue and red) is extruded to the periphery of the unreplicated nucleoid (grey) and the two strings of blobs drift apart and segregate owing to the excluded-volume interactions and conformational entropy. (d) Top figure shows scaling law P(s)s−1.08 where P(s) is the contact probability for a given genomic distance s, measured by Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. The exponent in the power-law decay is distinct from s−1.5 for an equilibrated globule (bottom left) whereas s−1.08 scaling (dashed lines from CG simulations in the top figure) is explained using a fractal globule (bottom right), a knot-free, polymer conformation, which enables reversible folding and unfolding at any genomic locus. Panels ad were adapted from refs 42 (reproduced with permission, © American Institute of Physics), 34 (reproduced with permission from AAAS), 48 and 51 (reproduced with permission from AAAS), respectively.

Stretching dsDNA

Fluctuations of dsDNA on scales comparable to lp can be described using WLC model, which parameterizes dsDNA as a polymer that resists bending on scale lp. Smith et al. measured the response of a 97 kbp dsDNA (countour length L≈33.0 μm) from λ-phage to a stretching force, f (refs 34,46) (Fig. 1b). In the absence of f, λ-DNA conformations are determined by thermal fluctuations, whereas loss in chain entropy must be overcome to stretch dsDNA f≠0. The free energy of stretching of a semiflexible chain under tension is equivalent to a quantum mechanical problem of a dipolar rotor with moment of inertia lp in an electric field f. An extrapolated formula is used to accurately describe the measured force as a function of extension (Fig. 1b). Fits to experimental data yield L of λ-DNA (32.80±0.10) μm and lp≈(53.4±2.3) nm, thus confirming most directly that dsDNA is a semiflexible chain.

Confined polymers and bacterial chromosome segregation

Replication and passage of genetic information to daughter cells are major events in cell reproduction. These complex events are remarkably accurate even in simple organisms. Although chromosome segregation is likely to be complex and well orchestrated, it has recently been proposed that confinement-induced entropic forces due to restrictions in cellular space is sufficient to drive chromosome segregation in bacteria47,48 (Fig. 1c). This proposal was formulated using molecular simulations of tightly confined self-avoiding polymers chains in cylindrical space, which show that the chains segregate and become spatially organized reminiscent of that observed in bacteria. In such highly confined spaces polymer conformations are determined by ξ, the size of a renormalized structural unit, the Flory radius RF in the absence of confinement, and the length (P) and diameter D of the cylinder. In Escherichia coli the values are ξ=87 nm, RF=3.3 μm, D and P are 0.24 μm and 1.3 μm, respectively. Armed with the results for confined polymers, a concentric shell model for bacterial chromosome was proposed47,48 in which the nucleoid was modelled as an inner and outer cylinder. The unreplicated 'mother' strand, a self-avoiding chain, is restricted to the inner compartment whereas the 'daughter' chain (obtained in simulations by adding monomers at a set time in the Monte Carlo simulations) are free to explore the entire nucleoid volume. The results of the simulations show that the newly added (or replicated) chain segregates to the periphery of the nucleoid, driven by gain in entropy, and become spatially organized as they are synthesized (Fig. 1c). CG modelling combined with polymer theory leads to the discovery that entropic forces alone are sufficient to drive chromosome segregation in bacteria, with proteins perhaps having a secondary role in poising the state of the chromosome for enabling the entropy-driven mechanism.

Chromosome folding

In eukaryotic cells chromosomes fold into globules that spatially occupy well-defined volumes known as chromosome territories49. In this process, widely separated gene-rich regions are brought into close proximity. Knowledge of the spatial arrangement of chromosomes is important in describing gene activity and the state of the cell. Polymer physics concepts have been used to describe the structures of folded chromosome using constraints derived from experiments. These calculations have provided considerable insights into their compartmentalization in the nucleus50. A number of models, such as the random walk model, and models that connect mega-based size domains by chromatin loops have been used to describe higher structures of chromatin. The experimental resolution is roughly 1Mb (340 μm), and consequently coarse-graining in this context must be on length scales on the order of a μm. Recently folding principles for human genome were proposed using data for long-range contacts between distinct loci as constraints51. Experiments showed that contact probability, I(s), between loci in a chromosome, which is separated by genomic distance s (measured in units of bp) exhibits a power law decay in the range 500 kb to 7 Mb. The observed dependence I(s)s−1 can be rationalized using polymer models (Fig. 1d) introduced a number of years ago in describing collapse of homopolymers52. If chromosome folds up into an equilibrium globule (polymer in a poor solvent) then I(s)s−1.5, which cannot account for the experimental observations. An alternate model suggests that interface DNA can organize itself into a fractal globule, which is compact and not entangled as an equilibrium globule would be. Monte Carlo simulations of a polymer with 4,000 beads (1 bead=1,200 bps0.4 μm) were used to generate conformations of fractal and equilibrium globules. The power law decay of I(s), with exponent −1, is consistent with measurements (Fig. 1d). More importantly, the unknotted fractal globules loci that are close in genomic sequence are also in proximity in three-dimensional spatial arrangement, which clearly is relevant for gene activity.

RNA folding

Since the discovery that RNA can serve as enzymes there has been great impetus to describe their folding in quantitative terms. RNA folding landscape is rugged because of interplay of several competing factors. First, phosphate groups are negatively charged, which implies that polyelectrolyte effects oppose folding. Valence, size and shape of counterions, necessary to induce compaction and folding53, can dramatically alter the thermodynamics and kinetics of RNA folding. Second, the nucleotides purine and pyrimidine bases have different sizes but are chemically similar. Third, only 54% of bases form canonical Watson–Crick base pairs whereas the remaining nucleotides are in non-pairing regions54. Fourth, the lack of chemical diversity in the bases results in RNA, easily adopting alternate misfolded conformations, which means that the stability gap between the folded and misfolded structures is not too large. Thus, the homopolymer nature of the RNA monomers, the critical role of counterions in shaping the folding landscape, and the presence of low-energy excitations around the folded state make RNA folding a challenging problem6.

Polyelectrolyte effects

To fold, RNA must overcome the large electrostatic repulsion between the negatively charged phosphate groups. Polyelectrolyte (PE) based theory shows that multivalent cations (Z>1) are more efficient in neutralizing the backbone charges than monovalent ions—a prediction that is borne out in experiments. The midpoint of the folding transition Cm, the ion-concentration at which the populations of the folded and unfolded states are equal, for Tetrahymena ribozyme is 3×106-fold greater in Na+ than in cobalt-hexamine (Z=3)! The nature of compact structures depends on Z with the radius of gyration scaling as RG1/Z2, which implies compact intermediates have larger free energy as Z increases. Thus, folding rates should decrease as Z increases, which also accords well with experiments55. Polyelectrolyte theory also shows that counterion charge density ζ=Ze/V should control RNA stability. As ζ increases, RNA stability should increase—a prediction that was validated using a combination of PE-based simulations and experiments. The changes in stability of in Tetrahymena ribozyme in various Group II metal ions (Mg2+, Ca2+, Ba2+, and Sr2+) showed a remarkable linear variation with ζ (ref. 56). The extent of stability is largest for ions with largest ζ (smallest V). Brownian dynamics simulations showed that this effect could be captured solely by nonspecific ion-RNA interactions56. These findings and similar variations of stability in different-sized diamines show that (i) the bulk of the stability arises from nonspecific association of ions with RNA, and (ii) stability can be greatly altered by valence, shape and size of the counterions.

Structures of RNA intermediates

The complete characterization of counterion-mediated RNA folding requires structural description of the unfolded (U), intermediate (I), and the folded states. Structures of the folded states can be obtained using crystallography or NMR. However, it is difficult to characterize the ensemble of structures populated at low (U) and moderate (I) ion concentrations (C). To obtain the ensemble of I structures from time-resolved small-angle X-ray scattering data, a CG model for Tetrahymena group I was constructed by representing (5–6) nucleotide pairs by a single sphere (Fig. 2a)57. The salient findings are: (i) At times before global collapse, the domains of the ribozymes are extended because PE effects dominate. (ii) On time scales that are much less than the overall folding time, there is a drastic reduction in the size of RNA. The folding intermediates are fluid-like and must be a mixture of species that contain specifically collapsed structures (large degree of native-like order) and nonspecifically collapsed conformations (low degree of native-like order).

Figure 2: Ribozyme to RNA hairpin folding.
figure 2

(a) Left is the secondary structure map of Tetrahymena group I intron where the circles show that 5–6 base pairs are used to represent one interaction centre. Simulations of the CG model are used to obtain best agreement with time-dependent small-angle X-ray scattering signals as the ribozyme folds. Representative structures that produce best agreement with experiments are shown57. (b) Refolding pathways of a RNA hairpin on quenching the force from a high-to-low value (left) and obtained from temperature quench (right) using SOP model. On force-quench folding commences from an extended (E) state by forming the turn, which nucleated the hairpin formation (left).

Complexity of hairpin formation

When viewed on length scales that span several bps folding of a small RNA (or DNA) hairpin is remarkably simple. However, when probed on short times (ns–μs range) the formation of a small hairpin involving turn formation and base-stacking is remarkably complex. Recent experiments show that the kinetics of hairpin formation in RNA (or ssDNA) deviates from the classical two-state kinetics and is best described as a multi-step process58. Extra facets of hairpin formation have been revealed in single-molecule experiments that use mechanical force (f). These experiments prompted simulations that vary both T and f. The equilibrium phase diagram showed two basins of attraction (folded and unfolded) at the locus of critical points (Tm, fm). At Tm and fm the probability of being unfolded and folded is the same. The free energy surface obtained from simulations explained the sharp bimodal transition between the folded and unfolded state when the RNA hairpin is subject to f (refs 16,59). Thus, from thermodynamic considerations, hairpin formation can be described as a two-state system.

On temperature quench, the formation of hairpins occurs via multiple steps59, as observed in the recent kinetic experiments. Folding pathways between T-quench and f-quench refolding are different (Fig. 2b). The initial conformations generated by forced unfolding are fully extended. They are structurally homogeneous. The first event in folding on f-quench is loop formation, which is a slow nucleation process (Fig. 2b). Zipping of the remaining base pairs leads to rapid hairpin formation. Refolding on T-quench commences from a structurally broad ensemble of unfolded conformations. Therefore, nucleation can originate from many regions in the molecule(Fig. 2b). The simulations showed that the complexity of the folding landscape observed in ribozyme experiments was already reflected in the formation of simple RNA hairpin16,60 just as β-hairpin formation captures much of the complexity of protein folding61.

Protein folding

The impetus to understand the mechanisms of protein folding comes from a number of different sources. First, there is increasing need to produce models that can predict folding thermodynamics and kinetics at conditions used in experiments. Second, it is urgent to describe the biophysical basis of misfolding and the link to neurodegenerative diseases. Third, as we move towards a system level description of cellular processes, it is important to develop theoretical models for describing folding in crowded solutions as well as folding of proteins as they are synthesized by the ribosome.

Molecular transfer model

The validity of models can only be assessed by comparing simulation results (obtained under conditions used in experiments) to experiments. Majority of computational studies use temperature to trigger folding and unfolding whereas a substantial number of experiments use denaturants for the same purpose, thus making it difficult to validate the models. This difficulty has been overcome with the introduction of a phenomenological molecular transfer mode (MTM)62,63, which combines simulations performed in condition A (for example fixed temperature, T1 and zero denaturant concentration), and the sampled conformations are assigned appropriate Boltzmann weight such that the behaviour in solution condition B (T and non-zero denaturant concentration for example) can be accurately predicted without running further simulations. The MTM theory shows that this procedure is exact provided the conformations of the protein are exhaustively sampled in condition A, and is only limited by the accuracies of the force fields. Applications of MTM requires the free energy cost of transferring a given protein conformation from A→B, which were taken from experimentally measured transfer-free energies for the peptide backbone and each amino acid. The MTM simulations for protein L (an α/β protein) and the nearly all β-sheet cold shock protein quantitatively reproduced measured values of the dependence of the population of the folded state as a function of denaturants (Fig. 3a). Surprisingly, MTM-based simulations also accurately predicted denaturant-dependent measurements in single-molecule experiments.

Figure 3: Protein folding.
figure 3

(a) Dependence of fraction of molecules in the native basin of attraction as function of guanidinium chloride concentration for protein L (blue) and Cold shock protein (red) where the symbols are data from experiments, and the lines are results from Molecular transfer model simulations. (b) Unfolding of GFP, an 250-residue 11-stranded β-barrel protein (left). Forced unfolding obtained from CG simulations on the right shows pathway bifurcation. The structures from the simulations at various stages of unfolding are also shown. (c) The volume of exit tunnel (left) and a helix (right) are shown in the ribosome structure. Co-translational folding of a protein occurs as it is synthesized by the ribosome.

Mechanical force in protein folding

A number of single-molecule experiments, which use mechanical force (f) in various modes (force ramp, force quench, and constant force) to initiate folding from arbitrary regions in the energy landscape, have given a new perspective on protein (and RNA) folding64,65. These experiments, which monitor time-dependent changes in the extension, x(t), of the protein of interest showed that folding occurs in multiple stages on force quench. The power of these experiments are fully realized only by combining them with theory66,67,68,69 and simulations. Such an approach was used to construct the folding landscape of the nearly 250-residue green fluorescent protein (GFP), which has a barrel-shaped structure consisting of 11 β-strands with one α-helix in the carboxy terminus. Using simulations with self-organized polymer (SOP) representation70 of GFP at the loading rate used in experiments, a rich and complex folding landscape was predicted (Fig. 3b). Unfolding of the native (N) began with rupture of the α-helix leading to [GFPΔα] intermediate. Subsequently, there was a bifurcation in the unfolding pathways. In most cases, the route to the unfolded (U) involved population of two extra intermediates, [GFPΔαΔβ1] (Δβ1 represents forced-rupture of amino-terminal β-strand) and [GFPΔαΔβ1Δβ2β3]. The most striking prediction of the simulations was that the minor pathway had only one intermediate [GFPΔαΔβ11] besides [GFPΔα]. The predictions using SOP simulations of GFP were quantitatively validated by single-molecule experiments71.

Co-translational folding

With the determination of the ribosome structures72,73, there is great interest in the folding of proteins as they are synthesized. On synthesis, which occurs at the rate of about 20 amino acids per second in E. coli, the polypeptide chain traverses a roughly cylindrical tunnel whose lining changes from the peptidyl transfer centre to the exit that is 10 nm from peptidyl transfer centre (Fig. 3c). Experiments have shown that it is likely that certain regions can accommodate α-helices depending on the sequence, which is of particular interest for transmembrane helices that can be directly inserted into the membrane by the translocon. Inspired by these experiments, theory and simulations were used to show that the extent of helix formation does depend on the sequence74, the diameter of the tunnel, and potential interactions between the nucleotides and residues that line the tunnel and the polypeptide chain.

More recently, several experiments have probed the possibility of tertiary structure formation especially in the vestibule near the exit tunnel, whose volume is large enough for tertiary structure formation of the N-terminal region of the protein. CG simulations, which use either Cα model75 or Cα-SCM76 and all atom representation of RNA or TIS or four-site model for RNA, have been used to interrogate coupled-synthesis and folding (Fig. 3c). Some general results were found in these simulations: Polypeptide synthesis and folding are not coupled for single-domain proteins, which require the synthesis of complete protein for folding to commence; However, co-translational folding is prevalent in multi-domain proteins in which the N-terminus region is likely to fold as it exits the tunnel. In this case, the in vivo folding pathway is expected to be different than in vitro; Simulations also suggest that interaction with the ribosome surface decreases folding-pathway diversity and results in a more compact transition state structure76.

Towards folding under cellular conditions

Cellular interior is replete with a host of macromolecules, which can alter all processes ranging from transcription to folding. For example, in E. coli ribosome (20.8 nm), polymerases and other protein complexes occupy merely 22% of total volume and small complexes, and other small complexes constitute about 8% of the total volume. Thus, unlike in vitro experiments, where folding is studied in an aqueous solution corresponding to infinite dilution conditions, crowding effects have to be taken when describing their behaviour in vivo. A simple calculation shows that average spacing between cytoplasmic proteins is 4 nm (ref. 77) (or ρ(50–400) mg ml−1 (ref. 78)). Given that the diameter of a typical proteins (300 aa) is 4 nm, the cell is an extremely crowded place, which severely inhibits conformational fluctuations that are easily realized in typical in vitro experiments.

An approximate mimic of the cellular environment can be realized by adding high concentrations of natural or synthetic macromolecules. Consider the simplest case of a crowding agent with radius Rc (Ficoll 70 for example) that is inert towards the protein or RNA. The volume fraction ϕc=ρν, where , the volume of the crowding particle, can be altered by Rc even with ρ fixed. The first crowding simulation used a Cα-SCM of a β-sheet protein in the presence of spherical crowding particles with 0≤ϕc≤0.25 (ref. 32). The CG simulations showed that when only excluded volume interactions dominate, stabilities of globular proteins relative to ϕc=0 (ref. 32) are enhanced (Fig. 4a). The extent of stability change, measured using ΔTm=Tm(ϕc)−Tm(ϕc=0), showed that where the ν(=3/5) is the Flory exponent that characterizes the size of the unfolded states of proteins. The scaling of ΔTm(ϕc) with ϕc has been confirmed in recent experiments79. Simulations also showed that the folding rate kF(ϕc), is also affected when ϕc≠0. Rate, kF(ϕc), increases monotonically till an optimum value, and subsequently decreases (Fig. 4b). Interestingly, identical behaviour was observed in the dependence of the relaxation rate of phosphoglycerate kinase (PGK) as a function of Ficoll concentration. The simulation results on crowding-induced effects on the smaller β-sheet WW domain explains several aspects of folding of PGK (Fig. 4c).

Figure 4: Crowding effects on folding.
figure 4

(a) Crowding-induced entropic stabilization of the folded states of proteins. Restriction of the extended denatured state ensemble because of volume occupied by crowding raises its free energy to a greater-than-the-folded state. The structures are from Cα-SCM simulations of a three-stranded β-sheet protein, WW domain. (b) Folding time as a function of the concentration of the crowding agent (black lines) for PGK. The red curve shows folding of WW domain as a function of φc. The comparison is meant to illustrate that CG simulations qualitatively explain the measurements. (c) Structure of PGK in the absence of crowding agent (left) and at φc=0.25 on the right. The distance between the N and C terminus lobes has been dramatically reduced by crowding. The functional implications are given in the text and in ref. 79. Reproduced with permission from refs 32, 79 (© National Academy of Sciences, USA).

In an insightful application of simulations, it was recently shown that crowding can alter catalytic activity of kinase (Fig. 4c)79. As is common in many kinases, PGKs that transfers phosphate group from diphosphoglycerate to ADP, has a catalytic site between the N- and C- lobes connected by flexible hinge. To perform kinase activity, PGK must undergo a large-scale structural movement that reduces the distance between N- and C-lobes. It was found that PGK activity is increased over 15-fold in 200 mg mol−1 (ϕc≈0.2) Ficoll 70. The enhancement in activity was attributed to crowding-induced shape change that brings the N- and C-lobes in proximity.

Biological nanomachines

Biological machines are typically multi-subunit constructs that carry out myriads of functions by interacting with a range of proteins and RNA. Examples of such machines include molecular motors (kinesin, myosin and dynein), E coli chaperonin GroEL, F0F1-ATPase, ribosomes and helicases. A common theme in the function of these systems is they consume energy and, in the process, undergo a reaction cycle that dictates their function. Free energy transduction from chemical energy to mechanical work via a series of conformational switches is the hallmark of biological nanomachines80.

Chaperonin GroEL

Most of the proteins in cells fold spontaneously. However, molecular chaperones have evolved to rescue a small fraction of proteins, which do not reach their native states easily and hence are destined to aggregate. In E. coli, it is estimated that only about 5–10 % of the proteins81 require assistance from the chaperonin GroEL, which has been extensively characterized using experiments and simulations82.

GroEL has two heptameric rings that are stacked back-to-back83, with each subunit consisting of apical (A), intermediate (I) and equatorial (E) domains. During the reaction cycle, GroEL undergoes a series of a structural (allosteric) transitions on binding of substrate protein (SP), ATP and the co-chaperonin GroES (Fig. 5). In the T state, the hydrophobic patches in the A-domain recognize the exposed hydrophobic residues of the misfolded SPs. ATP-binding triggers dramatic domain movements in GroEl resulting in the catalytic sites moving apart, which in turn imparts a stretching force to partially unfold the captured SP. This step is followed by GroES binding, which results in the encapsulation of the SP in the central cavity. The extent of structural changes at the molecular level in each subunit of GroEL (each ring has 3,850 residues) during the reaction cycle, (TRR″→T) was revealed only through CG simulations18.

Figure 5: Reaction cycle and GroEL function.
figure 5

The hemicycle of GroEL reaction cycle, which shows that a misfolded SP is captured by GroEL (E. coli chaperonin) in the T state. This step is followed by reversible ATP-driven transition to the R state to which the co-chaperonin GroES can bind to form the R′ complex, which also results in the SP being encapsulated in the GroEL cavity. The SP can fold by the kinetic partitioning mechanism (KPM). Hydrolysis of ATP, which results in R″ formation, is followed by an allosteric signal from the bottom ring leads to release of ADP, GroES, and SP (folded or not), thus resetting the top ring to the T state.

Simulations using the SOP model of the entire heptameric GroEL particle vividly illustrated the conformational changes of GroEL triggered by ATP binding (TR) and ATP hydrolysis (RR″). Multiple simulation trajectories revealed an unprecedented view of the key interactions that drive the allosteric transitions18: (i) A domains rotate counterclockwise in the TR transition and clockwise in RR″ transition. (ii) Global TR and RR″ transitions follow two-state kinetics whereas the formation and kinetics of disruption of residue pairs encompass a broad range of time scales. There is an underlying kinetic hierarchy of internal dynamics that govern global transitions. (iii) For both TR and RR″ transitions, disruption and formation of salt-bridges are coordinated at multiple sites, which mediate the communication between two neighbouring subunits and synchronize the dynamics of the heptameric ring. (iv) There is a spectacular outside–in movement of two helices accompanied by interdomain salt-bridge formation, which are both solvent exposed in the R state. As a result the microenvironment of the SP, which is predominantly hydrophobic in the T state becomes progressively hydrophilic as the reaction cycle proceeds.

These large-scale conformational changes are linked to function. As long as misfolded proteins, which typically have exposed hydrophobic regions, are presented to GroEL, they are captured. In the transitions (TR), and even more dramatically in RR″, the structural changes in the GroEL particle results in the interactions between SP and GroEL from being favourable in the T state to unfavourable in the R″ state. Changes in the microenvironment results in the SP being placed in different part of the folding landscape from which it can fold with some probability during the life time of the R and R″ states (Fig. 5). If the cycle is iterated multiple times, sufficient yield of the SP can be obtained as anticipated by the iterative annealing mechanism84,85. It is amusing to note that the mechanism of GroEL function is hauntingly similar to the simulated annealing protocol86 used in the context of NP hard problems. Not surprisingly, nature has stumbled on it apparently millions of years earlier.

Kinesins

Kinesins are motors that transport cellular organelles along the network of cytoskeletal filaments80,87,88. Made of two identical motor domains linked by a coiled-coil stalk, kinesins exploit the free energy generated from binding and hydrolysis of ATP to produce the characteristic hand-over-hand stepping motion. A number of SM experiments show that kinesin takes roughly 8 nm step along the polar microtubule (MT) track as it strides towards the (+) end consuming one ATP per step.

Because of the fundamental limitations in experimental resolution, it is difficult to provide molecular explanations of many intriguing observations related to kinesin motility such as the free energy transduction, out-of-phase coordination of the processes occurring at the two-motor domains, and the role of kinesin–MT interactions. For both heads to associate with the MT binding sites, internal tension ((8–15) pN) exerted through the neck-linker deforms the catalytic site from its native-like configuration23, thus inhibiting the premature binding of ATP to the nucleotide free leading head. The ATP inhibited state is maintained as long as the two heads remain bound. The deformed leading head catalytic site is restored only after the inorganic phosphate (Pi) is released, which changes the trailing head from a strong to a weak binding state. Thus, processivity of kinesin is regulated by strain in the leading head that can be linked to the topology of the kinesin-MT complex. Simplified molecular simulations combined with theoretical ideas have also shed light on the vexing question of whether kinesin takes substeps (Fig. 6)24.

Figure 6: Mechanochemical cycle of the conventional kinesin.
figure 6

The diagram depicts the enzymatic cycle of a dimeric kinesin that generates a single 8-nm step on MT track. The head-to-head regulation via neck-linker results in the out-of-phase coordination of catalytic cycle. T, DP, D and φ denote ATP, ADP·Pi, ADP, and nucleotide-free state of the catalytic site. The yellow arrow represents the ordered state of neck-linker.

Transcription initiation by bacterial RNA polymerase

The synthesis of RNA, carried out by DNA-dependent RNA polymerase (RNAP) in a process referred to as transcription, involves several stages. The highly regulated transcription process in eukaryotes is extraordinarily complicated involving a whole zoo of transcription factors which interact with polymerase as it reads the codes on the template strand of DNA to make RNA (Fig. 7). Transcription in bacteria also involves a number of steps. The DNA-dependent RNAP, whose sequence, structures, and global functions are universally conserved from bacteria to man, is the key enzyme in the transcription of genetic information in all organisms. The three major stages in the transcription cycle, which first involves binding of initiation-specific transcription factors to the catalytically competent core of RNAP, to form a holoenzyme are: (i) Initiation, during which initiation-specific σ factor binds to the catalytically competent are RNAP to form the holoenzyme. This step is followed by recognition of the promoter DNA to form the closed (R·Pc) complex and subsequent transition to the open (R·Po) structure. (ii) Elongation of the transcript by nucleotide addition. (iii) Termination involving cessation of transcription and disassembly of the RNAP elongation complex.

Figure 7: Promoter melting induced by bacterial RNA polymerse.
figure 7

(a) Schematics of the base pairing between the template and non-template strands of the promoter. Nucleotide positions are numbered relative to the transcription start site, +1. DNA segments that interact with RNAP, −35 and −10 elements, are shaded red. (b) Structural models correspond to R·Pc (left) and R·Po (right). Transcription bubble structure is on the right (DNA non-template strand (yellow) and template strand (green)). (c) Sequence of events in transcription bubble formation, melting, scrunching, and bending process from top to bottom extracted from simulations. Blue lines show the changes in the conformation of the template strand in three major stages in the formation the transcription bubble. Yellow circles embedded within the polymerase represent the position of Mg2+. Red and green circles mark the positions of the nucleotides, and highlight the processes of melting, scrunching, and bending.

Recently the dynamics of structural transitions that occur during R·PcR·Po transition, which leads to melting of 12 base pairs in the promoter region resulting in the formation of transcription bubble (Fig. 7b)89 were probed using CG simulations89. To perform these simulations, CG model for the 3,122 residue RNAP–DNA complex (15 nm long and 11 nm wide) that is identical to those used to describe GroEL and kinesin dynamics, was used. For DNA, each strand was represented using a single site located at the centre of the nucleotide. Transcription bubble forms in three steps (Fig. 7c): (i) Melting of −10 element on the promoter region. (ii) Scrunching of promoter DNA into RNAP active channel, followed by the formation of bubble; the accommodation of dsDNA into the channel involves an internal RNAP dynamics of a transient expansion of key structural motifs in the β subunit. (iii) Bending of downstream DNA after the unwinding of the dsDNA. Simulation results revealed that internal RNAP dynamics resulting in transient buildup of strain is needed to fully accommodate dsDNA to gain access to the active site. The simulations (for animation, see http://www.youtube.com/watch?v=Q6QoyDl3TCw) also make several testable predictions to probe the relationship between RNAP motion and transcription bubble formation.

Future perspectives

Given that biological problems are complex, it is inevitable that CG models should have a key role in informing experiments. Although not reviewed here, there are a number of areas such as protein aggregation, membrane structure and dynamics90, and lipid-membrane interactions where such simulations have already been profitable91. Experimental constraints and theory have been the guiding factors in constructing length-scale-dependent CG models, as a few examples here illustrate. Current methods can be used to provide insights into a number of important biological problems. On a few nm-length scale, corresponding to folding problems, there is a need to study proteins in excess of 200 residues. Modelling counter ion effects to describe RNA folding can be achieved by integrating theoretical ideas from polymer physics and suitable CG models. Although electrostatic interactions have been approximately modelled in simulations of biological machines18, further refinements might needed for more accurate simulations92. On longer length scales, there are a number of problems which could profit from CG simulations. Description of motor-driven polymerization and depolymerization kinetics of microtubule and protein-induced polymerization of actin are two examples.

The demand to develop CG models will continue to grow, because there is an appetite to understand the workings of a cell. The increasing attention paid to obtain real-time measurements on how the workers (enzymes, ribozymes, ribosomes, genomes, lipids, membranes and so on) cooperate to execute the demands on the cell is sure to spur interests in models and theories. From a modelling perspective, it is neither possible nor desirable to devise microscopic models when considering events on long length and time scales. In constructing whole-cell models, it may be sufficient to model the various workers as quasiparticles, which interact with each other through connected networks that are dynamically changing depending on the cell status and external stimuli. Such a viewpoint is already being used in systems biology. The lesson from theoretical approaches to problems in condensed matter and material science is that phenomena at different length and time scales require different levels of description. Such a perspective, which also applies to biological problems, will surely spur us on to develop suitable CG models and theories that capture the essence of the problem at hand without being encumbered by unnecessary details.

Additional information

How to cite this article: Hyeon, C. & Thirumalai, D. Capturing the essence of folding and functions of biomolecules using coarse-grained models. Nat. Commun. 2:487 doi: 10.1038/ncomms1481 (2011).