Biomolecular modeling thrives in the age of technology

The biomolecular modeling field has flourished since its early days in the 1970s due to the rapid adaptation and tailoring of state-of-the-art technology. The resulting dramatic increase in size and timespan of biomolecular simulations has outpaced Moore’s law. Here, we discuss the role of knowledge-based versus physics-based methods and hardware versus software advances in propelling the field forward. This rapid adaptation and outreach suggests a bright future for modeling, where theory, experimentation and simulation define three pillars needed to address future scientific and biomedical challenges. The field of biomolecular modeling has thrived by exploiting state-of-the-art technological advances. In this Perspective, the role of software and hardware advances, and the disparity and synergy between knowledge-based and physics-based methods are discussed and explored.

collaborations between experimentalists and modelers, and the impact of community initiatives and exercises.Here, we focus on two aspects of technology advances that are relevant to numerous areas of computational science at large: knowledge-based methods versus physics-based methods, and the role of hardware versus software in driving the field.

Knowledge-based versus physics-based approaches
Physics-type models based on molecular mechanics principles 1 have been successfully applied to molecular systems since the 1960s, providing insights into structures and mechanisms involved in biomolecular rearrangements, flexibility, pathways and function.In these methods, energy functions that treat molecules as physical systems, similar to balls connected by springs, are used to express biomolecules in terms of fundamental vibrations, rotations and non-bonded interactions.Target data taken from experiments on relevant molecular entities are used to parametrize these functions, which are then applied to larger systems composed of the same basic chemical subgroups.Thus, experimental data are used in constructing these general functions but in a fundamental way, such as the nature of C-O bonds or the rotational flexibility around alpha carbons.
Knowledge-based methods, in contrast, lack a fundamental energy framework.Instead, various structural, energetic or functional data are used to train a computer program into discovering these trends in related systems from known chemical and biophysical information on specific molecular systems.Thus, such approaches use available data to make extrapolative predictions regarding related biological and chemical systems.
Physics-based methods.Physics-based methods offer us a conceptual understanding of biological processes.Indeed, the development of improved all-atom force fields for biomolecular simulations [17][18][19] in both functional form and parameters has been crucial to the increasing accuracy of modeling many biological processes of large systems.Force fields for proteins, nucleic acids, membranes and small organic molecules have been applied to study problems such as protein folding, enzymatic mechanisms, ligand binding/ unbinding, membrane insertion mechanisms and many others 20,21 .Current fourth-generation force fields have introduced polarization effects [22][23][24] , important for processes or systems with induced electronic polarization, such as intrinsically disordered proteins, metal/ protein interactions and membrane permeation mechanisms 25,26 .Despite 50 years of developments, force fields are far from perfect 27 , and further refinement, expansion, standardization and validation can be expected in the future 28 .Transferability to a wide range of biomolecular systems and 'convergence' among different force fields will continue to be issues.In parallel to all-atom force fields, numerous coarse-grained potentials have been developed for many systems 29,30 , but these are far less unified compared with all-atom force fields.Much development can be expected in the near future in this area as the complexity of biomolecular problems of interest increases.
In the area of protein structure prediction, for example, the physics-based coarse-grained united-residue (UNRES) force field developed by the lab of the late Harold Scheraga demonstrated exceptional results in predicting the orientations of domains in the tenth Critical Assessment of Protein Structure Prediction (CASP10) 31 .Such predictions were free of biases from structural databases and relied on energetically favorable residue/residue interactions ('first principles').
Physics-based methods are essential for studying protein dynamics and folding pathways.For example, UNRES 32 and many all-atom force fields [17][18][19] have been successful in the study of folding pathways of several proteins 33,34 , including a small protein inside its chaperonin 35 .Besides folding mechanisms, kinetic and thermodynamic parameters can be determined 33 .
Many other areas of applications demonstrate how molecular mechanics and dynamics simulations provide insights on structures and mechanisms.These include structures of viruses 7,36 (including SARS-CoV-2 37 ), pathways in DNA repair 38 or folding of chromatin fibers [39][40][41] .In drug discovery, molecular docking has shown to be successful for high-throughput screening.For instance, restrainedtemperature multiple-copy molecular dynamics (MD) replicaexchange combined with molecular docking suggested molecules that bind to the spike protein of the SARS-CoV-2 virus 42 .
The most common concerns in such molecular mechanics approaches involve insufficient conformational sampling and limited simulation length compared with biological timeframes.Other drawbacks are approximations due to force-field imperfections and other model simplifications, absence of adequate statistical information and the lack of general applicability to all molecular systems 43 .Increases in computer power, advances in enhanced simulation techniques and fourth-generation force fields with incorporated polarizabilities [22][23][24] are helping overcome these limitations.For example, the Frontera petascale computing system allowed multiple microsecond all-atom MD simulations of the SARS-CoV-2 spike Fig. 2 | Performance of landmark simulations compared with the world's fastest supercomputers and moore's law trend.Plot of the computational system ranked first (blue) and the highest ranked academic computer (orange) as reported in R max according to the LINPACK benchmark as assembled in the Top500 supercomputer lists (www.top500.org).R max is the unit used to define computer performance in TFLOPS (trillion floating point operations per second).Landmark simulations (green diamonds) are dated assuming calculations were performed about a year before publication, except for the publications in 1998, which we assumed were performed in 1996.These include, from 1996 to the present, 25-bp DNA using National Center for Supercomputing Applications (NCSA) Silicon Graphics Inc. (SGI) machines 144 ; villin protein 145 using the Cray T3E900; bc 1 membrane complex 146 using the Cray T3E900; 12-bp DNA 147 using MareNostrum/Barcelona; Fip35 protein 148 using NCSA Abe clusters; nuclear core complex 151 using Blue Waters; influenza A virus 152 using the Jade Supercomputer; CypA/CA complex 154 using Blue Waters; HIV-1 capsid 7 using Titan Cray XK7; GATA4 gene 39 using Trinity Phase 2; and influenza A virus H1N1 36 using Blue Waters.As Blue Waters has opted out of the Top500, we use estimates of sustained system performance/sustained petascale performance (SSP/SPP) from 2012 and 2020.For system size and simulation time of each landmark simulation, see Fig. 1. glycoprotein embedded in the viral membrane 44 .Enhanced sampling simulations combining parallel tempering with well-tempered ensemble metadynamics revealed how phosphorylation of intrinsically disordered proteins regulates their binding to their interacting partners 45 .The polarizable force-field AMOEBA has allowed the determination of the phosphate binding mode to the phosphate binding protein 46 , which has remained controversial for a long time.
Knowledge-based methods.Knowledge-based methods are less conceptually demanding than physics-based models and can in principle overcome the approximations of physics-based methods.
In the field of protein folding and structure prediction, knowledge-based methods, such as homology 47 , threading 48 and minithreading modeling 49 have shown to be more effective than physics-based methods in some cases 50 .Other successful algorithms use information on evolutionary coupled residues, namely, residues involved in compensatory mutations 51 .Such information can be detected from multiple sequence alignments and used to predict protein structures de novo with high accuracy, as observed in CASP11 52 .
In particular, the artificial intelligence approach by Google AlphaFold, a co-evolution-based method, upstaged the CASP13 exercise held in 2018, outperforming other methods for protein structure prediction 53 (Fig. 3a).More recently, analyses of CASP14 (2020) results with the updated AlphaFold2 revealed unprecedented levels of accuracy across all targets 54 .
The increasing amount of high-resolution structural data for protein/ligand binding, as deposited in the Protein Data Bank, has accelerated the use of knowledge-based methods in drug discovery, a key application of biomolecular modeling.For example, the crystal structure of the main protease of the SARS-CoV-2 virus was solved unliganded and in complex with a peptide mimetic inhibitor 55 , providing the basis for the development of improved inhibitors using knowledge-based methods 56 .Also related to COVID-19, artificial intelligence tools have been used to identify potential drugs against SARS-CoV-2 57 .
Of course, the accuracy of knowledge-based methods depends on the quality and size of the database available, similarity between the underlying database and the systems studied, and the analysis methods applied.Even in large databases, some systems are underrepresented.These include, for example, RNAs with higher-order junctions 58 , where few experimental data exist, and intrinsically disordered proteins, which are difficult to solve by conventional X-ray or NMR techniques.Such problems may be alleviated in principle as more data become available.Nonetheless, unbalanced databases can produce erroneous results.For example, models trained with databases of ligand-protein complexes where ligands that bind weakly are underrepresented 59,60 can overestimate binding affinities.
For some applications, such as deriving force fields by machine learning protocols, access to a large and diverse high-quality training dataset obtained by quantum mechanics calculations is essential to obtain reliable results for general applications 61 .However, there are no known criteria of sufficiency.How many molecular descriptors are required to satisfactorily explain ligand binding or chemical reactivity?How many large non-coding RNAs are diverse enough to represent the universe of RNA folds for these systems?
Combined knowledge and physics-based methods.Fortunately, combinations of knowledge and physics-based approaches can merge the strengths of each technique, integrating specific molecular information with learned patterns.For instance, maximum entropy 62 and Bayesian 63 approaches integrate simulations with experimental data.They generate structural ensembles for the systems using MD or Monte Carlo simulations and incorporate them by imposing restraints to reproduce experimental data.
Protein-folding approaches can be improved by the use of hybrid energy functions that combine physics-based with knowledgebased components.For example, physics-based functions can be modified with structural restraints from NMR experiments 49 , torsion angle correction terms for the backbone or side chains of residues 64 or hydrogen-bonding potentials based on high-resolution protein crystal structures 65 .
In protein structure refinement, combinations of physics-based and knowledge-based approaches have shown to be particularly successful.For example, in the CASP10 exercise, MD simulations from the Shaw 66 and Zhang 67 groups showed that experimental constraints were crucial for refining predicted structures.Pure physics-based methods were unsuccessful at correcting non-native conformations toward native states.Recently, it was reported that refinements with MD simulations of models obtained with AlphaFold substantially improve the predicted structures 68 .
In computer-aided drug design, quantitative structure/property relationship (QSPR) models combine experimental and quantum mechanical descriptors to improve the prediction of Gibbs free energies of solvation 69 .MD simulations combined with machine learning algorithms can help create improved quantitative structure-activity relationship (QSAR) models 70 .
In the long run, inferring mechanisms is critical for understanding and addressing complex problems in biophysics.Force fields will not likely disappear any time soon, despite the growing success of knowledge-based methods.As shown in public citizen projects, such as Foldit for protein folding 71 , combinations of both physicsbased and knowledge-based methods will probably work best.Importantly, human intuition and insight is needed to fully merge both approaches and properly interpret the computational findings.

The role of algorithms versus hardware
Rigorous and efficient algorithms are essential for the success of any biomolecular modeling or simulation.New algorithms are required to address problems as they emerge, as well as to utilize new technologies and hardware developments.Classic examples of algorithms that enhanced the reliability and efficiency of biomolecular simulations include the particle-mesh Ewald method for treatment of electrostatics 72 and symplectic and resonance-free methods for long-time integration 1,[73][74][75] in MD simulations.Hardware advances, in addition, are essential for expanding system size and simulation timeframes.The continuing increase in computer power, in combination with parallel computing, has been crucial in the development of the field of biomolecular simulations.Both hardware and software will be essential to the continued success of the field.
Algorithms and software advances.Outstanding progress has been reported in developing software to enhance sampling, reduce computational cost and integrate information from machine learning and artificial intelligence methods to solve biological problems.Algorithms that utilize novel hardware such as graphics processing units (GPUs) and coupled processors have also been impactful.Enhanced sampling methods and particle-based methods such as Ewald summations have revolutionized how molecular simulations are performed and how conformational transitions can be captured, for example, to connect experimental endpoints 76 .MD algorithms such as multiple timestep approaches, in contrast, have achieved far less impact than hardware innovations, due to a relatively small net computational gain.However, their framework may be useful in combination with other improvements such as enhanced sampling algorithms 77 or optimized particle-mesh Ewald algorithms 78 .The complexity and size of the biological systems of interest increases every year and thus continued algorithm development is crucial to obtain reliable methods that balance accuracy and performance.
Density functional theory (DFT) 79,80 , used in quantum mechanical (QM) applications since the 1990s, has become one of the most NaTure ComPuTaTioNaL SCieNCe | www.nature.com/natcomputscipopular QM methods to study biomolecules.DFT has a computational cost similar to semi-empirical methods but higher accuracy.New DFT functionals are continuously being developed to improve the description of dispersion and for special applications 81 .The high efficiency of DFT implies that larger and more complex systems can be studied, expanding the applications and predictive power of electronic structure theory, and promoting collaborations between modelers and experimentalists 82 .This high efficiency has also been exploited by MD methods; DFT-based MD simulation methods, such as Car-Parrinello MD 83 and ab initio MD 84 , are widely applied to study electronic processes in biological systems, such as chemical reactions 85 .
Because of the computational cost of QM methods and the large size of most biological systems, the development of combined quantum mechanics/molecular mechanics (QM/MM) methods was fundamental to advance electronic structure calculations of biological systems 86 .In particular, computational enzymology has driven the development of these methods since the pioneering work of Warshel and Levitt on the reaction mechanism of the lysozyme 87 .By partitioning the system into an electronic active region and the rest, which is treated at a molecular mechanical level, computational effort is centered in the part of the system where it is needed, and the overall cost is substantially reduced.Nowadays, several QM/ MM methods differing in the scheme used to compute QM/MM energies, the treatment of the boundary region and the QM to MM interactions are applied to study many enzymatic mechanisms, metal-protein interactions, photochemical processes and redox processes, among others 86,88,89 .Adaptive QM/MM methods that reassign  155 .The distribution of distances and angles are obtained and then the scores are optimized with gradient descent to improve the designs.b, RNA mesoscale modeling.Target residues for drugs or gene editing in SARS-2-CoV frameshifting element (FSE) identified by graph theory combined with all-atom microsecond MD simulations made possible by a new four-petaflop 'Greene' supercomputer at New York University.c, Chromatin mesoscale modeling.HOXC mesoscale model at nucleosome resolution constructed with experimental information on nucleosome positioning, histone tail acetylation and linker histone binding 130 .Shown is the unfolded gene (left) and the folded structure of the gene (right).d, Cloud computing.Targeted protein sites related to SARS-CoV-2 with bound ligands from the billion compound database screened with VirtualFlow platform that makes use of cloud computing and cloud storage 156 .Figure reproduced with permission from: a, ref. 157 , Springer Nature America, Inc. (structure image); b, ref. 119 , Cell Press; c, ref. 130 , PNAS; d, https://vf4covid19.hms.harvard.edu,Harvard Univ.See original works for full names and abbreviations.
the QM and MM regions on the fly have also been developed 90 .These methods are particularly important to study ions in solution or in biomolecules, and chemical reactions in explicit solvent.
Recent QM/MM methods employ machine learning (ML) potentials in place of MM calculations 91 .Such QM/ML schemes can avoid problems associated with force fields as well as boundary issues between the QM and MM regions.Other recent developments use neural networks coupled with QM/MM algorithms; the neural networks are used to predict potential energy surfaces at an ab initio/MM level from semi-empirical/MM calculations 92 .
Many of the biological processes of interest occur on timescales that are not easily accessible by conventional MD simulations.Thus, a variety of enhanced sampling algorithms have been developed 93,94 .These methods improve the sampling efficiency by reducing energy barriers and allowing the systems to escape local minima in the potential energy surface.Speedups compared with conventional MD can be around one order of magnitude or more 95 .Methods based on collective variables such as umbrella sampling 96 , metadynamics 97 and steered MD 98 have advanced the field with applications to ligand binding/unbinding, conformational changes of proteins and nucleic acids, free energy profiles along enzymatic reactions and ligand unbinding, and protein folding.Methods that do not require definition of specific collective variables or reaction coordinates, such as replica exchange MD 99 and accelerated MD 100 have shown to be particularly successful when defining a collective variable is difficult, for example, when exploring transition pathways and intermediate states.Markov state models (MSMs) can help describe pathways between different relevant metastable states identified by experiments or MD.For example, when studying the folding of a dimeric protein 101 , an MSM of the metastable states on the free-energy surface has identified the states that describe the folding process, as well as the specific inter-residue interactions that can lead to kinetic traps.Physics-based protein folding has benefited from the application of MSMs that combine many short independent trajectories 102,103 .Related thermodynamic integration 104 and free-energy perturbation 105 methods, which calculate free-energy differences between initial and final states, have also helped determine protein/ligand binding constants, membrane/water partition coefficients, pK a values and folding free energies 106,107 to connect simulations to experimental measurements.
Enhanced sampling techniques are now being combined with machine learning to improve the selection of collective variables 108 and to develop new methods 109,110 .Clearly, artificial intelligence and ML algorithms are changing the way we do molecular modeling.Coupled with the growth of data, GPU-accelerated scientific computing and physics-based techniques, these algorithms are revolutionizing the field.Since the pioneering work of Behler and Parrinello on the use of neural networks to represent DFT potential energy surfaces and thus to describe chemical processes 111 , ML has been applied to design all-atom and coarse-grained force fields, analyze MD simulations, develop enhanced sampling techniques and construct MSMs, among others 112 .As discussed above, Google's AlphaFold performance in CASP13 and CASP14 showed how impactful these kinds of algorithms can be for predicting protein structure 53,54 .Artificial intelligence platforms for drug discovery have also led to clinical trials for COVID-19 treatments in record times 57 .

Multiscale models.
A special case of algorithms that has potential to revolutionize the field involves multiscale models.Crucial for bridging the gap between experimental and computational timeframes, such models increase spatial and temporal resolution by use of coarse graining, interpolation and other ways to connect all the information on different levels.
The 2013 Nobel Prize in Chemistry that recognized Karplus, Levitt and Warshel for their work on developing multiscale models has underscored the importance of these models.In the 1970s, bridging molecular mechanics with quantum mechanics defined indeed a new way of simulating molecular systems 113 .The first hybrid model of this type by Warshel and Karplus 113 was initially intended to study chemical properties and reactions of planar molecules but was later extended to study enzymatic reactions 87 .Today's models are numerous and varied.While useful in practice, they are generally tailored to specific systems and lack a rigorous theoretical framework.
For example, numerous coarse-grained protein models have been developed and applied to protein dynamics, folding and flexibility, protein structure prediction, protein interactions and membrane proteins, as recently reviewed 114 .
Coarse-grained models have also been developed to study nucleic acids.Possibly due to small volumes of structural data, high charge density and wide structural diversity, they have progressed somewhat slower than for proteins, especially in the case of RNA.
DNA coarse-grained models allow us to study, in reasonable time, large DNA systems that could not be approached by allatom models.The reduction in the degrees of freedom achieved by coarse-grained models has allowed the study of thousands of base pair systems in scales of microseconds to milliseconds.Crucial studies include self-assemblies of large DNA molecules, the denaturalization process, the hybridization process important for many biological functions, the topology of DNA mini-circles and the sequence-dependence of single-stranded DNA structures 29,115,116 .
The flexibility of RNAs and the huge spectrum of possible conformations make their modeling challenging, and numerous coarsegrained models that differ in the number of beads per nucleotide and interactions included in the model and their treatment have been developed 117,118 .A different coarse-grained approach using two-dimensional and three-dimensional graphs to represent RNA structure has also proven useful to analyze and design novel RNAs, including the SARS-CoV-2 frameshifting element 119 (Fig. 3b).
Coarse-grained models have also been applied to biomembranes, systems of thousands of lipids that undergo large-scale transitions in the microsecond-to-millisecond regime [120][121][122] .Membrane protein dynamics, virion capsid assembly, lipid recognition by proteins and many remodeling processes have been successfully captured in such coarse-grained applications 121 .
Finally, to study DNA complexed with proteins, such as in the context of chromatin fibers, multiscale approaches are essential, as recently reviewed 123,124 .These approaches derive the chromatin model from the atomistic DNA, nucleosomes and linker histones.Successful models by the groups of the late Langowski 41 , Wedemann 125 , Nordenskiöld 126 , Olson 127 , Spakowitz 128 , de Pablo 129 and ours 40,123 have been applied to understand the mechanisms that regulate chromatin compaction and function.For example, our recent three-dimensional folding of the HOXC gene cluster (~55 kbp) at nucleosome resolution by mesoscale modeling 130 revealed how epigenetic factors act together to regulate chromatin folding (Fig. 3c).The next challenge for these types of model is to merge the kilobase to megabase levels of understanding chromatin while retaining a basic dependency on the physical parameters that dictate fiber conformations.
Multiscale models are as much art as science, as they require subjective decisions on what parts to approximate and what parts to resolve.Yet much information guides these models, and important biological problems serve as motivators.Overall, innovative advances in both algorithms and hardware, especially in multiscale modeling, will be pivotal for the progress of the biological sciences in the coming years.
Hardware advances.The computational biology and chemistry communities have utilized hardware exceptionally well.This is evident from the expectation curve in Fig. 1 and from the computer technology plot in Fig. 2. We see that hardware innovations have NaTure ComPuTaTioNaL SCieNCe | www.nature.com/natcomputscipropelled the field of biomolecular simulations forward by around six orders of magnitude over three decades as reflected by simulation length and size of biomolecular systems.
In the first decade of the twenty-first century, hardware innovations such as the supercomputers Anton and Blue Waters propelled the field by expanding the limits of both system size and simulation time that is possible.Today, nanosecond simulations of a 160-million-atom influenza virus 36 or the 1-billion-atom GATA4 gene 39 have become possible.
At the same time, the introduction of GPUs for biomolecular simulations by NVIDIA broke new grounds.GPUs are specialized electronic circuits designed to rapidly manipulate and alter memory to accelerate computations.Such GPUs contain hundreds of arithmetic units and possess a high degree of parallelism, allowing performance levels tens or hundreds of times higher than a single central processing unit (CPU) core with tailored software 131 .
The acceleration of MD simulations by GPU computing and supercomputers substantially reduced the gap between experimental and theoretical scales.For example, as mentioned above, the world's second-fastest supercomputer-Summit from the Oak Ridge National Laboratory, with more than 27,000 NVIDIA GPUs and 9,000 IBM Power9 CPUs-was used to explore SARS-CoV-2 virus inhibitors among more than 8,000 compounds 42 .Such simulations were conducted in just a few days, with 77 compound candidates found.GPU-based algorithms for free-energy calculations can achieve a speedup of 200 compared with CPU-based methods 132 .QM/MM GPU-based methods have also accelerated calculations focused on enzymatic mechanisms.For example, GPU-based DFT in the framework of hybrid QM/MM calculations such as ONIOM 133 or additive QM/MM 134 realize speedup factors of 20 to 30 compared with CPU-based calculations.The Folding@home distributed computing project, dedicated to understanding the role of protein The image depicts the structure of the small protein BPTI that was simulated for 8.8 ps without hydrogen atoms and with four water molecules 158 .1980s: simulations that considered solvent effects became possible, and algorithms such as SHAKE, to constrain covalent bonds involving hydrogen atoms, allowed the study of systems with explicit hydrogens.The image depicts the 125 ps simulation of the 12 bp DNA in complex with the lac repressor protein 159 in aqueous solution using the simple three-point charge water model.1990s: QM/MM methods can perform geometry optimizations, MD and Monte Carlo simulations 160 .The image depicts the acetyl-CoA enolization mechanism by the citrate synthase enzyme studied with AM1/CHARMM.2000s: GPUbased MD simulations, specialized supercomputers such as Anton, shared resources such as Folding@home, enhanced sampling algorithms and Markov state models (MSMs) all contributed to advance protein folding 161 .The image depicts the long 100 μs simulation of the Fip35 folding conducted in Anton (red) compared with the X-ray structure (blue).2010s: all-atom and coarse-grained MD simulations of viruses performed on supercomputers such as Blue Waters became common 162 .The image depicts the MD simulation of the all-atom HIV capsid using the MDFF method that uses cryo-electron microscopy data to guide simulations.2020s: physical whole-cell models are being developed to fully understand how biomolecules behave inside cells and to study interactions between them, for example, within viruses and cells.The image depicts the model of the interaction between the spike protein in the SARS-CoV-2 surface and the ACE2 receptor on a human cell surface being developed in the Amaro Lab.From left to right, images adapted with permission from: (left to right); second image, ref. 159 , Wiley; third image, ref. 163 , Wiley; fourth image, ref. 150 , AAAS; fifth image, ref. 164 , Springer Nature Ltd; sixth image, https://amarolab.ucsd.edu/news.php,Amaro Lab.
NaTure ComPuTaTioNaL SCieNCe | www.nature.com/natcomputscifolding in several diseases, is conducting most calculations on GPUs by using simulation packages adapted to this architecture 135 .Recently, over a million citizen scientists helped solve COVID-19 challenges; they combined ~280,000 GPUs, reaching the exascale and generating more than 0.1 s of simulation 136 .These simulations helped understand how the SARS-CoV-2 virus spike surface protein attaches to the receptors in human cells.MD software adapted to GPU-accelerated architectures is also being used to perform enormous cell-scale simulations 137 , important to mimic realistic cellular environments and to study viral and bacterial infections.
Cloud-based computing is surging as a viable alternative to supercomputers, providing researchers with remote high-performance computing platforms for large-scale simulations, analysis and visualization.Acquisition and maintenance of such hardware is not affordable for individual research groups, but feasible for institutions and companies.For example, Google's Exacycle has been used to conduct millisecond simulations of the G-protein-coupled receptor β2AR that revealed its activation pathway, important for the design of drugs to treat heart diseases 138 .Recently, in an unprecedented study, the Google Cloud Platform and Google Cloud Storage were combined to screen around 1 billion compounds against 15 SARS-CoV-2 proteins and 2 human proteins involved in the infection 139 (Fig. 3d).A high-performance version of the popular visualization program VMD has been implemented on the Amazon cloud 140 , as well as the MD toolkit QwikMD 141 and the molecular dynamics flexible fitting (MDFF) method for structure refinement from cryo-electron microscopy densities 142 .These efforts allow scientists worldwide to access powerful computational equipment and software packages in a cost-effective way.
Overall, tailored computers for molecular simulations, such as Anton, can accelerate the calculation of computationally expensive interactions with specialized software 143 , while general-purpose supercomputers or cloud computing that parallelize MD calculations across multiple processors with thousands of GPUs or CPUs can accelerate performance (for example, trillions of calculations per second) for large systems 36,39 .
Although hardware advances have overwhelmed software advances, both are clearly needed for optimal performance.Hardware bottlenecks will inevitably emerge as computer storage limits are reached.Yet, whether or not Moore's law will continue to be realized 11 , software advances will always be important.Certainly, engineers and mathematicians will not be out of jobs.
Figure 4 summarizes key software, hardware and algorithm developments that helped breakthrough studies.

Conclusions and outlook
Technology has driven many advances that affect our everyday life, from cellphones and personal medical devices, to solar energy and coping in times of physical isolation during the current COVID-19 pandemic.Biomolecular modelers have consistently leveraged technology to solve important practical problems efficiently and will undoubtedly continue to do so.Machine learning and other data science approaches are now offering new tools for discovery in numerous fields.These tools for predicting structures, dynamics and functions of biomolecules can be combined with physics-based approaches not only to find solutions but also to understand associated mechanisms.Algorithms such as MSMs, neural networks, multiscale modeling, enhanced conformational sampling and comparative modeling can be leveraged as never before, especially in combination with these data-science approaches.
We expect that force-field based methods will remain essential for the understanding of mechanisms of biomolecular systems, but knowledge-based methods will certainly gain momentum.Although the recent breathtaking results from AlphaFold2 54 might tempt us to believe that the physics-based era is over, the range of complex problems beyond protein folding is unlikely to be easily solved by knowledge-based methods alone.Novel computing platforms will also play an important role in the future of biomolecular simulations.As quantum computing, neuromorphic computing and other architectures enter the arena, we can be sure that they will be exploited avidly by the biomolecular community.Despite the extraordinary technical impact of computers on our field and the incredible potential of artificial intelligence techniques to address many scientific problems, human intuition and intelligence will continue to be instrumental for developing ideas and pursuing new research avenues.After all, such human talent is responsible for artificial intelligence design and implementation in the first place and will probably continue to do so.
Finally, gone are the days when modelers worked in isolation.Whether on Zoom or sharing a bench, teams of multidisciplinary scientists (and likely automated machines too in the future) are collaborating to address essential problems of life, from energy to vaccines.Despite some bumps, exponential growth appears a reality in the near future, and the field of biomolecular modeling and simulation will undoubtedly continue to incorporate, innovate and digitize the inner workings of biological systems to solve the secrets of life and to develop solutions for treating human disease, improving global health and enhancing our environment.

Fig. 1 |
Fig.1| expectation curve for the field of biomolecular modeling and simulation.The field started with comprehensive molecular mechanics efforts, and it took off with the increasing availability of fast workstations and later supercomputers.In the molecular mechanics illustration (top left panel), symbols b, θ and τ represent bond, angle and dihedral angle motions, respectively, and non-bonded interactions are also indicated.The torsion potential (E) contains two-fold (dashed black curve) and three-fold (solid violet curve) terms.Following unrealistically high short-term expectations and disappointments concerning the limited medical impact of modeling and genomic research on human disease treatment, better collaborations between theory and experiment has ushered the field to its productive stage.Challenges faced in the decade 2000-2010 include force-field imperfections, conformational sampling limitations, some pharmacogenomics hurdles and limited medical impact of genomics-based therapeutics for human diseases.Technological innovations that have helped drive the field include distributed computations and the advent of the use of GPUs for biomolecular computations.The molecular-dynamics-specialized supercomputer Anton made it possible in 2009 to reach the millisecond timescale for explicit-solvent all-atom simulations.The 2013 Nobel Prize in Chemistry awarded to Levitt, Karplus and Warshel helped validate a field that lagged behind experiment and propel its trajectory.Along the timeline, we depict landmark simulations: 25-bp DNA (5 ns and ∼21,000 atoms)144 ; villin protein (1 μs and 12,000 atoms)145 ; bc 1 membrane complex (1 ns and ∼91,000 atoms)146 ; 12-bp DNA (1.2 μs and ∼16,000 atoms)147 ; Fip35 protein (10 μs and ∼30,000 atoms)148 (image from 149 ); Fip35 and bovine pancreatic trypsin inhibitory (BPTI) proteins (100 μs for Flip35 and 1 ms for BPTI, and ∼13,000 atoms)150 ; nuclear pore complex (1 μs and 15.5 million atoms)151 ; influenza A virus (1 μs and >1 million atoms)152 ; N-methyl-D-aspartate (NMDA) receptor in membrane (60 μs and ∼507,000 atoms)153 ; tubular cyclophilin A/capsid protein (CypA/CA) complexes (100 ns and 25.6 million atoms)154 ; HIV-1 fully solvated empty capsid (1 μs and 64 million atoms) 7 ; GATA4 gene (1 ns and 1B atoms)39 ; and influenza A virus H1N1 (121 ns and ∼160 million atoms)36 .Figure adapted with permission from ref.1 , Cambridge Univ.Press.

Fig. 3 |
Fig. 3 | applications of biomolecular modeling made possible by technology advances.a, AlphaFold workflow.Deep neural networks are trained with known structures deposited in the Protein Data Bank to predict protein structures de novo 155 .The distribution of distances and angles are obtained and then the scores are optimized with gradient descent to improve the designs.b, RNA mesoscale modeling.Target residues for drugs or gene editing in SARS-2-CoV frameshifting element (FSE) identified by graph theory combined with all-atom microsecond MD simulations made possible by a new four-petaflop 'Greene' supercomputer at New York University.c, Chromatin mesoscale modeling.HOXC mesoscale model at nucleosome resolution constructed with experimental information on nucleosome positioning, histone tail acetylation and linker histone binding 130 .Shown is the unfolded gene (left) and the folded structure of the gene (right).d, Cloud computing.Targeted protein sites related to SARS-CoV-2 with bound ligands from the billion compound database screened with VirtualFlow platform that makes use of cloud computing and cloud storage 156 .Figure reproduced with permission from: a, ref. 157 , Springer Nature America, Inc. (structure image); b, ref. 119 , Cell Press; c, ref. 130 , PNAS; d, https://vf4covid19.hms.harvard.edu,Harvard Univ.See original works for full names and abbreviations.

Fig. 4 |
Fig.4| Key developments in algorithms, software and hardware that advanced the field.1970s: simulations of <1,000-atom systems and few picoseconds in vacuo were possible due to the development of digital computers and algorithms to treat long-range Coulomb interactions.The image depicts the structure of the small protein BPTI that was simulated for 8.8 ps without hydrogen atoms and with four water molecules158 .1980s: simulations that considered solvent effects became possible, and algorithms such as SHAKE, to constrain covalent bonds involving hydrogen atoms, allowed the study of systems with explicit hydrogens.The image depicts the 125 ps simulation of the 12 bp DNA in complex with the lac repressor protein159 in aqueous solution using the simple three-point charge water model.1990s: QM/MM methods can perform geometry optimizations, MD and Monte Carlo simulations160 .The image depicts the acetyl-CoA enolization mechanism by the citrate synthase enzyme studied with AM1/CHARMM.2000s: GPUbased MD simulations, specialized supercomputers such as Anton, shared resources such as Folding@home, enhanced sampling algorithms and Markov state models (MSMs) all contributed to advance protein folding161 .The image depicts the long 100 μs simulation of the Fip35 folding conducted in Anton (red) compared with the X-ray structure (blue).2010s: all-atom and coarse-grained MD simulations of viruses performed on supercomputers such as Blue Waters became common162 .The image depicts the MD simulation of the all-atom HIV capsid using the MDFF method that uses cryo-electron microscopy data to guide simulations.2020s: physical whole-cell models are being developed to fully understand how biomolecules behave inside cells and to study interactions between them, for example, within viruses and cells.The image depicts the model of the interaction between the spike protein in the SARS-CoV-2 surface and the ACE2 receptor on a human cell surface being developed in the Amaro Lab.From left to right, images adapted with permission from: (left to right); second image, ref.159 , Wiley; third image, ref.163 , Wiley; fourth image, ref.150 , AAAS; fifth image, ref.164 , Springer Nature Ltd; sixth image, https://amarolab.ucsd.edu/news.php,Amaro Lab.