Introduction

Biomolecular simulations are essential tools for drug design and development, and for our understanding of the molecular basis of disease1,2. Simulations provide a computational microscope to reveal biological mechanisms in atomic detail3. They can reveal cryptic drug binding sites4 and predict important biological properties, such as drug resistance5. Molecular dynamics (MD) is among the most widely used method in biomolecular simulation: it applies empirical molecular mechanics (MM) force fields and can now be used to explore time-dependent phenomena in atomic detail at the scale of viral capsids6 over microseconds7, given sufficient computational power. MM methods are increasingly applied in structure-based drug design, for example, for free-energy calculations to predict binding affinities of pharmaceutical leads to their targets, accelerating drug development8. Their importance and algorithmic efficiency (based on years of development by many pioneers) have made atomistic MD simulations one of the largest scientific consumers of computing time globally. These methods (and Monte Carlo simulations) can be applied in rigorous free-energy calculations of relative binding affinities of small molecules to protein targets. However, computational demands in terms of the system size and timescale limit the use of MD methods, but at the same time, the fairly simple potential functions used to achieve computational efficiency somewhat limit their range of application and accuracy. Different types of simulation methods are therefore required for different types of problems. Each of these different simulation methods has strengths, weaknesses and practical limitations in terms of the system size that can be simulated, the simulation length that can be achieved and the type of phenomena that can be modelled. For example, various types of coarse-grained methods allow simulations of phenomena on large spatiotemporal scales, including protein–protein interactions, protein orientation in membranes and packaging of nucleic acids. Simple molecular docking approaches offer a limited level of detail about molecular interactions, conformational flexibility and solvation in favour of increased computational efficiency for the rapid identification of potential leads from large databases. At the other extreme of computational molecular science, quantum chemical methods can be used to model chemical reactions (for example, the mechanisms of enzyme catalysis) and calculate spectra. First-principles (ab initio) electronic structure techniques for the structural optimization of proteins9 and atomistic simulations to investigate the dynamics of systems of appreciable size and complexity over nanosecond to microsecond10,11 and even millisecond timescales12 (for smaller systems) exemplify the upper limits of the current capability of atomic and molecular methods13.

Multiscale modelling approaches are emerging in drug design with a potentially enormous impact on human health. Drugs act at the molecular scale but obviously have macroscopic effects, so we must consider multiple lengthscales to understand how drugs exert their effects. The dynamic nature of drug targets and the breathtaking complexity of biological systems challenge scientific understanding from the level of molecular structure all the way up to the level of cellular organization and beyond. Each level provides challenges and fascination in its own right, but a holistic approach requires an understanding of how changes at different levels are linked together and influence each other. No single simulation method can address all of the many questions involved nor explain how phenomena at various spatiotemporal scales are coupled and linked. Multiscale simulation methods aim to model and analyse the connections across scales to determine, for example, how changes at one scale lead to changes at another. An obvious challenge is the integration of data and simulations across lengthscales and timescales. Current multiscale approaches are potentially capable of overcoming these limits by directly combining different levels of description, bringing a new perspective to drug discovery.

The awarding of the 2013 Nobel Prize in Chemistry to Karplus, Levitt and Warshel for their seminal contributions in developing multiscale methods for modelling complex biochemical systems recognized the essential role of theoretical and computational methods as a direct and necessary complement to experiment, as well as the birth of multiscale molecular modelling in biochemistry14. Today, nearly half a century after the advent of these methods, multiscale simulations offer the tantalizing possibility of understanding biology in intricate and exquisite detail. For example, multiscale modelling offers an understanding of how a chemical reaction occurring at an enzyme active site can affect other proteins and extend through the hierarchy of biological complexity — from subcellular neighbourhoods to cells and tissues. The power of multiscale methods lies in the possibility to create a fluid knowledge landscape that can synthesize disparate modelling and experimental data at different spatiotemporal scales. Practical insight will depend on combining diverse approaches, linking chemistry at the atomic scale with biological function at the cellular and higher levels to elucidate the mechanisms of emergent phenomena, and doing so in a way that circles back to drive drug design and development.

Unleashing the potential of emerging experimental data sources. Technological innovations for the in situ acquisition of biological structural data, including advances in direct detector and phase plate technologies15 for X-ray beamlines and electron microscopes (EMs), give access to new and detailed information across a range of previously inaccessible scales and, in some instances, with time resolution (Fig. 1). Multiscale computational approaches are needed to fill in and connect data sets, which include data obtained from serial block wide-field EM illumination of tissue and cellular ultrastructure capable of achieving isotropic resolutions within tens of nanometres for biologically representative (endogenous, not cultured) samples16; cryoelectron tomography (cryoET) to localize supramolecular complexes and yield glimpses into cells with molecular resolution (2–4 nm resolution in individual tomograms)1719; soft X-ray tomography to image whole hydrated (not stained or frozen) cells in their near-native state20; near-atomic cryoelectron microscopy (cryoEM)19; small-angle X-ray scattering (SAXS) and small-angle neutron scattering (SANS); X-ray crystallography, diffuse scattering for an ensemble-based view of X-ray structures21; X-ray free-electron lasers22; time-resolved X-ray diffraction23; and neutron diffraction24. In parallel, ongoing innovations in biophysical techniques, such as NMR spectroscopy (for example, used for internal protein segmentation25) and hydrogen–deuterium exchange mass spectrometry (HDX-MS)26, continue to enrich our understanding of the dynamics and interactions of molecular and macromolecular ensembles. Interpretation, refinement and understanding of the high-resolution data from all these techniques challenge current modelling approaches. In addition, all these new data call for the development of models and tests for their validation.

Figure 1: Multiscale structure-based and physics-based methods bridging from atoms to cells.
figure 1

Emerging multiscale computational methods coupled with increasingly accurate structural data on biological and chemical systems enables the development of highly detailed and predictive models of drug action across spatial scales ranging from angstroms to microns, and temporal scales ranging from femtoseconds to minutes. Such approaches can be gainfully used to address a number of outstanding challenges in drug discovery and design (Table 1). FIB, focused ion beam; SAXS, small angle X-ray scattering; SANS, small angle neutron scattering; XFEL, X-ray free-electron laser.

PowerPoint slide

The coming fusion of simulation and data science. The convergence of improved and increased biophysical data, together with impressive algorithmic advances, occurs against a backdrop of an ever-expanding, increasingly diverse and more capable computing landscape. Porting simulation methods to the growing range of novel hardware architectures (for example, graphical processing units (GPUs), advanced reduced instruction set computing (RISC) machine (ARM)-based high-performance computing (HPC), cloud computing, petascale HPC and the emerging horizon of exascale computing27) is extending the scope and range of simulations. The rapid growth of data science also offers transformative possibilities, not only in the manipulation of simulation data and linking across spatiotemporal scales but also in its seamless integration with experimental data. Examples include the systematic development of Jupyter notebooks28 and automated workflows29,30, and improved data sharing31,32, integration33 and analytics34. These developments are driving cultural shifts towards improved reproducibility, openness, sharing, robustness and, ultimately, predictive ability of the computational approaches discussed in this Perspective.

Potential applications of multiscale methods in drug discovery. Multiscale methods span two or more spatial or temporal domains, or combine different types of treatment, aiming to give insight across scales. We note that the concept of multiscale modelling has been developed in several disciplines. Here, we focus on multiscale simulation methods that are based on fundamental physics and tackle processes at the molecular level potentially relevant to drug discovery. Multiscale techniques can be constructed in different ways, depending on how different levels of description are combined or coupled and how information is passed among the different levels35. Multiscale methods can combine different levels of theory or resolution, for example, combining MD with Brownian dynamics (BD) to access long (seconds) timescales and large (micron) lengthscales (MD–BD36) or combining quantum mechanics (QM) with MM (QM/MM37) to study electronic properties in a single simulation. Another class of multiscale methods comprises the hierarchical integration of sets of approaches carried out at different scales, which leads to one ultimate, cohesive model. In this case, the final result is obtained through the interchange of key parameters across model scales3841, even though the simulation platform itself may not directly interface two distinct physical regimes. A related form of multiscale modelling is based on the connection (and synthesis) of different types of biological, chemical, structural and biophysical data — a particularly exciting approach, given technological advances across a spectrum of experimental techniques. The development of increasingly accurate integrative models as an initial framework on which multiscale simulation methods subsequently operate presents a powerful emerging paradigm for drug discovery.

The study of multiscale methods for drug design is a wide and rapidly growing field; it is rich in potential but has yet to realize its enormous promise. The diversity of approaches and applications means that we can cover only a few relevant examples in this Perspective to indicate the vast potential of this field. Here, we focus on some recent and exciting methodological advances, as well as on challenges for development and applications that highlight the promise of simulations to bridge scales from atoms to cells.

Cellular to subcellular

The ongoing surge of high-resolution structural data is providing detailed views into many previously inaccessible biological compartments (such as the cell nucleus) and enabling the development of correspondingly realistic molecular models of cells and subcellular organelles. New tools such as cellPACK12, which inter-operates with cellVIEW42 and LipidWrapper13, can be used to model complex biomolecular systems at the mesoscale, reaching atomic resolution. Combining different data sets across scales of resolution enables direct multiscaling from the standpoint of data integration (Box 1). The accessible spatial range now reaches the micron scale, with essentially no limits on the complexity of the constituents of the system under investigation. A key advance is the ability to develop many ensembles of such models compatible with multiple sources of experimental data (for example, proteomics and structural data from X-ray through cell tomography). This allows researchers to modify the construction of a system (for example, changing the expression level of a particular protein, introducing a structural perturbation to the membrane or organelle shape, varying the molecular composition of viral strains (reassortment) or adding post-translational modifications) in order to test consistency against different types of data and predict the effects on the system induced by these changes.

Once structural models are assembled, researchers can explore biological heterogeneity at the molecular level and statistical distributions of biological and chemical components within these complex environments. Monte Carlo-based methods, such as MCell43, and continuum-based methods, such as SubCell44, enable researchers to investigate biological phenomena without explicitly accounting for molecular (particle) collisions and interactions. Reaction–diffusion master equation (RDME) methods combine network-based and particle-based approaches on discretized grids and/or lattice sites; in doing so, these approaches allow the development of whole-cell-based models of drugs and their dynamical interactions with receptors. Two exemplary GPU-accelerated RDME programs are Lattice Microbe4547, which splits the reaction and diffusion operators to allow efficient models of in vivo crowding during particle diffusion, and ReaDDyMM48, which combines reactions at lattice sites with particle–particle interactions at off-lattice sites, as determined by MD. Another promising multi-resolution method — lattice Boltzmann MD (LBMD) — employs a mixed approach in which (dynamic) proteins are represented as coarse-grained particles, and the solution through which the proteins diffuse is represented probabilistically, such that multiple physical elements (including hydrodynamic and thermodynamic forces) can be included49.

Particle-based approaches range from coarse-grained (each particle represents a group of atoms, from part of an amino acid residue to a whole protein) to fully atomistic (every atom represented individually) representations of the molecular constituents. Although coarse-grained techniques offer the possibility of simulating the behaviour of larger systems on longer timescales compared with fully atomistic approaches, the choice of particle representation (that is, how atoms are grouped) and the related force field development still present challenges50. Atomic details are neglected entirely in coarser models, such as fluctuating finite element analysis (FFEA). In this approach, the macromolecules are essentially treated as density maps, subject to thermal fluctuations within a continuum medium that encodes the material properties, such as shear and elastic stress. Lower-resolution data, such as data obtained from SAXS or cryoEM, can be directly linked to FFEA, effectively bypassing the requirement of starting with a detailed molecular model51. Such approaches can be essential, for example, when higher-resolution structural data for system components are not available or when not all of the molecular components in a particular system are known.

Particle-based simulations in which the molecular components are represented with atomic detail — rigid-body BD and fully flexible atomistic MD simulations — are proving ever more capable6,5254. Rigid-body BD uses fully atomistic representations of molecules but neglects the molecular internal degrees of freedom. In this representation, rigid molecules (for example, proteins) are free to tumble and rotate as they diffuse relative to each other in a viscous medium (with water as a representative continuum solvent) and are subject to random motion according to the fluctuation–dissipation theorem. The intermolecular forces that govern the interaction and collisions of the particles are electrostatic in nature and can be represented with a modified form of the Poisson–Boltzmann equation. By contrast, internal motions and conformational changes are considered in fully flexible MD simulations. BD and MD can be used to explore the dynamics of molecules in crowded biological milieus, improving our understanding of detailed biological and chemical interactions10,52. They can be combined in multiscale approaches such as SEEKR (Box 2). Recent efforts linking particle-based simulations to higher-level systems biology or network-based models exemplify how hand-offs between such methods can be gainfully achieved. For example, association rates determined with BD, MD and SEEKR in network-based Markov chain of states models have been combined in order to define the mechanisms underlying the cooperative nature of cAMP activation of protein kinase A38. Systems biology models and MD simulations have been combined, for example, to generate predictions of the effects of enzyme–substrate binding affinity changes due to genetic variation in human erythrocytes39. In both cases, linking molecular and cell-based models allowed mechanistic insights and predictions at the atomic level to inform network-based whole-cell models of disease. Collectively, multiscale approaches spanning cellular to subcellular scales will help to address the challenge of better predictive models for off-target effects (such as binding of drugs to targets other than that desired). Such multiscale approaches also enable a more complete understanding of chemical mechanisms of action and their effects at larger scales, especially for complex signalling pathways that require a broad view of molecular complexity. Hence, particle-based multiscale approaches at this scale may be particularly useful for understanding drug action.

Challenges at the cellular and subcellular scales will continue to include the experimental determination of the constituents of cellular compartments (protein counts, mRNA expression levels, and so on) with enough detail (spatial and temporal resolution) to enable the development of biologically accurate structural models. New tools for segmentation and refinement of tomographic data are needed, particularly when the data are highly complex. For example, recent ultrastructural 3D mapping of the cell nucleus shows how new imaging techniques, such as ChromEMT, can directly reveal details of the higher-order structure of chromatin; at the same time, it underscores the need for new tools that are able to segment and refine the structural data at the level of individual chromatin fibres55. These data will transform our understanding of the relationship between chromatin structure and dynamics and the regulation of gene expression, which is an inherently multiscale challenge56. Additionally, once these structures are refined, new tools for developing numerically computable meshes from these reconstructions will be required to create models that can be extended and interrogated with physics-based simulations. Innovations in simulation approaches are required to effectively handle hydrodynamic interactions57, as well as different concentration and diffusion regimes within a complex, crowded cell scene58. Finally, a major set of challenges relates to the ability of researchers to set up and execute complex models and simulations. In addition, the substantial complexity of highly detailed large-scale models can make interpretation of the simulated phenomena difficult. These challenges are intimately linked with data analysis and visualization and will benefit from technological advances such as machine learning59 and virtual reality60.

Subcellular to molecular and atomistic

The action and metabolism of many drugs is fundamentally based on the changes in and interactions among individual molecules. Multiscale methods are needed to connect molecular changes to changes induced at subcellular levels and beyond. Increasingly informed integrative models of macromolecular complexes61,62 are providing atomically detailed views of complex drug targets and drug–target interactions63, owing to advances in cryoEM in particular. The 2017 Nobel Prize in Chemistry recognized these developments64. However, moving beyond the traditional paradigm of studying single drug targets in isolation to tackle the dynamics of macromolecular complexes and, for example, their interactions with the genome63 poses a challenge to atomic-scale modelling. The ability to model protein–protein complexes more accurately (for example, allowing for changes in conformation driven by intermolecular interactions and chemical changes) will improve protein and antibody design for vaccine development6568. Indeed, the resolution revolution taking place in cryoEM and cryoET will drive the development of methods for multiscale simulation across the subcellular and molecular scales. Multiscale simulations will offer an expanded perspective of drug targets, elucidating their detailed and often crucial interactions with realistically complex membranes and other proteins. The new structural understanding provided by diverse approaches has already helped industrial research teams reconcile seemingly divergent or otherwise inexplicable experimental assay results69.

There has been considerable recent progress in the development of methods enabling the exploration and characterization of the dynamics of molecular-scale systems. An example particularly relevant to drug discovery is provided by simulation-based approaches for the identification of so-called cryptic (hidden) pockets or sites of allosteric activation, which are not evident in X-ray crystallographic structures and can be novel drug-target sites4,70,71. Another promising class of techniques is based on Markov state models (MSMs), which enable the extension of temporal scales achievable in ensemble-based approaches through the extraction of long-timescale dynamics from many short-timescale simulations. By using MSMs, one takes a more statistics-oriented approach to trajectory analysis: individual states are defined or identified, and the dynamics between the interconnected states, assumed to be Markovian, are modelled as a transition probability matrix, populated from independent simulations. Dynamic information relating the states is typically obtained through many short-timescale MD simulations that are integrated into one cohesive framework7274. MSMs have been used to predict the thermodynamic and kinetic landscapes for the activation of multiple kinases75,76 and protein–protein association pathways77, characterizing biological processes that occur over timescales from microseconds to hours. Notably, MSMs make use of the statistical sampling necessary for the analysis of larger-lengthscale (for example, subcellular-scale or cell-scale) simulations, which by their nature contain many independent copies of one particular drug target. One can use relatively short-timescale (for example, tens of nanoseconds) simulations of large and/or multicomponent systems, potentially comprising hundreds of millions of atoms, in combination with MSMs to extract long-timescale information (for example, kinetics on the order of milliseconds) for the individual molecular components of a biological scene. These methods will facilitate an increasingly accurate understanding of how drugs act at a particular site and subsequently alter the dynamical landscape of their receptors. These include, for example, the highly dynamic G protein-coupled receptors78, one of the most important classes of pharmaceutical targets. MD-based methods also stand to benefit from and connect to advances in experimental structural characterization methods. These methods include diffuse X-ray scattering, which gives improved characterization of protein flexibility and ligand binding, and the detailed picture of biomolecular heterogeneity emerging from high-resolution cryoEM, both of which can feed back into the development of more accurate MD force fields21.

The development of effective drugs is becoming increasingly reliant on our understanding and ability to quantify and optimize the kinetics of drug binding (associated with a rate constant kon) and unbinding (koff). Drugs must bind to their target quickly enough to avoid being cleared from the body before they can act, and must also remain bound long enough to exert an effect. These considerations are more important than the thermodynamic binding affinity in many cases. Drugs with slow rates of dissociation have longer interaction times (also referred to as residence times) with their targets and therefore are often found to be more efficacious79,80. The ability to quantify and predict binding and unbinding rates and residence times therefore represents a major and growing requirement in drug discovery and development programmes. As such, the past few years have seen a dramatic increase in simulation and associated modelling methods to predict such quantities and analyse the molecular and dynamical features that determine them80. These methods include direct quantification through improved sampling techniques81, such as MSMs75,82, smoothed potential MD83,84 and metadynamics85 (for example, with path collective variables and parallel tempering to calculate free-energy profiles and with transition state–partial path transition interface sampling86 for kinetics), or through multiscale simulation methods, such as SEEKR. SEEKR is a novel method that uses milestoning theory to combine atomic-scale rigid-body BD (when the two interacting molecular species are sufficiently far apart) with fully flexible MD simulations (when the particles are in close proximity)36,87,88 (Box 2). The combined use of different dynamical propagators enables the efficient computation of accurate binding kinetics and free energies of binding using a directly multiscale approach.

Predicting membrane permeability is another area of intense interest to drug discovery programmes, for which multiscale physics-based simulations promise considerable advantages over phenomenological (or descriptor-based) models. Both the potential of the mean force for a small molecule crossing the membrane and its position-dependent diffusion constant can be simulated in several ways and combined to predict the passive permeation of drugs through membranes (reviewed in Refs89,90). Recent resurgence of interest in this area has included multiscale efforts to tackle system complexity, for example, for notoriously challenging systems such as Gram-negative bacterial membrane transport9193, multiresolution methods that mix coarse-grained models of membranes with all-atom models of antibacterial compounds94 and methods that enable multiscaling in time through the integrated use of MSMs95 or milestoning96,97.

Despite the developments described above, the ability of present methods to enable the crossing of subcellular to molecular scales — ensuring biological and chemical realism — is not without technical and intellectual challenges. Sampling of slow conformational dynamics in all of these systems remains a key problem. Development of new approaches to accelerate otherwise slow dynamics (for example, multi-ensemble MSMs98) will be required. Innovative solutions to the challenges posed by slow dynamics are likely to require continued development of approaches for hierarchical coarse graining56,99103. As simulations over increasingly longer timescales become routine, research efforts must continue to shift from manual human-driven data curation (which remains the current de facto standard) to machine learning-based methods that will enable the detection of new patterns, correlations and more within the vast amount of data being generated104.

Atomistic to electronic

Molecular structure, dynamics and reactivity arise fundamentally from quantum mechanics. Calculations of the electronic structure of molecules are essential for the study of certain properties. In principle, as Dirac stated long ago, quantum mechanics provides the theoretical route to calculating all molecular properties, with the ‘only’ challenge being computational tractability. The utility of quantum mechanics has been amply demonstrated for small molecules (tens of atoms), for which reaction barriers and spectra can be calculated from ab initio electronic structure methods with accuracy often at least as good as experimental methods. The electronic structure of larger systems (hundreds of atoms) can be calculated with density functional theory (DFT) methods. These are in general somewhat less accurate and are not systematically improvable but have nevertheless revolutionized the role of computation in chemistry by providing useful insight (for example, into reaction mechanisms) at a manageable computational cost. More approximate methods based on semi-empirical molecular orbital theory or approximate DFT allow calculations on even larger systems (thousands of atoms). Algorithmic developments (such as implementation on GPUs9) and methodological developments continue to extend the reach of quantum chemical calculations, both in terms of system size (for example, the thousands of atoms for modelling transition states in enzymes and properties of ion channels) and scope (for example, in MD and Monte Carlo simulations), bringing electronic structure calculations into new biological regimes105. All of these techniques address molecular electronic structure, which is inherently quantum mechanical, and are usually applied with a classical description of molecular dynamics and/or nuclear motion. Quantum dynamical effects such as quantum tunnelling (which is important in determining the rate of hydrogen transfer) can also be investigated by methods that include the effects of quantum dynamics for nuclei.

In principle, quantum mechanical methods offer higher accuracy than empirical force fields, but in practice, it may often be more feasible, or indeed preferable, to apply hybrid approaches, combining a quantum mechanical description of a small region (for example, enzyme active site) with an empirical (MM) treatment of most of the system (protein, solvent, membrane, and so on). Such QM/MM methods were a focus of the researchers awarded the 2013 Nobel Prize in Chemistry and now provide an attractive combination of practicality and versatility for a range of problems. Applications in drug development include informing inhibitor design from the knowledge of interactions of transition states and intermediates106, understanding the reactivity and specificity of (and resistance to) covalent inhibitors107, analysing the coupling of chemical and conformational changes in biomolecules, predicting NMR and electronic spectra (for example, to identify binding modes), developing in situ structure–activity relationships and predicting drug metabolism108110 (Fig. 2). It is possible to extend beyond DFT to highly accurate ab initio methods, for example, by applying projector-based embedding techniques, which can be applied in a QM/MM framework to treat large systems, such as proteins111113. QM/MM MD simulations are possible with lower-level QM treatments and are increasingly practical with DFT methods. A combination of such methods can predict chemically accurate barriers for enzyme-catalysed reactions36,112. QM/MM methods can also now be applied in areas that were previously the domain of empirical force fields, such as in calculations of binding affinities and solvation energies using multilevel sampling approaches8,114118. QM/MM methods can be applied in multiscale schemes coupling sampling at different levels to combine the accuracy of the higher-level method with a more computationally efficient lower-level treatment (Box 3). Potentially, this can overcome some limitations of empirical (MM) atomistic force fields (such as oversimplified descriptions of electrostatics and lack of electronic polarization), which may be particularly important for some target classes, such as metalloproteins119. For example, the binding affinity of water molecules to proteins is affected by changes in the polarization of water molecules120. This effect is larger for larger drug-like molecules and may affect predictions of drug binding and unbinding kinetics owing to the notable changes in solvation involved in these processes. Just as QM methods can generate data to inform the development of atomistic force fields121, which are increasingly driven by machine learning approaches and integrated with experimental data, multiscale QM/MM schemes also offer the potential of testing and developing lower-level methods, for example, in on the fly (re) parameterization122.

Figure 2: Multiscale simulation methods to predict drug metabolism by cytochrome P450 enzymes.
figure 2

a | Cytochrome P450 enzymes (CYPs) play a central role in metabolizing most drugs. Understanding their reactivity and selectivity is a central goal in predicting drug metabolism and provides an example of a drug development challenge requiring multiscale simulation approaches. b | The flowchart shows a practical workflow for multiscale modelling of metabolic reactions of pharmaceuticals in CYPs108. c | Mammalian CYPs are membrane-bound enzymes but typically only the structures of the soluble portions are determined experimentally, lacking the membrane-anchoring helix and the membrane. The intact CYP, in situ, can be modelled by adding the transmembrane helix and assembling the membrane around the protein, which occurs spontaneously in coarse-grained (CG) molecular dynamics (MD) simulations. CG methods allow MD simulations with timescales of microseconds to milliseconds, showing how the protein is oriented in the membrane and how drug molecules, such as warfarin, move through the membrane and associate with the protein. Understanding how the drug accesses and binds within the active site requires more detailed, fully flexible MD simulations with an atomistic (AT) molecular mechanics (MM) representation in which every atom in the simulation is represented explicitly, in contrast to the representation of amino acids by a small number of beads that group atoms together at the CG level. The CG model is converted into an AT model, which can be used for MD simulations of drug binding. Typical MM methods cannot be used to model chemical reactivity; therefore, for potentially reactive poses of the enzyme–drug complex, the system is converted into a quantum mechanics–molecular mechanics (QM/MM) model, in which the reactive compound and the drug are included in the QM region for modelling chemical reactions of the drug in the enzyme.

PowerPoint slide

Methods capable of modelling and predicting chemical reactivity in molecular detail offer opportunities in a number of emerging and challenging areas in drug discovery. A particularly important area of application is in the study of covalent inhibitors. There is renewed and growing interest in developing compounds that covalently bind and/or inhibit enzymes and other targets, offering stronger affinity than non-covalent agents (for example, effectively irreversible binding), and different pharmacokinetics123. There is a need to understand what governs the reactivity of covalent modifiers in vivo to maximize specificity and minimize off-target effects, and to understand resistance to covalent drugs124. The ideal covalent modifier is only activated at the specific target site. Just as for the prediction of drug metabolism, approaches based on the ligands alone cannot capture all the factors relevant to reactivity. Challenges include prediction of the pKa values of target residues for modification (for cysteines, in particular), treatment of conformational effects and identification of unusual mechanisms. For modelling of reactivity, empirical valence bond (EVB) methods are a highly efficient alternative to QM or QM/MM calculations, being much less computationally demanding than electronic structure methods, but they require substantial effort in parameterization125. QM and QM/MM methods (and/or experimental data) can be used to parameterize EVB models126. A practical challenge in applying QM/MM methods127 includes choosing the size of the QM region for optimal efficiency and accuracy128. Adaptive schemes, in which the QM region changes during the simulation, are potentially useful for some applications, such as long-range electron transfer129. Consistency between the particular level of QM treatment and the MM force field is also important. Also, QM/MM simulations typically apply relatively low levels of QM theory, and connecting to higher levels of theory (for example, via embedding or perturbation approaches) can be important for achieving a high degree of accuracy. For large systems, models combining QM/MM and coarse-grained methods will be useful130. Generating reaction pathways and reactive configurations requires effective and efficient simulation methods and improved sampling techniques. A long-cherished goal is to use knowledge of transition states to design enzyme inhibitors based on Pauling's proposal that transition state analogues should be high-affinity ligands. While it is naive to think that this is a universal approach to enzyme inhibition, there is real potential in using structurally detailed knowledge of the interactions of transition state structures in some enzymes, and reaction intermediates in others, to design and optimize binding interactions of drug leads.

Methods for understanding and predicting chemical reactivity in large biological systems also bring into view new types of application that offer novel and exciting routes for drug design and development. For example, simulations of biochemical reactions will be important for understanding the modulation and control of reactivity by conformational effects and allosteric regulation. A fundamentally important theme is understanding how chemical and conformational changes are coupled in biomolecular systems. This is essentially a multiscale problem in itself: it requires knowledge of the role of macromolecular conformational changes in catalytic cycles and of the reaction mechanisms (such as the hydrolysis of ATP) that drive biological motors and other biomolecular ‘machines’. Linking predictions of biomolecular reactivity to larger scales will help in the manipulation of metabolic cycles and signalling cascades. More speculative practical challenges are also becoming apparent: the control and manipulation of reactivity within biological systems promises entirely new types of therapy. Enzyme inhibitors are obviously important as pharmaceuticals, and in a few cases, enzymes are used as drugs, such as thrombolytics, and in enzyme replacement and enhancement therapies to correct genetic deficiencies, usually in rare conditions. The activation of prodrugs often depends on enzymes, and therefore, improved understanding of prodrug–enzyme reactivity will help in the design and development of all types of directed enzyme prodrug therapy131,132. Potentially, engineered or evolved enzymes, catalytic antibodies or hybrid biological chemical catalysts could be used to control selective prodrug reactivity in cells. Photodynamic therapy is another area in which electronic structure calculations can potentially aid drug design, and the understanding and predicting of photoactivation, contributing to improved selectivity133. More radical is the use of designed catalysts to remodel or destroy biological targets in situ134. One example is the possibility of gene editing offered by systems such as CRISPR–Cas9. Application of catalysts in human patients will be accelerated by techniques for understanding and designing determinants of specificity and reactivity, and their interactions in vivo. Multiscale simulation methods capable of modelling reactions and predicting their effects in complex biological systems will contribute to such developments.

The changing role of simulation

An expanding range of chemical, biological, biophysical, spectroscopic and structural techniques is increasingly integral to drug discovery and design programmes. The scale and complexity of the data that they generate demand the concerted development of data-centric computational models to interpret and connect them. The combination of a range of data with multiscale models will provide detailed knowledge of drug targets, including their time dependence and dynamics, transforming our ability to understand and predict drug action. An exciting prospect is the development of interconnected multiscale models spanning the full range of complexity, from chemical action at a target binding site through to complex cellular interactions and beyond. Multiscale models of this scope will also help to identify and analyse adverse drug reaction, arising from drug–drug interactions and off-target, effects. Developing reliable, integrated multiscale methods poses considerable challenges, but it promises significant payoffs in the drug discovery arena. They will drive the generation of experimentally testable hypotheses and assist in experimental design. Understanding of biology will undoubtedly advance faster and more reliably through the effective combination of experimental and multiscale computational science.

The role of biomolecular simulation in drug design and development (and in analysing mechanisms relevant to health and disease) is evolving rapidly. Whereas previously simulation provided simple models to help develop or illustrate hypotheses, simulations are increasingly used as another form of experiment: a computational experiment or assay. As such, simulators must apply similar standards of statistical rigour in assessing the significance of their findings. Computational assays can be used to assess and predict biological properties, such as drug resistance in mutant systems. Simulations can also be used to explore and analyse processes that are otherwise inaccessible or unachievable though experimentation. Ongoing increases in computational power, together with improvements in the reliability and scope of simulation methods (with detailed validation against experimental results), mean that computational assays will become ever more important in drug development, offering speed and affordability, and complementing and managing the growing deluge of experimental data.

No single simulation technique can address all the many challenges (Table 1) and levels of understanding required for modelling biological systems from the molecular to the cellular level; this is the essential driving motivation for the development of multiscale methods. As the examples outlined above highlight, the potential of multiscale modelling and simulation, and of its close integration with experiment, is only just starting to be realized in drug discovery. We expect that, within the next decade, multiscale methods are likely to be central in drug discovery and development programmes. They will form the basis of, and inform, cohesive data-rich models for drug–target systems. Together, they will rationalize and synthesize experimental data, accelerate drug development and help discover effective therapeutics with novel mechanisms of action.

Table 1 Drug discovery challenges and multiscale computational methods that address them

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

How to cite this article

Amaro, R. E. & Mulholland, A. J. Multiscale methods in drug design bridge chemical and biological complexity in the search for cures. Nat. Rev. Chem. 2, 0148 (2018).