# Quantum chemical accuracy from density functional approximations via machine learning

## Abstract

Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal mol−1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal mol−1) on test data. Moreover, density-based Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ-DFT  is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT  facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.

## Introduction

The recent rise in the popularity of machine-learning (ML) methods has engendered many advances in the molecular sciences. These include the prediction of properties of atomistic systems across chemical space1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26, the construction of accurate force fields27,28,29,30,31,32,33,34,35,36,37,38,39 for ML-based molecular dynamics (MD) simulations, the representation of the (high-dimensional) statistical distribution of molecular conformers40,41,42, or the prediction of the kinetics of structural transformation of materials43. In many applications, a key task for an ML model is to predict the outcome of an electronic structure calculation without the calculation’s having to be explicitly performed. This could be done at any desired level of electronic structure theory from density functional theory (DFT) to the current gold-standard, namely, coupled-cluster with single, double, and perturbative triple excitations (CCSD(T)). While the latter is generally preferable, its putative N7 computational scaling with system size makes it prohibitive for large molecular systems or even for small systems if many energy and energy gradient calculations are needed, as would be the case in MD simulations or geometry optimizations. Therefore, Kohn-Sham (KS) DFT, with its putative N3 scaling, is often employed as an acceptable compromise between computational efficiency and accuracy. Unfortunately, the wavefunction and DFT formalisms are so distinct that there is no known way to combine the accuracy of the former with the speed of the latter. Thus, an important advance could be achieved if the power of ML could be leveraged to allow large numbers of CCSD(T) calculations to be performed at a cost equal to or even less than that of the same number of DFT calculations for a given system.

An ML scheme capable of realizing the aforementioned objective should satisfy several important criteria: First, the ML framework should be able to deliver basic molecular properties, such as total energies, geometries, and, in principle, electronic properties, all at CCSD(T) accuracy. Beyond this, however, it should also allow geometry optimization and long time-scale MD to be performed with energies and forces at the CCSD(T) accuracy level. The construction of such an ML approach requires a molecular descriptor flexible enough to accomplish both types of tasks, and for this, it seems natural to employ the electron density. It is worth noting that as molecular descriptors have evolved from objects such as SMILES strings44,45, molecular graphs46,47, and molecular graphs with feature vectors24,25,48, there has been a progression toward descriptors that attempt to capture key features of the electron density in a simple manner15,48,49,50,51. Admittedly, employing the full electron density carries with it a considerable computational cost; nevertheless, it is useful to develop such frameworks, considering that more optimal algorithms could follow. Previously, we had shown that the electron density could be used in a self-consistent manner to train a system-specific density functional (akin to a system-specific force field52) using a mapping from the external potential to the electron density and a second map of the density to the total energy53. Rather than delivering a solution to the KS equations, the first map (denoted the ML-HK map) bypasses the KS equations in a manner that is akin to solving the original Hohenberg-Kohn functional differential equation54. The second map from density to energy predicts the result of plugging that solution back into the Hohenberg-Kohn functional to obtain the ground-state energy. While other machine-learning methods for the prediction of electron densities or density functionals have appeared recently50,51,55,56,57,58,59,60,61,62, the ML-HK map facilitates the use of both machine-learned densities, from which electronic properties could be computed, and density functionals for obtaining total energies and gradients for geometry optimization and MD simulation.

In this paper, we describe an approach for generating an ML framework that satisfies the criteria outlined above. The ML model employed in this work is kernel ridge regression (KRR), the basic principles of which in the construction of density functionals have been developed over several years63,64,65,66,67,68,69. In order to advance our ML framework53 to the prediction of coupled-cluster (CC) energies, as opposed to DFT energies, one need only recognize that the basic ML construction procedure is independent of the source of inputs. Therefore, one could readily imagine training the aforementioned maps on a set of CC densities and energies. In practice, however, few quantum chemistry packages yield the CC electron density, as it is not something that is needed to find a CC energy. Therefore, in order to avoid the need to compute a CC electron density, we show that the density-energy map can be constructed by considering the CC energy as a functional of a DFT density obtained within a standard approximation such as PBE, i.e., we regress the CC energy from the PBE density. The density is used as the aforementioned descriptor for a given potential and can additionally serve as an input for learning other properties as well. The ML algorithm then learns to predict the CC energy as a functional of the approximate ML-predicted (descriptor) density. Importantly, we find that it is roughly as easy to train a model that returns the CC energy from the DFT density as it is to train for the self-consistent DFT energy itself. We additionally find that the use of a crudely approximated density results in a reduction in accuracy (even for DFT energies), showing the importance of using accurate densities. Drawing on existing ML experience70, we further show that it is possible to learn the difference between a DFT and a CC energy as a functional of the input DFT densities. Importantly, this can be done with greater efficiency than learning either DFT or CC energies separately. Referring to this approach as Δ-DFT , we show that the error in the training curve for Δ-DFT  drops far faster than that for learning either the DFT or the CC energies themselves, indicating that the error in DFT is much more amenable to learning than the DFT energy itself. Moreover, by exploiting molecular point group symmetries, we drastically reduce the amount of training data needed to achieve quantum chemical accuracy (~1 kcal mol−1), allowing us to extract CC energies from standard DFT calculations, with essentially no additional cost (beyond the initial generation of training data). That is, we create a system-specific ML model capable of yielding CCSD(T) accuracy at the cost of a standard DFT calculation. A single water molecule (see Fig. 1a) is used as the first benchmark of the new scheme. We use the same PBE density as a functional of the potential as in ref. 53 but now with various ML maps of the energy as a functional of the density. While the DFT calculation loses accuracy rapidly when the molecule is either compressed or extended, Δ-DFT  corrects these errors. We then consider the examples of ethanol, benzene, and resorcinol, all of which contain greater internal flexibility. We discuss the issue of sampling input geometries using finite-temperature MD simulations, arguing that care must be taken when these configurations do not reflect the target CCSD(T) energy surface (see Fig. 1b as an illustration for water). Resorcinol is further used as an example of using the ML scheme to generate an ab initio MD trajectory on the predicted underlying CCSD(T) energy surface. Obtaining such a trajectory typically requires hundreds to thousands or tens of thousands of energy and force calculations, which would be prohibitive using explicit CCSD(T) calculations but is routine using the ML model. This example reveals the importance of having CCSD(T) accuracy to describe a conformational change for which DFT produces quantitatively incorrect barriers. Finally, we take a step toward creating a more general model capable of predicting CCSD(T) energies of a small set of similar, but not identical, molecules. Resorcinol, phenol, and benzene are finally used to create an ML functional capable of describing multiple molecules. Here, molecular point group symmetries are exploited to expand the training dataset, thereby reducing the number of explicit CCSD(T) calculations needed to obtain chemical accuracy.

## Results

### Theory

A central difficulty in quantum chemistry is the fundamental incompatibility of the formalisms of DFT and wave-function based ab initio methods such as CCSD(T). Both aim to deliver the ground-state energy of a molecule as a function of its nuclear coordinates. Ab initio methods directly solve the electronic Schrödinger equation, albeit in an approximate yet systematic and controllable fashion. KS-DFT, by contrast, buries all the quantum complexity into an unknown functional of the density, i.e., the exchange-correlation (XC) energy, which must be approximated71,72. A myriad of different forms for such KS-DFT approximations exist. Unfortunately, there is currently no practical route for converting an approximation in one formalism to an approximation in the other, as there is no simple mathematical route to coupling the two formalisms.

In this work, we leverage ML to bypass this difficulty, by correcting DFT energies to CCSD(T) energies. Routine DFT calculations use some approximate XC functional and solve the Kohn-Sham equations self-consistently. However, an alternative approach has long been considered (e.g., ref. 73), in which the exact energy, E, is found by correcting an approximate self-consistent DFT calculation:

$$E={E}^{{\mathrm{DFT}}}[{n}^{{\mathrm{DFT}}}]+\Delta E[{n}^{{\mathrm{DFT}}}],$$
(1)

where DFT denotes the approximate DFT calculation, and ΔE, evaluated on the approximate density, is defined, formally, such that E is the exact energy. This is not the functional of standard KS-DFT, but it still yields exact energies and can be a more practical alternative in which one solves the KS equations within that approximation but corrects the final energy by ΔE. If nDFT is a highly accurate approximation, then ΔE should not differ much from the intrinsic error of the DFT XC approximation. Recently, several classes of DFT calculations have been improved by using densities that are not self-consistent74,75. Thus, regression of DFT densities to find CC energies can be considered a system-specific construction of ΔE[nDFT] of the same kind as the system-specific construction of the HK map53. This differs from a general purpose, explicit XC functional approximation in that (i) it might only be accurate for the systems for which it has been trained, (ii) it has no simple closed form, and (iii) its functional minimum yields only an approximate density. However, using the results from the Supplementary Discussion 2.1, one can, in principle, construct the exact density from a sequence of such calculations. To avoid confusion, we note that Δ-DFT has nothing in common with, e.g., Δ-SCF, a useful alternative to TDDFT for extracting excited state energies in DFT76.

### Coupled cluster accuracy from ML DFT

Details of our approach are found in the “Methods” section. In brief, the approach constitutes a realization of the part of the Hohenberg-Kohn theorem that establishes a one-to-one mapping between external potentials v(r) and ground-state densities n(r) for a specified number of electrons. This map is expressed through the functional relationship n[v](r). In practice, we expand the density in an orthonormal basis ϕl(r) as $${n}_{{\mathrm{ML}}}[v]=\mathop{\sum }\nolimits_{l = 1}^{L}{u}_{\mathrm{ML}}^{(l)}[v]{\phi }_{l}({\bf{r}})$$ and learn the set density expansion coefficients $$\{{{\bf{u}}}_{{}_{{\mathrm{ML}}}}[v]\}$$53 in order to construct a learned DFT density $${n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}({\bf{r}})$$. As previously noted, KRR is employed here as the ML model. A second KRR model is then used to predict energies from a higher level of theory, in this case CC energies:

$${E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]=\mathop{\sum }\limits_{i=1}^{M}{{{\alpha }}}_{i}k({{\bf{u}}}_{{\mathrm{ML}}}[v],{{\bf{u}}}_{{\mathrm{ML}}}[{v}_{i}]).$$
(2)

where k(uML[v], uML[vi]) is the kernel, and {α} are the coefficients learned in the second KRR model. This allows us to create $${E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$, the chemically accurate CC energy, as a functional of the learned DFT density. (This corresponds to learning EDFT + ΔE in Eq. (1)).

In order to demonstrate the methodology behind the map in Eq. (1), we begin by describing the process of learning the CC energy directly via Eq. (2) based on a set of 102 random water geometries (Fig. 1b and Supplementary Fig. 1). Note that the mean absolute error (MAE) of DFT energies relative to the CC energies (relative to the lowest energy conformer in the training set) is 1.86 kcal mol−1, with maximum errors of more than 6 kcal mol−1. The performance of the $${E}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$ and $${E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$ models was evaluated for training subsets containing 10, 15, 20, 30, 40 or 50 geometries, while the test set consisted of 52 geometries (Fig. 1c). Due to the small size of the dataset, we used cross-validation to obtain more stable estimates for the prediction accuracy of the models69. Details of the evaluation procedure are provided in the “Methods” section. As expected, the accuracy of each model improves with increasing training set size, but the benefit of predicting CC energies compared to DFT energies is immediately obvious. For this dataset, the MAE of EDFT relative to ECC (used here as the ground truth) is reached by $${E}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$ with 40 training geometries. Quantum chemical accuracy of 1 kcal mol−1 is obtained using slightly fewer (30) samples for the energy functional $${E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$, and an improved MAE of 0.24 kcal mol−1 with 50 training samples. Once constructed, the time to evaluate EML[n] is the same regardless of the energy on which it is trained (for a fixed amount of training data). There is a clear benefit of training the model on the more accurate CC energies as long as a good performance can be achieved with a small number of samples from the more computationally expensive method.

Standard semilocal density functionals such as PBE typically yield highly accurate densities near equilibrium, and errors in atomization energies are dominated by errors in the energy rather than the self-consistent density77. However, far from equilibrium, these self-consistent densities can differ substantially from the exact density. In such density-sensitive cases, the energy error can be substantially increased by the error in the self-consistent density, leading to many failures of standard functionals78. The need to find accurate densities is bypassed by the ML-CC energy map, as it learns accurate energies even as a functional of an inaccurate density, as in Eq. (1).

### Reducing the CC cost with Δ-DFT

Inspired by the concept of delta learning79, we also propose a machine-learning framework that is able to leverage densities and energies from lower-level theories (e.g., DFT) to predict CC level energies. This is achieved by correcting DFT energies using delta learning, which we denote as Δ-DFT . Instead of predicting the CC energies directly using our machine-learning model, we can instead train a new map $$\Delta {E}_{{\mathrm{ML}}}^{{\mathrm{CC}}-{\mathrm{DFT}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$ that yields the error in a DFT calculation (relative to CC) for each geometry (i.e., the second term in Eq. (1)). We define the corresponding total energy as

$${E}_{\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]={E}^{{\mathrm{DFT}}}[{n}^{{\mathrm{DFT}}}]+\Delta {E}_{{\mathrm{ML}}}^{{\mathrm{CC}}-{\mathrm{DFT}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}].$$
(3)

Correcting the DFT energies in this way leads to a dramatic improvement in the model performance, as seen in Fig. 1c. Remarkably, with only 10 training samples, the MAE of this $${E}_{\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$ model is already lower than the error of $${E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$ trained with 50 samples; using 50 training samples reduces the MAE of the Δ-DFT  model to only 0.013 kcal mol−1. The Δ-DFT  correction is easier to learn than the energies themselves, as illustrated in Fig. 1d for symmetric water geometries that were not included in the previous dataset. Although the optimized geometry differs slightly between DFT and CC, the Δ-DFT  approach provides a smooth map between the two types of electronic structure calculations as a functional of the density. For the most extreme geometries, the model errors for Δ-DFT  are smaller than for the direct models (see Supplementary Fig. 3) and depend differently on the geometry, indicating that there is information contained in the density beyond that of the external nuclear potential. We note in passing that Δ-DFT  links a particular DFT calculation to a particular CC level of theory, rendering comparisons between models trained on different calculations invalid (see Supplementary Discussion 2.2). The comparison between the Δ-DFT  and total energy ML models is further explored with larger molecules in the subsequent sections.

### Δ-DFT  with molecular symmetries

The next molecule chosen to evaluate our ML model is ethanol using geometries and energies from the MD17 dataset32,33. This molecule has two types of geometric minima, for which the alcohol OH is either an anti or doubly degenerate gauche position; the freely rotating CH3 group introduces additional variability into these possible geometries. Supplementary Fig. 4 shows the atomic distributions of the ethanol dataset after alignment based on heavy atom positions. The fact that ethanol possesses internal flexibility and a larger number of degrees of freedom than water naturally renders the learning problem more difficult. Hence, we expect that a greater number of training samples is needed to achieve chemical accuracy for the range of thermally accessible geometries. The dataset contains 1000 training and 1000 test samples with both DFT and CC energies (see Supplementary Fig. 5). The ML-HK map automatically incorporates equivalence for each chemical element, but we can also exploit the mirror symmetry of the molecule by reflecting H atoms through the plane defined by the three heavy atoms, effectively doubling the size of the training set, as outlined in the “Methods” section. To differentiate the models trained on datasets augmented by these symmetries, we add an s in front of the machine-learning model (e.g., sML). Table 1 shows the prediction accuracies of the various sML models for ethanol compared to some other state-of-the-art ML methods for the same dataset. The prediction error for DFT and CC energies is roughly equal to that of other ML models trained only on energies.

It is also important to note that using the $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ functional to correct low-cost DFT energies achieves a MAE for CC energies comparable to those of the most accurate force-based models, (without incurring the cost of evaluating CC forces for each training point). We note that Δ-learning does not improve the energy prediction over a direct force-based sGDML model for CC energies (see Supplementary Table 1). The $${E}_{\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]$$ functional based only on the original 1000 training geometries has a MAE of 0.15 kcal mol−1 (see Supplementary Table 2), hence using the ethanol symmetry reduces the MAE of the ML model by half while requiring the same number of CC calculations.

### Molecule optimization using ML functionals

Neither the training nor test configurations from the MD17 dataset32,33 include the minimum energy conformers of ethanol. Using the ML models, we predicted the energy of the anti and gauche conformers optimized using MP2/6-31G* and the electronic structure methods used to generate the energies for each model. Note that MP2 and PBE have gauche as the global minimum, but the CCSD(T) global minimum is anti. Although all training geometries have energies more than 4.5 kcal mol−1 higher than the global minimum, the ML models are able to predict the energies of the minima with errors below chemical accuracy (see Table 2).

In addition, the machine-learned energy function is sufficiently smooth to optimize ethanol using energy gradients computing from the ML model itself. Calculations for each conformer start from geometries optimized using MP2/6-31G*, which are slightly different from both DFT- and CC-optimized geometries. Figure 2b shows that despite the sparsity of training data near the minimum energy configurations, the ML models trained with different energies can differentiate between the DFT and CC minima with remarkable fidelity.

### ML model sensitivity to density inputs

Our results show that we can use ML models to map learned electron densities to several types of energy targets. This naturally raises the question of how sensitive our results are to the input density. If one does not need accurate self-consistent densities, why bother with the density at all? Why not, instead, simply learn the energy directly from the nuclear potential? To answer this, consider benzene and the 1500 geometries in the MD17 dataset34 (see Supplementary Figs. 7, 8). Due to benzene’s 24 point group (D6h) symmetries, applying our symmetrization approach on 1000 CC training points produces an effective dataset size of 24,000 geometries.

We first investigate the difference between EsML models trained using the self-consistent DFT densities (nDFT) and those created by the ML-HK density map ($${n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}$$). Just as for ethanol, these models have accuracies comparable to other approaches that require CC forces for training (see Supplementary Table 3). Table 3 shows that for any of our energy functionals ($${E}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}$$, $${E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}$$, or $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}$$), model performance differs negligibly when trained using these two-electron density representations because the density-driven errors of the ML-HK maps are small53. Relevant dimensionality estimation (RDE)80 quantifies the effective complexity that the ML models require for predicting, e.g., a particular set of energies given a set of densities (see Supplementary Tables 4, 5, 6). The direct EsML models for benzene using the ground-state densities are all of similar complexity, with a comparable number of relevant data dimensions required to obtain similar accuracy. $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}$$ achieves higher accuracy with fewer relevant data dimensions than either direct model because the energy difference landscape is smoother and easier to learn.

Next, we consider model performance when the molecular electron density is approximated by a superposition of atomic densities (SAD), which are conceptually similar to the pseudodensities used in other ML models9,15 and effectively translate the nuclear potential into electron densities, albeit without a proper description of the chemical bonds. While such densities (denoted as nSAD) cost little to generate, Table 3 shows that ML models trained on these inputs have errors that are at least twice those of models using more accurate densities. The RDE analysis shows that models based on nSAD have comparable dimensionality for direct energy models but significantly lower signal-to-noise ratios (defined in SI for RDE analysis, Supplementary Eq. 2), thus rendering the energy models less accurate. Nonetheless, given the ever-present trade-off between accuracy and computational cost, SAD densities may be useful to avoid self-consistent optimization of the electron density for each geometry. In the case of SAD inputs, energy labels for the ML models would reflect the DFT functional evaluated on the approximate density (e.g., ESAD). For benzene, results are poorer for both the direct ML energy model ($${E}_{{\mathrm{sML}}}^{{\mathrm{SAD}}}$$) and Δ-DFT  ($${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{SAD}}}^{{\mathrm{CC}}}$$), although they are still within chemical accuracy. We understand the larger errors to be due to the increased variance of ESAD labels (seven times that of the self-consistent dataset—see Supplementary Fig. 9) as well as their overall lower signal-to-noise ratio, as evidenced by the RDE analysis (see Supplementary Table 6).

The results presented thus far demonstrate that reasonably accurate ML models can be created using approximate densities that are inconsistent with the energy targets. Such ML models can be generated for applications where speed is more important than accuracy, for example, in the first few cycles of an active learning scheme17, where a cheap approximate density provides sufficient information to train models that ultimately would return CC energies with chemical accuracy. Finally, using accurate self-consistent densities as input significantly improves model performance for the same training and test geometries. These findings provide clear evidence that the electron density contains highly useful machine-learnable information about the molecular system beyond that contained in atomic positions alone.

### MD using CC energies

The final molecular example of 1,3-benzenediol (resorcinol) illustrates the utility of learning multiple ML functionals for the same system. Combining the $${E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ with the more expensive and accurate $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ method, we demonstrate how to run self-consistent MD simulations that can be used to explore the configurational phase space based on CC energies.

Resorcinol has two rotatable OH groups, two molecular symmetry operations, and more degrees of freedom than water, ethanol, or benzene, making this a more stringent test of the ML functionals. The initial datasets are generated from 1 ns classical MD simulations at 500 K and 300 K for the training and test sets, respectively (details are found in the “Methods” section). For the density representation, the 1000 conformer training set is augmented with the two symmetries, resulting in an effective training set size of 4000 samples (see Supplementary Fig. 10). The molecular geometries in the MD-generated training set have energies between 7 and 50 kcal mol−1 above the equilibrium conformer (as shown in Supplementary Fig. 11); the four local minima are also included in the dataset using geometries from MP2/6-31G* optimizations, leading to 1004 unique training geometries and a total effective training set size of 4004 samples. These local minima, which differ in the orientation of the two alcohol groups, are separated by a rotational barrier of  ~ 4 kcal mol−1 (see Supplementary Fig. 12). The maximum relative energy errors between the DFT and the (ground truth) CC energies are 6.1 and 6.7 kcal mol−1, respectively, for geometries included in the training and test sets.

As with the other examples, ML model performance improves with increasing training set size (see Supplementary Fig. 13). When trained on 1004 unique training geometries (4004 training points), the MAE of predicted energies is around 1.3 kcal mol−1 for both $${E}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ and $${E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$, and the error, when using $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$, is only 0.11 kcal mol−1. The Δ-DFT  accuracy is insensitive to the use of the ML-HK map for the density input, as shown in Supplementary Table 7, and is sufficient to run an MD simulation based on CC energies without the need of CC forces.

Although DFT energies may be sufficient for some molecules, the ability to use CC energies to determine the equilibrium geometries and thermal fluctuations is a promising advance. For resorcinol, the relative DFT energies can differ significantly from the CC energies, particularly near the OH rotational barrier that separates conformers (see Supplementary Fig. 12). Conformational changes are also rare events in the MD trajectories, making it crucial to describe the transitions accurately. For example, the exploration of the OH dihedral angles over a 10 ps MD trajectory from a DFT-based constant-temperature simulation at 350 K is shown in Supplementary Fig. 14. In this simulation, only one conformational change is observed, despite several excursions away from the local minima.

Using the $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}$$ approach, we could easily correct energies after running a conventional DFT-MD simulation. However, as shown in Supplementary Fig. 15, for snapshots along a 1.5 ps constant-energy simulation starting from a point near a conformer change, the MAE of DFT energies compared to CC energies for each snapshot is 1.0 kcal mol−1, with a maximum of just under 4.5 kcal mol−1. Therefore, a more promising use of the ML functionals is to run MD simulations using the CC energy function directly. An example $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ trajectory starting from a random training point is shown in Supplementary Fig. 16, with an MAE of 0.2 kcal mol−1.

Starting from a different point in the DFT-generated trajectory serves to illustrate the importance of generating MD trajectories directly on the CC energy surface. As seen in Fig. 3, for constant-energy simulations starting from the same initial condition, a DFT-based trajectory does not have sufficient kinetic energy to traverse the rotational barrier, while the conformer switch does occur for the CC-based trajectory. Astonishingly, the $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ trajectory has a MAE of only 0.18 kcal mol−1 relative to the true CC energies over a range of more than 15 kcal mol−1.

As the Δ-DFT  method requires performing a DFT calculation at each step of the trajectory, we can overcome this computational cost by combining the ML models. The middle panel of Fig. 3b shows the CC trajectory using a reversible reference-system based multi-time-step integrator81 to evaluate energies and forces primarily with the $${E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ model as a reference and with periodic force corrections based on the more accurate $${E}_{{\mathrm{s}}\Delta \,\text{-}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]$$ every three steps (see Supplementary Note 1.4 and Supplementary Fig. 17 for more details). The resulting trajectory has a MAE of 3.8 kcal mol−1 relative to the true CC energies, with the largest errors in regions that are sparsely represented in the training set. This self-consistent exploration of the configurational space with the combined ML models provides an opportunity to improve the sampling in a cost-effective manner.

### Combining densities for improved sampling

The electron density provides some advantages as a descriptor of a chemical system over inputs that rely solely on local atomic environments or connectivity11,12,82. For a given periodic cell and number of basis functions, the same density input structure is able to describe systems with different numbers, types, and orders of atoms. In contrast, models that rely on an atomistic decomposition of the energy must have representations for the environment of each separate element (for example, see refs. 6,26). To improve the sampling represented in the training set for resorcinol, we can leverage overlap with configurational spaces sampled by similar, yet smaller and less costly, molecules. For example, adding data for phenol can provide better sampling of the rotation of an OH group, while the dynamics of benzene contains extensive sampling of C–C bonds.

To demonstrate this feature of density-based ML models, we use 1001 geometries for each of these two molecules as input configurations (see Supplementary Figs. 8, 18), along with the 1004 resorcinol configurations. We trained a set of density-to-energy maps, combining the symmetrized datasets, pairwise and as a complete set, and then we used the resorcinol test set to evaluate the performance of this model. In each case, the density-to-energy map was learned by combining the densities of the different molecules into a single dataset. The models using combinations of true or independently learned densities, displayed in Tables 4 and 5 and Supplementary Tables 8 and 9, show significant improvements in performance, with the prediction error being reduced by 30–60%. The results for models trained on DFT energies are similar to those for CC energies and can be found in Supplementary Table 10.

In addition, we can analogously train an ML-HK map by combining the artificial potentials of the different molecules into one dataset in order to produce a combined map ($${n}_{{\mathrm{sML}}-{\mathrm{c}}}^{{\mathrm{DFT}}}$$). Using the combination of symmetrized phenol and resorcinol data to train the ML-HK map improves the performance of the direct ML energy models, although the Δ-DFT  approach is again less sensitive to the density representation. We note that, unlike the models with independently learned densities, simply adding more training data by including benzene in the ML-HK map, does not significantly change the results. Molecular similarity clearly affects the combination of ML-HK maps (see Supplementary Table 9 for resorcinol and benzene), but the ML density functionals are less sensitive and show improvement for all molecular combinations. We view this as a stepping stone toward learning a truly transferable model capable of predicting both densities and energies for a wide range of configurations and molecules.

## Discussion

DFT is used in at least 30,000 scientific papers each year83, and because of its low cost relative to wave function based ab initio methods, it can be used to compute energies of large molecules. Moreover, if geometry optimizations or MD simulations are desired, these would be beyond the reach of CCSD(T) level calculations owing to the high computational cost. However, if CCSD(T) is affordable for a small number of carefully chosen configurations, then our methodology provides one possible bridge between the DFT and CCSD(T) levels of theory.

There are two distinct modes in which our results can be applied. With Δ-DFT, the cost of a gas-phase MD simulation is essentially that of the DFT-based MD with a given approximate functional, plus the cost of evaluating a few dozen CCSD(T) energies. While the optimal selection of training points is an open question in the field of machine learning, the Δ-DFT  approach presented here may help to reduce the number of points necessary by learning an inherently smoother energy correction map. We stress that no forces are needed for training, making training set generation cheaper than other methods with similar performance. Compared to other machine-learning models, Δ-DFT  is well behaved and stable outside of the training set, since the zero-mean prior allows it to fall back on DFT results when far from the training set. The combination of Δ-DFT  with the ML models for DFT energies of ref. 53 yields both the efficiency from bypassing the KS equations and the accuracy of CCSD(T). While this yields accurate energy functions within the training manifold, it occasionally yields inaccurate energy gradients or forces in an MD simulation, which can be corrected with the Δ-DFT  forces using the appropriate integrators, as shown above.

Clearly, our methodology can be applied to any gas-phase MD simulation or geometry optimization for which CCSD(T) calculations can be performed for a reasonable number of carefully selected configurations. Gas-phase MD, for example, has many applications. Earlier studies focused on comparing equilibrium properties from simulations excluding or including (via the Feynman path integral) nuclear quantum effects84,85,86,87,88. More recent studies have focused on accurate spectroscopy and exploration of reactivity in small complexes and clusters89,90,91,92. For geometry optimization at the CCSD(T) level or testing of DFT energetics against CCSD(T) energies, DFT geometries often must be used due to the prohibitive cost of finding an optimum CCSD(T) geometry. For molecules with many soft modes, finding the geometry can require hundreds of evaluations of energies and forces. Here, we have shown how relatively few energies are needed in Δ-DFT  to produce an accurate energy functional, suggesting the possibility of using Δ-DFT  to speed up such searches, producing CC geometries for molecules that were previously prohibitive. For larger molecules and/or molecules interacting with an environment, recent schemes that embed an ab initio core within a larger DFT calculation93 could also be treated by this method, especially if Δ-DFT  need only be applied to the ab initio portion of the calculation. With suitable training sets, the ML approaches presented here have the potential to enable MD simulations for each of these systems.

Standard electronic structure methods require users to choose between accuracy and computational cost for each application. The success of our new ML approach connecting DFT densities to CC energies provides a new framework and strategy for linking formerly inconsistent calculations to reduce the penalty of this tradeoff. We have also demonstrated that the densities from a simpler molecule can be combined with a more complex system to improve the coverage of critical degrees of freedom. This promising result indicates that the smart use of combined densities from smaller molecular fragments could yield more accurate energies at even lower cost. Given that the CC-DFT energy difference landscape does not resemble the intrinsic energy landscapes of either of the underlying electronic structure methods, themselves, we hope future work will further explore this dissimilarity as a function of training set size and composition for Δ-DFT  models.

ML represents an entirely new approach to extracting energies from DFT calculations, avoiding some of the biases built into human-designed functionals, while also bypassing the need for strict self-consistency between the electron density and the resulting energy when an approximate result is sufficient. As shown here, ML provides a natural framework for incorporating results from more accurate electronic structure methods, thus bridging the gap between the CC and the DFT worlds while maintaining the versatility of DFT to describe electronic properties beyond energy and forces such as the dipole moment, molecular polarizability, NMR chemical shifts, etc. Along with these insights, the long and successful history of KS-DFT suggests that using the density as a descriptor may thus prove to be an excellent strategy for improved simulations in the future.

## Methods

### Machine-learning model

In order to predict the total energy of a system given only the Na atomic positions of a molecule and using the electron density as a key descriptor, we can use the ML-HK map introduced in ref. 53, with the entire procedure being illustrated in Fig. 1a. Initially, we characterize the Hamiltonian by the external nuclear potential v(r), which we approximate using a sum of Gaussians as94

$$v({\bf{r}})=\mathop{\sum }\limits_{\alpha =1}^{{N}^{\text{a}}}{Z}_{\alpha }\exp \left(\frac{-| | {\bf{r}}-{{\bf{R}}}_{\alpha }| {| }^{2}}{2{\gamma }^{2}}\right),$$
(4)

where r are the coordinates of a spatial grid, Rα is a vector containing the atom coordinates of atom α, and Zα is the nuclear charges of atom α. Finally, γ is a width hyperparameter. This Gaussian potential is then evaluated on a 3D grid around the molecule and used as a descriptor for the ML-HK model. For each molecule, cross-validation is used to determine the width parameter, γ, and the grid spacing for discretization of the associated Gaussian potential.

After obtaining the Gaussian potential, we use a KRR model to learn the approximate DFT valence electron density. In order to simplify the learning problem and avoid representing the density on a 3D grid, we expand the density map in an orthonormal basis set, and consequently learn the basis coefficients instead of the density grid points:

$${n}_{{\mathrm{ML}}}[v]({\bf{r}})=\mathop{\sum }\nolimits_{l = 1}^{L}{u}_{\mathrm{ML}}^{(l)}[v]{\phi }_{l}({\bf{r}}).$$
(5)

where ϕl(r) is a basis function. In this work, a Fourier basis is employed. In the applications presented in this work, 12,500 basis functions (25 per dimension) proved sufficient for good performance. Use of KRR to learn these basis coefficients makes the problem more tractable for 3D densities, and more importantly, the orthogonality of the basis functions allows us to learn the individual coefficients independently:

$${u}_{{\mathrm{ML}}}^{(l)}[v]=\mathop{\sum }\nolimits_{i = 1}^{M}{\beta }_{i}^{(l)}k[v,{v}_{i}],$$
(6)

where β(l) are the KRR coefficients and k is a kernel functional.

The independent and direct prediction of the basis coefficients makes the ML-HK map more efficient and easier to scale to larger molecules, since the complexity only depends on the number of basis functions. In addition, we can use the predicted basis coefficients to reconstruct the continuous density at any point in space, making the predicted density independent of a fixed grid and enabling computations such as numerical integrals to be performed at an arbitrary accuracy.

As a final step, another KRR model is used to learn the total energy from the density basis coefficients:

$${E}_{{\mathrm{ML}}}[{n}_{{\mathrm{ML}}}]=\mathop{\sum }\limits_{i=1}^{M}{{{\alpha }}}_{i}k({{\bf{u}}}_{{\mathrm{ML}}}[v],{{\bf{u}}}_{{\mathrm{ML}}}[{v}_{i}]),$$
(7)

where k is the Gaussian kernel.

### Exploiting point group symmetries

Training datasets for our machine-learning model can be easily enriched using the point group symmetries. To extract the point group symmetries and the corresponding transformation matrices we used the SYVA software package95. Consequently, we can multiply the size of the training set by the number of point group symmetries without performing any additional quantum chemical calculations simply by applying the point group transformations on our existing data.

### Cross-validation and hyperparameter optimization

Due to the small number of training and test samples, when evaluating the models on the water dataset, the data were shuffled 40 times, and for each shuffle a subset of 50 geometries was selected as the training set, with the remaining 52 being used as the out-of-sample test set. For the smaller training sets, a subset of the 50 training geometries was selected using k-means sampling.

The hyperparameters for all models were tuned using fivefold cross-validation on the training set. For the ML-HK map from potentials to densities, the following three hyperparameters were optimized individually for each dataset: the width parameter of the Gaussian potential γ, the spacing of the grid on which Gaussian potential is evaluated, and the width parameter σ of the Gaussian kernel k[vvi]. For each subsequent density to energy map $${E}_{{\mathrm{ML}}}^{* }[n]$$, only the width parameter of the Gaussian kernel k(uML[v], uML[vi]) needs to be chosen using cross-validation. Specific values are reported in the Supplementary Tables 1115.

### Classical molecular dynamics

Training and test set geometries for resorcinol (1,3-benzenediol) and phenol were selected from a 1 ns trajectory generated via classical MD using the GAFF force field96. The local minima were optimized using MP2/6-31g* in Gaussian0997. Symmetric atomic charge assignments were determined from a RESP fit98 to the HF/6-31g* calculations, using the three distinct geometries with Boltzmann weights determined by the relative MP2 energies for resorcinol. All other standard GAFF parameters96 for the MD simulations were assigned using the AmberTools package99. To generate resorcinol and phenol conformers, classical MD simulations in a canonical ensemble were run at 300 K and 500 K using the PINY_MD package100 with massive Nosé-Hoover chain (NHC) thermostats101 for atomic degrees of freedom (length = 4, τ = 20 fs, Suzuki-Yoshida order = 7, multiple time step = 4) and a time step of 1 fs.

For the resorcinol and phenol training sets, we selected 1000 conformers closest to k-means centers from the 1 ns classical MD trajectory run at 500 K. The test sets comprise 1000 randomly selected snapshots from the 1 ns 300 K classical MD simulations. Datasets are aligned by minimizing the root mean square deviation (RMSD) of carbon atoms to the global minimum energy conformer.

### DFT molecular dynamics

Born-Oppenheimer MD simulations of a resorcinol molecule in the gas phase were run using DFT in the QUICKSTEP package102 of CP2K v. 2.6.2103. The PBE XC functional104 was used to approximate exchange and correlation, and a mixed Gaussian/plane wave (GPW) basis-set scheme105 was employed with DZVP-MOLOPT-GTH (m-DZVP) basis sets106 paired with appropriate dual-space GTH pseudopotentials107,108. Wave functions were converged to 1E-7 Hartree using the orbital transformation method109 on a multiple grid (n = 5) with a cutoff of 900 Ry for the system in a cubic box (L = 20 bohr). For the constant-temperature simulation, a temperature of 350 K was maintained using massive NHC thermostats101 (length = 4, τ = 10 fs, Suzuki-Yoshida order = 7, multiple time step = 4) and a time step of 0.5 fs.

### ML molecular dynamics

We used the atomistic simulation environment110 with a 0.5 fs time-step to run MD with ML energies. For the constant-temperature simulation, a temperature of 350 K maintained via a Langevin thermostat with a friction value of 0.01 atomic units (0.413 fs−1). Atomic forces were calculated using the finite difference method with ϵ = 0.001 Å.

### Electronic structure calculations

Optimizations for ethanol conformers were run using MP2/6-31g* in Gaussian0997. DFT calculations for the ML models were run using Quantum ESPRESSO code111 with the PBE XC functional104 and projector-augmented wave approach112,113 with Troullier-Martin pseudopotentials replacing explicit ionic core electrons114. Molecules were simulated in a cubic box (L = 20 bohr) with a wave function cutoff of 90 Ry. The valence electron densities were evaluated on a grid with 125 points in each dimension. All CC calculations were run using Orca115 with CCSD(T)/aug-cc-pVTZ116 for water or CCSD(T)/cc-pVDZ116 for resorcinol and phenol.

## Data availability

The data generated and used in this study are available at quantum-machine.org/datasets.

## Code availability

The code generated and used for this study is available at https://github.com/MihailBogojeski/ml-dft.

## References

1. 1.

Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).

2. 2.

Montavon, G. et al. Learning invariant representations of molecules for atomization energy prediction. Adv. Neural. Inf. Process. Syst. 25, 440–448 (2012).

3. 3.

Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound space. N. J. Phys. 15, 095003 (2013).

4. 4.

Botu, V. & Ramprasad, R. Learning scheme to predict atomic forces and accelerate materials simulations. Phys. Rev. B 92, 094306 (2015).

5. 5.

Hansen, K. et al. Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space. J. Phys. Chem. Lett. 6, 2326–2331 (2015).

6. 6.

Bartók, A. P. & Csányi, G. Gaussian approximation potentials: a brief tutorial introduction. Int. J. Quantum Chem. 115, 1051–1057 (2015).

7. 7.

Rupp, M., Ramakrishnan, R. & von Lilienfeld, O. A. Machine learning for quantum mechanical properties of atoms in molecules. J. Phys. Chem. Lett. 6, 3309–3313 (2015).

8. 8.

Bereau, T., Andrienko, D. & von Lilienfeld, O. A. Transferable atomic multipole machine learning models for small organic molecules. J. Chem. Theory Comput. 11, 3225–3233 (2015).

9. 9.

De, S., Bartók, A. P., Csányi, G. & Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18, 13754–13769 (2016).

10. 10.

Podryabinkin, E. V. & Shapeev, A. V. Active learning of linearly parametrized interatomic potentials. Comput. Mater. Sci. 140, 171–180 (2017).

11. 11.

Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K.-R. & Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8, 13890 (2017).

12. 12.

Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet–a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).

13. 13.

Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017).

14. 14.

Yao, K., Herr, J. E. & Parkhill, J. The many-body expansion combined with neural networks. J. Chem. Phys. 146, 014106 (2017).

15. 15.

Eickenberg, M., Exarchakis, G., Hirn, M., Mallat, S. & Thiry, L. Solid harmonic wavelet scattering for predictions of molecule properties. J. Chem. Phys. 148, 241732 (2018).

16. 16.

Ryczko, K., Mills, K., Luchak, I., Homenick, C. & Tamblyn, I. Convolutional neural networks for atomistic systems. Comput. Mater. Sci. 149, 134–142 (2018).

17. 17.

Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

18. 18.

Grisafi, A., Wilkins, D. M., Csányi, G. & Ceriotti, M. Symmetry-adapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 120, 036002 (2018).

19. 19.

Pronobis, W., Tkatchenko, A. & Müller, K.-R. Many-body descriptors for predicting molecular properties with machine learning: analysis of pairwise and three-body interactions in molecules. J. Chem. Theory Comput. 14, 2991–3003 (2018).

20. 20.

Faber, F. A., Christensen, A. S., Huang, B. & von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).

21. 21.

Thomas, N. et al.Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. Preprint at http://arXiv.org/abs/1802.08219 (2018).

22. 22.

Hy, T. S., Trivedi, S., Pan, H., Anderson, B. M. & Kondor, R. Predicting molecular properties with covariant compositional networks. J. Chem. Phys. 148, 241745 (2018).

23. 23.

Schütt, K. T., Gastegger, M., Tkatchenko, A., Müller, K.-R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Comm. 10, 1–10 (2019).

24. 24.

von Lilienfeld, O. A., Müller, K.-R. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).

25. 25.

Noé, F., Tkatchenko, A., Müller, K.-R. & Clementi, C. Machine learning for molecular simulation. Annu. Rev. Phys. Chem. 71, 361–390 (2020).

26. 26.

Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 10, 1–8 (2019).

27. 27.

Li, Z., Kermode, J. R. & De Vita, A. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Phys. Rev. Lett. 114, 096405 (2015).

28. 28.

Gastegger, M., Behler, J. & Marquetand, P. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 8, 6924–6935 (2017).

29. 29.

Schütt, K. et al. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 30, 991–1001 (2017).

30. 30.

John, S. T. & Csányi, G. Many-body coarse-grained interactions using gaussian approximation potentials. J. Phys. Chem. B 121, 10934–10949 (2017).

31. 31.

Huan, T. D. et al. A universal strategy for the creation of machine learning-based atomistic force fields. J. Comput. Mater. 3, 37 (2017).

32. 32.

Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 3, e1603015 (2017).

33. 33.

Chmiela, S., Sauceda, H. E., Müller, K.-R. & Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 9, 3887 (2018).

34. 34.

Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.-R. & Tkatchenko, A. sGDML: Constructing accurate and data efficient molecular force fields using machine learning. Comput. Phys. Commun. 240, 38–45 (2019).

35. 35.

Kanamori, K. et al. Exploring a potential energy surface by machine learning for characterizing atomic transport. Phys. Rev. B 97, 125124 (2018).

36. 36.

Zhang, L., Han, J., Wang, H., Car, R. & E, W. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).

37. 37.

Glielmo, A., Zeni, C. & De Vita, A. Efficient nonparametric n-body force fields from machine learning. Phys. Rev. B 97, 184307 (2018).

38. 38.

Christensen, A. S., Faber, F. A. & von Lilienfeld, O. A. Operators in quantum machine learning: response properties in chemical space. J. Phys. Chem. 150, 064105 (2019).

39. 39.

Sauceda, H. E., Chmiela, S., Poltavsky, I., Müller, K.-R. & Tkatchenko, A. Molecular force fields with gradient-domain machine learning: Construction and application to dynamics of small molecules with coupled cluster forces. J. Chem. Phys. 150, 114102 (2019).

40. 40.

Schneider, E., Dai, L., Topper, R. Q., Drechsel-Grau, C. & Tuckerman, M. E. Stochastic neural network approach for learning high-dimensional free energy surfaces. Phys. Rev. Lett. 119, 150601 (2017).

41. 41.

Mardt, A., Pasquali, L., Wu, H. & Noé, F. VAMPnets for deep learning of molecular kinetics. Nat. Commun. 9, 5 (2018).

42. 42.

Noé, F., Olsson, S., Köhler, J. & Wu, H. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science 365, eaaw1147 (2019).

43. 43.

Rogal, J., Schneider, E. & Tuckerman, M. E. Neural-network-based path collective variables for enhanced sampling of phase transformations. Phys. Rev. Lett. 123, 245701 (2019).

44. 44.

Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Info Model. 58, 1194 (2018).

45. 45.

Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

46. 46.

Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: Moving beyond fingerprints. J. Computer-Aided Molec. Des. 30, 595 (2016).

47. 47.

Schütt, K. T. et al. Machine Learning Meets Quantum Physics, volume 968 (Springer Lecture Notes in Physics, 2020).

48. 48.

Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theor. Comput. 13, 5255 (2017).

49. 49.

Fabrizio, A., Grisafi, A., Meyer, B., Ceriotti, M. & Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 10, 9424–9432 (2019).

50. 50.

Nagai, R., Akashi, R. & Sugino, O. Completing density functional theory by machine learning hidden messages from molecules. npj Comput. Mater. 6, 1–8 (2020).

51. 51.

Sebastian, D. & Fernandez-Serra, M. Machine learning accurate exchange and correlation functionals of the electronic density. Nat. Commun. 11, 1–10 (2020).

52. 52.

Steffen, J. & Hartke, B. Cheap but accurate calculation of chemical reaction rate constants from ab initio data via system-specific black-box force fields. J. Chem. Phys. 147, 161701 (2017).

53. 53.

Brockherde, F. et al. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 8, 872 (2017).

54. 54.

Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964).

55. 55.

Welborn, M., Cheng, L. & Miller, T. F. Transferability in machine learning for electronic structure via the molecular orbital basis. J. Chem. Theory Comput. 14, 4772–4779 (2018).

56. 56.

Seino, J., Kageyama, R., Fujinami, M., Ikabata, Y. & Nakai, H. Semi-local machine-learned kinetic energy density functional with third-order gradients of electron density. J. Chem. Phys. 148, 241705 (2018).

57. 57.

Ryczko, K., Strubbe, D. & Tamblyn, I. Deep learning and density functional theory. Phys. Rev. A 100, 022512 (2019).

58. 58.

Sinitskiy, A. V. & Pande, V. S. Deep neural network computes electron densities and energies of a large set of organic molecules faster than density functional theory (DFT). Preprint at http://arXiv.org/abs/1809.02723 (2018).

59. 59.

Grisafi, A. et al. A transferable machine-learning model of the electron density. ACS Cent. Sci. 5, 57–64 (2019).

60. 60.

Chandrasekaran, A. et al. Solving the electronic structure problem with machine learning. npj Comput. Mater. 5, 22 (2019).

61. 61.

Cheng, L., Welborn, M., Christensen, A. S. & Miller, T. F. III A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules. J. Chem. Phys. 150, 131103 (2019).

62. 62.

Sebastian, D. & Fernandez-Serra, M. Learning from the density to correct total energy and forces in first principle simulations. J. Chem. Phys. 151, 144102 (2019).

63. 63.

Snyder, J. C., Rupp, M., Hansen, K., Müller, K.-R. & Burke, K. Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012).

64. 64.

Snyder, J. C. et al. Orbital-free bond breaking via machine learning. J. Chem. Phys. 139, 224104 (2013).

65. 65.

Snyder, J. C., Rupp, M., Müller, K.-R. & Burke, K. Nonlinear gradient denoising: finding accurate extrema from inaccurate functional derivatives. Int. J. Quantum Chem. 115, 1102–1114 (2015).

66. 66.

Li, L. et al. Understanding machine-learned density functionals. Int. J. Quantum Chem. 116, 819–833 (2016).

67. 67.

Li, L., Baker, T. E., White, S. R. & Burke, K. Pure density functional for strong correlation and the thermodynamic limit from machine learning. Phys. Rev. B 94, 245129 (2016).

68. 68.

Hollingsworth, J., Li, L., Baker, T. E. & Burke, K. Can exact conditions improve machine-learned density functionals? J. Chem. Phys. 148, 241743 (2018).

69. 69.

Hansen, K. et al. Assessment and validation of machine learning methods for predicting molecular atomization energies. J. Chem. Theory Comput. 9, 3404–3419 (2013).

70. 70.

Ginzburg, I. & Horn, D. Combined neural networks for time series analysis. Adv. Neural Inf. Process. Syst. 224–231 (1994).

71. 71.

Parr, R. G. & Yang, W. Density Functional Theory of Atoms and Molecules (Oxford University Press, 1989).

72. 72.

Fiolhais, C., Nogueira, F. & Marques, M. A Primer in Density Functional Theory. (Springer-Verlag, New York, 2003).

73. 73.

Levy, M. & Görling, A. Correlation energy density-functional formulas from correlating first-order density matrices. Phys. Rev. A 52, R1808 (1995).

74. 74.

Kim, M.-C., Sim, E. & Burke, K. Understanding and reducing errors in density functional calculations. Phys. Rev. Lett. 111, 073003 (2013).

75. 75.

Vuckovic, S., Song, S., Kozlowski, J., Sim, E. & Burke, K. Density functional analysis: the theory of density-corrected DFT. J. Chem. Theory Comput. 15, 6636–6646 (2019).

76. 76.

Zhu, W., Botina, J. & Rabitz, H. Rapidly convergent iteration methods for quantum optimal control of population. J. Chem. Phys, 108, 1953 (1998).

77. 77.

Wasserman, A. et al. The importance of being self-consistent. Annu. Rev. Phys. Chem. 68, 555–581 (2017).

78. 78.

Sim, E., Song, S. & Burke, K. Quantifying density errors in DFT. J. Phys. Chem. Lett. 9, 6385–6392 (2018).

79. 79.

Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the  Δ-machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).

80. 80.

Braun, M. L., Buhmann, J. M. & Müller, K.-R. On relevant dimensions in kernel feature spaces. J. Mach. Learn. Res. 9, 1875–1908 (2008).

81. 81.

Tuckerman, M. E., Berne, B. J. & Martyna, G. J. Reversible multiple time scale molecular dynamics. J. Chem. Phys. 97, 1990–2001 (1992).

82. 82.

Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).

83. 83.

Pribram-Jones, A., Gross, D. A. & Burke, K. DFT: a theory full of holes? Annu. Rev. Phys. Chem. 66, 283–304 (2015).

84. 84.

Tuckerman, M. E., Marx, D., Klein, M. L. & Parrinello, M. On the quantum nature of the shared proton in hydrogen bonds. Science 275, 817 (1997).

85. 85.

Miura, S., Tuckerman, M. E. & Klein, M. L. An ab initio path integral molecular dynamics study of double proton transfer in the formic acid dimer. J. Chem. Phys. 109, 5920 (1998).

86. 86.

Tuckerman, M. E. & Marx, D. Heavy-atom skeleton quantization and proton tunneling in “intermediate-barrier” hydrogen bonds. Phys. Rev. Lett. 86, 4946 (2001).

87. 87.

Li, X. Z., Walker, B. & Michaelides, A. Quantum nature of the hydrogen bond. Proc. Natl Acad. Sci. USA 108, 6369 (2011).

88. 88.

Pérez, A., Tuckerman, M. E., Hjalmarson, H. P. & von Lilienfeld, O. A. Enol tautomers of Watson−Crick base-pair models are metastable because of nuclear quantum effects. J. Am. Chem. Soc. 132, 11510–11515 (2010).

89. 89.

Kaczmarek, A., Shiga, M. & Marx, D. Quantum effects on vibrational and electronic spectra of hydrazine studied by “on-the-fly. J. Phys. Chem. A 113, 1985 (2009).

90. 90.

Wang, H. & Agmon, N. Complete assignment of the infrared spectrum of the gas-phase protonated ammonia dimer. J. Phys. Chem. A 120, 3117 (2016).

91. 91.

Samala, N. R. & Agmon, N. Structure, spectroscopy, and dynamics of the phenol-(water)(2) cluster at low and high temperatures. J. Chem. Phys. 147, 234307 (2017).

92. 92.

Jarvinen, T., Lundell, J. & Dopieralski, P. Ab initio molecular dynamics study of overtone excitations in formic acid and its water complex. Theor. Chem. Acc. 137, 100 (2018).

93. 93.

Lee, S. J. R., Welborn, M., Manby, F. R. & Miller, T. F. Projection-based wavefunction-in-DFT embedding. Acc. Chem. Res. 52, 1359–1368 (2019).

94. 94.

Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).

95. 95.

Gyevi-Nagy, L. & Tasi, G. SYVA: a program to analyze symmetry of molecules based on vector algebra. Comput. Phys. Commun. 215, 156–164 (2017).

96. 96.

Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).

97. 97.

Frisch, M. J. et al. Gaussian 09 (2009).

98. 98.

Bayly, C. I., Cieplak, P., Cornell, W. & Kollman, P. A. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J. Phys. Chem. 97, 10269–10280 (1993).

99. 99.

Wang, J., Wang, W., Kollman, P. A. & Case, D. A. Antechamber: an accessory software package for molecular mechanical calculations. J. Am. Chem. Soc. 222, U403 (2001).

100. 100.

Tuckerman, M. E., Yarne, D. A., Samuelson, S. O., Hughes, A. L. & Martyna, G. J. Exploiting multiple levels of parallelism in molecular dynamics based calculations via modern techniques and software paradigms on distributed memory computers. Comput. Phys. Commun. 128, 333–376 (2000).

101. 101.

Martyna, G. J., Klein, M. L. & Tuckerman, M. Nosé–Hoover chains: the canonical ensemble via continuous dynamics. J. Chem. Phys. 97, 2635–2643 (1992).

102. 102.

VandeVondele, J. et al. Quickstep: fast and accurate density functional calculations using a mixed Gaussian and plane waves approach. Comput. Phys. Commun. 167, 103–128 (2005).

103. 103.

Hutter, J., Iannuzzi, M., Schiffmann, F. & VandeVondele, J. CP2K: atomistic simulations of condensed matter systems. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 4, 15–25 (2014).

104. 104.

Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).

105. 105.

Lippert, G., Hutter, J. & Parrinello, M. A hybrid Gaussian and plane wave density functional scheme. Mol. Phys. 92, 477–488 (2010).

106. 106.

VandeVondele, J. & Hutter, J. Gaussian basis sets for accurate calculations on molecular systems in gas and condensed phases. J. Chem. Phys. 127, 114105 (2007).

107. 107.

Goedecker, S., Teter, M. & Hutter, J. Separable dual-space Gaussian pseudopotentials. Phys. Rev. B 54, 1703–1710 (1996).

108. 108.

Krack, M. Pseudopotentials for H to Kr optimized for gradient-corrected exchange-correlation functionals. Theoretica Chim. Acta 114, 145–152 (2005).

109. 109.

VandeVondele, J. & Hutter, J. An efficient orbital transformation method for electronic structure calculations. J. Chem. Phys. 118, 4365 (2003).

110. 110.

Bahn, S. R. & Jacobsen, K. W. An object-oriented scripting interface to a legacy electronic structure code. Comput. Sci. Eng. 4, 56–66 (2002).

111. 111.

Giannozzi, P. et al. QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys.: Condens. Matter 21, 395502 (2009).

112. 112.

Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999).

113. 113.

Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).

114. 114.

Troullier, N. & Martins, J. L. Efficient pseudopotentials for plane-wave calculations. Phys. Rev. B 43, 1993–2006 (1991).

115. 115.

Neese, F. The ORCA program system. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2, 73–78 (2012).

116. 116.

Dunning, T. H. Gaussian basis sets for use in correlated molecular calculations. I. the atoms boron through neon and hydrogen. J. Chem. Phys. 90, 1007–1023 (1989).

## Acknowledgements

The authors thank Dr. Felix Brockherde and Joseph Cendagorta for helpful discussions, Dr. Li Li for the initial water dataset, and Dr. Huziel Sauceda and Dr. Stefan Chmiela for the optimized geometries of ethanol and for helpful discussions. Calculations were run on NYU IT High Performance Computing resources and at TUB. Work at NYU was supported by the U.S. Army Research Office under contract/grant number W911NF-13-1-0387 (L.V.-M. and M.E.T.). K.-R.M. was supported in part by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea Government (No. 2019-0-00079, Artificial Intelligence Graduate School Program, Korea University), and was partly supported by the German Ministry for Education and Research (BMBF) under Grants 01IS14013A-E, 01GQ1115, 01GQ0850, 01IS18025A, 031L0207D and 01IS18037A; the German Research Foundation (DFG) under Grant Math+, EXC 2046/1, Project ID 390685689. K.B. was supported by NSF grant CHE 1856165. This publication only reflects the authors views. Funding agencies are not liable for any use that may be made of the information contained herein.

## Author information

Authors

### Contributions

L.V.-M. initiated the project and M.B. and L.V.-M. ran all simulations. M.E.T, K.-R.M., and K.B. conceived the theory and co-supervised the project. All authors guided the project design, contributed to data analysis, and co-wrote the manuscript.

### Corresponding authors

Correspondence to Mark E. Tuckerman or Klaus-Robert Müller or Kieron Burke.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review information Nature Communications thanks Reinhard Maurer and the other anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Bogojeski, M., Vogt-Maranto, L., Tuckerman, M.E. et al. Quantum chemical accuracy from density functional approximations via machine learning. Nat Commun 11, 5223 (2020). https://doi.org/10.1038/s41467-020-19093-1

• Accepted:

• Published:

• ### Life in silico: Are we close yet?

• Dmitrii E. Makarov

Proceedings of the National Academy of Sciences (2021)

• ### Dynamical strengthening of covalent and non-covalent molecular interactions by nuclear quantum effects at finite temperature

• Huziel E. Sauceda
• , Valentin Vassilev-Galindo
• , Stefan Chmiela
• , Klaus-Robert Müller
•  & Alexandre Tkatchenko

Nature Communications (2021)

• ### Learning to Approximate Density Functionals

• Bhupalee Kalita
• , Li Li
• , Ryan J. McCarty
•  & Kieron Burke

Accounts of Chemical Research (2021)

• ### Pure non-local machine-learned density functional theory for electron correlation

• Johannes T. Margraf
•  & Karsten Reuter

Nature Communications (2021)

• ### Mean-field density matrix decompositions

• Janus J. Eriksen

The Journal of Chemical Physics (2020)