Abstract
KohnSham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 23 kcal ⋅ mol^{−1} with presentlyavailable functionals. Ab initio methods, such as coupledcluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupledcluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol^{−1}) on test data. Moreover, densitybased Δlearning (learning only the correction to a standard DFT calculation, termed ΔDFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of ΔDFT is highlighted by correcting “on the fly” DFTbased molecular dynamics (MD) simulations of resorcinol (C_{6}H_{4}(OH)_{2}) to obtain MD trajectories with coupledcluster accuracy. We conclude, therefore, that ΔDFT facilitates running gasphase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.
Introduction
The recent rise in the popularity of machinelearning (ML) methods has engendered many advances in the molecular sciences. These include the prediction of properties of atomistic systems across chemical space^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}, the construction of accurate force fields^{27,28,29,30,31,32,33,34,35,36,37,38,39} for MLbased molecular dynamics (MD) simulations, the representation of the (highdimensional) statistical distribution of molecular conformers^{40,41,42}, or the prediction of the kinetics of structural transformation of materials^{43}. In many applications, a key task for an ML model is to predict the outcome of an electronic structure calculation without the calculation’s having to be explicitly performed. This could be done at any desired level of electronic structure theory from density functional theory (DFT) to the current goldstandard, namely, coupledcluster with single, double, and perturbative triple excitations (CCSD(T)). While the latter is generally preferable, its putative N^{7} computational scaling with system size makes it prohibitive for large molecular systems or even for small systems if many energy and energy gradient calculations are needed, as would be the case in MD simulations or geometry optimizations. Therefore, KohnSham (KS) DFT, with its putative N^{3} scaling, is often employed as an acceptable compromise between computational efficiency and accuracy. Unfortunately, the wavefunction and DFT formalisms are so distinct that there is no known way to combine the accuracy of the former with the speed of the latter. Thus, an important advance could be achieved if the power of ML could be leveraged to allow large numbers of CCSD(T) calculations to be performed at a cost equal to or even less than that of the same number of DFT calculations for a given system.
An ML scheme capable of realizing the aforementioned objective should satisfy several important criteria: First, the ML framework should be able to deliver basic molecular properties, such as total energies, geometries, and, in principle, electronic properties, all at CCSD(T) accuracy. Beyond this, however, it should also allow geometry optimization and long timescale MD to be performed with energies and forces at the CCSD(T) accuracy level. The construction of such an ML approach requires a molecular descriptor flexible enough to accomplish both types of tasks, and for this, it seems natural to employ the electron density. It is worth noting that as molecular descriptors have evolved from objects such as SMILES strings^{44,45}, molecular graphs^{46,47}, and molecular graphs with feature vectors^{24,25,48}, there has been a progression toward descriptors that attempt to capture key features of the electron density in a simple manner^{15,48,49,50,51}. Admittedly, employing the full electron density carries with it a considerable computational cost; nevertheless, it is useful to develop such frameworks, considering that more optimal algorithms could follow. Previously, we had shown that the electron density could be used in a selfconsistent manner to train a systemspecific density functional (akin to a systemspecific force field^{52}) using a mapping from the external potential to the electron density and a second map of the density to the total energy^{53}. Rather than delivering a solution to the KS equations, the first map (denoted the MLHK map) bypasses the KS equations in a manner that is akin to solving the original HohenbergKohn functional differential equation^{54}. The second map from density to energy predicts the result of plugging that solution back into the HohenbergKohn functional to obtain the groundstate energy. While other machinelearning methods for the prediction of electron densities or density functionals have appeared recently^{50,51,55,56,57,58,59,60,61,62}, the MLHK map facilitates the use of both machinelearned densities, from which electronic properties could be computed, and density functionals for obtaining total energies and gradients for geometry optimization and MD simulation.
In this paper, we describe an approach for generating an ML framework that satisfies the criteria outlined above. The ML model employed in this work is kernel ridge regression (KRR), the basic principles of which in the construction of density functionals have been developed over several years^{63,64,65,66,67,68,69}. In order to advance our ML framework^{53} to the prediction of coupledcluster (CC) energies, as opposed to DFT energies, one need only recognize that the basic ML construction procedure is independent of the source of inputs. Therefore, one could readily imagine training the aforementioned maps on a set of CC densities and energies. In practice, however, few quantum chemistry packages yield the CC electron density, as it is not something that is needed to find a CC energy. Therefore, in order to avoid the need to compute a CC electron density, we show that the densityenergy map can be constructed by considering the CC energy as a functional of a DFT density obtained within a standard approximation such as PBE, i.e., we regress the CC energy from the PBE density. The density is used as the aforementioned descriptor for a given potential and can additionally serve as an input for learning other properties as well. The ML algorithm then learns to predict the CC energy as a functional of the approximate MLpredicted (descriptor) density. Importantly, we find that it is roughly as easy to train a model that returns the CC energy from the DFT density as it is to train for the selfconsistent DFT energy itself. We additionally find that the use of a crudely approximated density results in a reduction in accuracy (even for DFT energies), showing the importance of using accurate densities. Drawing on existing ML experience^{70}, we further show that it is possible to learn the difference between a DFT and a CC energy as a functional of the input DFT densities. Importantly, this can be done with greater efficiency than learning either DFT or CC energies separately. Referring to this approach as ΔDFT , we show that the error in the training curve for ΔDFT drops far faster than that for learning either the DFT or the CC energies themselves, indicating that the error in DFT is much more amenable to learning than the DFT energy itself. Moreover, by exploiting molecular point group symmetries, we drastically reduce the amount of training data needed to achieve quantum chemical accuracy (~1 kcal mol^{−1}), allowing us to extract CC energies from standard DFT calculations, with essentially no additional cost (beyond the initial generation of training data). That is, we create a systemspecific ML model capable of yielding CCSD(T) accuracy at the cost of a standard DFT calculation. A single water molecule (see Fig. 1a) is used as the first benchmark of the new scheme. We use the same PBE density as a functional of the potential as in ref. ^{53} but now with various ML maps of the energy as a functional of the density. While the DFT calculation loses accuracy rapidly when the molecule is either compressed or extended, ΔDFT corrects these errors. We then consider the examples of ethanol, benzene, and resorcinol, all of which contain greater internal flexibility. We discuss the issue of sampling input geometries using finitetemperature MD simulations, arguing that care must be taken when these configurations do not reflect the target CCSD(T) energy surface (see Fig. 1b as an illustration for water). Resorcinol is further used as an example of using the ML scheme to generate an ab initio MD trajectory on the predicted underlying CCSD(T) energy surface. Obtaining such a trajectory typically requires hundreds to thousands or tens of thousands of energy and force calculations, which would be prohibitive using explicit CCSD(T) calculations but is routine using the ML model. This example reveals the importance of having CCSD(T) accuracy to describe a conformational change for which DFT produces quantitatively incorrect barriers. Finally, we take a step toward creating a more general model capable of predicting CCSD(T) energies of a small set of similar, but not identical, molecules. Resorcinol, phenol, and benzene are finally used to create an ML functional capable of describing multiple molecules. Here, molecular point group symmetries are exploited to expand the training dataset, thereby reducing the number of explicit CCSD(T) calculations needed to obtain chemical accuracy.
Results
Theory
A central difficulty in quantum chemistry is the fundamental incompatibility of the formalisms of DFT and wavefunction based ab initio methods such as CCSD(T). Both aim to deliver the groundstate energy of a molecule as a function of its nuclear coordinates. Ab initio methods directly solve the electronic Schrödinger equation, albeit in an approximate yet systematic and controllable fashion. KSDFT, by contrast, buries all the quantum complexity into an unknown functional of the density, i.e., the exchangecorrelation (XC) energy, which must be approximated^{71,72}. A myriad of different forms for such KSDFT approximations exist. Unfortunately, there is currently no practical route for converting an approximation in one formalism to an approximation in the other, as there is no simple mathematical route to coupling the two formalisms.
In this work, we leverage ML to bypass this difficulty, by correcting DFT energies to CCSD(T) energies. Routine DFT calculations use some approximate XC functional and solve the KohnSham equations selfconsistently. However, an alternative approach has long been considered (e.g., ref. ^{73}), in which the exact energy, E, is found by correcting an approximate selfconsistent DFT calculation:
where DFT denotes the approximate DFT calculation, and ΔE, evaluated on the approximate density, is defined, formally, such that E is the exact energy. This is not the functional of standard KSDFT, but it still yields exact energies and can be a more practical alternative in which one solves the KS equations within that approximation but corrects the final energy by ΔE. If n^{DFT} is a highly accurate approximation, then ΔE should not differ much from the intrinsic error of the DFT XC approximation. Recently, several classes of DFT calculations have been improved by using densities that are not selfconsistent^{74,75}. Thus, regression of DFT densities to find CC energies can be considered a systemspecific construction of ΔE[n^{DFT}] of the same kind as the systemspecific construction of the HK map^{53}. This differs from a general purpose, explicit XC functional approximation in that (i) it might only be accurate for the systems for which it has been trained, (ii) it has no simple closed form, and (iii) its functional minimum yields only an approximate density. However, using the results from the Supplementary Discussion 2.1, one can, in principle, construct the exact density from a sequence of such calculations. To avoid confusion, we note that ΔDFT has nothing in common with, e.g., ΔSCF, a useful alternative to TDDFT for extracting excited state energies in DFT^{76}.
Coupled cluster accuracy from ML DFT
Details of our approach are found in the “Methods” section. In brief, the approach constitutes a realization of the part of the HohenbergKohn theorem that establishes a onetoone mapping between external potentials v(r) and groundstate densities n(r) for a specified number of electrons. This map is expressed through the functional relationship n[v](r). In practice, we expand the density in an orthonormal basis ϕ_{l}(r) as \({n}_{{\mathrm{ML}}}[v]=\mathop{\sum }\nolimits_{l = 1}^{L}{u}_{\mathrm{ML}}^{(l)}[v]{\phi }_{l}({\bf{r}})\) and learn the set density expansion coefficients \(\{{{\bf{u}}}_{{}_{{\mathrm{ML}}}}[v]\}\)^{53} in order to construct a learned DFT density \({n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}({\bf{r}})\). As previously noted, KRR is employed here as the ML model. A second KRR model is then used to predict energies from a higher level of theory, in this case CC energies:
where k(u_{ML}[v], u_{ML}[v_{i}]) is the kernel, and {α} are the coefficients learned in the second KRR model. This allows us to create \({E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\), the chemically accurate CC energy, as a functional of the learned DFT density. (This corresponds to learning E^{DFT} + ΔE in Eq. (1)).
In order to demonstrate the methodology behind the map in Eq. (1), we begin by describing the process of learning the CC energy directly via Eq. (2) based on a set of 102 random water geometries (Fig. 1b and Supplementary Fig. 1). Note that the mean absolute error (MAE) of DFT energies relative to the CC energies (relative to the lowest energy conformer in the training set) is 1.86 kcal mol^{−1}, with maximum errors of more than 6 kcal mol^{−1}. The performance of the \({E}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\) and \({E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\) models was evaluated for training subsets containing 10, 15, 20, 30, 40 or 50 geometries, while the test set consisted of 52 geometries (Fig. 1c). Due to the small size of the dataset, we used crossvalidation to obtain more stable estimates for the prediction accuracy of the models^{69}. Details of the evaluation procedure are provided in the “Methods” section. As expected, the accuracy of each model improves with increasing training set size, but the benefit of predicting CC energies compared to DFT energies is immediately obvious. For this dataset, the MAE of E^{DFT} relative to E^{CC} (used here as the ground truth) is reached by \({E}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\) with 40 training geometries. Quantum chemical accuracy of 1 kcal mol^{−1} is obtained using slightly fewer (30) samples for the energy functional \({E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\), and an improved MAE of 0.24 kcal mol^{−1} with 50 training samples. Once constructed, the time to evaluate E_{ML}[n] is the same regardless of the energy on which it is trained (for a fixed amount of training data). There is a clear benefit of training the model on the more accurate CC energies as long as a good performance can be achieved with a small number of samples from the more computationally expensive method.
Standard semilocal density functionals such as PBE typically yield highly accurate densities near equilibrium, and errors in atomization energies are dominated by errors in the energy rather than the selfconsistent density^{77}. However, far from equilibrium, these selfconsistent densities can differ substantially from the exact density. In such densitysensitive cases, the energy error can be substantially increased by the error in the selfconsistent density, leading to many failures of standard functionals^{78}. The need to find accurate densities is bypassed by the MLCC energy map, as it learns accurate energies even as a functional of an inaccurate density, as in Eq. (1).
Reducing the CC cost with ΔDFT
Inspired by the concept of delta learning^{79}, we also propose a machinelearning framework that is able to leverage densities and energies from lowerlevel theories (e.g., DFT) to predict CC level energies. This is achieved by correcting DFT energies using delta learning, which we denote as ΔDFT . Instead of predicting the CC energies directly using our machinelearning model, we can instead train a new map \(\Delta {E}_{{\mathrm{ML}}}^{{\mathrm{CC}}{\mathrm{DFT}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\) that yields the error in a DFT calculation (relative to CC) for each geometry (i.e., the second term in Eq. (1)). We define the corresponding total energy as
Correcting the DFT energies in this way leads to a dramatic improvement in the model performance, as seen in Fig. 1c. Remarkably, with only 10 training samples, the MAE of this \({E}_{\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\) model is already lower than the error of \({E}_{{\mathrm{ML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\) trained with 50 samples; using 50 training samples reduces the MAE of the ΔDFT model to only 0.013 kcal mol^{−1}. The ΔDFT correction is easier to learn than the energies themselves, as illustrated in Fig. 1d for symmetric water geometries that were not included in the previous dataset. Although the optimized geometry differs slightly between DFT and CC, the ΔDFT approach provides a smooth map between the two types of electronic structure calculations as a functional of the density. For the most extreme geometries, the model errors for ΔDFT are smaller than for the direct models (see Supplementary Fig. 3) and depend differently on the geometry, indicating that there is information contained in the density beyond that of the external nuclear potential. We note in passing that ΔDFT links a particular DFT calculation to a particular CC level of theory, rendering comparisons between models trained on different calculations invalid (see Supplementary Discussion 2.2). The comparison between the ΔDFT and total energy ML models is further explored with larger molecules in the subsequent sections.
ΔDFT with molecular symmetries
The next molecule chosen to evaluate our ML model is ethanol using geometries and energies from the MD17 dataset^{32,33}. This molecule has two types of geometric minima, for which the alcohol OH is either an anti or doubly degenerate gauche position; the freely rotating CH_{3} group introduces additional variability into these possible geometries. Supplementary Fig. 4 shows the atomic distributions of the ethanol dataset after alignment based on heavy atom positions. The fact that ethanol possesses internal flexibility and a larger number of degrees of freedom than water naturally renders the learning problem more difficult. Hence, we expect that a greater number of training samples is needed to achieve chemical accuracy for the range of thermally accessible geometries. The dataset contains 1000 training and 1000 test samples with both DFT and CC energies (see Supplementary Fig. 5). The MLHK map automatically incorporates equivalence for each chemical element, but we can also exploit the mirror symmetry of the molecule by reflecting H atoms through the plane defined by the three heavy atoms, effectively doubling the size of the training set, as outlined in the “Methods” section. To differentiate the models trained on datasets augmented by these symmetries, we add an s in front of the machinelearning model (e.g., sML). Table 1 shows the prediction accuracies of the various sML models for ethanol compared to some other stateoftheart ML methods for the same dataset. The prediction error for DFT and CC energies is roughly equal to that of other ML models trained only on energies.
It is also important to note that using the \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) functional to correct lowcost DFT energies achieves a MAE for CC energies comparable to those of the most accurate forcebased models, (without incurring the cost of evaluating CC forces for each training point). We note that Δlearning does not improve the energy prediction over a direct forcebased sGDML model for CC energies (see Supplementary Table 1). The \({E}_{\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{ML}}}^{{\mathrm{DFT}}}]\) functional based only on the original 1000 training geometries has a MAE of 0.15 kcal mol^{−1} (see Supplementary Table 2), hence using the ethanol symmetry reduces the MAE of the ML model by half while requiring the same number of CC calculations.
Molecule optimization using ML functionals
Neither the training nor test configurations from the MD17 dataset^{32,33} include the minimum energy conformers of ethanol. Using the ML models, we predicted the energy of the anti and gauche conformers optimized using MP2/631G^{*} and the electronic structure methods used to generate the energies for each model. Note that MP2 and PBE have gauche as the global minimum, but the CCSD(T) global minimum is anti. Although all training geometries have energies more than 4.5 kcal mol^{−1} higher than the global minimum, the ML models are able to predict the energies of the minima with errors below chemical accuracy (see Table 2).
In addition, the machinelearned energy function is sufficiently smooth to optimize ethanol using energy gradients computing from the ML model itself. Calculations for each conformer start from geometries optimized using MP2/631G^{*}, which are slightly different from both DFT and CCoptimized geometries. Figure 2b shows that despite the sparsity of training data near the minimum energy configurations, the ML models trained with different energies can differentiate between the DFT and CC minima with remarkable fidelity.
ML model sensitivity to density inputs
Our results show that we can use ML models to map learned electron densities to several types of energy targets. This naturally raises the question of how sensitive our results are to the input density. If one does not need accurate selfconsistent densities, why bother with the density at all? Why not, instead, simply learn the energy directly from the nuclear potential? To answer this, consider benzene and the 1500 geometries in the MD17 dataset^{34} (see Supplementary Figs. 7, 8). Due to benzene’s 24 point group (D_{6h}) symmetries, applying our symmetrization approach on 1000 CC training points produces an effective dataset size of 24,000 geometries.
We first investigate the difference between E_{sML} models trained using the selfconsistent DFT densities (n^{DFT}) and those created by the MLHK density map (\({n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}\)). Just as for ethanol, these models have accuracies comparable to other approaches that require CC forces for training (see Supplementary Table 3). Table 3 shows that for any of our energy functionals (\({E}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}\), \({E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}\), or \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\)), model performance differs negligibly when trained using these twoelectron density representations because the densitydriven errors of the MLHK maps are small^{53}. Relevant dimensionality estimation (RDE)^{80} quantifies the effective complexity that the ML models require for predicting, e.g., a particular set of energies given a set of densities (see Supplementary Tables 4, 5, 6). The direct E_{sML} models for benzene using the groundstate densities are all of similar complexity, with a comparable number of relevant data dimensions required to obtain similar accuracy. \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\) achieves higher accuracy with fewer relevant data dimensions than either direct model because the energy difference landscape is smoother and easier to learn.
Next, we consider model performance when the molecular electron density is approximated by a superposition of atomic densities (SAD), which are conceptually similar to the pseudodensities used in other ML models^{9,15} and effectively translate the nuclear potential into electron densities, albeit without a proper description of the chemical bonds. While such densities (denoted as n^{SAD}) cost little to generate, Table 3 shows that ML models trained on these inputs have errors that are at least twice those of models using more accurate densities. The RDE analysis shows that models based on n^{SAD} have comparable dimensionality for direct energy models but significantly lower signaltonoise ratios (defined in SI for RDE analysis, Supplementary Eq. 2), thus rendering the energy models less accurate. Nonetheless, given the everpresent tradeoff between accuracy and computational cost, SAD densities may be useful to avoid selfconsistent optimization of the electron density for each geometry. In the case of SAD inputs, energy labels for the ML models would reflect the DFT functional evaluated on the approximate density (e.g., E^{SAD}). For benzene, results are poorer for both the direct ML energy model (\({E}_{{\mathrm{sML}}}^{{\mathrm{SAD}}}\)) and ΔDFT (\({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{SAD}}}^{{\mathrm{CC}}}\)), although they are still within chemical accuracy. We understand the larger errors to be due to the increased variance of E^{SAD} labels (seven times that of the selfconsistent dataset—see Supplementary Fig. 9) as well as their overall lower signaltonoise ratio, as evidenced by the RDE analysis (see Supplementary Table 6).
The results presented thus far demonstrate that reasonably accurate ML models can be created using approximate densities that are inconsistent with the energy targets. Such ML models can be generated for applications where speed is more important than accuracy, for example, in the first few cycles of an active learning scheme^{17}, where a cheap approximate density provides sufficient information to train models that ultimately would return CC energies with chemical accuracy. Finally, using accurate selfconsistent densities as input significantly improves model performance for the same training and test geometries. These findings provide clear evidence that the electron density contains highly useful machinelearnable information about the molecular system beyond that contained in atomic positions alone.
MD using CC energies
The final molecular example of 1,3benzenediol (resorcinol) illustrates the utility of learning multiple ML functionals for the same system. Combining the \({E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) with the more expensive and accurate \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) method, we demonstrate how to run selfconsistent MD simulations that can be used to explore the configurational phase space based on CC energies.
Resorcinol has two rotatable OH groups, two molecular symmetry operations, and more degrees of freedom than water, ethanol, or benzene, making this a more stringent test of the ML functionals. The initial datasets are generated from 1 ns classical MD simulations at 500 K and 300 K for the training and test sets, respectively (details are found in the “Methods” section). For the density representation, the 1000 conformer training set is augmented with the two symmetries, resulting in an effective training set size of 4000 samples (see Supplementary Fig. 10). The molecular geometries in the MDgenerated training set have energies between 7 and 50 kcal mol^{−1} above the equilibrium conformer (as shown in Supplementary Fig. 11); the four local minima are also included in the dataset using geometries from MP2/631G^{*} optimizations, leading to 1004 unique training geometries and a total effective training set size of 4004 samples. These local minima, which differ in the orientation of the two alcohol groups, are separated by a rotational barrier of ~ 4 kcal mol^{−1} (see Supplementary Fig. 12). The maximum relative energy errors between the DFT and the (ground truth) CC energies are 6.1 and 6.7 kcal mol^{−1}, respectively, for geometries included in the training and test sets.
As with the other examples, ML model performance improves with increasing training set size (see Supplementary Fig. 13). When trained on 1004 unique training geometries (4004 training points), the MAE of predicted energies is around 1.3 kcal mol^{−1} for both \({E}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) and \({E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\), and the error, when using \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\), is only 0.11 kcal mol^{−1}. The ΔDFT accuracy is insensitive to the use of the MLHK map for the density input, as shown in Supplementary Table 7, and is sufficient to run an MD simulation based on CC energies without the need of CC forces.
Although DFT energies may be sufficient for some molecules, the ability to use CC energies to determine the equilibrium geometries and thermal fluctuations is a promising advance. For resorcinol, the relative DFT energies can differ significantly from the CC energies, particularly near the OH rotational barrier that separates conformers (see Supplementary Fig. 12). Conformational changes are also rare events in the MD trajectories, making it crucial to describe the transitions accurately. For example, the exploration of the OH dihedral angles over a 10 ps MD trajectory from a DFTbased constanttemperature simulation at 350 K is shown in Supplementary Fig. 14. In this simulation, only one conformational change is observed, despite several excursions away from the local minima.
Using the \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\) approach, we could easily correct energies after running a conventional DFTMD simulation. However, as shown in Supplementary Fig. 15, for snapshots along a 1.5 ps constantenergy simulation starting from a point near a conformer change, the MAE of DFT energies compared to CC energies for each snapshot is 1.0 kcal mol^{−1}, with a maximum of just under 4.5 kcal mol^{−1}. Therefore, a more promising use of the ML functionals is to run MD simulations using the CC energy function directly. An example \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) trajectory starting from a random training point is shown in Supplementary Fig. 16, with an MAE of 0.2 kcal mol^{−1}.
Starting from a different point in the DFTgenerated trajectory serves to illustrate the importance of generating MD trajectories directly on the CC energy surface. As seen in Fig. 3, for constantenergy simulations starting from the same initial condition, a DFTbased trajectory does not have sufficient kinetic energy to traverse the rotational barrier, while the conformer switch does occur for the CCbased trajectory. Astonishingly, the \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) trajectory has a MAE of only 0.18 kcal mol^{−1} relative to the true CC energies over a range of more than 15 kcal mol^{−1}.
As the ΔDFT method requires performing a DFT calculation at each step of the trajectory, we can overcome this computational cost by combining the ML models. The middle panel of Fig. 3b shows the CC trajectory using a reversible referencesystem based multitimestep integrator^{81} to evaluate energies and forces primarily with the \({E}_{{\mathrm{sML}}}^{{\mathrm{CC}}}[{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) model as a reference and with periodic force corrections based on the more accurate \({E}_{{\mathrm{s}}\Delta \,\text{}\,{\mathrm{DFT}}}^{{\mathrm{CC}}}\ [{n}_{{\mathrm{sML}}}^{{\mathrm{DFT}}}]\) every three steps (see Supplementary Note 1.4 and Supplementary Fig. 17 for more details). The resulting trajectory has a MAE of 3.8 kcal mol^{−1} relative to the true CC energies, with the largest errors in regions that are sparsely represented in the training set. This selfconsistent exploration of the configurational space with the combined ML models provides an opportunity to improve the sampling in a costeffective manner.
Combining densities for improved sampling
The electron density provides some advantages as a descriptor of a chemical system over inputs that rely solely on local atomic environments or connectivity^{11,12,82}. For a given periodic cell and number of basis functions, the same density input structure is able to describe systems with different numbers, types, and orders of atoms. In contrast, models that rely on an atomistic decomposition of the energy must have representations for the environment of each separate element (for example, see refs. ^{6,26}). To improve the sampling represented in the training set for resorcinol, we can leverage overlap with configurational spaces sampled by similar, yet smaller and less costly, molecules. For example, adding data for phenol can provide better sampling of the rotation of an OH group, while the dynamics of benzene contains extensive sampling of C–C bonds.
To demonstrate this feature of densitybased ML models, we use 1001 geometries for each of these two molecules as input configurations (see Supplementary Figs. 8, 18), along with the 1004 resorcinol configurations. We trained a set of densitytoenergy maps, combining the symmetrized datasets, pairwise and as a complete set, and then we used the resorcinol test set to evaluate the performance of this model. In each case, the densitytoenergy map was learned by combining the densities of the different molecules into a single dataset. The models using combinations of true or independently learned densities, displayed in Tables 4 and 5 and Supplementary Tables 8 and 9, show significant improvements in performance, with the prediction error being reduced by 30–60%. The results for models trained on DFT energies are similar to those for CC energies and can be found in Supplementary Table 10.
In addition, we can analogously train an MLHK map by combining the artificial potentials of the different molecules into one dataset in order to produce a combined map (\({n}_{{\mathrm{sML}}{\mathrm{c}}}^{{\mathrm{DFT}}}\)). Using the combination of symmetrized phenol and resorcinol data to train the MLHK map improves the performance of the direct ML energy models, although the ΔDFT approach is again less sensitive to the density representation. We note that, unlike the models with independently learned densities, simply adding more training data by including benzene in the MLHK map, does not significantly change the results. Molecular similarity clearly affects the combination of MLHK maps (see Supplementary Table 9 for resorcinol and benzene), but the ML density functionals are less sensitive and show improvement for all molecular combinations. We view this as a stepping stone toward learning a truly transferable model capable of predicting both densities and energies for a wide range of configurations and molecules.
Discussion
DFT is used in at least 30,000 scientific papers each year^{83}, and because of its low cost relative to wave function based ab initio methods, it can be used to compute energies of large molecules. Moreover, if geometry optimizations or MD simulations are desired, these would be beyond the reach of CCSD(T) level calculations owing to the high computational cost. However, if CCSD(T) is affordable for a small number of carefully chosen configurations, then our methodology provides one possible bridge between the DFT and CCSD(T) levels of theory.
There are two distinct modes in which our results can be applied. With ΔDFT, the cost of a gasphase MD simulation is essentially that of the DFTbased MD with a given approximate functional, plus the cost of evaluating a few dozen CCSD(T) energies. While the optimal selection of training points is an open question in the field of machine learning, the ΔDFT approach presented here may help to reduce the number of points necessary by learning an inherently smoother energy correction map. We stress that no forces are needed for training, making training set generation cheaper than other methods with similar performance. Compared to other machinelearning models, ΔDFT is well behaved and stable outside of the training set, since the zeromean prior allows it to fall back on DFT results when far from the training set. The combination of ΔDFT with the ML models for DFT energies of ref. ^{53} yields both the efficiency from bypassing the KS equations and the accuracy of CCSD(T). While this yields accurate energy functions within the training manifold, it occasionally yields inaccurate energy gradients or forces in an MD simulation, which can be corrected with the ΔDFT forces using the appropriate integrators, as shown above.
Clearly, our methodology can be applied to any gasphase MD simulation or geometry optimization for which CCSD(T) calculations can be performed for a reasonable number of carefully selected configurations. Gasphase MD, for example, has many applications. Earlier studies focused on comparing equilibrium properties from simulations excluding or including (via the Feynman path integral) nuclear quantum effects^{84,85,86,87,88}. More recent studies have focused on accurate spectroscopy and exploration of reactivity in small complexes and clusters^{89,90,91,92}. For geometry optimization at the CCSD(T) level or testing of DFT energetics against CCSD(T) energies, DFT geometries often must be used due to the prohibitive cost of finding an optimum CCSD(T) geometry. For molecules with many soft modes, finding the geometry can require hundreds of evaluations of energies and forces. Here, we have shown how relatively few energies are needed in ΔDFT to produce an accurate energy functional, suggesting the possibility of using ΔDFT to speed up such searches, producing CC geometries for molecules that were previously prohibitive. For larger molecules and/or molecules interacting with an environment, recent schemes that embed an ab initio core within a larger DFT calculation^{93} could also be treated by this method, especially if ΔDFT need only be applied to the ab initio portion of the calculation. With suitable training sets, the ML approaches presented here have the potential to enable MD simulations for each of these systems.
Standard electronic structure methods require users to choose between accuracy and computational cost for each application. The success of our new ML approach connecting DFT densities to CC energies provides a new framework and strategy for linking formerly inconsistent calculations to reduce the penalty of this tradeoff. We have also demonstrated that the densities from a simpler molecule can be combined with a more complex system to improve the coverage of critical degrees of freedom. This promising result indicates that the smart use of combined densities from smaller molecular fragments could yield more accurate energies at even lower cost. Given that the CCDFT energy difference landscape does not resemble the intrinsic energy landscapes of either of the underlying electronic structure methods, themselves, we hope future work will further explore this dissimilarity as a function of training set size and composition for ΔDFT models.
ML represents an entirely new approach to extracting energies from DFT calculations, avoiding some of the biases built into humandesigned functionals, while also bypassing the need for strict selfconsistency between the electron density and the resulting energy when an approximate result is sufficient. As shown here, ML provides a natural framework for incorporating results from more accurate electronic structure methods, thus bridging the gap between the CC and the DFT worlds while maintaining the versatility of DFT to describe electronic properties beyond energy and forces such as the dipole moment, molecular polarizability, NMR chemical shifts, etc. Along with these insights, the long and successful history of KSDFT suggests that using the density as a descriptor may thus prove to be an excellent strategy for improved simulations in the future.
Methods
Machinelearning model
In order to predict the total energy of a system given only the N^{a} atomic positions of a molecule and using the electron density as a key descriptor, we can use the MLHK map introduced in ref. ^{53}, with the entire procedure being illustrated in Fig. 1a. Initially, we characterize the Hamiltonian by the external nuclear potential v(r), which we approximate using a sum of Gaussians as^{94}
where r are the coordinates of a spatial grid, R_{α} is a vector containing the atom coordinates of atom α, and Z_{α} is the nuclear charges of atom α. Finally, γ is a width hyperparameter. This Gaussian potential is then evaluated on a 3D grid around the molecule and used as a descriptor for the MLHK model. For each molecule, crossvalidation is used to determine the width parameter, γ, and the grid spacing for discretization of the associated Gaussian potential.
After obtaining the Gaussian potential, we use a KRR model to learn the approximate DFT valence electron density. In order to simplify the learning problem and avoid representing the density on a 3D grid, we expand the density map in an orthonormal basis set, and consequently learn the basis coefficients instead of the density grid points:
where ϕ_{l}(r) is a basis function. In this work, a Fourier basis is employed. In the applications presented in this work, 12,500 basis functions (25 per dimension) proved sufficient for good performance. Use of KRR to learn these basis coefficients makes the problem more tractable for 3D densities, and more importantly, the orthogonality of the basis functions allows us to learn the individual coefficients independently:
where β^{(l)} are the KRR coefficients and k is a kernel functional.
The independent and direct prediction of the basis coefficients makes the MLHK map more efficient and easier to scale to larger molecules, since the complexity only depends on the number of basis functions. In addition, we can use the predicted basis coefficients to reconstruct the continuous density at any point in space, making the predicted density independent of a fixed grid and enabling computations such as numerical integrals to be performed at an arbitrary accuracy.
As a final step, another KRR model is used to learn the total energy from the density basis coefficients:
where k is the Gaussian kernel.
Exploiting point group symmetries
Training datasets for our machinelearning model can be easily enriched using the point group symmetries. To extract the point group symmetries and the corresponding transformation matrices we used the SYVA software package^{95}. Consequently, we can multiply the size of the training set by the number of point group symmetries without performing any additional quantum chemical calculations simply by applying the point group transformations on our existing data.
Crossvalidation and hyperparameter optimization
Due to the small number of training and test samples, when evaluating the models on the water dataset, the data were shuffled 40 times, and for each shuffle a subset of 50 geometries was selected as the training set, with the remaining 52 being used as the outofsample test set. For the smaller training sets, a subset of the 50 training geometries was selected using kmeans sampling.
The hyperparameters for all models were tuned using fivefold crossvalidation on the training set. For the MLHK map from potentials to densities, the following three hyperparameters were optimized individually for each dataset: the width parameter of the Gaussian potential γ, the spacing of the grid on which Gaussian potential is evaluated, and the width parameter σ of the Gaussian kernel k[v, v_{i}]. For each subsequent density to energy map \({E}_{{\mathrm{ML}}}^{* }[n]\), only the width parameter of the Gaussian kernel k(u_{ML}[v], u_{ML}[v_{i}]) needs to be chosen using crossvalidation. Specific values are reported in the Supplementary Tables 11–15.
Classical molecular dynamics
Training and test set geometries for resorcinol (1,3benzenediol) and phenol were selected from a 1 ns trajectory generated via classical MD using the GAFF force field^{96}. The local minima were optimized using MP2/631g^{*} in Gaussian09^{97}. Symmetric atomic charge assignments were determined from a RESP fit^{98} to the HF/631g^{*} calculations, using the three distinct geometries with Boltzmann weights determined by the relative MP2 energies for resorcinol. All other standard GAFF parameters^{96} for the MD simulations were assigned using the AmberTools package^{99}. To generate resorcinol and phenol conformers, classical MD simulations in a canonical ensemble were run at 300 K and 500 K using the PINY_MD package^{100} with massive NoséHoover chain (NHC) thermostats^{101} for atomic degrees of freedom (length = 4, τ = 20 fs, SuzukiYoshida order = 7, multiple time step = 4) and a time step of 1 fs.
For the resorcinol and phenol training sets, we selected 1000 conformers closest to kmeans centers from the 1 ns classical MD trajectory run at 500 K. The test sets comprise 1000 randomly selected snapshots from the 1 ns 300 K classical MD simulations. Datasets are aligned by minimizing the root mean square deviation (RMSD) of carbon atoms to the global minimum energy conformer.
DFT molecular dynamics
BornOppenheimer MD simulations of a resorcinol molecule in the gas phase were run using DFT in the QUICKSTEP package^{102} of CP2K v. 2.6.2^{103}. The PBE XC functional^{104} was used to approximate exchange and correlation, and a mixed Gaussian/plane wave (GPW) basisset scheme^{105} was employed with DZVPMOLOPTGTH (mDZVP) basis sets^{106} paired with appropriate dualspace GTH pseudopotentials^{107,108}. Wave functions were converged to 1E7 Hartree using the orbital transformation method^{109} on a multiple grid (n = 5) with a cutoff of 900 Ry for the system in a cubic box (L = 20 bohr). For the constanttemperature simulation, a temperature of 350 K was maintained using massive NHC thermostats^{101} (length = 4, τ = 10 fs, SuzukiYoshida order = 7, multiple time step = 4) and a time step of 0.5 fs.
ML molecular dynamics
We used the atomistic simulation environment^{110} with a 0.5 fs timestep to run MD with ML energies. For the constanttemperature simulation, a temperature of 350 K maintained via a Langevin thermostat with a friction value of 0.01 atomic units (0.413 fs^{−1}). Atomic forces were calculated using the finite difference method with ϵ = 0.001 Å.
Electronic structure calculations
Optimizations for ethanol conformers were run using MP2/631g^{*} in Gaussian09^{97}. DFT calculations for the ML models were run using Quantum ESPRESSO code^{111} with the PBE XC functional^{104} and projectoraugmented wave approach^{112,113} with TroullierMartin pseudopotentials replacing explicit ionic core electrons^{114}. Molecules were simulated in a cubic box (L = 20 bohr) with a wave function cutoff of 90 Ry. The valence electron densities were evaluated on a grid with 125 points in each dimension. All CC calculations were run using Orca^{115} with CCSD(T)/augccpVTZ^{116} for water or CCSD(T)/ccpVDZ^{116} for resorcinol and phenol.
Data availability
The data generated and used in this study are available at quantummachine.org/datasets.
Code availability
The code generated and used for this study is available at https://github.com/MihailBogojeski/mldft.
References
 1.
Rupp, M., Tkatchenko, A., Müller, K.R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
 2.
Montavon, G. et al. Learning invariant representations of molecules for atomization energy prediction. Adv. Neural. Inf. Process. Syst. 25, 440–448 (2012).
 3.
Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound space. N. J. Phys. 15, 095003 (2013).
 4.
Botu, V. & Ramprasad, R. Learning scheme to predict atomic forces and accelerate materials simulations. Phys. Rev. B 92, 094306 (2015).
 5.
Hansen, K. et al. Machine learning predictions of molecular properties: accurate manybody potentials and nonlocality in chemical space. J. Phys. Chem. Lett. 6, 2326–2331 (2015).
 6.
Bartók, A. P. & Csányi, G. Gaussian approximation potentials: a brief tutorial introduction. Int. J. Quantum Chem. 115, 1051–1057 (2015).
 7.
Rupp, M., Ramakrishnan, R. & von Lilienfeld, O. A. Machine learning for quantum mechanical properties of atoms in molecules. J. Phys. Chem. Lett. 6, 3309–3313 (2015).
 8.
Bereau, T., Andrienko, D. & von Lilienfeld, O. A. Transferable atomic multipole machine learning models for small organic molecules. J. Chem. Theory Comput. 11, 3225–3233 (2015).
 9.
De, S., Bartók, A. P., Csányi, G. & Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18, 13754–13769 (2016).
 10.
Podryabinkin, E. V. & Shapeev, A. V. Active learning of linearly parametrized interatomic potentials. Comput. Mater. Sci. 140, 171–180 (2017).
 11.
Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K.R. & Tkatchenko, A. Quantumchemical insights from deep tensor neural networks. Nat. Commun. 8, 13890 (2017).
 12.
Schütt, K. T., Sauceda, H. E., Kindermans, P.J., Tkatchenko, A. & Müller, K.R. SchNet–a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
 13.
Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017).
 14.
Yao, K., Herr, J. E. & Parkhill, J. The manybody expansion combined with neural networks. J. Chem. Phys. 146, 014106 (2017).
 15.
Eickenberg, M., Exarchakis, G., Hirn, M., Mallat, S. & Thiry, L. Solid harmonic wavelet scattering for predictions of molecule properties. J. Chem. Phys. 148, 241732 (2018).
 16.
Ryczko, K., Mills, K., Luchak, I., Homenick, C. & Tamblyn, I. Convolutional neural networks for atomistic systems. Comput. Mater. Sci. 149, 134–142 (2018).
 17.
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
 18.
Grisafi, A., Wilkins, D. M., Csányi, G. & Ceriotti, M. Symmetryadapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 120, 036002 (2018).
 19.
Pronobis, W., Tkatchenko, A. & Müller, K.R. Manybody descriptors for predicting molecular properties with machine learning: analysis of pairwise and threebody interactions in molecules. J. Chem. Theory Comput. 14, 2991–3003 (2018).
 20.
Faber, F. A., Christensen, A. S., Huang, B. & von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).
 21.
Thomas, N. et al.Tensor field networks: rotationand translationequivariant neural networks for 3D point clouds. Preprint at http://arXiv.org/abs/1802.08219 (2018).
 22.
Hy, T. S., Trivedi, S., Pan, H., Anderson, B. M. & Kondor, R. Predicting molecular properties with covariant compositional networks. J. Chem. Phys. 148, 241745 (2018).
 23.
Schütt, K. T., Gastegger, M., Tkatchenko, A., Müller, K.R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Comm. 10, 1–10 (2019).
 24.
von Lilienfeld, O. A., Müller, K.R. & Tkatchenko, A. Exploring chemical compound space with quantumbased machine learning. Nat. Rev. Chem. 4, 347–358 (2020).
 25.
Noé, F., Tkatchenko, A., Müller, K.R. & Clementi, C. Machine learning for molecular simulation. Annu. Rev. Phys. Chem. 71, 361–390 (2020).
 26.
Smith, J. S. et al. Approaching coupled cluster accuracy with a generalpurpose neural network potential through transfer learning. Nat. Commun. 10, 1–8 (2019).
 27.
Li, Z., Kermode, J. R. & De Vita, A. Molecular dynamics with onthefly machine learning of quantummechanical forces. Phys. Rev. Lett. 114, 096405 (2015).
 28.
Gastegger, M., Behler, J. & Marquetand, P. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 8, 6924–6935 (2017).
 29.
Schütt, K. et al. SchNet: a continuousfilter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 30, 991–1001 (2017).
 30.
John, S. T. & Csányi, G. Manybody coarsegrained interactions using gaussian approximation potentials. J. Phys. Chem. B 121, 10934–10949 (2017).
 31.
Huan, T. D. et al. A universal strategy for the creation of machine learningbased atomistic force fields. J. Comput. Mater. 3, 37 (2017).
 32.
Chmiela, S. et al. Machine learning of accurate energyconserving molecular force fields. Sci. Adv. 3, e1603015 (2017).
 33.
Chmiela, S., Sauceda, H. E., Müller, K.R. & Tkatchenko, A. Towards exact molecular dynamics simulations with machinelearned force fields. Nat. Commun. 9, 3887 (2018).
 34.
Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.R. & Tkatchenko, A. sGDML: Constructing accurate and data efficient molecular force fields using machine learning. Comput. Phys. Commun. 240, 38–45 (2019).
 35.
Kanamori, K. et al. Exploring a potential energy surface by machine learning for characterizing atomic transport. Phys. Rev. B 97, 125124 (2018).
 36.
Zhang, L., Han, J., Wang, H., Car, R. & E, W. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).
 37.
Glielmo, A., Zeni, C. & De Vita, A. Efficient nonparametric nbody force fields from machine learning. Phys. Rev. B 97, 184307 (2018).
 38.
Christensen, A. S., Faber, F. A. & von Lilienfeld, O. A. Operators in quantum machine learning: response properties in chemical space. J. Phys. Chem. 150, 064105 (2019).
 39.
Sauceda, H. E., Chmiela, S., Poltavsky, I., Müller, K.R. & Tkatchenko, A. Molecular force fields with gradientdomain machine learning: Construction and application to dynamics of small molecules with coupled cluster forces. J. Chem. Phys. 150, 114102 (2019).
 40.
Schneider, E., Dai, L., Topper, R. Q., DrechselGrau, C. & Tuckerman, M. E. Stochastic neural network approach for learning highdimensional free energy surfaces. Phys. Rev. Lett. 119, 150601 (2017).
 41.
Mardt, A., Pasquali, L., Wu, H. & Noé, F. VAMPnets for deep learning of molecular kinetics. Nat. Commun. 9, 5 (2018).
 42.
Noé, F., Olsson, S., Köhler, J. & Wu, H. Boltzmann generators: Sampling equilibrium states of manybody systems with deep learning. Science 365, eaaw1147 (2019).
 43.
Rogal, J., Schneider, E. & Tuckerman, M. E. Neuralnetworkbased path collective variables for enhanced sampling of phase transformations. Phys. Rev. Lett. 123, 245701 (2019).
 44.
Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Info Model. 58, 1194 (2018).
 45.
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
 46.
Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: Moving beyond fingerprints. J. ComputerAided Molec. Des. 30, 595 (2016).
 47.
Schütt, K. T. et al. Machine Learning Meets Quantum Physics, volume 968 (Springer Lecture Notes in Physics, 2020).
 48.
Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theor. Comput. 13, 5255 (2017).
 49.
Fabrizio, A., Grisafi, A., Meyer, B., Ceriotti, M. & Corminboeuf, C. Electron density learning of noncovalent systems. Chem. Sci. 10, 9424–9432 (2019).
 50.
Nagai, R., Akashi, R. & Sugino, O. Completing density functional theory by machine learning hidden messages from molecules. npj Comput. Mater. 6, 1–8 (2020).
 51.
Sebastian, D. & FernandezSerra, M. Machine learning accurate exchange and correlation functionals of the electronic density. Nat. Commun. 11, 1–10 (2020).
 52.
Steffen, J. & Hartke, B. Cheap but accurate calculation of chemical reaction rate constants from ab initio data via systemspecific blackbox force fields. J. Chem. Phys. 147, 161701 (2017).
 53.
Brockherde, F. et al. Bypassing the KohnSham equations with machine learning. Nat. Commun. 8, 872 (2017).
 54.
Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964).
 55.
Welborn, M., Cheng, L. & Miller, T. F. Transferability in machine learning for electronic structure via the molecular orbital basis. J. Chem. Theory Comput. 14, 4772–4779 (2018).
 56.
Seino, J., Kageyama, R., Fujinami, M., Ikabata, Y. & Nakai, H. Semilocal machinelearned kinetic energy density functional with thirdorder gradients of electron density. J. Chem. Phys. 148, 241705 (2018).
 57.
Ryczko, K., Strubbe, D. & Tamblyn, I. Deep learning and density functional theory. Phys. Rev. A 100, 022512 (2019).
 58.
Sinitskiy, A. V. & Pande, V. S. Deep neural network computes electron densities and energies of a large set of organic molecules faster than density functional theory (DFT). Preprint at http://arXiv.org/abs/1809.02723 (2018).
 59.
Grisafi, A. et al. A transferable machinelearning model of the electron density. ACS Cent. Sci. 5, 57–64 (2019).
 60.
Chandrasekaran, A. et al. Solving the electronic structure problem with machine learning. npj Comput. Mater. 5, 22 (2019).
 61.
Cheng, L., Welborn, M., Christensen, A. S. & Miller, T. F. III A universal density matrix functional from molecular orbitalbased machine learning: Transferability across organic molecules. J. Chem. Phys. 150, 131103 (2019).
 62.
Sebastian, D. & FernandezSerra, M. Learning from the density to correct total energy and forces in first principle simulations. J. Chem. Phys. 151, 144102 (2019).
 63.
Snyder, J. C., Rupp, M., Hansen, K., Müller, K.R. & Burke, K. Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012).
 64.
Snyder, J. C. et al. Orbitalfree bond breaking via machine learning. J. Chem. Phys. 139, 224104 (2013).
 65.
Snyder, J. C., Rupp, M., Müller, K.R. & Burke, K. Nonlinear gradient denoising: finding accurate extrema from inaccurate functional derivatives. Int. J. Quantum Chem. 115, 1102–1114 (2015).
 66.
Li, L. et al. Understanding machinelearned density functionals. Int. J. Quantum Chem. 116, 819–833 (2016).
 67.
Li, L., Baker, T. E., White, S. R. & Burke, K. Pure density functional for strong correlation and the thermodynamic limit from machine learning. Phys. Rev. B 94, 245129 (2016).
 68.
Hollingsworth, J., Li, L., Baker, T. E. & Burke, K. Can exact conditions improve machinelearned density functionals? J. Chem. Phys. 148, 241743 (2018).
 69.
Hansen, K. et al. Assessment and validation of machine learning methods for predicting molecular atomization energies. J. Chem. Theory Comput. 9, 3404–3419 (2013).
 70.
Ginzburg, I. & Horn, D. Combined neural networks for time series analysis. Adv. Neural Inf. Process. Syst. 224–231 (1994).
 71.
Parr, R. G. & Yang, W. Density Functional Theory of Atoms and Molecules (Oxford University Press, 1989).
 72.
Fiolhais, C., Nogueira, F. & Marques, M. A Primer in Density Functional Theory. (SpringerVerlag, New York, 2003).
 73.
Levy, M. & Görling, A. Correlation energy densityfunctional formulas from correlating firstorder density matrices. Phys. Rev. A 52, R1808 (1995).
 74.
Kim, M.C., Sim, E. & Burke, K. Understanding and reducing errors in density functional calculations. Phys. Rev. Lett. 111, 073003 (2013).
 75.
Vuckovic, S., Song, S., Kozlowski, J., Sim, E. & Burke, K. Density functional analysis: the theory of densitycorrected DFT. J. Chem. Theory Comput. 15, 6636–6646 (2019).
 76.
Zhu, W., Botina, J. & Rabitz, H. Rapidly convergent iteration methods for quantum optimal control of population. J. Chem. Phys, 108, 1953 (1998).
 77.
Wasserman, A. et al. The importance of being selfconsistent. Annu. Rev. Phys. Chem. 68, 555–581 (2017).
 78.
Sim, E., Song, S. & Burke, K. Quantifying density errors in DFT. J. Phys. Chem. Lett. 9, 6385–6392 (2018).
 79.
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the Δmachine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).
 80.
Braun, M. L., Buhmann, J. M. & Müller, K.R. On relevant dimensions in kernel feature spaces. J. Mach. Learn. Res. 9, 1875–1908 (2008).
 81.
Tuckerman, M. E., Berne, B. J. & Martyna, G. J. Reversible multiple time scale molecular dynamics. J. Chem. Phys. 97, 1990–2001 (1992).
 82.
Behler, J. Atomcentered symmetry functions for constructing highdimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
 83.
PribramJones, A., Gross, D. A. & Burke, K. DFT: a theory full of holes? Annu. Rev. Phys. Chem. 66, 283–304 (2015).
 84.
Tuckerman, M. E., Marx, D., Klein, M. L. & Parrinello, M. On the quantum nature of the shared proton in hydrogen bonds. Science 275, 817 (1997).
 85.
Miura, S., Tuckerman, M. E. & Klein, M. L. An ab initio path integral molecular dynamics study of double proton transfer in the formic acid dimer. J. Chem. Phys. 109, 5920 (1998).
 86.
Tuckerman, M. E. & Marx, D. Heavyatom skeleton quantization and proton tunneling in “intermediatebarrier” hydrogen bonds. Phys. Rev. Lett. 86, 4946 (2001).
 87.
Li, X. Z., Walker, B. & Michaelides, A. Quantum nature of the hydrogen bond. Proc. Natl Acad. Sci. USA 108, 6369 (2011).
 88.
Pérez, A., Tuckerman, M. E., Hjalmarson, H. P. & von Lilienfeld, O. A. Enol tautomers of Watson−Crick basepair models are metastable because of nuclear quantum effects. J. Am. Chem. Soc. 132, 11510–11515 (2010).
 89.
Kaczmarek, A., Shiga, M. & Marx, D. Quantum effects on vibrational and electronic spectra of hydrazine studied by “onthefly. J. Phys. Chem. A 113, 1985 (2009).
 90.
Wang, H. & Agmon, N. Complete assignment of the infrared spectrum of the gasphase protonated ammonia dimer. J. Phys. Chem. A 120, 3117 (2016).
 91.
Samala, N. R. & Agmon, N. Structure, spectroscopy, and dynamics of the phenol(water)(2) cluster at low and high temperatures. J. Chem. Phys. 147, 234307 (2017).
 92.
Jarvinen, T., Lundell, J. & Dopieralski, P. Ab initio molecular dynamics study of overtone excitations in formic acid and its water complex. Theor. Chem. Acc. 137, 100 (2018).
 93.
Lee, S. J. R., Welborn, M., Manby, F. R. & Miller, T. F. Projectionbased wavefunctioninDFT embedding. Acc. Chem. Res. 52, 1359–1368 (2019).
 94.
Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
 95.
GyeviNagy, L. & Tasi, G. SYVA: a program to analyze symmetry of molecules based on vector algebra. Comput. Phys. Commun. 215, 156–164 (2017).
 96.
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).
 97.
Frisch, M. J. et al. Gaussian 09 (2009).
 98.
Bayly, C. I., Cieplak, P., Cornell, W. & Kollman, P. A. A wellbehaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J. Phys. Chem. 97, 10269–10280 (1993).
 99.
Wang, J., Wang, W., Kollman, P. A. & Case, D. A. Antechamber: an accessory software package for molecular mechanical calculations. J. Am. Chem. Soc. 222, U403 (2001).
 100.
Tuckerman, M. E., Yarne, D. A., Samuelson, S. O., Hughes, A. L. & Martyna, G. J. Exploiting multiple levels of parallelism in molecular dynamics based calculations via modern techniques and software paradigms on distributed memory computers. Comput. Phys. Commun. 128, 333–376 (2000).
 101.
Martyna, G. J., Klein, M. L. & Tuckerman, M. Nosé–Hoover chains: the canonical ensemble via continuous dynamics. J. Chem. Phys. 97, 2635–2643 (1992).
 102.
VandeVondele, J. et al. Quickstep: fast and accurate density functional calculations using a mixed Gaussian and plane waves approach. Comput. Phys. Commun. 167, 103–128 (2005).
 103.
Hutter, J., Iannuzzi, M., Schiffmann, F. & VandeVondele, J. CP2K: atomistic simulations of condensed matter systems. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 4, 15–25 (2014).
 104.
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
 105.
Lippert, G., Hutter, J. & Parrinello, M. A hybrid Gaussian and plane wave density functional scheme. Mol. Phys. 92, 477–488 (2010).
 106.
VandeVondele, J. & Hutter, J. Gaussian basis sets for accurate calculations on molecular systems in gas and condensed phases. J. Chem. Phys. 127, 114105 (2007).
 107.
Goedecker, S., Teter, M. & Hutter, J. Separable dualspace Gaussian pseudopotentials. Phys. Rev. B 54, 1703–1710 (1996).
 108.
Krack, M. Pseudopotentials for H to Kr optimized for gradientcorrected exchangecorrelation functionals. Theoretica Chim. Acta 114, 145–152 (2005).
 109.
VandeVondele, J. & Hutter, J. An efficient orbital transformation method for electronic structure calculations. J. Chem. Phys. 118, 4365 (2003).
 110.
Bahn, S. R. & Jacobsen, K. W. An objectoriented scripting interface to a legacy electronic structure code. Comput. Sci. Eng. 4, 56–66 (2002).
 111.
Giannozzi, P. et al. QUANTUM ESPRESSO: a modular and opensource software project for quantum simulations of materials. J. Phys.: Condens. Matter 21, 395502 (2009).
 112.
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmentedwave method. Phys. Rev. B 59, 1758–1775 (1999).
 113.
Blöchl, P. E. Projector augmentedwave method. Phys. Rev. B 50, 17953–17979 (1994).
 114.
Troullier, N. & Martins, J. L. Efficient pseudopotentials for planewave calculations. Phys. Rev. B 43, 1993–2006 (1991).
 115.
Neese, F. The ORCA program system. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2, 73–78 (2012).
 116.
Dunning, T. H. Gaussian basis sets for use in correlated molecular calculations. I. the atoms boron through neon and hydrogen. J. Chem. Phys. 90, 1007–1023 (1989).
Acknowledgements
The authors thank Dr. Felix Brockherde and Joseph Cendagorta for helpful discussions, Dr. Li Li for the initial water dataset, and Dr. Huziel Sauceda and Dr. Stefan Chmiela for the optimized geometries of ethanol and for helpful discussions. Calculations were run on NYU IT High Performance Computing resources and at TUB. Work at NYU was supported by the U.S. Army Research Office under contract/grant number W911NF1310387 (L.V.M. and M.E.T.). K.R.M. was supported in part by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea Government (No. 2019000079, Artificial Intelligence Graduate School Program, Korea University), and was partly supported by the German Ministry for Education and Research (BMBF) under Grants 01IS14013AE, 01GQ1115, 01GQ0850, 01IS18025A, 031L0207D and 01IS18037A; the German Research Foundation (DFG) under Grant Math+, EXC 2046/1, Project ID 390685689. K.B. was supported by NSF grant CHE 1856165. This publication only reflects the authors views. Funding agencies are not liable for any use that may be made of the information contained herein.
Author information
Affiliations
Contributions
L.V.M. initiated the project and M.B. and L.V.M. ran all simulations. M.E.T, K.R.M., and K.B. conceived the theory and cosupervised the project. All authors guided the project design, contributed to data analysis, and cowrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Reinhard Maurer and the other anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bogojeski, M., VogtMaranto, L., Tuckerman, M.E. et al. Quantum chemical accuracy from density functional approximations via machine learning. Nat Commun 11, 5223 (2020). https://doi.org/10.1038/s41467020190931
Received:
Accepted:
Published:
Further reading

Life in silico: Are we close yet?
Proceedings of the National Academy of Sciences (2021)

Dynamical strengthening of covalent and noncovalent molecular interactions by nuclear quantum effects at finite temperature
Nature Communications (2021)

Learning to Approximate Density Functionals
Accounts of Chemical Research (2021)

Pure nonlocal machinelearned density functional theory for electron correlation
Nature Communications (2021)

Meanfield density matrix decompositions
The Journal of Chemical Physics (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.