Abstract
Densityfunctional theory (DFT) is a rigorous and (in principle) exact framework for the description of the ground state properties of atoms, molecules and solids based on their electron density. While computationally efficient densityfunctional approximations (DFAs) have become essential tools in computational chemistry, their (semi)local treatment of electron correlation has a number of wellknown pathologies, e.g. related to electron selfinteraction. Here, we present a type of machinelearning (ML) based DFA (termed Kernel Density Functional Approximation, KDFA) that is pure, nonlocal and transferable, and can be efficiently trained with fully quantitative reference methods. The functionals retain the meanfield computational cost of common DFAs and are shown to be applicable to noncovalent, ionic and covalent interactions, as well as across different system sizes. We demonstrate their remarkable possibilities by computing the free energy surface for the protonated water dimer at hitherto unfeasible goldstandard coupled cluster quality on a single commodity workstation.
Introduction
In their seminal 1964 paper, Hohenberg and Kohn proved that there exists a universal functional F[ρ] of the electron density ρ, that captures all electronic contributions to the total energy of a system of interacting electrons^{1}. This universal functional has since become something of a holy grail for chemistry, physics, and materials science, as its knowledge would allow determining the exact groundstate energy and electron density for any molecule or solid^{2}. Unfortunately, the concrete form of F[ρ] itself has remained elusive. Indeed, it has been shown that the universal functional likely has the same prohibitive computational complexity as solving the Schrödinger equation directly^{3}.
Nevertheless, Hohenberg and Kohn’s density functional theory (DFT) has become an essential method in the toolkits of computational chemistry, condensed matter physics, or surface science. This is mostly owing to the formulation of Kohn and Sham (KS), which reduces the problem to finding a density functional E_{xc}[ρ] for electronic exchange and correlation^{4}. Again, the exact form of E_{xc}[ρ] is unknown, but many useful density functional approximations (DFAs) exist, which are generally considered to offer a good tradeoff between computational complexity and accuracy.
The large zoo of DFAs that has been developed over the years is often organized in the hierarchy of Jacob’s ladder, where approximations are grouped according to the ingredients that are used in their functional form^{5}. At the lower rungs of this ladder, (semi)local DFAs are found, which only require information about the local density and its derivatives. Such functionals are sometimes called pure, because they can be computed from the electron density alone. At higherrungs, information about the occupied and/or unoccupied KS orbitals is also included^{6,7,8}. These socalled (double) hybrid DFAs are therefore no longer pure in the sense described above. This increases their computational complexity, but also makes them more accurate, because they incorporate nonlocal information.
In spite of their known limitations (e.g., regarding electron selfinteraction)^{9}, the pure functionals at the bottom of Jacob’s latter are a widely used stateoftheart, e.g., in practical calculations of large systems or with extensive sampling. This reveals a crucial dilemma of Jacob’s ladder, namely that very often it is not possible to use a higherrung functional due to computational constraints, even if the nature of the system of interest would in principle require a more accurate description.
Recently, machinelearning (ML)based DFAs have been shown to break the constraints of Jacob’s ladder, offering highly accurate, pure, and nonlocal density functionals for different onedimensional model systems^{10,11,12,13}. Although these approaches show that it is possible to learn the highly nonlinear mapping from the electron density to the energy from data, they cannot be directly transferred to real systems. A straightforward realspace representation of the electron density (e.g., on a grid) is extremely inefficient in three dimensions. Even for a small molecule, a single reference energy value would have to be fitted as a function of on the order of one hundred thousand input values. Furthermore, gridbased models are not in general sizeextensive, particularly if they require the grid to be of constant size. Very recently, Burke and coworkers developed MLbased DFT models, which represent the density in a planewave basis^{14,15}. These models have successfully been applied to real molecules, though they are still not sizeextensive.
These issues can be circumvented if the reference energy is projected onto the grid in the form of an energy density^{16,17,18}. This brings the number of target values to the same order of magnitude as the number of input values, leading to a much better defined fitting problem. On the other hand, this also makes the resulting MLDFAs more like the traditional functionals in Jacob’s ladder. For instance, if a semilocal Ansatz is chosen to fit a reference energy density, the resulting functional will display electron selfinteraction, even if the reference method does not^{17}. Developing correlation functionals in this manner is particularly challenging. Correlation energy densities based on highlevel wavefunction methods can, e.g., display significant positive values at the centers of stretched bonds, where the electron density is vanishingly small^{19}.
In this paper, we therefore propose a new route to pure MLbased DFAs, using a densityfitting representation of the electron density. This representation is much more compact than a realspace grid, and allows decomposing the density into atomic contributions. As a consequence, the presented MLDFAs can readily be applied to real molecules and are by construction sizeextensive. At the same time, any reference method can be used for training without the need for realspace projection or energy decomposition.
Results
Density representation
In KSDFT, the electron density is computed from the occupied KS orbitals ψ_{i}(r). These are in turn expanded as a linear combination of basis functions χ_{μ}(r):
with the density matrix elements D_{μν}. A noteworthy aspect of eq. (1) is that the density is expanded in terms of products of basis functions, which are not in general atomcentered, even if the basis functions themselves are (see Fig. 1, left). This leads to the appearance of memory intensive fourindex integrals in the computation of Coulomb and exchange contributions to the KS matrix.
In many electronic structure codes it is therefore common practice to use additional densityfitting (DF) basis functions ϕ_{Q}(r − r_{A}), which allow an atomcentered expansion of the density (see Supplementary Note 1):
Here, the first sum is over all atoms A, and \({C}_{Q}^{A}\) is the expansion coefficient of basis function Q on atom A. Unlike eq. (1), the DF expansion can readily (and unambiguously) be decomposed into atomic densities ρ_{A} (see Fig. 1, right)^{20,21}.
It should be noted that the accurate DF representation of ρ(r) requires significantly larger basis sets than those used for the expansion of the KS orbitals. Nonetheless, the density can typically be precisely represented with the knowledge of ca. 100 coefficients per atom and a set of DF basis functions that only depend on the element of the atom. This is in contrast to eq. (1), where the product basis is dependent on the geometry (i.e. on the relative position of pairs of atoms) of the molecule in question.
The DF basis thus offers a highly compressed and transferable representation of ρ, both of which are desirable properties for ML. However, there is also a significant downside to this representation, namely that the coefficients \({C}_{Q}^{A}\) are not in general rotationally invariant. In other words, rotating a molecule and its density will lead to a different set of coefficients \({C}_{Q}^{A}\), even though the target energy remains unchanged. Although this invariance could in principle be learned from data, this would require significantly more training data and only lead to an approximately invariant model.
This issue can be circumvented by borrowing a trick from the Smooth Overlap of Atomic Positions (SOAP) representation of atomic environments^{22}: Instead of using the coefficients directly, we use their rotationally invariant power spectrum p_{A} (see Supplementary Note 2), which has the added benefit of further compressing the representation. In the following, we will refer to this as the rotationally invariant density representation (RIDR).
Correlation functionals from ML
We can now proceed to construct a MLDFA based on the RIDR. Herein, we will focus on learning correlation energy functionals from calculated HartreeFock (HF) densities. Although exchange is treated exactly within HF, the combination of full HF exchange with conventional pure DFT correlation functionals leads to poor results^{23}. This is because the nonlocal exchange in HF is incompatible with (semi)local correlation^{5}. We therefore train our model on nonlocal correlation energies obtained from manybody wavefunction methods. Specifically, we will consider secondorder MøllerPlesset theory (MP2) and goldstandard Coupled Cluster theory with single, double, and perturbative triple excitations (CCSD(T)), to illustrate the approach for both an efficient and a fully quantitative treatment of electron correlation, respectively^{24,25,26}. In addition to being nonlocal, both these methods also provide much improved fractional charge behavior, overcoming one of the main pathologies of conventional DFAs^{9,27,28,29}.
Because wavefunctionbased reference calculations are computationally expensive, it is critical to choose an ML approach that is as dataefficient as possible. In chemistry applications, Kernel Ridge Regression (KRR) has been found to be a good choice in this respect^{30,31,32,33}. We can write a generic KRR correlation functional as:
where the sum runs over N training densities, α_{i} are regression coefficients and K(ρ, ρ_{i}) is a kernel function that measures the similarity between the target and training densities. To ensure sizeextensivity of the functional, we construct K(ρ, ρ_{i}) from the atomic densities provided by the DF representation (see eq. (2)):
Now a kernel k(ρ_{A}, ρ_{B}) that measures the similarity of atomic densities can be defined using the RIDR. A wide range of kernel functions are possible, but we found that a simple polynomial Ansatz (as used for the SOAP kernel) already displays very promising performance, as discussed below:
Having defined the kernel function, the regression coefficients α_{i} (in eq. (3)) can be determined in closed form, for a given training set (see Supplementary Note 3). In the following these functionals are referred to as kernel density functional approximations (KDFA).
Performance and applications
KDFAs need to be trained on data, and are consequently always to some extent system specific. Nonetheless, a useful MLDFA methodology should be general in the sense that it can easily be trained for different chemistries and system sizes^{34}. Herein, three types of systems are considered, namely water clusters, microsolvated protons and linear alkanes of different sizes. In all cases, training and test structures are randomly drawn from molecular dynamics (MD) simulations at 350 K (see Methods section for details). For this part, we will concentrate on the results obtained with MP2 reference data. As also further commented below, CCSD(T) energies can be learned with the same accuracy and a full account of these results is provided in the SI.
In Fig. 2, learning curves of ML correlation functionals for the three types of systems are shown. In all cases, mean absolute errors (MAEs) below 25 meV (roughly corresponding to k_{B}T at 300 K) are reached with 100 training examples. Indeed, in many cases socalled chemical accuracy (1 kcal mol^{−1} ≈ 43 meV) is already achieved with only 10 training structures. Within each class, the MAEs generally increase for larger systems. However, the errors for pure and protonated water clusters with three and four molecules are nearly identical. This is readily understood, as the intermolecular interactions are approximately additive in these systems. In contrast, the interactions in longer alkane chains are harder to learn, e.g., owing to more pronounced interactions between the molecular termini in butane (C_{4}H_{10}) than in propane (C_{3}H_{8}).
Overall, these results show that highly accurate, nonlocal correlation functionals can be learned from easily tractable reference data sets. In fact, the similarity of the learning curves obtained for the three systems suggests that highlevel energies for a large ensemble of configurations can already be obtained with only 10–100 reference calculations. In many applications, even such a limited number of highlevel reference calculations may not be feasible, however, owing to the prohibitive scaling of wavefunction methods with system size^{35,36}.
Here, the sizeextensivity of our approach becomes important, as it enables us to train on small systems and predict the energies of larger ones^{34,37}. In Fig. 3, this is highlighted for the water octamer and octane, with the corresponding KDFAs trained exclusively on the systems with up to four monomers (i.e., at most half the size of the predicted systems). Clearly, this is a challenging test for the transferability of the functionals. The observed MAEs are indeed somewhat larger than for models validated on configurations of the same size as the training set (74 and 48 meV for (H_{2}O)_{8} and octane, respectively). A larger error should be expected for larger systems, however, precisely owing to the extensive nature of the correlation energy^{38,39}. Furthermore, the prediction errors are somewhat systematic, in particular for (H_{2}O)_{8}. Consequently, the MAE for relative energies of water octamer configurations is further reduced to 42 meV (46 meV for octane).
Of comparable computational cost as conventional lowrung functionals, the transferable KDFA approach provides access to quantitative CCSD(T) simulations for systems that before were on the brink of largestscale supercomputing studies or simply not tractable at all. We demonstrate this for the shared proton in a protonated water dimer, as an intensely studied testbed for the role of electron correlation and nuclear quantum effects in hydrated excess protons^{40,41,42,43,44,45,46,47,48,49}. An ensemble of 100,000 uncorrelated configurations of the protonated water dimer was drawn from a 5 ns MD trajectory generated at the semiempirical GFN2XTB level of theory^{50}. From this ensemble, 100 structures were randomly sampled and their energies computed at the CCSD(T)/def2TZVP level. These configurations were used to train a KDFA, which was used to predict the energies of all 100,000 configurations. These KDFA energies were then used to generate a CCSD(T)level ensemble, via the Monte Carlo resampling (RSM) approach of Essex and coworkers (see Supplementary Note 4)^{51}.
The two dominant reaction coordinates for this system are the oxygen–oxygen distance (r_{OO}) and the proton transfer coordinate ν^{40}. The latter is defined as the difference between the OH distances of the shared proton and each oxygen atom, with a value of ν = 0 meaning that the proton is equidistant to both oxygens, whereas positive or negative values indicate that the proton is closer to one of the oxygen atoms. In Fig. 4 (left) the freeenergy surface derived from the semiempirical MD trajectory is shown with respect to these coordinates. This reveals a fairly broad singlewell potential with a minimum around r_{OO} ≈ 2.45 Å and ν = 0. The location of the minimum and the overall shape of the well are in good agreement with previously reported probability distributions obtained from MD simulations with dispersioncorrected (semilocal) DFT^{52}.
In contrast, the freeenergy surface for the CCSD(T)quality KDFA displays some strikingly different features. Here, the minimum is more narrow and at shorter r_{OO}. Furthermore, the potential well has a distinct heartshape, meaning that at larger values of r_{OO}, the proton is no longer equidistant to both oxygen atoms but preferentially located closer to one or the other. Finally, the energy range of both freeenergy plots differs by over 100 meV, indicating that the minimum is much deeper and narrower at the CCSD(T) level. The comparison with the semiempirical surface (and analogous semilocal DFT surfaces)^{40,52} shows that the electron delocalization errors of these methods also leads to proton overdelocalization. Interestingly, the inclusion of nuclear quantum effects actually does delocalize the proton more strongly^{40,41}. However, to obtain good agreement with the experimental properties of water both an explicit treatment of nuclear quantum effects (i.e., via pathintegral MD) and a quantitatively correct classical potential energy surface (as provided by our CCSD(T)quality KDFA) are required^{53,54}.
Overall, the features of the presented CCSD(T) surface are in good agreement with the one recently reported by Kühne and coworkers, which was based on tourdeforce MD simulations requiring millions of CCSD/ccpVDZ calculations^{41}, i.e., in comparison with our work without perturbative triples in the coupled cluster ansatz and using an inferior basis set. In contrast, our combined KDFA/RSM approach only required 100 CCSD(T) reference calculations and was quickly performed in a matter of hours on a single workstation, even though we chose to employ a much larger ensemble to arrive at a better freeenergy sampling.
Discussion
Given the large variety of ML approaches reported in recent years, it is important to discuss the proposed KDFA method in a broader context. Most prominently, kernel and neural network regression has been very successfully used to directly learn the relationship between atomic coordinates and groundstate energies, i.e., the potential energy surface (PES)^{22,30,31,33,48,55,56,57,58,59,60,61,62,63,64,65,66}. The main advantage of this direct approach is that electronic structure calculations are only required for training, whereas predictions can be performed using just the atomic coordinates as input. On the other hand, this also means that the complex physics underlying the PES has to be learned completely from data, which often translates to very large training sets from tens of thousands to millions of configurations, depending on the system (see Supplementary Note 5)^{67,68}. Another downside is that most such ML forcefields are by construction shortranged, meaning that longrange Coulomb and dispersion interactions are not included.
Von Lilienfeld and coworkers proposed the socalled ΔML approach to mitigate these downsides^{69}. Here, the idea is to use an inexpensive semiempirical method as a baseline and learn the energy difference to a higherlevel method like DFT or CC. They showed that this requires much less training data to reach a given accuracy than for a pure ML model. Broadly speaking, the KDFA approach proposed here also falls into this category. However, rather than learning the total energy differences between different approximations wholesale, we exclusively focus on the correlation energy, which is generally much smoother (and therefore easier to learn) than the total PES^{32,70}. Furthermore, unlike the original ΔML approach, we use the electron density as input, rather than geometric information. Our ML models are thus genuine density functionals.
The presented approach is closest in spirit to the work of Miller and coworkers, where paircorrelation energies are learned from molecular orbital based descriptors (MOBML)^{34,37}. As in their work, our models are based on a rigorous physical theory of electron correlation and are by construction sizeextensive. The main difference is that in our case no previous energy decomposition or orbital localization is required. This enables us to choose arbitrary reference methods, for which paircorrelation energies may not be available or implemented (i.e., the random phase approximation, RPA, or quantum Monte Carlo methods). Furthermore, the requirement of orbital localization makes the application of MOBML to metallic or small bandgap systems fundamentally difficult.
Of course, the work of Burke, Müller, Tuckermann, and coworkers on MLbased DFT is also highly pertinent^{10,11,13,14,15}. Their most recent method focuses on the integrated prediction of both the valence electron density and the exchangecorrelation energy^{14}. This work was originally aimed at accelerating the computation of established DFAs, but has very recently been extended to also learn corrections to higherlevel methods like CCSD(T)^{15}. The main difference to our work is their choice of density representation. Specifically, they work in a planewave basis instead of the atomcentered Gaussians used herein. This makes learning the density much easier, since all basis functions are orthogonal. As a consequence of this choice, their models are not sizeextensive, however. Furthermore, molecular symmetry can only be included in an approximate form, i.e., through dataaugmentation. Consequently, these models can only be applied if at least some number of highlevel CC calculations are affordable for the complete system, and if the predictions are done on fairly similar configurations (i.e., within an MD trajectory). It should also be noted that Ceriotti, Corminbeauf, and coworkers have recently developed a method for learning electron densities in atomcentered basis sets^{20,21,71}. In future work, integrated, sizeextensive KDFAs could thus be developed that can predict both densities and energies.
Finally, a note on selfconsistency is in order. KSDFT calculations are commonly carried out selfconsistently. One could therefore argue that the presented approach is more closely related to postHF methods like Coupled Cluster theory. Nonetheless, we feel that our models are more accurately described as DFAs, as they exclusively depend on the electron density. In particular, no information related to virtual orbitals is required. The models therefore share the favorable computational scaling of pure DFAs. We also note that higherrung DFAs (i.e., double hybrids or RPAbased functionals) are still commonly used nonselfconsistently^{6,8}. Indeed, even classic semilocal functionals like Becke’s B88 were developed in a nonselfconsistent framework^{72}. In the same vein, a selfconsistent extension of the proposed method is clearly possible and will be the subject of future work. This will, however, likely require significantly more training data.
To conclude, we have presented a novel method to learn pure, nonlocal and transferable density functionals for the correlation energy from data. The high accuracy and dataefficiency of the method was demonstrated for a range of noncovalent, ionic and covalent systems of different sizes. In all cases, MAEs below 25 meV could be achieved with <100 training structures. We also demonstrated the transferability and sizeextensivity of the approach by training and predicting on differently sized systems. As an exemplary application of the new method, a highly accurate freeenergy surface for the shared proton in a protonated water dimer was obtained by resampling a semiempirical MD trajectrory with a CCSD(T)based KDFA at meanfield computational cost.
Methods
Computational details
Most electronic structure calculations were performed with the Psi4 package^{73}. The KDFA method was implemented as an external plugin to Psi4 via the Psi4numpy interface^{74}. CCSD(T) and MP2 calculations were performed with ORCA^{75}. The Karlsruhe def2TZVP basis set and corresponding DF basis were used throughout^{76,77}.
Molecular dynamics
NVT MD simulations were performed with the semiempirical GFN2XTB method and the atomic simulation environment^{50,78}. All simulations were run with a timestep of 0.5 fs and a Langevin thermostat with a coupling constant of 0.1 atomic units. For the learning curves in Fig. 2, decorrelated snapshots from 50 ps trajectories at 350 K were used. For the freeenergy surfaces in Fig. 4, a trajectory of 5 ns at 300 K was calculated.
Data availability
All data sets used in this paper are available in the supplementary information (Supplementary Data 1).
Code availability
The code used to fit the ML models is available at https://gitlab.com/jmargraf/kdf.
References
Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964).
Peverati, R. & Truhlar, D. G. Quest for a universal density functional: the accuracy of density functionals across a broad spectrum of databases in chemistry and physics. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 372, 20120476 (2014).
Schuch, N. & Verstraete, F. Computational complexity of interacting electrons and fundamental limitations of density functional theory. Nat. Phys. 5, 8 (2007).
Kohn, W. & Sham, L. J. Selfconsistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138 (1965).
Perdew, J. P. & Schmidt, K. Jacob’s ladder of density functional approximations for the exchangecorrelation energy. In AIP Conf. Proc., vol. 577, 1–20 (AIP, 2001).
Grimme, S. Semiempirical hybrid density functional with perturbative secondorder correlation. J. Chem. Phys. 124, 034108 (2006).
Becke, A. D. Density functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 98, 5648–5652 (1993).
Zhang, I. Y., Rinke, P., Perdew, J. P. & Scheffler, M. Towards efficient orbitaldependent density functionals for weak and strong correlation. Phys. Rev. Lett. 117, 133002 (2016).
Cohen, A. J., MoriSanchez, P. & Yang, W. Insights into current limitations of density functional theory. Science 321, 792–794 (2008).
Wagner, L. O., Stoudenmire, E. M., Burke, K. & White, S. R. Reference electronic structure calculations in one dimension. Phys. Chem. Chem. Phys. 14, 8581–8590 (2012).
Li, L., Baker, T. E., White, S. R. & Burke, K. Pure density functional for strong correlation and the thermodynamic limit from machine learning. Phys. Rev. B 94, 245129 (2016).
Schmidt, J., BenavidesRiveros, C. L. & Marques, M. A. L. Machine learning the physical nonlocal exchangecorrelation functional of densityfunctional theory. J. Phys. Chem. Lett. 10, 6425–6431 (2019).
Snyder, J. C., Rupp, M., Hansen, K., Müller, K.R. & Burke, K. Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012).
Brockherde, F. et al. Bypassing the KohnSham equations with machine learning. Nat. Commun. 8, 872 (2017).
Bogojeski, M., VogtMaranto, L., Tuckerman, M. E., Müller, K.R. & Burke, K. Quantum chemical accuracy from density functional approximations via machine learning. Nat. Commun. 11, 5223 (2020).
Nudejima, T., Ikabata, Y., Seino, J., Yoshikawa, T. & Nakai, H. Machinelearned electron correlation model based on correlation energy density at complete basis set limit. J. Chem. Phys. 151, 024104 (2019).
Margraf, J. T., Kunkel, C. & Reuter, K. Towards density functional approximations from coupled cluster correlation energy densities. J. Chem. Phys. 150, 244116 (2019).
Vyboishchikov, S. F. A simple local correlation energy functional for spherically confined atoms from ab initio correlation energy density. ChemPhysChem 18, 3478–3484 (2017).
Baerends, E. J. & Gritsenko, O. V. Quantum chemical view of density functional theory. J. Phys. Chem. A 101, 5383–5403 (1997).
Grisafi, A. et al. Transferable machinelearning model of the electron density. ACS Cent. Sci. 5, 57–64 (2019).
Fabrizio, A., Grisafi, A., Meyer, B., Ceriotti, M. & Corminboeuf, C. Electron density learning of noncovalent systems. Chem. Sci. 10, 9424–9432 (2019).
Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
Oliphant, N. & Bartlett, R. J. A systematic comparison of molecular properties obtained using HartreeFock, a hybrid HartreeFock densityfunctionaltheory, and coupledcluster methods. J. Chem. Phys. 100, 6550–6561 (1994).
Urban, M., Noga, J., Cole, S. J. & Bartlett, R. J. Towards a full CCSDT model for electron correlation. J. Chem. Phys. 83, 4041–4046 (1985).
Raghavachari, K., Trucks, G. W., Pople, J. A. & HeadGordon, M. A fifthorder perturbation comparison of electron correlation theories. Chem. Phys. Lett. 157, 479–483 (1989).
Bartlett, R. J. Manybody perturbation theory and coupled cluster theory for electron correlation in molecules. Annu. Rev. Phys. Chem. 32, 359–401 (1981).
Cohen, A. J., MoriSánchez, P. & Yang, W. Secondorder perturbation theory with fractional charges and fractional spins. J. Chem. Theory Comput. 5, 786–792 (2009).
Steinmann, S. N. & Yang, W. Wave function methods for fractional electrons. J. Chem. Phys. 139, 074107 (2013).
Margraf, J. T. & Bartlett, R. Coupled cluster and manybody perturbation theory for fractional charges and spins. J. Chem. Phys. 148, 221103 (2018).
von Lilienfeld, O. A. Quantum machine learning in chemical compound space. Angew. Chem. Int. Ed. 57, 4164–4169 (2018).
Bartók, A. P. et al. Machine learning unifies the modeling of materials and molecules. Sci. Adv. 3, e1701816 (2017).
Margraf, J. T. & Reuter, K. Making the coupled cluster correlation energy machinelearnable. J. Phys. Chem. A 122, 6343–6348 (2018).
Rupp, M. Machine learning for quantum mechanics in a nutshell. Int. J. Quantum Chem. 115, 1058–1073 (2015).
Welborn, M., Cheng, L. & Miller, T. F. Transferability in machine learning for electronic structure via the molecular orbital basis. J. Chem. Theory Comput. 14, 4772–4779 (2018).
Bartlett, R. J. & Musiał, M. Coupledcluster theory in quantum chemistry. Rev. Mod. Phys. 79, 291–352 (2007).
Crawford, T. D. & Schaefer, H. F. An introduction to coupled cluster theory for computational chemists. Rev. Comput. Chem. 14, 33–136 (2000).
Cheng, L., Kovachki, N. B., Welborn, M. & Miller, T. F. Regression clustering for improved accuracy and training costs with molecularorbitalbased machine learning. J. Chem. Theory Comput. 15, 6668–6677 (2019).
Perdew, J. P., Sun, J., Garza, A. J. & Scuseria, G. E. Intensive atomization energy: rethinking a metric for electronic structure theory methods. Z. f.ür. Phys. Chem. 230, 737–742 (2016).
Margraf, J. T., Ranasinghe, D. S. & Bartlett, R. J. Automatic generation of reaction energy databases from highly accurate atomization energy benchmark sets. Phys. Chem. Chem. Phys. 19, 9798–9805 (2017).
Tuckerman, M. E., Marx, D., Klein, M. L. & Parrinello, M. On the quantum nature of the shared proton in hydrogen bonds. Science 275, 817 (1997).
Spura, T., Elgabarty, H. & Kühne, T. D. Onthefly coupled cluster pathintegral molecular dynamics: impact of nuclear quantum effects on the protonated water dimer. Phys. Chem. Chem. Phys. 17, 14355–14359 (2015).
Clark, T., Heske, J. & Kühne, T. D. Opposing electronic and nuclear quantum effects on hydrogen bonds in H2O and D2O. ChemPhysChem 20, 2461–2465 (2019).
Dagrada, M., Casula, M., Saitta, A. M., Sorella, S. & Mauri, F. Quantum Monte Carlo study of the protonated water dimer. J. Chem. Theory Comput. 10, 1980–1993 (2014).
Marx, D., Tuckerman, M. E., Hutter, J. & Parrinello, M. The nature of the hydrated excess proton in water. Nature 397, 601–604 (1999).
Auer, A. A., Helgaker, T. & Klopper, W. Accurate molecular geometries of the protonated water dimer. Phys. Chem. Chem. Phys. 2, 2235–2238 (2000).
Cheng, H. P. & Krause, J. L. The dynamics of proton transfer in H5O2+. J. Chem. Phys. 107, 8461–8468 (1997).
Valeev, E. F. & Schaefer, H. F. The protonated water dimer: Brueckner methods remove the spurious C1 symmetry minimum. J. Chem. Phys. 108, 7197–7201 (1998).
Schran, C., Behler, J. & Marx, D. Automated fitting of neural network potentials at coupled cluster accuracy: protonated water clusters as testing ground. J. Chem. Theory Comput. 16, 88–99 (2020).
Kapil, V., VandeVondele, J. & Ceriotti, M. Accurate molecular dynamics and nuclear quantum effects at low cost by multiple steps in real and imaginary time: Using density functional theory to accelerate wavefunction methods. J. Chem. Phys. 144, 054111 (2016).
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2xTB  an accurate and broadly parametrized selfconsistent tightbinding quantum chemical method with multipole electrostatics and densitydependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).
CaveAyland, C., Skylaris, C. K. & Essex, J. W. A Monte Carlo resampling approach for the calculation of hybrid classical and quantum free energies. J. Chem. Theory Comput. 13, 415–424 (2017).
Marsalek, O. & Markland, T. E. Ab initio molecular dynamics with nuclear quantum effects at classical cost: ring polymer contraction for density functional theory. J. Chem. Phys. 144, 054112 (2016).
Naserifar, S. & Goddard, W. A. Liquid water is a dynamic polydisperse branched polymer. Proc. Natl Acad. Sci. 116, 1998–2003 (2019).
HeadGordon, T. & Paesani, F. Water is not a dynamic polydisperse branched polymer. Proc. Natl Acad. Sci. 116, 13169–13170 (2019).
Dral, P. O. Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett. 11, 2336–2347 (2020).
Jung, H. et al. Sizeextensive molecular machine learning with global representations. ChemSystemsChem 1900052, syst.201900052 (2020).
Dragoni, D., Daff, T. D., Csányi, G. & Marzari, N. Achieving DFT accuracy with a machinelearning interatomic potential: Thermomechanics and defects in bcc ferromagnetic iron. Phys. Rev. Mater. 2, 013808 (2018).
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
Huo, H. & Rupp, M. Unified representation of molecules and crystals for machine learning https://arxiv.org/abs/1704.06439 (2017).
Faber, F. A. et al. Prediction errors of molecular machine learning models lower than Hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017).
Deringer, V. L. & Csányi, G. Machine learning based interatomic potential for amorphous carbon. Phys. Rev. B 95, 094203 (2017).
Rupp, M., Tkatchenko, A., Müller, K.R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
Artrith, N., Morawietz, T. & Behler, J. Highdimensional neuralnetwork potentials for multicomponent systems: applications to zinc oxide. Phys. Rev. B 83, 153101 (2011).
Morawietz, T., Sharma, V. & Behler, J. A neural network potentialenergy surface for the water dimer based on environmentdependent atomic energies and charges. J. Chem. Phys. 136, 064103 (2012).
Behler, J. & Parrinello, M. Generalized neuralnetwork representation of highdimensional potentialenergy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Smith, J. S. et al. Approaching coupled cluster accuracy with a generalpurpose neural network potential through transfer learning. Nat. Commun. 10, 2903 (2019).
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the δmachine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).
Peyton, B., Briggs, C., D’Cunha, R., Margraf, J. T. & Crawford, T. Machinelearning coupled cluster properties through a density tensor representation. J. Phys. Chem. A 124, 4861–4871 (2020).
Grisafi, A., Wilkins, D. M., Csányi, G. & Ceriotti, M. Symmetryadapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 120, 36002 (2018).
Becke, A. D. Densityfunctional exchangeenergy approximation with correct asymptotic behavior. Phys. Rev. A 38, 3098–3100 (1988).
Smith, D. G. A. et al. Psi4Numpy: an interactive quantum chemistry programming environment for reference implementations and rapid development. J. Chem. Theory Comput. 14, 3504–3511 (2018).
Parrish, R. M. et al. Psi4 1.1: an opensource electronic structure program emphasizing automation, advanced libraries, and interoperability. J. Chem. Theory Comput. 13, 3185–3197 (2017).
Neese, F. Software update: the ORCA program system, version 4.0. WIREs Comput. Mol. Sci. 8, e1327 (2018).
Weigend, F. & Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 7, 3297–305 (2005).
Weigend, F. Hartreefock exchange fitting basis sets for H to Rn. J. Comput. Chem. 29, 167–175 (2008).
Hjorth Larsen, A. et al. The atomic simulation environment  a Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).
Acknowledgements
We acknowledge funding from the German Research Foundation (DFG) through its Cluster of Excellence econversion EXC 2089/1.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
J.T.M. and K.R. devised the project. J.T.M. wrote the KDFT code and performed all calculations. All authors contributed to writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Margraf, J.T., Reuter, K. Pure nonlocal machinelearned density functional theory for electron correlation. Nat Commun 12, 344 (2021). https://doi.org/10.1038/s4146702020471y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702020471y
This article is cited by

Human and machinecentred designs of molecules and materials for sustainability and decarbonization
Nature Reviews Materials (2022)

A transferable recommender approach for selecting the best density functional approximations in chemical discovery
Nature Computational Science (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.