Neural network interpolation of exchange-correlation functional

Density functional theory (DFT) is one of the most widely used tools to solve the many-body Schrodinger equation. The core uncertainty inside DFT theory is the exchange-correlation (XC) functional, the exact form of which is still unknown. Therefore, the essential part of DFT success is based on the progress in the development of XC approximations. Traditionally, they are built upon analytic solutions in low- and high-density limits and result from quantum Monte Carlo numerical calculations. However, there is no consistent and general scheme of XC interpolation and functional representation. Many different developed parametrizations mainly utilize a number of phenomenological rules to construct a specific XC functional. In contrast, the neural network (NN) approach can provide a general way to parametrize an XC functional without any a priori knowledge of its functional form. In this work, we develop NN XC functionals and prove their applicability to 3-dimensional physical systems. We show that both the local density approximation (LDA) and generalized gradient approximation (GGA) are well reproduced by the NN approach. It is demonstrated that the local environment can be easily considered by changing only the number of neurons in the first layer of the NN. The developed NN XC functionals show good results when applied to systems that are not presented in the training/test data. The generalizability of the formulated NN XC framework leads us to believe that it could give superior results in comparison with traditional XC schemes provided training data from high-level theories such as the quantum Monte Carlo and post-Hartree-Fock methods.


I. INTRODUCTION
Since its initial formulation, density functional theory (DFT) 1,2 has become one of the main methods to solve the many-body Schrodinger equation.The source of the major ambiguity in DFT theory is the lack of a universal form of the exchange-correlation (XC) functional.Therefore, the success of DFT is mostly determined by the improvement of XC representations.The seminal Monte Carlo (MC) simulations of Ceperley and Adler 3 have resulted in a number of practical local density approximations (LDAs) [4][5][6] .The next major progress was produced by the introduction of the generalized gradient approximation (GGA), which takes into account local gradients of electron density [7][8][9] .This greatly enhanced the ability of DFT to describe systems with inhomogeneous electron densities, such as molecules and surfaces.
Despite the obvious success of the LDA and GGA, their construction is a highly complicated process that includes many heuristics steps.Even the XC functional form itself is based on some assumptions about the local nature of XC interactions.A large number of analytical conditions to fulfill 9 and a commitment to the local functional form make it difficult to incorporate new numerical results from high-level quantum mechanics theories.For example, only numerical data from MC simulations of homogeneous electron gas are used to produce LDA 6 and GGA 9 functionals.Therefore, much of the progress in quantum MC 10,11 and post-Hartree-Fock 12 methods is not adopted by XC approximations.One possible solution to overcome this problem is to use neural networks (NNs) a) Electronic mail: ryabov.alexandr@phystech.edub) Electronic mail: p.zhilyaev@skoltech.ru for the interpolation of XC functionals.The NN provides a general way to represent any functional relationship 13 and a highly flexible framework to embody both analytical and numerical training data.
There have been a number of studies in which NNs have been used to describe the potential energy surface to speed up electronic structure calculations [14][15][16][17][18][19][20][21] , while fewer studies have been devoted to developing NN XC functionals 22 .These works heavily use feature engineering and specify abstract representations of atomistic systems to use the NN approach.Only recently have several research groups 23,24 developed NN XC functionals for modeling 2D systems, avoiding any data preprocessing steps.
In this work, we focus on constructing NN XC functionals for 3D systems without any feature engineering.We show that even rather simple NN architectures reproduce LDA and GGA XC potentials with satisfactory accuracy.To construct the LDA, potential point-to-point mapping is used due to the local idea behind the LDA itself.GGA potential region-topoint mapping is achieved by increasing the number of input neurons to take into account the local environment.This paper is outlined as follows: In Section II, we describe the data generation process, NN topologies and hyperparameters.In Section III, we show that the constructed NN XC potentials have the capability to reproduce LDA and GGA results for real physical systems.In Section IV, we summarize our results and consider future research that could be done with the proposed framework, with a particular emphasis on incorporating data from high-level methods such as quantum Monte Carlo and post-Hartree-Fock.arXiv:1909.03860v1[physics.comp-ph]9 Sep 2019 FIG. 1. Topology of the developed neural network.Two hidden layers are used, with 1000 and 500 neurons, respectively.A base 10 logarithm is applied to the input data (electron density).This helps to scale the input data to the range that the neural network is most sensitive to.

II. METHODS
To obtain the training data needed to build the NN XC potential, we carry out calculations of three-dimensional (3D) electron gases in a framework of the Kohn-Sham formalism 1,2 for potentials of a simple harmonic oscillator (SHO).This potential takes the form: where k i is the spring constant and r = (x 1 , x 2 , x 3 ) is the radius vector.Parameter k i pertains to a simple harmonic oscillator randomly sampled from intervals 0.01 ≤ k i ≤ 0.9 Ha/a 2 0 , where a 0 is 1 Bohr radius.
The XC potential is evaluated on a 40 × 40 × 40 a 0 parallelepiped with 32 × 32 × 32 mesh, which corresponds to a spacing of 1.25 a 0 .Three electrons are placed in this 3D space.For each chosen parameter, DFT calculations in real space were performed using the Octopus code [25][26][27] .The parametrizations of the LDA exchange-correlation functional used include the Slater exchange 28,29 and Perdew & Zunger (modified) correlation part 30 .The GGA parameters used include the exchange and correlation parts from 9,31 .The total number of DFT calculations performed is 150 for each XC parametrization.The ranges of electron density represented by the Wigner-Seitz radius r s are [1; 5 × 10 6 ] and [1; 10 8 ] for LDA and GGA, respectively.One calculation then provides a 32 × 32 × 32 dataset size for LDA and a 28 × 28 × 28 size for GGA.These yield overall dataset sizes of approximately 5.0 million and 3.3 million for LDA and GGA, respectively, but only 3.3 million samples are used for both functionals.15% of all datasets are used for the test set.
The composition of Tensorflow 32 and TFLearn 33 is utilized for training the NN.The neural architecture used for the LDA is a fully connected network with one input neuron, one linear output neuron and two ReLU 34 hidden layers with 1000 and 500 neurons (see fig. 1).The only difference in the GGA case is that it takes into account the contribution of the local environment, which corresponds to an increase in the number of neurons in the first layer to 125 (5 × 5 × 5 cube).The Adam algorithm with a learning rate of 0.001 is used for training, with MSE loss.

III. RESULTS AND DISCUSSION
Multiple neural architectures, including different activation functions, numbers of layers and numbers of neurons, are tested.We choose robust and rather simple architectures, namely, the point-to-point and region-to-point architectures described above.We also test NNs with only one hidden layer with 1000 neurons, which have poor generalizability and result in a larger loss in comparison with the topology used.It should be noted that the topology chosen is heuristics, and it is possible that another number of hidden layers and neurons could give better results than those provided in this study.The  search for the optimal NN topology is a large task that needs to be carried out in future studies.
Because the density range contains mostly small values, base-10 logarithmic transformation is chosen for preprocessing the dataset.This scaling greatly improves the convergence speed of the backpropagation algorithm during the NN training cycle.No other preprocessing techniques and feature engineering are applied to the input data.
Training of the LDA NN is stopped after 1300 epochs.The NN LDA mean absolute error (MAE) is 0.03 mHa × Bohr 3 .The mean minimum and maximum values of V xc averaged over all configurations used for the LDA NN construction are 0.73 and 510 mHa×Bohr 3 , respectively.Training of the GGA NN is terminated after 320 epochs.The NN GGA mean absolute error (MAE) is 0.156 mHa × Bohr 3 .The mean minimum and maximum values of Vxc averaged over all configurations used for the GGA NN construction are 0 and 553 mHa × Bohr 3 , respectively.The results of the training for both the LDA and GGA NNs are summarized in Table I.
The results of applying an NN to the electron density from the test set are presented in fig. 2. Both XC potentials provided by the LDA and GGA NNs are in good agreement with the numerical calculations made by Octopus.The special features of the V xc form, for example, the "two humps", are also well reproduced by them.
To test the NN XCs for real physical systems, we consider Si in a diamond structure with 8 basis atoms in a cubic cell (lattice constant: 10.2 a 0 ).The numerical calculation is performed on a 10.2 × 10.2 × 10.2 a 0 parallelepiped with a 14 × 14 × 14 mesh, which corresponds to a spacing of 0.729 a 0 .The NN LDA and NN GGA MAEs are 0.6 and 18.1 mHa × Bohr 3 , respectively.The NN LDA results are in excellent agreement with the reference data obtained from Octopus.The NN GGA describes V xc reasonably well, except for regions with high density variations.We believe that this is because of the different spacings of the training/test data, which lead to the change in electron density gradients perceived by the NN XC.
The NN XC potentials are also tested on slices of 3D data  in the plane of the benzene molecule.The numerical calculation is performed on a 14 × 14 × 14 a 0 parallelepiped with a 44 × 44 × 44 mesh, which corresponds to a spacing of 0.318 a 0 .The NN LDA and NN GGA MAEs are 1.36 and 18.3 mHa × Bohr 3 , respectively.The overall form of the XC potential obtained by Octopus is similar to the form produced by both NNs.In the case of benzene, the MAE is substantially larger than the MAE obtained during the training/test procedure.This is due to the high electron density values obtained for benzene, which are not presented in the training/test dataset.Moreover, the mesh spacing in the benzene calculation is smaller than that in the dataset, which is important for NN GGA potential evaluation.Despite the lack of appropriate input data, both NN XC potentials show an impressive ability  III) demonstrates a large MAE for both systems considered, as the mesh spacing is not presented in the training procedure.Electron density spatial information not included explicitly in the training set leads to the fact that neural networks can take gradients into account in a way that cannot be properly generalized.Changing the mesh corrupts the learned representations and makes it not quite correct to apply the NN GGA to systems on meshes other than the one learned.
To construct mesh-independent nonlocal functionals such as the GGA, it is necessary to use the coordinates of mesh points at which the electron density is defined.This additional information will help the NN evaluate mesh-independent gradients or other higher-order variations of electron density and include them in the process of the interpolation of the exchange-correlation potential.The type of local coordinate system to use remains the subject of further research.

IV. SUMMARY
In this work, we developed, verified and tested a neuralnetwork approach to interpolating XC functionals.We show that the NN approach gives adequate results for both the LDA and GGA considered.When considering a model system based on three electrons placed in a parallelepiped box with an external potential of a 3D simple harmonic oscillator, the NN is in perfect agreement with numerical calculations from the test dataset.Moreover, the neural network shows a satisfactory error in real 3D physical systems not presented in the training/test data, such as silicon and benzene.Despite the lack of generalizability of the NN GGA for both systems, the analysis of those results led us to the idea of improving the NN approach by searching for a better type of coordinate system for proper representation of the input data.It is important to note that the neural network topology used in this work is very simple and has good potential for further improvements.
The main advantage of the NN approach in comparison with other interpolation techniques for XC functionals is its flexibility to incorporate exchange-correlation data from different sources, such as post-Hartree-Fock and quantum Monte Carlo.It is possible that application of the NN to interpolate high-level XC quantum data could eliminate many heuristics used in the traditional construction of XC functionals.

FIG. 2 .
FIG. 2. The results of applying NN XC functionals to the self-consistent electron density obtained from solving the Schrodinger equation in the framework of DFT for 3 electrons in a simple harmonic potential.LDA and GGA "true" V xc results are produced by the Octopus package.All presented images are slices of 3D data in the z = 0 plane; (a) -the slice of the electron density obtained in the DFT calculation using LDA XC funtional, (b) -the slice of the XC potential obtained in the same as (a) DFT calculation using LDA XC functional, (c) -the slice of the NN XC potential obtained from the (a) electron density, (d) -the slice of the electron density obtained in the DFT calculation using GGA XC funtional, (e) -the slice of the XC potential obtained in the same as (d) DFT calculation using GGA XC functional, (f) -the slice of the NN XC potential obtained from the (d) electron density.

FIG. 3 .
FIG. 3. The results of applying NN XC functionals to the self-consistent electron density obtained from solving the Schrodinger equation in the framework of DFT for a Si diamond structure with 8 basis atoms in a cubic cell.The LDA and GGA "true" V xc results are produced by the Octopus package.All presented images are slices of 3D data in the (001) plane defined by the Miller indices; (a) -the slice of the electron density obtained in the DFT calculation using LDA XC funtional, (b) -the slice of the XC potential obtained in the same as (a) DFT calculation using LDA XC functional, (c) -the slice of the NN XC potential obtained from the (a) electron density, (d) -the slice of the electron density obtained in the DFT calculation using GGA XC funtional, (e) -the slice of the XC potential obtained in the same as (d) DFT calculation using GGA XC functional, (f) -the slice of the NN XC potential obtained from the (d) electron density.

FIG. 4 .
FIG. 4. The results of applying NN XC functionals to the self-consistent electron density obtained from solving the Schrodinger equation in the framework of DFT for a benzene molecule.The LDA and GGA "true" V xc results are procuced by the Octopus package.All presented images are slices of 3D data in the plane of the benzene molecule; (a) -the slice of the electron density obtained in the DFT calculation using LDA XC funtional, (b) -the slice of the XC potential obtained in the same as (a) DFT calculation using LDA XC functional, (c) -the slice of the NN XC potential obtained from the (a) electron density, (d) -the slice of the electron density obtained in the DFT calculation using GGA XC funtional, (e) -the slice of the XC potential obtained in the same as (d) DFT calculation using GGA XC functional, (f) -the slice of the NN XC potential obtained from the (d) electron density.

TABLE II .
Summary of training results for NN LDA

TABLE III
The results obtained from the developed NN XC potentials show that NN LDA works well, provided that the electron density values are presented in the training/test dataset.Based on table II, the NN LDA MAE is rather small for silicon and becomes larger for benzene, in which the electron density contains values that are outside the training/test density range.It is not surprising that the NN GGA (see table