Ab initio calculation of real solids via neural network ansatz

Neural networks have been applied to tackle many-body electron correlations for small molecules and physical models in recent years. Here we propose an architecture that extends molecular neural networks with the inclusion of periodic boundary conditions to enable ab initio calculation of real solids. The accuracy of our approach is demonstrated in four different types of systems, namely the one-dimensional periodic hydrogen chain, the two-dimensional graphene, the three-dimensional lithium hydride crystal, and the homogeneous electron gas, where the obtained results, e.g. total energies, dissociation curves, and cohesive energies, reach a competitive level with many traditional ab initio methods. Moreover, electron densities of typical systems are also calculated to provide physical intuition of various solids. Our method of extending a molecular neural network to periodic systems can be easily integrated into other neural network structures, highlighting a promising future of ab initio solution of more complex solid systems using neural network ansatz, and more generally endorsing the application of machine learning in materials simulation and condensed matter physics.


Introduction
Solving the many-body electronic structure of real solids from ab initio is one of the grand challenges in condensed matter physics and materials science [1]. More accurate ab initio solutions can push the limit of our understanding in many fundamental and mysterious emergent phenomena, such as superconductivity, light-matter interaction, heterogeneous catalysis, to name just a few [2]. The current workhorse method is density functional theory (DFT), whose accuracy depends quite sensitively on the choice of the so-called exchange-correlation functional and unfortunately there lacks a systematic routine towards the exact [3,4]. Other commonly used ab initio quantum chemistry methods, such as the coupled-cluster and configuration interaction theories [5], can provide more accurate solutions for molecules but face severe difficulty when applied to solid systems. Recently, several breakthroughs have been made in applying these quantum chemistry methods on solids [6,7], driving the study of solid systems towards a new frontier.
Meanwhile, in the last few years, new attempts to tackle the correlated wavefunction problem in molecules or model Hamiltonians using neural network based approaches have been reported by different groups [8][9][10][11][12][13][14]. The key idea is to use the neural network as the wavefunction ansatz in quantum Monte Carlo (QMC) simulations. The stochastic nature of QMC enables a considerably economical [6,[15][16][17] time scaling and efficient parallelization. The universal approximation theorem behind neural network based ansatz significantly improves the accuracy of the traditional QMC method and the strategy has been proved successful in studying small molecules [10][11][12]. However, how to apply such neural network ansatz for real solids, i.e. how to apply periodic boundary conditions (PBC) in the neural network, and whether it can describe the long-range electron correlations in extended systems remain as open questions.
Here we propose a powerful periodic neural network ansatz, * Email: lixiang.62770689@bytedance.com based on which we develop a highly efficient QMC method for ab initio calculation of real solid and general periodic systems with unprecedented accuracy. We apply our method to periodic hydrogen chains, graphene, lithium hydride (LiH) crystal, and homogeneous electron gas. These systems cover a wide range of interests, including materials dimension from one to three, electronic structure from metallic to insulating, and bonding type from covalent to ionic. The calculated dissociation curve, cohesive energy and correlation energy, can be compared satisfactorily with available experimental values and other state-of-the-art computational approaches. Electron densities of typical systems are further calculated to test our neural network and explore the underlying physics. All the results demonstrates that our method can achieve accurate electronic structure calculations of real solid/periodic systems.

Neural network for solid system
Periodicity and anti-symmetry are two fundamental properties of the wavefunction of a solid system. The anti-symmetry can be ensured by the Slater determinant, which has been commonly used as the basic block in molecular neural networks. We also express the wavefunction by two Slater determinants of one spin-up channel and one spin-down channel, In this regard, our ansatz resembles the structure of FermiNet [10,11], whereas other neural network wavefunction ansatz may include extra terms in addition to the Slater determinants [12]. Each determinant is then constructed from a set of periodic orbitals, which inherits the physics captured by the Bloch function form by a product of phase factor ⋅ and collective molecular orbital mol . Fig. 1 displays more details on the structure of our neural network. Building an efficient and accurate periodic ansatz is the key The electron coordinates are passed to two channels. In the first one, they build the periodic distance features ( ) using the periodic metric matrix and the lattice vectors , and then ( ) features are fed into two molecular neural networks, that represent separately the real and the imaginary part of wavefunction. In the second channel, constructs the plane-wave phase factors on a selected set of crystal momentum vectors. The total wavefunctions of solids are constructed by the two channels following the expression of Eq. (1).
step in developing ab initio methods for solid. Here we have followed the recently proposed scheme of Whitehead et al. to construct a set of periodic distance features ( ) [18] using lattice vectors in real and reciprocal space ( , ), The periodic metric matrix is constructed by periodic functions , , which retains ordinary distances at the origin and regulates them to periodic ones at far distances, ensuring asymptotic cusp form, continuity, and periodicity requirement at the same time.
The constructed periodic distance features ( ) can then be fed into molecular neural networks to form collective orbitals mol . Specifically, in this work we represent the molecular networks with FermiNet [10], which incorporates the electron-electron interactions. The inclusion of all-electron features promotes the traditional single-particle orbitals to the collective ones, and hence the description of wavefunction and correlation effects can be improved while fewer Slater determinants are required. In addition, the wavefunction of solid systems is necessarily complex-valued, and we introduce two sets of molecular orbitals to represent the real and imaginary parts of the solid wavefunction, respectively. The plane-wave phase factors i ⋅ in Fig. 1 are used to construct the Bloch function like orbitals, and the corresponding points are selected to minimize the Hartree-Fock (HF) energy.
Based on the variational principle, our neural network is trained using the variational Monte Carlo (VMC) approach. To efficiently optimize the network, a modified Kronecker factored cur-vature estimator (KFAC) optimizer [19] is adopted, which significantly outperforms traditional energy minimization algorithms. Calculations are also ensured by efficient and massive parallelization on multiple nodes of high-performance GPUs. More details on the theories, methods, and computations are included in the Methods section and the supplementary information.

Hydrogen chain
Hydrogen chain is one of the simplest models in condensed matter research. Despite its simplicity, hydrogen chain is a challenging and interesting system, serving as a benchmark system for electronic structure methods and featuring intriguing correlated phenomena [20]. The calculated energy of the periodic H 10 chain as a function of the bond length is shown in Fig. 2a. The results from lattice-regularized diffusion Monte Carlo (LR-DMC) and traditional VMC are also plotted for comparison [20]. We can see that our results nearly coincide with the LR-DMC results and significantly outperform traditional VMC (see Supplementary Table 3). In Fig. 2b, the energy of hydrogen chains of different atom numbers are calculated for extrapolation to the thermodynamic limit (TDL). The shaded bar in Fig. 2b illustrates the extrapolated energy of the periodic hydrogen chain at TDL from auxiliary field quantum Monte Carlo (AFQMC), which is considered as the current state-of-the-art along with LR-DMC. Our TDL result is comparable with both AFQMC and LR-DMC (see Supplementary Table 4). Our results are all labeled as Net. , H 10 dissociation curve is plotted. , energy of different hydrogen chain sizes N, the bond length of hydrogen chain is fixed at 1.8 Bohr. LR-DMC and VMC both use TZ-LDA basis and AFQMC is pushed to complete basis limit [20]. , structure of graphene. , the cohesive energy per atom of Γ point and finite-size error corrected result is plotted. Experiment cohesive energy is from Ref. [21]. Graphene is calculated at its equilibrium length 1.421 Å. , structure of rock-salt lithium hydride crystal. , equation of state of LiH crystal is plotted, fitted Birch-Murnaghan parameters and experimental data are also given. HF corrections are calculated using ccpvdz basis, and HF ∞ is approximated by HF N=8 . The arrows denote the corresponding HF corrections. , plot of homogeneous electron gas system. , correlation error of 54 electrons HEG systems at different . DCD and BF-VMC results are displayed for comparison, and BF-DMC data is used as reference [22,23].

Graphene
Graphene is arguably the most famous two-dimensional system ( Fig. 2c) receiving broad attention in the past two decades for its mechanical, electronic, and chemical applications [24]. Here we carry out simulations to estimate its cohesive energy, which measures the strength of C-C chemical bonding and long-range dispersion interactions. The calculations are performed on a 2 × 2 supercell of graphene using twist average boundary condition (TABC) [25] in conjunction with structure factor ( ) correction [26] (see Supplementary Fig. 3) to reduce the finite-size error. The calculated results are plotted in Fig. 2d along with the experimental value [21], and it shows that our neural network can deal with graphene very well, producing a cohesive energy of graphene within 0.1 eV/atom to the experimental reference (see Supplementary Table 6). We also plotted the results with PBC, namely the Γ point only result, which deviates from the experiment data by 1.25 eV/atom.

Lithium hydride crystal
For three-dimensional system, we consider the LiH crystal with a rock-salt structure (Fig. 2e), another benchmark system for accurate ab initio methods [6,27,28]. Despite consisting of only simple elements, LiH represents typical ionic and covalent bondings upon changing the lattice constants. Using our neural network, we first simulate the equation of state of LiH on a 2 × 2 × 2 supercell, as shown in Fig. 2f. In addition, we employ a standard finite-size correction based on Hartree-Fock calculations of a large supercell (see Supplementary Fig. 5). In Fig. 2f we also show the Birch-Murnaghan fitting to the equation of state, based on which we can obtain thermodynamic quantities such as the cohesive energy, the bulk modulus, and the equilibrium lattice constant of LiH. As shown in the inset, our results on the thermodynamic quantities agree very well with experimental data [27] (see Supplementary Table 8 , 9). For further validation, we have also computed directly the 3 × 3 × 3 supercell of LiH at its equilibrium length 4.061Å, which contains 108 electrons. To the best of our knowledge, this is the largest electronic system computed using a high-quality neural network ansatz. The 3×3×3 supercell calculation predicts the total energy per unit cell of LiH is −8.160 Hartree and the cohesive energy per unit cell is −4.770 eV after the finite-size correction (see Supplementary Table 10), which is also very close to the experimental value −4.759 eV [27].

Homogeneous electron gas
In addition to the real solids, our computational framework can also apply straightforwardly to model systems such as homogeneous electron gas (HEG). HEG has been studied for a long time to understand the fundamental behavior of metals and electronic phase transitions [29]. Several seminal ab initio works have reported the energy of HEG at different densities [22,23,[29][30][31], and recently more investigations have been conducted using neural network ansatz [30,31]. Here we broaden our tests to simulate a simple cubic cell containing 54 electrons in closed-shell configuration (Fig. 2g). Fig. 2h shows the correlation energy error from our neural network calculations on HEG at 6 different densities from = 0.5 Bohr to 20.0 Bohr. The state-of-the-art results, namely VMC with backflow correlation (BF) [22] and dis- tinguishable cluster with double excitations (DCD) [23] are also plotted for comparison, and the most accurate BF-DMC result is used as the reference energy of correlation error. Overall, our neural network performs very well, with an error of less than 1% in a wide range of density (see Supplementary Table 11). Generally, the correlation error increases as the density of HEG decreases when the correlation effects become larger, which also appears in DCD calculations.

Electron density
Besides the total energy of solid systems, the electron density is also a key quantity to be calculated. For example, electron density is crucial for characterizing different solids, such as the difference between insulators and conductors, and the distinct nature of ionic and covalent crystals. In DFT the one-to-one correspondence between many-body wavefunction and electron density proved by Hohenberg and Kohn in 1964 suggests that electron density is a fundamental quantity of materials. However, an interesting survey found that while new functionals in DFT improve the energy calculation the obtained density somehow can deviate from the exact [32]. Here, with our accurate neural network wavefunction, we can also obtain accurate electron density of solids and provide a valuable benchmark and guidance for method development.
A conductor features free-moving electrons, which would have macroscopic movements under external electric fields. In contrast, electrons are localized and constrained in insulators and cause considerable electron resistance. In Fig. 3, as an example, we show the calculated electron density of the hydrogen chain at two different bond lengths. As we can see, for the compressed hydrogen chain (L = 2 Bohr), the electron density is rather uniform and has small fluctuations. As the chain is stretched, the electrons are more localized and the density profile has larger variations. The observation is consistent with the well-known insulator-conductor transition on the hydrogen chain by varying the bond length. To measure the transition more quantitatively, we further calculate the complex polarization as the order parameter for insulator-conductor transition [33]. A conducting state is characterized by a vanishing complex polarization modulus | | ∼ 0, while an insulating state has finite | | ∼ 1. We can see that the conductor-insulator transition bond length of hydrogen chain is around 3 Bohr according to the calculated results, which is also consistent with previous studies [33].
Ionic and covalent bonds are the most fundamental chemical bonds in solids. While the physical pictures of these two types of bonding are very different, they both lie in the behavior of electrons as the "quantum glue" and electron density distribution is a simple way to visualize different bonding types. Here we choose to calculate the electron density of the diamond-structured Si, rock-salt NaCl and LiH crystals at their equilibrium position. They are representative of covalent and ionic crystals and have also been investigated by other high-level wavefunction methods, e.g. AFQMC [34]. Note that in the calculations of NaCl and Si, correlation-consistent effective core potential (ccECP) is employed to reduce the cost, which removes the inertia core electrons and keeps the behavior of active valence electrons [13,35].
The electron density of diamond-structured Si in its (011) plane is plotted in Fig. 4b. We can see that valence electrons are shared by nearest Si atoms, forming apparent Si-Si covalent bonds. In contrast, valence electrons are located around atoms in NaCl crystal as Fig. 4c shows. All the valence electrons are attracted around Cl atoms, forming effective Na + and Cl − ions in the crystal. Moreover, the electron density of LiH crystal is also calculated and plotted in Fig. 4d. LiH crystal is a moderate system between a typical ionic and covalent crystal. According to the result, the electrons are nearly equally distributed near Li and H atoms for our network. Detailed Bader charge analysis [36] manifests the atoms in the crystal become Li 0.67+ and H 0.67− ions respectively (resolution ∼ 0.015Å), which slightly deviates from the stable closed-shell configuration (see Supplementary Note 6 for more details).

Conclusion
The construction of a wave function for solid systems is a crucial but unsolved problem in the neural network community. The core mechanism of our neural network is the use of the periodic distance feature, which promotes molecule neural networks elegantly to the corresponding periodic ones and avoids timeconsuming lattice summation. Considering the high-accuracy results obtained in this work, our neural network can be further applied to study more delicate physics and materials problems, such as the phase transitions of solids, surfaces, interfaces, and disordered systems, to name just a few. Our ansatz also offers a flexible extension to other neural networks and an easy integration into traditional computational techniques. The naturally evolved many-body wavefunction from the neural network may provide more physical and chemical insights to emergent phenomena of complex materials.

Methods
Supercell approximation. Simulating a solid system requires solving the Schrödinger equation of many electrons within a large bulk. Supercell approximation is usually adopted to simplify the problem, considering a finite number of electrons and nuclei with periodic boundary conditions, whose Hamiltonian readŝ where denotes the spatial position of i-th electron in the supercell.
where , denote the momentum vectors reduced in the first Brillouin zone of the supercell and the primitive cell, respectively. Eq. (4) and the anti-symmetry condition together form the fundamental requirements for Ψ. As the size of supercell increases, simulation results gradually converge to the thermodynamic limit of real solid system.

Wavefunction ansatz.
In conventional QMC simulation of solids, Hartree-Fock type wavefunction anzatz composed of Bloch functions is often used, which reads In order to satisfy Eq. (4), in the determinant should lie on the grid of supercell reciprocal lattice vectors { } offset by within the first Brillouin zone of primitive cell. Moreover, functions in Eq. (5) should satisfy the translation invariant condition by the primitive cell lattice vectors, Following the strategy of FermiNet [10], Bloch functions in Eq. (5) can be promoted with collective distances, where ≠ denotes all the electron coordinates except . These collective orbitals are constructed to achieve the equivalence of electron permutations , which combined with the Slater determinant ensure the antisymmetry nature of electron. Moreover, we use the periodic distance features ( ) in Eq. (2) to substitute ordinary | | in the molecular neural network. The periodic functions , used in Eq. (2) read and their arguments are reduced into [− , ]. Eq. (6) can then be satisfied without causing discontinuity [18]. For an overall sketch of the neural network, see Algorithm 1. Note that the distance between electrons and nuclei is omitted for HEG system since it does not contain any nucleus. Specific hyperparameters of each system are listed in Supplementary Note 1.

Neural network optimization.
Parameters within the neural network can be optimized to minimize the energy expectation value ⟨ ⟩, and the gradient ∇ ⟨ ⟩ reads where denotes the local energy of neural network ansatz Ψ. Besides energy minimization, stochastic reconfiguration optimization [37] has also been widely adopted and proved to be much more efficient, whose gradient reads  In this work, we adopt a modified KFAC optimizer, which approximates as where denotes the weight parameters of layer , and vec means vectorized form.
, denote the activation and sensitivity of layer respectively. Note that activation is always real-valued, which explains the absence of conjugation of in the second line. The first term in the bracket of Eq. (12) is approximated as the Kronecker product of the expectation values, and the second term is omitted for simplification.
Twist average boundary condition. TABC is a conventional technique to reduce the finite-size error due to the constrained size of supercell [25]. It averages the contributions from different periodic images of the supercell and improve the convergence on the total energy. The formula reads where 1.B.Z. denotes the first Brillouin zone of supercell and the integral is practically approximated by a discrete sum of a Monkhorst-Pack mesh (see Supplementary Note 3.3 for more details).
Structure factor correction Finite-size error can be further reduced via the structure factor ( ) correction [26], which is usually calculated to correct the exchange-correlation potential xc and the formula reads where lim →0 is practically estimated via interpolation (see Supplementary Note 3.4 for more details).
Empirical correction formula. Empirical formulas are also commonly employed to reduce the finite-size error [16], one of which reads The simulation size of high-accuracy methods is usually limited due to high computational costs. Hence methods with much more practical time scaling, such as HF, is usually used to give a posterior estimation of the finite-size error. All the results of LiH are corrected using this empirical formula with HF results in ccpvdz basis (see Supplementary Note 4.3 for more details).

Electron density analysis Electron density ( ) is defined as
and it's practically evaluated by accumulating Monte Carlo samples of electrons on a uniform grid over the simulation cell. As for the complex polarization , it is defined as [33] = ⟨exp( where ∥ denotes the projection of electron coordinate along the chain direction. Moreover, Bader charge is employed to estimate the charge partition on each atom [36]. The convergence test of Bader charge is shown in the Supplementary Fig.8. Workflow and computational details. This work is developed upon open-source FermiNet and PyQMC on Github, deep learning framework JAX [38] is used which supports flexible and powerful complex number calculation. Ground-state energy calculations are performed with all-electrons. Diamond-structured Si and NaCl crystal are simulated with ccECP[Ne] [35]. The neural network is pretrained by Hartree-Fock ansatz, obtained with PySCF software [39]. All the expectation values for distribution |Ψ| 2 are evaluated via the Monte Carlo approach, and then the energy and wavefunction is optimized using the modified KFAC optimizer (see Supplementary Fig.1, 2, 4, 6, 7). The Ewald summation technique is implemented for the lattice summation in energy calculation. After training is converged, energy is calculated in a separate inference phase. Concrete code of this work is developed on Github at https://github.com/bytedance/DeepSolid.  Supplementary Fig. 1. The correlation error is defined as where HF is calculated using the ccpvdz basis set and DMC is taken from Ref. [1].
and the weight factors origin from the different number of symmetry equivalent points.

Supplementary Note 3.3 Training curves
Training curves at different are plotted in Supplementary Fig. 2.    [3], and the combination is now seen as the standard scheme of applying QMC to solids. Structure factor ( ) is calculated to correct the exchange-correlation part, namely xc , of the total potential energy, which reads Δ xc where refers to the coordinate of each electron, and e denotes the number of electrons in the simulation cell. The calculated ( ) and corresponding Δ xc of Γ point is plotted in Supplementary Fig. 3, and corrections of all twists are quite close to each other. The final correction from structure factor is 0.00122 Hartree / atom.

Supplementary Note 4.1 Geometry
Lithium hydride crystal has a rock-salt structure, whose lattice vectors and atom positions are given in Supplementary Table 7 where 0 , 0 , 0 , ′ 0 are fitted quantities, their results and corresponding experiment data [4] are listed in Supplementary Table 9. The training curve of the 3 × 3 × 3 LiH crystal at its equilibrium lattice constant L = 4.061Å is plotted in Supplementary Fig. 6, corresponding Hartree-Fock corrections are also given. The final inference results from neural network are listed in Supplementary  Table 10.

Supplementary Note 5.1 Training curve
The training curve of HEG system containing 54 electrons is plotted in Supplementary Fig. 7. HF and DMC are taken from Ref. [5]. Final results of neural network, BF-DMC, BF-VMC and DCD [5,6]

Supplementary Note 6. Bader charge analysis
Detailed Bader charge analysis [7] is applied to conventional LiH crystal, and the result is plotted in Fig. 8.