Quantum machine learning for electronic structure calculations

Considering recent advancements and successes in the development of efficient quantum algorithms for electronic structure calculations—alongside impressive results using machine learning techniques for computation—hybridizing quantum computing with machine learning for the intent of performing electronic structure calculations is a natural progression. Here we report a hybrid quantum algorithm employing a restricted Boltzmann machine to obtain accurate molecular potential energy surfaces. By exploiting a quantum algorithm to help optimize the underlying objective function, we obtained an efficient procedure for the calculation of the electronic ground state energy for a small molecule system. Our approach achieves high accuracy for the ground state energy for H2, LiH, H2O at a specific location on its potential energy surface with a finite basis set. With the future availability of larger-scale quantum computers, quantum machine learning techniques are set to become powerful tools to obtain accurate values for electronic structures.

M achine learning techniques are demonstrably powerful tools displaying remarkable success in compressing high dimensional data 1,2 . These methods have been applied to a variety of fields in both science and engineering, from computing excitonic dynamics 3 , energy transfer in lightharvesting systems 4 , molecular electronic properties 5 , surface reaction network 6 , learning density functional models 7 to classify phases of matter, and the simulation of classical and complex quantum systems [8][9][10][11][12][13][14] . Modern machine learning techniques have been used in the state space of complex condensed-matter systems for their abilities to analyze and interpret exponentially large data sets 9 and to speed-up searches for novel energy generation/ storage materials 15,16 .
Quantum machine learning 17 -hybridization of classical machine learning techniques with quantum computation-is emerging as a powerful approach allowing quantum speed-ups and improving classical machine learning algorithms [18][19][20][21][22] . Recently, Wiebe et al. 23 have shown that quantum computing is capable of reducing the time required to train a restricted Boltzmann machine (RBM), while also providing a richer framework for deep learning than its classical analog. The standard RBM models the probability of a given configuration of visible and hidden units by the Gibbs distribution with interactions restricted between different layers. Here, we focus on an RBM where the visible and hidden units assume {+1,−1} forms 24,25 .
Accurate electronic structure calculations for large systems continue to be a challenging problem in the field of chemistry and material science. Toward this goal-in addition to the impressive progress in developing classical algorithms based on ab initio and density functional methods-quantum computing based simulation have been explored [26][27][28][29][30][31] . Recently, Kivlichan et al. 32 show that using a particular arrangement of gates (a fermionic swap network) is possible to simulate electronic structure Hamiltonian with linear depth and connectivity. These results present significant improvement on the cost of quantum simulation for both variational and phase estimation based quantum chemistry simulation methods.
Recently, Troyer and coworkers proposed using a restricted Boltzmann machine to solve quantum many-body problems, for both stationary states and time evolution of the quantum Ising and Heisenberg models 24 . However, this simple approach has to be modified for cases where the wave function's phase is required for accurate calculations 25 .
Herein, we propose a three-layered RBM structure that includes the visible and hidden layers, plus a new layer correction for the signs of coefficients for basis functions of the wave function. We will show that this model has the potential to solve complex quantum many-body problems and to obtain very accurate results for simple molecules as compared with the results calculated by a finite minimal basis set, STO-3G. We also employed a quantum algorithm to help the optimization of training procedure.

Results
Three-layers restricted Boltzmann machine. We will begin by briefly outlining the original RBM structure as described by 24 . For a given Hamiltonian, H, and a trial state, jϕi ¼ P x ϕðxÞjxi, the expectation value can be written as: 24 where ϕ(x) = 〈x|ϕ〉 will be used throughout this letter to express the overlap of the complete wave function with the basis function |x〉, ϕðxÞ is the complex conjugate of ϕ(x). We can map the above to a RBM model with visible layer units σ z 1 ; σ z 2 ::: σ z n and hidden layer units h 1 , h 2 ... h m with σ z i , h j ∈ {−1, 1}. We use a visible unit σ z i to represent the state of a qubit i -when σ z i ¼ 1; jσ z i i represents the qubit i in state j1i and when σ z i ¼ À1; jσ z i i represents the qubit i in state j0i. The total state of n qubits is represented by the basis jxi ¼ σ z 1 σ z 2 :::σ z n . ϕðxÞ ¼ ffiffiffiffiffiffiffiffiffi PðxÞ p where P(x) is the probability for x from the distribution determined by the RBM. The probability of a specific set x ¼ fσ z 1 ; σ z 2 :::σ z n g is: Within the above a i and b j are trainable weights for units σ z i and h j . w ij are trainable weights describing the connections between σ z i and h j (see Fig. 1.) By setting 〈H〉 as the objective function of this RBM, we can use the standard gradient decent method to update parameters, effectively minimizing 〈H〉 to obtain the ground state energy.
However, previous prescriptions considering the use of RBMs for electronic structure problems have found difficulty as ϕ(x) can only be non-negative values. We have thus appended an additional layer to the neural network architecture to compensate for the lack of sign features specific to electronic structure problems.
We propose an RBM with three layers. The first layer, σ z , describes the parameters building the wave function. The h's within the second layer are parameters for the coefficients for the wave functions and the third layer s, represents the signs associated |x〉: The s uses a non-linear function tanh to classify whether the sign should be positive or negative. Because we have added another function for the coefficients, the distribution is not solely decided by RBM. We also need to add our sign function into the distribution. Within this scheme, c is a regulation and d i are weights for σ z i . (see Fig. 1). Our final objective function, now with Fig. 1 Constructions of restricted Boltzmann machine. a The original restricted Boltzmann machine (RBM) structure with visible σ i z and hidden h j layers. b Improved RBM structure with three layers, visible, hidden and sign. a i , w ij , b j , d i , c are trainable weights describing the different connection between layers ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-06598-z jϕi ¼ P x ϕðxÞsðxÞjxi, becomes: After setting the objective function, the learning procedure is performed by sampling to get the distribution of ϕ(x) and calculating to get s(x). We then proceed to calculate the joint distribution determined by ϕ(x) and s(x). The gradients are determined by the joint distribution and we use gradient decent method to optimize 〈H〉 (see Supplementary Note 1). Calculating the the joint distribution is efficient because s(x) is only related to x.
Electronic structure Hamiltonian preparation. The electronic structure is represented by N single-particle orbitals which can be empty or occupied by a spinless electron: 33 where h ij and h ijkl are one and two-electron integrals. In this study we use the minimal basis (STO-3G) to calculate them. a y j and a j are creation and annihilation operators for the orbital j.
Equation (5) is then transformed to Pauli matrices representation, which is achieved by the Jordan-Wigner transformation 34 . The final electronic structure Hamiltonian takes the general form where σ x , σ y , σ z are Pauli matrices and I is the identity matrix: 35 Quantum algorithm to sample Gibbs distribution. We propose a quantum algorithm to sample the distribution determined by RBM. The probability for each combination y = {σ z , h} can be written as: Instead of P(y), we try to sample the distribution Q(y) as: where k is an adjustable constant with different values for each iteration and is chosen to increase the probability of successful sampling. In our simulation, it is chosen as O ÀP i;j jw ij j Á . We employed a quantum algorithm to sample the Gibbs distribution from the quantum computer. This algorithm is based on sequential applications of controlled-rotation operations, which tries to calculate a distribution Q′(y) ≥ Q(y) with an ancilla qubit showing whether the sampling for Q(y) is successful 23 .
This two-step algorithm uses one system register (with n + m qubits in use) and one scratchpad register (with one qubit in use) as shown in Fig. 2.
All qubits are initialized as |0〉 at the beginning. The first step is to use R y gates to get a superposition of all combinations of {σ z , h} where =k and |y〉 corresponds to the combination jyi ¼ jσ z 1 :::σ z n h 1 :::h m iÀ as before when h j ¼ 1; jhji represents the corresponding qubit in state j1i and when h j ¼ À1; jhji represents the corresponding qubit in state j0i.
The second step is to calculate e w ij σ z i h j . We use controlledrotation gates to achieve this. The idea of sequential controlledrotation gates is to check whether the target qubit is in state |0〉 or state |1〉 and then rotate the corresponding angle (Fig. 2). If qubits σ z i h j are in |00〉 or |11〉, the ancilla qubit is rotated by R y (θ ij,1 ) and otherwise by R y (θ ij,2 ), with θ ij;1 ¼ 2arcsin . Each time after one e w ij σ z i h j is calculated, we do a measurement on the ancilla qubit. If it is in |1〉 we continue with a new ancilla qubit initialized in |0〉, otherwise we start over from the beginning (details in Supplementary Note 2).
After we finish all measurements the final states of the first m + n qubits follow the distribution Q(y). We just measure the Fig. 2 The example circuit for the second step of the controlled-rotation gate approach with measurements first n + m qubits of the system register to obtain the probability distribution. After we get the distribution, we calculate all probabilities to the power of k and normalize to get the Gibbs distribution (Fig. 3). The complexity of gates comes to O(mn) for one sampling and the qubits requirement comes to O(mn). If considering the reuse of ancilla qubits, the qubits requirements reduce to O(m + n) (see Supplementary Note 4). The probability of one successful sampling has a lower bound e À1 k P i;j 2jw ij j and if k is set to O P i;j jw ij j it has constant lower bound (see Supplementary Note 3). If N s is the number of successful sampling to get the distribution, the complexity for one iteration should be O(N s mn) due to the constant lower bound of successful sampling as well as processing distribution taking O(N s ). In the meantime, the exact calculation for the distribution has complexity as O(2 m+n ). The only error comes from the error of sampling if not considering noise in the quantum computer.
Summary of numerical results. We now present the results derived from our RBM for H 2 , LiH and H 2 O molecules. It can clearly be seen from Fig. 4 that our three layer RBM yields very accurate results comparing to the disorganization of transformed Hamiltonian which is calculated by a finite minimal basis set, STO-3G. Points deviating from the ideal curve are likely due to local minima trapping during the optimization procedure. This can be avoided in the future by implementing optimization methods which include momentum or excitation, increasing the escape probability from any local features of the potential energy surface.
Further discussion about our results should mention instances of transfer learning. Transfer learning is a unique facet of neural network machine learning algorithms describing an instance (engineered or otherwise) where the solution to a problem can inform or assist in the solution to another similar subsequent problem. Given a diatomic Hamiltonian at a specific intermolecular separation, the solution yielding the variational parameters -which are the weighting coefficients of the basis functions-are adequate first approximations to those parameters at a subsequent calculation where the intermolecular separation is a small perturbation to the previous value.
Except for the last point in the Fig. 4d, we use 1/40 of the iterations for the last point in calculations initiated with transferred parameters from previous iterations of each points and still achieve a good result. We also see that the local minimum is avoided if the starting point achieve global minimum.

Discussion
In conclusion, we present a combined quantum machine learning approach to perform electronic structure calculations. Here, we have a proof of concept and show results for small molecular systems. Screening molecules to accelerate the discovery of new materials for specific application is demanding since the chemical space is very large! For example, it was reported that the total number of possible small organic molecules that populate the 'chemical space' exceed 10 60 36,37 . Such an enormous size makes a thorough exploration of chemical space using the traditional electronic structure methods impossible. Moreover, in a recent perspective 38 in Nature Reviews Materials the potential of machine learning algorithms to accelerate the discovery of materials was pointed out. Machine learning algorithms have been used for material screening. For example, out of the GDB-17 data base, consisting of about 166 billion molecular graphs, one can make organic and drug-like molecules with up to 17 atoms and 134 thousand smallest molecules with up to 9 heavy atoms were calculated using hybrid density functional (B3LYP/6-31G (2df,p). Machine learning algorithms trained on these data, were found to predict molecular properties of subsets of these molecules [38][39][40] .
In Gradient estimation. The two functions ϕ(x) and s(x) are both real function. Thus, the gradient for parameter p k can be estimated as 2 E loc D p k where E loc ðxÞ ¼ hxjHjϕi ϕðxÞsðxÞ is so called local energy, D p k ðxÞ ¼ ϕðxÞsðxÞ . 〈...〉 represents the expectation value of joint distribution determined by ϕ(x) and s(x) (details in Supplementary Note 1). Implementation details. In our simulation we choose small constant learning rate 0.01 to avoid trapping in local minimum. All parameter are initialized as a random number between (−0.02,0.02). The range of initial random parameter is to avoid gradient vanishing of tanh. For each calculation we just need 1 reusing ancilla qubit all the time. Thus, in the simulation, the number of required qubits is m + n + 1. All calculations do not consider the noise and system error (details in Supplementary Note 5).

Data availability
The data and codes that support the findings of this study are available from the corresponding author upon reasonable request.