Abstract
The characterization of observables, expressed via Hermitian operators, is a crucial task in quantum mechanics. For this reason, an eigensolver is a fundamental algorithm for any quantum technology. In this work, we implement a semiautonomous algorithm to obtain an approximation of the eigenvectors of an arbitrary Hermitian operator using the IBM quantum computer. To this end, we only use singleshot measurements and pseudorandom changes handled by a feedback loop, reducing the number of measures in the system. Due to the classical feedback loop, this algorithm can be cast into the reinforcement learning paradigm. Using this algorithm, for a singlequbit observable, we obtain both eigenvectors with fidelities over 0.97 with around 200 singleshot measurements. For twoqubits observables, we get fidelities over 0.91 with around 1500 singleshot measurements for the four eigenvectors, which is a comparatively low resource demand, suitable for current devices. This work is useful to the development of quantum devices able to decide with partial information, which helps to implement future technologies in quantum artificial intelligence.
Introduction
Increasing the computational capabilities of machines is an essential field in artificial intelligence. In this context, machine learning algorithms have emerged with great force in the last decades^{1,2}. This class of algorithms can be divided into two families, learning from big data and learning from interactions. Learning from big data can be classified into two categories, supervised and unsupervised learning. In the supervised learning paradigm, we have a set of labeled data named training data, from which we want to infer some classification function to sort unlabeled new data. Unsupervised learning algorithms do not use training data. In this paradigm, the goal is to extract the statistical structure of an unsorted data set and divide it into different groups according to some criteria (clustering problem)^{3,4,5,6,7,8}.
In the category of learning from interactions we have the Reinforcement Learning (RL) algorithms^{9,10,11,12,13,14,15,16,17,18}. The idea in this paradigm is that a known and manipulable system called agent (A) interacts with a nonmanipulable system called environment (E). Here, the goal is to optimize a task \(\mathscr {G}(A, E)\), which depends on the state of A and E. For this, we use feedback loops to change the state of A using the information extracted from the interaction with E. Some impressive and recent examples of RL are the AI players for different strategy games like Go^{19}, Chess^{20}, or StarCraft II^{21}.
On the other hand, it has been shown that quantum computing^{22} can overcome some fundamental limits of classical computing, e.g., in searching problems^{23}, factorization algorithms^{24}, solving linear equation systems^{25,26}, and for linear differential equations^{27}. Therefore, it was natural to merge machine learning techniques with the advantages of quantum computing in the topic known as Quantum Machine Learning (QML)^{28,29,30,31,32,33,34,35}.
With the development of Noisy IntermediateScale Quantum (NISQ) devices^{36}, the research on simple quantum information protocol (suitable for NISQ quantum computers) and in QML has grown in the last years. The IBM quantum computer is one of the most famous open NISQ devices, which can be programmed using Qiskit^{37}, an opensource python package, to create and run quantum programs using the IBM quantum cloud service^{38}.
One of the most useful algorithms for linear algebra, and hence for quantum mechanics, are the quantum eigensolvers. The hybrid quantumclassical algorithms like variational quantum eigensolver (VQE)^{39,40,41} take advantage due to its easy implementation in NISQ devices. The main idea of this class of algorithm is to calculate some expectation value (like energy) with a quantum processor, and then use a classical optimizer (like variational one) to reach the solution^{42}. Nevertheless, it has been recently proposed an algorithm that uses a quantum optimizer^{43}. Each iteration of the classical optimizer algorithm involves many singleshot measurements in the quantum system, which are required to calculate an expectation value. The development of an algorithm with more quantum features will involve the use of a more primitive classical subroutine.
In this paper, we implement the semiautonomous eigensolver proposed in Ref.^{44}. The protocol can obtain an approximation of all eigenvectors for an arbitrary observable using singleshot measurements instead of expectation values. Here, we use the most basic classical subroutine, which involves only pseudorandom changes handled by the outcome of the singleshot measurement and a feedback loop. Due to this feedback loop, this algorithm can be classified in the RL paradigm. Using our protocol, we can obtain a high fidelity approximation for all eigenvectors. In the singlequbit case, we get fidelities larger than 0.97 and larger than 0.91 for a twoqubit observable in around 200 and 5000 singleshot measurements, respectively. This work opens the door to explore alternative paradigms in hybrid classicalquantum algorithms, which is useful for developing semiautonomous quantum devices that decide with incomplete information.
Methods
Basics on RL paradigm
We briefly describe the basic components of the RL paradigm. As mentioned above, in an RL algorithm, we define two systems: the agent A and the environment E. The interaction among these systems can be divided in three basic steps, the policy, the reward function (RF) and the value function (VF). The policy refers to the general rules of the algorithm and can be subdivided into three stages: first, the interaction, where we specify how A and E interact; second, the action, which refers to how A changes its perception of E modifying some internal parameters; and third, the information extraction, that defines the process used by A to infer information from E. The information extraction can be done directly by A or using an auxiliary system, named register, if A cannot read the response of the environment.
The RF is the criterion to reward or punish A in each iteration using the information collected from E. This step is the most important in any RL algorithm because the right choice of the RF ensures the optimization of the desired task \(\mathscr {G}(A, E)\). Finally, the VF evaluates a figure of merit related to the task \(\mathscr {G}(A, E)\), which provides us the utility of the algorithm. The main difference between RF and VF is that the first evaluates each iteration to increase the performance locally in time without considering the history of the algorithm. At the same time, VF depends on the history of the algorithm, which takes into consideration a large number of iterations given the global performance of the algorithm.
RL protocol
We define the basic parts of our protocol as an RL algorithm. The state of the agent is denoted by
where \(\hat{D}_k\) is a unitary transformation to prepare the desired agent state, the state \(\vert j \rangle\) is the initial state provided by the quantum processor in the computational basis, and the subindex k denotes the iteration of the algorithm. The environment is expressed as an unknown Hermitian operator \(\hat{\mathscr {O}}\) written as
with \(\alpha ^{(j)}\) and \(\vert \mathscr {E}^{(j)} \rangle\) the jth eigenvalue and eigenvector of \(\hat{\mathscr {O}}\), respectively. The task \(\mathscr {G}\) is set to maximize the fidelity between the state of the agent, \(\vert \mathscr {A}_N^{(j)} \rangle\), after N iterations, and the eigenvectors \(\vert \mathscr {E}^{(j)} \rangle\), or in other words, we want to find the matrix \(\hat{D}_k\) that diagonalizes the observable \(\hat{\mathscr {O}}\).
Now, the policy is as follows:
Interaction: The observable \(\hat{\mathscr {O}}\) generates an evolution given by the unitary transformation
where \(\tau\) is a constant related with the elapsed time of the interaction. The agent state after this evolution is
Information extraction: We measure the state \(\vert \bar{\mathscr {A}}_k^{(j)} \rangle\) in the basis \(\{\vert \mathscr {A}_k^{(\ell )} \rangle \}\). For this purpose we apply the transformation \(\hat{D}^{\dagger }_k\) obtaining
followed by a singleshot measurement in the computational basis \(\{\vert \ell \rangle \}\) obtaining the outcome value m with probability \(c^{(m)}^2\). This outcome refers to the resulting state \(\vert \mathscr {A}_k^{(m)} \rangle\) after the measuring process.
Action: According to Eq. (3) if \(\vert \mathscr {A}_k^{(j)} \rangle\) is equal to some eigenvector of \(\hat{\mathscr {O}}\), we obtain \(c^{(j)}=1\) in Eq. (4). Using this condition we define the next rule for the action. If the outcome is \(m\ne j\Rightarrow c^{(j)}\ne 1\), then \(\vert \mathscr {A}_k^{(j)} \rangle\) is not an eigenvector of \(\hat{\mathscr {O}}\). In this case (\(m\ne j\)), we modify the agent for the next iteration defining operator \(\hat{D}_{k+1}\) as
with
where,
Then,
up to a global phase. Therefore, \(\hat{u}(\theta , \phi , \lambda )\) is a general rotation in the \(\{\vert j \rangle ,\vert m \rangle \}\) subspace. The angles are random numbers given by
where the range amplitude \(w_k\) will be updated in each iteration according to the RF, which will be specified later. Now, for the case \(m=j\), the state \(\vert \mathscr {A}_k^{(j)} \rangle\) could be an eigenvector of \(\hat{\mathscr {O}}\), then we define
We can summarize Eqs. (6) and (11) as
Now, we define the reward function as
where \(p>1\) is the punishment ratio, and \(0<r<1\) is the reward ratio. This means that each time we obtain the outcome \(m\ne j\), we increase the amplitude range \(w_{k+1}\), because \(m\ne j\) means that we are further away from an eigenvector and greater corrections are required. In the other case, when \(m=j\) means that we are closer to an eigenvector, then, we reduce the value of \(w_{k+1}\) obtaining smaller changes for future iterations.
Finally, the value function will be the last value of the range amplitude \(w_N\) after N iterations. If \(w_N\rightarrow 0\) signifies that we have measured \(m=j\) several times, then \(c^{(j)}\approx 1\), which implies that we obtain a good approximation of an eigenvector.
Results
Singlequbit case
We implement the algorithm described above in the IBM quantum computer. We start with the simplest case, which is to find the eigenvectors of a singlequbit observable. Since there are only two eigenvectors, we only need to obtain one of them, because the orthogonality property can determine the second one. Figure 1 shows the circuit diagram for this case. As we can see in Fig. 1 the agent in each iteration is given by
In this case, we have only one the rotation (\(\hat{u}_{1,0}\)) of the form of Eq. (7), then, for simplicity, we redefine the operator \(\hat{D}_k=\hat{D}(\theta _k,\phi _k,\lambda _k)\) as
where \({\hat{\sigma }}^{(a)}\) is the aPauli matrix and
with \(\{\Delta _{\theta },\Delta _{\phi },\Delta _{\lambda }\}\in w_k[\pi ,\pi ]\) and \(w_k\) given by Eq. (13), considering only two outcomes (\(m\in \{0,1\}\)) and \(j=0\) for the whole algorithm. The gate in Eq. (15) has the form of the general qubitrotation provided by qiskit, therefore, it can be efficiently implemented in the IBM quantum computer. We denote by, \(\mathscr {F}\), the maximum fidelity between the agent state, \(\vert \mathscr {A}_N^{(0)} \rangle\), and one of the eigenvectors at the end of the algorithm. We find that \(\mathscr {F}\) is related to the probability of obtaining the outcome \(m = 0\) (\(P_0\)) by (see appendix A)
where \(\Delta =\tau  \alpha ^{(0)}\alpha ^{(1)} \) is the gap between the eigenvalues of \(\tau \hat{\mathscr {O}}\) [see Eqs. (2) and (3)]. Figure 2 shows \(P_0\) as a function of the fidelity \(\mathscr {F}\) for different values of \(\Delta\)
.
For the implementation we use the initial values \(\theta _1=\phi _1=\lambda _1=0\), \(w_1=1\) and the quantum processor “ibmqx2”. The algorithm is run until \(w_N<0.1\). Since the algorithm converges stochastically to the eigenvectors, we perform 40 experiments in order to characterize the performance of the algorithm by the central values of the data set. Also, we compare the performances of our algorithms with the VQE algorithm for the same environments using the same quantum processor. To test the algorithm, we use three different environment Hermitian operators:

1.
\(\begin{aligned} \tau \hat{\mathscr {O}}=\frac{\pi }{2}\sigma _x\Rightarrow \Delta =\pi \Rightarrow \mathscr {F}=\frac{1}{2}(1+\sqrt{P_0}). \end{aligned}\)
Here, we choose the reward ratio \(r=0.9\) and the punishment ratio \(p=1/r\). The results of the 40 experiments are collected in the Apendix Table 1 (Supplemental material) and summarized in the histograms of Fig. 3. From Fig. 3a, we can see that the probability \(P_0\) is bigger than 0.85 in 36 cases, which implies, as is shown in Fig. 3b, that most cases give fidelities larger than 0.94. Also, we have 36 experiments with \(\mathscr {F}> 0.96\), the average fidelity is \(\bar{\mathscr {F}}=0.98\) and the standard deviation is \(\sigma =0.019\) which represent the \(2\%\) of the average fidelity \(\bar{\mathscr {F}}\). Also, the average number of iterations of the algorithm in the 40 experiments is \(\bar{N}=103\), the minimum number of iterations \(N_{min}=25\), and the maximum number of iterations \(N_{max}=528\). This number may look large, but we remark that we using only one singleshot measurement per iteration. In comparison, if we want to calculate a given expectation value, we require at least 1000 singleshot measurements for a single qubit. Then for this case, our algorithm requires less resources than any other classicalquantum algorithm that utilizes expectation values. For the VQE algorithm, first we choose 500 singleshot measurements per step and COBYLA as the classical optimization method. VQE needs 33 COBYLA iterations to converge, which means 16500 singleshot measurements in total, i.e.100 times the resources needed in our algorithm, and get a fidelity of 0.997. If we change the number of singleshot measurements to 8192 per step (it is the maximum shots allowed by IBM), we need 35 COBYLA iterations to converge, which means 286720 singleshot measurements, 1000 times more resources than our algorithms, nevertheless, the fidelity is 0.999.

2.
\(\begin{aligned} \tau \hat{\mathscr {O}}=\frac{\pi }{4}\sigma _x\Rightarrow \Delta =\frac{\pi }{2}\Rightarrow \mathscr {F}=\frac{1}{2}(1+\sqrt{2P_01}). \end{aligned}\)
Now, we choose the reward ratio \(r=0.9\) and the punishment ratio \(p=1.5/r\). The results of the 40 experiments are collected in the Appendix Table 2 (see supplemental material) and summarized in the histograms of Fig. 4. From Fig. 4a we can see that the probability \(P_0\) is bigger than 0.9 in 35 cases, which implies, as is shown in Fig. 4b, that most cases give fidelities larger than 0.94. Also, we have 30 experiments with \(\mathscr {F}> 0.96\), the average fidelity is \(\bar{\mathscr {F}}=0.97\) and the standard deviation is \(\sigma =0.022\) which represent the \(2.3\%\) of the average fidelity \(\bar{\mathscr {F}}\). Also, the average number of iterations of the algorithm in the 40 experiments is \(\bar{N}=116\), the minimum number of iterations \(N_{min}=25\) and the maximum number of iterations \(N_{max}=572\), again for this case our algorithm uses less resources than the algorithm that use expectation values. As in the previous case, we compare the results with the VQE algorithm. For 500 shots per step, we get a fidelity of 0.883 with 23 COBYLA iterations, which means 11500 singleshot measurements, i.e.100 times more resources than our algorithm. For 8192 shots per step, the fidelity is 0.891 and we need 23 COBYLA iterations, the total singleshot measurements are 188416, i.e.1000 times more resources than in our algorithm.

3.
\(\begin{aligned} &\tau \hat{\mathscr {O}}=\cos {\frac{1}{10}}\sigma _x+\sin {\frac{1}{10}}\sigma _y\Rightarrow \Delta =2\\&\Rightarrow \mathscr {F}=\frac{1}{2}\left( 1+\sqrt{1+\frac{2(P_01)}{1\cos {2}}}\right) \end{aligned}\)
We choose the reward ratio \(r=0.9\) and the punishment ratio \(p=1.5/r\) as in the previous case. The results of the 40 experiments are collected in the Appendix Table 3 (see supplemental material) and summarized in the histograms of Fig. 5. From Fig. 5a we can see that the probability \(P_0\) is bigger than 0.85 in 39 cases, which implies, as is shown in Fig. 5b, that most cases give fidelities larger than 0.94. Also, we have 30 experiments with \(\mathscr {F}> 0.98\), the average fidelity is \(\bar{\mathscr {F}}=0.98\) and the standard deviation of \(\sigma =0.015\) which represent the \(1.6\%\) of the average fidelity \(\bar{\mathscr {F}}\). Also, the average number of iterations of the algorithm in the 40 experiments was \(\bar{N}=227\), the minimum number of iterations \(N_{min}=26\) and the maximum number of iterations \(N_{max}=782\). In this case, as \(N_{max}\) is around 800, we compare the VQE algorithm, at first with 800 shots per step, obtaining a fidelity of 0.911 using 14 COBYLA iterations, which means, a total number of singleshot measurements of 11200, i.e.50 times more resources than our algorithms. When we use 8192 per step, the fidelity is 0.999 and we need 14 COBYLA iterations, obtaining a total number of singleshot measurements of 114688, i.e.500 times more resources than our algorithm.
Even if VQE allows us to reach fidelities larger than 0.98 (the mean fidelity of our algorithm), it needs several resources, more than 100 times the resources using by our algorithm, which implies a great advantage of our proposal.
Twoqubit case
In this case, we have three different agent states given by
We update the matrix \(\hat{D}_k\) according to Eq. (12). To decompose the matrix \(\hat{D}_k\) in a set of one and twoqubit gates, we use the method already implemented in qiskit^{45}. To find all the eigenvectors we divide the protocol in three stages. In the first stage, we consider the agent state \(\vert \mathscr {A}_k^{(0)} \rangle =\hat{D}_k\vert 00 \rangle\), with \(\hat{D}_1=\mathbb {I}\) and \(w_1=1\). The outcome of the measure have four possibilities \(m\in \{00,\,01,\,10,\,11\}\) and we run the algorithm until \(w_{n_1}<0.1\) (\(n_1\) iterations). After this, we have that \(\vert A_{n_1}^{(0)} \rangle =\hat{D}_{n_1}\vert 00 \rangle\) is the approximation of one of the eigenvectors of \(\hat{\mathscr {O}}\).
In the second stage, we consider the agent state \(\vert \mathscr {A}_k^{(1)} \rangle =\hat{D}_k\vert 01 \rangle\), with \(\hat{D}_{n_1+1}=\hat{D}_{n_1}\) and \(w_{n_1+1}=1\). Now, we take into account only three outcome \(m\in \{01,\,10,\,11\}\), since we suppose that \(\vert \mathscr {A}_{N_1}^{(0)} \rangle\) is a good enough approximation. If we obtain \(m=00\), we consider it as an error, and we define \(\hat{D}_{k+1}=\hat{D}_k\) and \(w_{k+1}=w_k\), it means that we do nothing, and not apply the updating rule for \(\hat{D}_{k+1}\) and \(w_{k+1}\), we denote this error as \(c_{00}\). We run this stage \(n_2\) iterations until \(w_{n_1+n_2}<0.1\). As we do not do rotations in the subspace spanned by \(\{\vert 00 \rangle ,\,\vert 01 \rangle \}\) during this stage, we have \(\vert \mathscr {A}_{n_1+n_2}^{(0)} \rangle =\vert \mathscr {A}_{n_1}^{(0)} \rangle\). Now, we obtain the approximation of two eigenvectors \(\vert A_{n_1+n_2}^{(1)} \rangle =\hat{D}_{n_1+n_2}\vert 01 \rangle\) and \(\vert A_{n_1+n_2}^{(0)} \rangle =\hat{D}_{n_1+n_2}\vert 00 \rangle\).
Finally, in the third stage, we consider the agent state \(\vert \mathscr {A}_k^{(2)} \rangle =\hat{D}_k\vert 10 \rangle\), with \(\hat{D}_{n_1+n_2+1}=\hat{D}_{n_1+n_2}\) and \(w_{n_1+n_2+1}=1\). Now, we have only two possibilities for the outcome measurement \(m\in \{10,\,11\}\). Here, we also suppose that \(\hat{D}_{n_1+n_2}\vert 00 \rangle\) and \(\hat{D}_{n_1+n_2}\vert 01 \rangle\) are good enough approximations. If we obtain \(m=00\) or \(m=01\), we consider them again as an error and we do not apply the update rule, denoting these errors as \(c_{00}^{'}\) and \(c_{01}\), like in the previous stage. We run this case \(n_3\) iterations until \(w_{n_1+n_2+n_3}<0.1\). In this stage, we only modify the subspace expanded by \(\{\vert 10 \rangle ,\,\vert 11 \rangle \}\), then, we have that \(\vert \mathscr {A}_{n_1+n_2+n_3}^{(0)} \rangle =\vert \mathscr {A}_{n_1+n_2}^{(0)} \rangle =\vert \mathscr {A}_{n_1}^{(0)} \rangle\) and \(\vert \mathscr {A}_{n_1+n_2+n_3}^{(1)} \rangle =\vert \mathscr {A}_{n_1+n_2}^{(1)} \rangle\). After this procedure we obtained the approximation of all the eigenvectors \(\{\vert A_{n_T}^{(0)} \rangle =\hat{D}_{n_T}\vert 00 \rangle ,\,\vert A_{n_T}^{(1)} \rangle =\hat{D}_{n_T}\vert 01 \rangle ,\,\vert A_{n_T}^{(2)} \rangle =\hat{D}_{n_T}\vert 10 \rangle ,\,\vert A_{n_T}^{(3)} \rangle =\hat{D}_{n_T}\vert 11 \rangle \}\), with \(n_T=n_1+n_2+n_3\).
To test the algorithm, we choose three cases. First we consider the bilocal operator given by
In this case, the eigenstates and the eigenvalues are
We note that the ground state is degenerate, then any linear state of the form \(\vert \phi \rangle =a\vert \mathscr {E}^{(0)} \rangle +b\vert \mathscr {E}^{(1)} \rangle\) will be also ground state of the operator and the same for the other states. In this case we define the fidelity of our algorithm by the probability to measure the initial state \(\vert j \rangle\)
We run this case using IBM backend “ibmq_vigo” and the results are shown in Appendix Table 4 (see supplemental material). In this case, we run the algorithm ten times and the mean fidelities are: \(\mathscr {F}_{00}=0.931\), \(\mathscr {F}_{01}=0.933\), \(\mathscr {F}_{10}=0.932\), and \(\mathscr {F}_{11}=0.919\). The mean number of iterations is \(\bar{N}=272\). In this case, the mean errors are: \(\bar{c}_{00}=10\), \(\bar{c}_{00}^{'}=8\) and \(\bar{c}_{01}=5\). Therefore, the fidelity of our algorithm was higher than 0.91 for each eigenstate in less than 300 singleshot measurements. The same as the singlequbit case, we will compare with the VQE algorithm. At first, we choose 300 shots per step, and 56 COBYLA iterations, which means 16800 singleshot measurements, obtaining a fidelity of 0.976 for the ground state. Using 8192 shots per step, VQE needs 54 COBYLA iterations to converge, which means 442368 singleshot measurements, obtaining a fidelity of 0.997 for the ground state. In this case, VQE get a significantly more accurate result, but it is only for the ground state and uses 1000 times more resources than our algorithm which obtain all the eigenvectors.
The second example is the molecular hydrogen Hamiltonian with a bound length of \(0.2\,[\mathring{A}]\)^{46}:
with \(g_0=2.8489, g_1=0.5678, g_2=1.4508, g_3=0.6799, g_4=0.0791, g_5=0.0791\). In this case the environment is given by
with the next eigenvectors and eigenvalues
In this case, we choose the same method as the previous case to calculate the \(\mathscr {F}\), we choose IBM backend “ibmq_valencia” and the results are shown in Appendix Table 5 (see supplemental material). In this case, we run the algorithm ten times and the mean fidelities are: \(\mathscr {F}_{00}=0.989\), \(\mathscr {F}_{01}=0.973\), \(\mathscr {F}_{10}=0.976\) and \(\mathscr {F}_{11}=0.979\). The mean errors are: \(\bar{c}_{00}=7\), \(\bar{c}_{00}^{'}=4\) and \(\bar{c}_{01}=3\) and the mean number of iterations is \(\bar{N}=111\). In this case, we need less than 150 singleshot measurements to obtain the fidelity over 0.97. For the VQE algorithm, at first we choose 120 shots per step and we need to use 59 COBYLA iterations, which means 7080 singleshot measurements, obtaining a fidelity of 0.994 for the ground state. When we use 8192 shots per step and VQE needs 64 COBYLA iterations to converge, it means 507904 singleshot measurements, obtaining a fidelity of 0.999 for the ground state. In this case, VQE can get better fidelities (larger than 0.99) but use again much more resources than our proposal, around 1000 times more to get only one of the eigenvectors.
The third case that we consider to test the algorithm is the nondegenerate twoqubit operator

3.
with eigenvectors and eigenvalues given by
$$\begin{aligned} \tau \hat{\mathscr {O}}=\begin{pmatrix} \pi &{} \frac{\pi }{2} &{} \frac{\pi }{4} &{} \frac{\pi }{4} \\ \frac{\pi }{2} &{} \pi &{} \frac{\pi }{4} &{} \frac{\pi }{4} \\ \frac{\pi }{4} &{} \frac{\pi }{4} &{} \frac{\pi }{2} &{} 0\\ \frac{\pi }{4} &{} \frac{\pi }{4} &{} 0 &{} \frac{\pi }{2} \end{pmatrix}, \end{aligned}$$(25)
We run the algorithm in the IBM quantum computer “ibmq_vigo”. In order to reduce the total number of iterations, we run the three stages of the algorithm four times as follows:

1.
We choose \(r=0.6,\,p=1/r,\,\hat{D}_1=\mathbb {I},\, w_1=1\). Suppose that the total number of iteration after the three stages is \(N_1=\eta _1\).

2.
We choose \(r=0.7,\,p=1/r,\,\hat{D}_{\eta _1+1}=\hat{D}_{\eta _1},\, w_{\eta _1+1}=1\). Suppose that the total number of iteration after the three stages is \(N_2=\eta _1+\eta _2\).

3.
We choose \(r=0.8,\,p=1/r,\,\hat{D}_{N_2+1}=\hat{D}_{N_2},\, w_{N_2+1}=1\). Suppose that the total number of iteration after the three stages is \(N_3=\eta _1+\eta _2+\eta _3\).

4.
We choose \(r=0.9,\,p=1/r,\,\hat{D}_{N_3+1}=\hat{D}_{N_3},\, w_{N_3+1}=1\), and suppose that the total number of iteration after the three stages is \(N=\eta _1+\eta _2+\eta _3+\eta _4\).
We define the fidelity of each approximation as
To obtain a data set to evaluate the performance of our protocol, we perform ten independent experiments. These data are collected in Appendix Table 6 (see supplemental material). The average fidelities that we obtain are \(\bar{\mathscr {F}}_{00}=0.941,\,\bar{\mathscr {F}}_{01}=0.933,\,\bar{\mathscr {F}}_{10}=0.929,\,\bar{\mathscr {F}}_{11}=0.935\), the average number of iterations is \(\bar{N}=1396\) and the mean errors are: \(\bar{c}_{00}=29\), \(\bar{c}_{00}^{'}=19\) and \(\bar{c}_{01}=18\). Therefore, in this case we obtain the four eigenvectors with fidelities larger than 0.92 in less than 1500 singleshot measurements, which at least corresponds to 6 measurements of mean values, being not enough for a classicalquantum algorithm that uses the optimization of mean values. For the VQE algorithm, we choose 2000 shots per step using 77 COBYLA iterations, which means 157000 singleshot measurements obtaining a fidelity of 0.918 for the ground state. For 8192 shots per step, VQE needs 88 COBYLA iterations to converge, it means 720896 singleshot measurements obtaining a fidelity of 0.944. In this case, VQE cannot surpass the performance of our algorithm, and use more than 100 times resources than our proposal only for the ground state.
For \(n\)qubit observable (\(n>2\)), we can use the same protocol but considering more measurement outputs, which implies more stages in the algorithm.
Conclusions
In this work, we implement satisfactorily the approximate eigensolver^{44} using the IBM quantum computer. For the singlequbit case, we obtain fidelities larger than 0.97 for both eigenvectors using around 200 singleshot measurements. For the twoqubit case, we use around 1500 singleshot measurements to obtain the approximation of the four eigenvectors with fidelity over 0.9. Due to the stochastic nature of this protocol, we cannot ensure that the approximation converges asymptotically with the number of iteration to the eigenvectors. Nevertheless, it is useful to obtain a fast approximation to use as a guess into another eigensolver that can reach maximal fidelity, like in the eigensolver of Ref.^{43}. Also, we compare the performance of our proposal with the VQE algorithm, where VQE, in general, get better fidelities in the singlequbit case but use more than 100 times the number of resources than our algorithm. For twoqubit, the advantage in the maximal fidelity of VQE is a little better in comparison with our algorithm, but again, VQE needs several resources, i.e.more than 1000 times the resources used by our algorithm for all the eigenvectors. Also, the performance of the VQE algorithm depends on the variational ansatz used, which is not the case with our algorithm. This dependence of the VQE algorithms allows enhancing its performance using a better ansatz. The main goal of our algorithm is to get a high fidelity approximation for all the eigenvectors with few resources. This goal is completely satisfied in comparison with the resources needed for VQE. On the other hand, by manipulating the convergence criteria of our algorithm, we can reach better fidelities. Finally, this work also paves the way for the development of future suitable quantum devices to work with limited resources.
Data availability
The qiskit codes of the onequbit case and the twoqubit case are available in https://github.com/Panchiyue/QiskitCode/tree/main.
References
Russell, S. & Norvig, P. Artificial Intelligence: A Modern Approach (Prentice Hall, New Jersey, 1995).
Metha, P. et al. A highbias, lowvariance introduction to machine learning for physicists. Phys. Rep. 810, 1–124 (2019).
Ghahramani, Z. Advanced Lectures on Machine Learning (Springer, Berlin, 2004).
Kotsiantis, S. B. Supervised machine learning: A review of classification techniques. Informatica 31, 249–268 (2007).
Wiebe, N., Braun, D. & Lloyd, S. Quantum algorithm for data fitting. Phys. Rev. Lett. 109, 050505 (2012).
Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning. arXiv:1307.0411 [quantph] (2013).
Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014).
Li, Z., Liu, X., Xu, N. & Du, J. Experimental realization of a quantum support vector machine. Phys. Rev. Lett. 114, 140504 (2015).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, 2018).
Jaderberg, M. et al. Humanlevel performance in 3D multiplayer games with populationbased reinforcement learning. Science 364, 859–865 (2019).
Lamata, L. Basic protocols in quantum reinforcement learning with superconducting circuits. Sci. Rep. 7, 1609 (2017).
Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996).
Dong, D., Chen, C., Li, H. & Tarn, T.J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. B Cybern. 38, 1207–1220 (2008).
Mnih, V. et al. Humanlevel control through deep reinforcement learning. Nature 518, 529–533 (2015).
Riedmiller, M., Gabel, T., Hafner, R. & Lange, S. Reinforcement learning for robot soccer. Auton. Robot. 27, 55–73 (2009).
Yu, S. et al. Reconstruction of a photonic qubit state with reinforcement learning. Adv. Quantum Technol. 2, 1800074 (2019).
AlbarránArriagada, F., Retamal, J. C., Solano, E. & Lamata, L. Measurementbased adaptation protocol with quantum reinforcement learning. Phys. Rev. A 98, 042315 (2018).
Littman, M. L. Reinforcement learning improves behaviour from evaluative feedback. Nature 521, 445–451 (2015).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through selfplay. Science 362, 1140–1144 (2018).
Vinyals, O. et al. Grandmaster level in StarCraft II using multiagent reinforcement learning. Nature 575, 350–354 (2019).
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information: 10th Anniversary Edition (Cambridge University Press, New York, 2010).
Grover, L. K. A fast quantum mechanical algorithm for database search, in Proceedings of the TwentyEighth Annual ACM Symposium on Theory of Computing 212–219 (1996).
Shor, P. W. Polynomialtime algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput. 26, 1484–1509 (1997).
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).
Cai, X.D. et al. Experimental quantum computing to solve systems of linear equations. Phys. Rev. Lett. 110, 230501 (2013).
Xin, T. et al. Quantum algorithm for solving linear differential equations: Theory and experiment. Phys. Rev. A 101, 032307 (2020).
Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).
Schuld, M., Sinayskiy, I. & Petruccione, F. An introduction to quantum machine learning. Contemp. Phys. 56, 172–185 (2015).
Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantumenhanced machine learning. Phys. Rev. Lett. 117, 130501 (2016).
Gao, J. et al. Experimental machine learning of quantum states. Phys. Rev. Lett. 120, 240501 (2018).
Schuld, M. & Killoran, N. Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 122, 040504 (2019).
Lau, H.K., Pooser, R., Siopsis, G. & Weedbrook, C. Quantum machine learning over infinite dimensions. Phys. Rev. Lett. 118, 080501 (2017).
Wittek, P. Quantum Machine Learning: What Quantum Computing Means to Data Mining (Academic Press, New York, 2014).
Lamata, L. Quantum machine learning and quantum biomimetics: A perspective. Mach. Learn. Sci. Technol. 1, 033002 (2020).
Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2, 79 (2018).
Aleksandrowicz, G. et al. Qiskit: An opensource framework for quantum computing (2019).
IBMQ Experience (2019).
McClean, J. R., Romero, J., Babbush, R. & AspuruGuzik, A. The theory of variational hybrid quantumclassical algorithms. New J. Phys. 18, 023023 (2016).
Kandala, A. et al. Hardwareefficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242–246 (2017).
Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 5, 4213 (2014).
Lavrijsen, W. et al. Classical optimizers for Noisy IntermediateScale Quantum devices, in IEEE International Conference on Quantum Computing & Engineering (QCE20) (2020).
Wei, S., Li, H. & Long, G.L. A full quantum eigensolver for quantum chemistry simulations. Research 2020, 1486935 (2020).
AlbarránArriagada, F., Retamal, J. C., Solano, E. & Lamata, L. Reinforcement learning for semiautonomous approximate quantum eigensolver. Mach. Learn. Sci. Technol. 1, 015002 (2020).
Qiskit command operator.
O’Malley, P. J. J. et al. Scalable quantum simulation of molecular energies. Phys. Rev. X 6, 301007 (2016).
Acknowledgements
We acknowledge financial support from Spanish MCIU/AEI/FEDER (PGC2018095113BI00), Basque Government IT98616, projects QMiCS (820505) and OpenSuperQ (820363) of EU Flagship on Quantum Technologies, EU FET Open Grant Quromorphic, EPIQUS, and Shanghai STCSM (Grant No. 2019SHZDZX01ZX04).
Author information
Authors and Affiliations
Contributions
E.S. and F.A.A. supervised and contributed to the theoretical analysis. N.B. carried out all calculations and prepared the figures. C.Y.P. and M.H write the qiskit program to run in IBM quantum experience. All the authors wrote the manuscript. All authors contributed to the results discussion and revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Pan, CY., Hao, M., Barraza, N. et al. Experimental semiautonomous eigensolver using reinforcement learning. Sci Rep 11, 12241 (2021). https://doi.org/10.1038/s41598021905347
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021905347
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.