Fast reconstruction algorithm based on HMC sampling

In Noisy Intermediate-Scale Quantum (NISQ) era, the scarcity of qubit resources has prevented many quantum algorithms from being implemented on quantum devices. Circuit cutting technology has greatly alleviated this problem, which allows us to run larger quantum circuits on real quantum machines with currently limited qubit resources at the cost of additional classical overhead. However, the classical overhead of circuit cutting grows exponentially with the number of cuts and qubits, and the excessive postprocessing overhead makes it difficult to apply circuit cutting to large scale circuits. In this paper, we propose a fast reconstruction algorithm based on Hamiltonian Monte Carlo (HMC) sampling, which samples the high probability solutions by Hamiltonian dynamics from state space with dimension growing exponentially with qubit. Our algorithm avoids excessive computation when reconstructing the original circuit probability distribution, and greatly reduces the circuit cutting post-processing overhead. The improvement is crucial for expanding of circuit cutting to a larger scale on NISQ devices.


Background Circuit cutting
Circuit cutting is also known as a time-like cut.In quantum computing, any unitary matrix can be theoretically decomposed into a set of combinations of Pauli matrices {I, X, Y , Z} 18 .Specifically, any matrix A can be decom- posed into the following formulas: We can further decompose the Pauli matrix into its eigenvalues and eigenvectors, and again derive the following Eq.( 2) Each trace operator corresponds to a set of measurements in a particular Pauli basis, while the density matrix comprising of eigenvectors corresponds to a set of initialization operations.Since the physical implementation of the measurement in the I base and Z base is identical, we can knit together the cut points by three measure- ment operations ( X, Y , Z ) and four initialization operations ( |0 �, |1 �, |+ �, |i � ).This facilitates the generation of three distinct upstream subcircuits (referred to as Fragment 1) with different measurement bases, and four downstream subcircuits (referred to as Fragment 2) with different initializations, as illustrated in Fig. 1.These subcircuits can be run independently, and the results can be measured.
However, it should be noted that since the last qubit of the upstream subcircuit does not appear in the final output of the uncut circuit, the results of the upstream subcircuits need additional processing.The results of the upstream subcircuits need to be multiplied by a factor α = ±1 .The sign of α depends on both the measurement base and the measurement result of the last qubit.When the measurement base is I , α = 1 regardless of the measurement result of the last qubit.However, when the measurement base is {X, Y , Z} , α = 1 if the measure- ment result of the last qubit is |0 � , and α = −1 if the measurement result is |1 �.This relationship is summarized in Eq. (3) below: (1) A = Tr(AI)I + Tr(AX)X + Tr(AY )Y + Tr(AZ)Z 2 (2) Figure 1.A quantum circuit can be divided into two parts using a single cut.These two parts create multiple subcircuits by incorporating different measurement bases and initialization operations.These subcircuits can run independently and the output of the original circuit can be reconstructed through classical post-processing.
Following the processing of the upstream subcircuit, it is necessary to compute the terms that correspond to both the upstream and downstream subcircuits to reconstruct the original circuit's probability distribution.Assuming that the original is cut as in Fig. 1, the cut position is at the n/2 nd qubit, and the circuit quantum state is |x 0 x 1 x 2 . . .x n � .The quantum state associated with the upstream subcircuit is x up = x { n 2 +1} . . .x n � , and the quantum state associated with the downstream subcircuit is x down = x 0 . . .x n 2 � ,which x i ∈ {0, 1}.According to Eqs. ( 2) and (3), we know that during the reconstruction process, the upstream subcircuit consists of four terms, The four terms of the downstream subcircuit are The probability of the original circuit quantum state, denoted as |x 0 x 1 x 2 . . .x n � , can be calculated by taking the Kronecker product of the output terms from the upstream and downstream subcircuits.These output terms form 4 pairs, and their results are subsequently summed up, as shown in Eq. ( 4).
The probability distribution of the original circuit is obtained simply by calculating the probability of each quantum state in the original circuit using Equation (4).

Virtual two-qubit gate
A virtual two-qubit gate is a technique that can decompose a two-qubit gate into a series of single-qubit gates, also known as a space-like cut.It essentially involves classical post-processing and sampling of single-qubit gates to 'simulate' the entanglement effect produced by two-qubit gates.This means that a two-qubit gate can be expressed as a series of individual quantum gate operations, such as the Pauli matrices {X, Y , Z} , as well as single-qubit rotating gates Rx,Ry , and Rz around the X , Y , and Z axes.
The super-operator S(U) shown in Eq. ( 5) corresponds to the evolution of a complete quantum state, where ρ is the density matrix of the quantum state and U is the unitary operator acting on the quantum state, Suppose A 1 , A 2 are both unitary operators, with Therefore, it can be deduced that a two-qubit gate of form e iθ A 1 ⊗A 2 can be decomposed into a series of single- qubit gates as shown in Fig. 2 If A ∈ {X, Y , Z} , then I + α 1 A 1 can be realized by projection measurements on the A-base, and I + iα 2 A 2 can be realized by a single qubit gate in rotation around the A-base, as derived in Ref. 21 . (3) Vol:.( 1234567890) Hamiltonian Monte Carlo, also known as hybrid Monte Carlo 22 , is a Markov chain Monte Carlo (MCMC) sampling algorithm based on Hamiltonian dynamics.It describes the motion of a given object at time t using its position x and momentum p .Position corresponds to potential energy, while momentum corresponds to kinetic energy.Therefore, the potential energy U(x) and kinetic energy K(p) are functions of position and momentum, respectively.The Hamiltonian is the sum of the potential energy and kinetic energy functions.Thus, the Hamiltonian can be expressed as: The Hamiltonian characterizes the interconversion between kinetic energy and potential energy as an object moves.The following Hamiltonian equation can be analyzed quantitatively by differentiation.
If it is possible to express ∂U(x)/∂x i and ∂K(p)/∂p i with some initial conditions (e.g., at time t 0 , initial posi- tion point x 0 , and initial momentum p 0 ), it becomes feasible to predict the object's position and momentum at any subsequent time instant t = t 0 + T.
The Hamiltonian equation captures the continuous time evolution of an object's motion.To numerically simulate Hamiltonian dynamics, it is essential to discretize time to obtain an approximate solution for the Hamiltonian equation.The time interval T can be divided into smaller sub-intervals of length δ , allowing for an approximate continuity of time.This process can be accomplished using the leapfrog algorithm 23 , which sequentially updates the momentum and position variables.The algorithm proceeds by first calculating the momentum over a period, updating the object's position over a slightly extended period δ, and finally completing the calculation of momen- tum for the next time interval.The algorithm follows the steps outlined below.The core idea of HMC is to construct the Hamiltonian function H(x, p) .By leveraging this function, it becomes more efficient to explore the target distribution P(x) .The canonical distribution of statistical mechanics can be used to relate H(x, p) to P(x) .Given the energy function of a state as E(θ) , the corresponding canonical distribution can be defined as where Z is the regularization factor that ensures that P(θ)dθ = 1 , thus creating a valid probability distri- bution.Since the energy function is equal to the sum of potential and the kinetic energy in the system, it gives the following: Then the canonical function of the Hamiltonian kinetic energy function can be expressed as www.nature.com/scientificreports/According to Eq. ( 14), we can decompose the joint distribution P(x, p) into the product of the distributions of P(x) and P(p) , indicating that the two variables are independent of each other.As a result, their respective distributions can be utilized to sample their joint probability distributions.The introduction of an auxiliary variable p can expedite the convergence of the Markov chain.Given that variables x and p are independent, the momentum variable p can be sampled from any distribution, with N(0, 1) commonly being selected.The func- tion connected to the potential energy in the Hamiltonian is given by: In HMC, after defining K(p) , the remaining work is how to find the potential energy function U(x) for a given target distribution P(x) , then the potential energy function is usually defined as: Calculate again the gradient function G(x) of the potential energy function: Next, Hamiltonian dynamics can be applied to MCMC to sample the objective function P(x) .However, dis- cretizing the time may introduce a specific error that may not match the target distribution, so the acceptance rate can be induced to offset the error, and the acceptance rate α is Here, x 0 , p 0 represents the initial state, while the new state x L , p L is obtained after executing the jump point algorithm L times.A random point u is chosen from a uniform distribution between 0 and 1.If the acceptance rate α is greater than u , the point x L is accepted in the Markov chain.Upon performing multiple samples to reach the burn-in period of HMC sampling, the Markov chain converges towards a stationary distribution, which corresponds to the target distribution P(x).
The random walk in the MCMC algorithm can lead the Markov chain to converge to a stationary distribution, p(x) , but it is often considered inefficient.Hamiltonian Monte Carlo leverages the principles of Hamiltonian dynamics in physics to calculate the future states of the Markov chain, rather than relying solely on a probability distribution.This approach enables more efficient exploration of the state space and achieves faster convergence compared to random wandering.

Fast reconstruction algorithm
Traditional exact reconstruction algorithms involve traversing through all possible quantum states in the original circuit's state space to reconstruct its probability distribution.This requires processing all potential combinations of qubit strings through brute force computation.However, the time complexity of these algorithms grows exponentially.As the number of qubits in the circuit increases, reconstruction time also experiences an exponential explosion.Additionally, exact reconstruction algorithms may encounter difficulties when applied to large-scale quantum circuits due to overhead.To overcome these challenges, this paper presents a fast reconstruction algorithm that differs from the approximate reconstruction method proposed by Chen et al. based on MH sampling 20 .In this work, we draw inspiration from MCMC sampling.However, our reconstruction algorithm deviates from the traditional MH approach, where the future state of a Markov chain is calculated through random wandering.Instead, our method employs Hamiltonian dynamics, rooted in the physical system concept, to determine future states.This technique enables more efficient analysis of the state space compared to random wandering and facilitates faster sampling of high probability solutions from state space, thus accelerating the convergence of Markov chains towards the original circuit's probability distribution.For a comprehensive understanding of HMC sampling, please refer to the references 24,25 .
Our reconstruction algorithm follows the steps outlined below.
Step 1: Assume that the original circuit probability distribution is P(x) , then customize the potential energy function U(x) = −logP(x) , the gradient function G(x) = ∂U(x) ∂x , and the kinetic energy function K p = p T p 2 .
Step 2: Once the potential, gradient, and kinetic energy functions have been initialized, we must establish an initial state x 0 , p 0 for HMC sampling.Here, x 0 represents a randomly chosen quantum state from the original circuit's probability distribution, while p 0 typically denotes a random number drawn from a standard normal distribution N(0, 1) .Additionally, it is essential to initialize a dictionary to store the original circuit's probability distribution.
Step 3: Based on the HMC sampling principle mentioned above, simulating Hamiltonian dynamics in numerical terms requires discretizing continuous time.This is typically achieved using the leapfrog algorithm.To obtain the new state x l , p l , the initial state x 0 , p 0 is iteratively updated L times using the leapfrog algorithm.
Step 4: Once the new state has been obtained, it is necessary to calculate the acceptance rate α = min 1, exp −U(x l ) + U(x 0 ) − K p l + K p 0 for the state x l , p l .This calculation is crucial because HMC sampling estimates the posterior distribution through probabilistic sampling.In essence, HMC sampling constructs a Markov chain that traverses the state space.The traversal is achieved by computing future ( 15) states using the principles of dynamics in physical systems.To ensure that this Markov chain possesses the properties of a stationary distribution, it is necessary to define an acceptance rate α for determining the viabil- ity of transitioning to future states.If the acceptance rate α is greater than a threshold u , the point is accepted as a value from the original circuit's probability distribution and stored in the dictionary for multiple iterations.Following the burn-in period of the algorithm, the Markov chain produced by HMC sampling stabilizes into the desired stationary distribution, which corresponds to the original circuit's probability distribution.
The pseudocode for our reconstruction algorithm can be obtained based on the description above.

Results
As the overhead of circuit reconstruction grows exponentially with the number of cuts, this paper also investigates how to reduce the minimum number of cuts required for circuit cutting, which can be achieved by virtual two-qubit gate decomposition.
The two-qubit gate decomposition technique can be applied to certain fully connected quantum circuits to reduce their structural complexity, resulting in fewer cuts and a significant reduction in the circuit's processing overhead.Figure 3 illustrates the change in the minimum number of required cuts before and after applying a two-qubit gate decomposition on a fully connected 5-qubit QFT 26   If the circuit shown in Fig. 3a is cut directly, we need to cut it at least 4 times to decompose the circuit.We try to decompose the CP gate between qubit 0 and qubit 3 of the 5-qubit QFT circuit, the decomposition method is shown in Equation ( 6), and the decomposed circuit is shown in Fig. 3b.The CP-Cut in Fig. 3b indicates the gates after the CP gate is decomposed.
After applying the two-qubit decomposition principle to decompose the 5-qubit QFT circuit, it was found that only three cuts were needed to separate the circuit.Furthermore, we conducted additional experiments using the circuit-knitting-toolbox toolkit to explore the furthest two-qubit gate decomposition for varying qubit sizes in QFT circuits.The minimum number of cuts required before and after decomposing the two-qubit gate was then calculated.The experimental results are depicted in Fig. 4.
Analysis of Fig. 4 reveals that decomposing two-qubit gates effectively reduces the number of cuts in 4-qubit to 21-qubit QFT circuits.This reduction is more prominent as the number of qubits in the QFT circuit increases.This trend can be extrapolated to complex quantum circuits, where decomposition of two-qubit gates serves as a strategy to minimize circuit cuts.Larger circuits benefit more from this technique, experiencing a greater reduction in the number of cuts after the decomposition of two-qubit gates.This paper evaluates the performance of our reconstruction algorithm through several experiments.Specifically, we measure the runtime of single cut reconstruction for randomized circuits with varying qubit sizes and compare it against three other algorithms: the traditional exact reconstruction algorithm, Tang's DD algorithm, and Chen's approximate reconstruction algorithm.In cases where a single cut is insufficient to cut the circuit, we apply two-qubit gate decomposition to minimize the number of required cuts.These experiments were conducted using the qasm simulator, and the corresponding results are illustrated in Fig. 5a.
Figure 5a shows that the traditional exact reconstruction algorithm (Exact) shows remarkably short runtime for small qubit sizes.However, due to its requirement to traverse all quantum states, the runtime of Exact experiences exponential growth as the number of qubits in the quantum circuit increases.Conversely, the other three reconstruction algorithms demonstrate smoother time distributions in Fig. 5a and do not show significant fluctuations with increasing qubit counts.Among the three algorithms, the DD algorithm shows the longest average runtime due to its utilization of continuous recursion for merging active qubits into bins , aiming to reconstruct  www.nature.com/scientificreports/solution states 18 .However, the time-consuming process of merging quantum states per recursion contributes to the relatively slower average runtime of the DD algorithm compared to both the fast reconstruction algorithm (FRA) and the approximate reconstruction algorithm (ARA) based on sampling.Notably, FRA outperforms ARA in terms of speed as it incorporates the concept of Hamiltonian dynamics in physical systems to compute future states within the Markov chain, in contrast to the random wandering approach employed by ARA.
The experimental results indicate that FRA outperforms other reconstruction algorithms in terms of runtime.Specifically, FRA's average runtime is 45.6 times faster than the traditional reconstruction algorithm, 2.32 times faster than the DD algorithm, and around 1.47 times faster than ARA.
In addition to the runtime analysis, this paper also compares the correctness of all quantum states of the probability distribution of the reconstruction results, and the evaluation metric used in this paper is the mean squared error ( MSE ), as in Eq. ( 19): Here, x i refers to the reconstructed result, while y i represents the result obtained from original circuit.A smaller MSE corresponds to a lower error rate, as depicted in Fig. 5b, which presents a comparison of the MSE values for different algorithms.Based on the experimental findings, the Exact shows the lowest error rate and closest proximity to the probability distribution of the original circuit.It is followed by the DD algorithm, the FRA proposed in this paper, and finally ARA.
The MSE of FRA shows improvement compared to the MSE of ARA in this experiment.However, there exists a disparity between traditional exact reconstruction and DD algorithms.This discrepancy arises due to the inherent sampling-based nature of FRA and ARA, which introduces a certain degree of error in contrast to alternative algorithms.
Quantum circuits can be broadly classified into two types.The first type of circuit results in a sparse probability distribution, where only a few solution states have a significantly high probability, while non-solution states have a probability of 0. Quantum algorithms like QAOA 27 , Grover 28 , and BV 29 generally belong to this type.On the other hand, the second type of circuit produces a dense probability distribution, where many quantum states have non-zero probabilities, such as the 2-D random circuits 30 .In QAOA and similar algorithms, it is sufficient to highlight the solution state in the reconstructed probability distribution to obtain the algorithm's solution.There is no need to excessively focus on achieving high accuracy for all possible solutions.Consequently, we have tested the first type of circuit, specifically the QAOA circuit, as demonstrated below using an example to address the Max-Cut problem.
This paper makes a single cut to the 6-qubit QAOA circuit.This circuit is used to solve the Max-Cut problem of Fig. 6a, and the probability distribution reconstructed using FRA in this paper is experimentally compared with the results of run of the original circuit.
The Max-Cut problem is a common combinatorial optimization problem in graph theory with important applications in statistical physics and circuit design.Michel Goemins and David Williamson proposed a classical algorithm based on semidefinite programming (SDP) approximation for solving the Max-Cut Problem in 1995, which is the best-known approximation algorithm for polynomial time 31 .The effectiveness of QAOA depends on the number of layers of the unitary transformation used, and in theory, it is possible to find an excellent approximation with enough layers, but it can also be time-consuming.
The Max-Cut problem involves dividing the nodes of a graph into two sets so that the number of edges between the sets is maximized.The objective function for its transformation into a combinatorial optimization problem can be formulated as follows: Assuming a max-cut is performed on the graph shown in Fig. 6a, we can then construct a quantum circuit graph illustrated in Fig. 6b based on the objective function of the Max-cut, represented by Eq. ( 20).We optimize the parameters β, γ ( 2β is the rotation angle of the Rzz gate, 2γ is the rotation angle of the Rx gate) by the classical COBYLA optimizer.The changes in the circuit's expected value and variance are observed and presented in Fig. 7.In the expectation value measurement, a larger shaded region indicates a smaller value, indicating a better fit of the parameters and closer proximity of the circuit results to the approximate optimal solution.On the other hand, the variance represents the stability of the solution, where a larger shaded region corresponds to a smaller variance, indicating a more stable and reliable solution.
After finding the optimal parameters to construct the complete QAOA circuit in conjunction with Fig. 7, the farthest two-qubit gate Rzz gate of the QAOA circuit is decomposed due to According to Eq. ( 6), the Rzz gate can be decomposed into a series of Z gates and a combination of Rz gates.Subsequently, we divide the circuit into two parts by making a single cut and then reconstruct the circuit results using FRA.We conducted a comparative analysis of the results obtained from running the original circuit, the circuit after decomposing the two-qubit gate, and the reconstructed circuit after decomposing the two-qubit gate and performing a single cut.These results are visually represented in Fig. 8.
From Fig. 8, we see that the correct results of the circuit are |010101> and |101010>, and the correct solution probabilities of the circuit after decomposing the two-qubit gate are 0.13 and 0.12 respectively, which are lower  than that of the original circuit at 0.15.However, the fast reconstruction algorithm is also able to reconstruct the high probability solution state based on the circuit after the two-qubit gate decomposition, with a correct solution probability of 0.19 and 0.18, which is even higher than that of the original circuit of 0.15.Although FRA does not fully reconstruct all quantum states within the state space of the original circuit, such as the low-probability solutions |011111> and others, this limitation results in a relatively low MSE .However, FRA shows stability in reconstructing high probability solutions, which closely resemble the original distribution.This level of accuracy is sufficient for solving circuits like QAOA.
To further validate the conclusion, we performed additional experiments on multiple QAOA circuits.Each circuit consisted of two high probability solutions, which were then evaluated through FRA after a single cut.The experiment results were sorted in descending order of probability after running all circuits.Subsequently, the first two solutions from the original circuits' running results were compared to the first two solutions of the FRA reconstruction results.The evaluation metric used in this comparison was the coincidence rate ( CR).
The CR is equal to 1 when both first two solutions coincide completely, 0.5 when only one solution coincides, and 0 when there are no coinciding solutions.
We conducted experiments on the QAOA circuits with 3-qubit to 13-qubit.As a result of the experiments, the CR of the reconstructed results of the FRA for all experimental circuits is 1, when compared to the results obtained from the original circuits.This signifies that FRA is capable of accurately reconstructing the solution states for quantum circuits, such as QAOA.Additionally, we have counted the number of results obtained by various reconstruction algorithms in the experiments to estimate the space overhead, as depicted in Fig. 9.
As shown in Fig. 9, of all the reconstruction algorithms for this experiment, the FRA reconstruction yields the least number of solutions.The Exact and DD algorithms reconstruct all quantum states within the state space of the circuit.However, the results obtained by these algorithms increase exponentially as the number of qubits grows.At a certain point, the storage device may be incapable of accommodating all the results.Conversely, FRA reconstructs high probability solutions from the original circuit, eliminating the presence of 0-probability or lowprobability solutions.Consequently, the storage space overhead can be significantly reduced, transforming from exponential to polynomial scale when compared to Exact and DD algorithm for large-scale quantum circuits.
In summary, FRA can efficiently acquire the reconstruction probability distribution of a quantum circuit in a shorter runtime compared to the Exact, DD, and ARA.Unlike the Exact and DD algorithm, FRA does not require the reconstruction of probabilities for all quantum states.Instead, it selectively focuses on high probability solutions, reducing time overhead and space overhead while finding the correct solution for the circuit.This means that FRA is a low overhead circuit-cutting post-processing algorithm.

Conclusion
This paper introduces an algorithm based on Hamiltonian Monte Carlo in circuit cutting reconstruction.Additionally, we apply two-qubit gate decomposition to minimize the number of cuts in experimental circuits during our experiments.Our comprehensive experiments demonstrate that our reconstruction algorithm efficiently samples high probability solutions from state space, without the need to traverse all possible states.Furthermore, it significantly reduces both time and space overheads.
In the NISQ era, circuit cutting plays a crucial role in extending NISQ devices to larger scales.However, due to its excessive post-processing overhead, operating on large-scale quantum circuits becomes impractical.Therefore, the algorithm proposed in this paper to reduce the post-processing overhead of circuit cutting holds the utmost www.nature.com/scientificreports/importance for the NISQ era.Additionally, the concept of the fast reconstruction algorithm, introduced in this paper, solely obtaining high probability solutions carries significant implications for the future development of quantum computing.Currently, our reconstruction algorithm is only applicable to circuits with a single cut, and it has not been extended to circuits with multiple cuts.Our reconstruction algorithm is particularly suitable for circuits such as QAOA.Investigating and exploring how the algorithm can be adapted for multiple cuts and extended to all quantum circuits are promising directions for future research.
a. Begin by calculating the change in momentum after half the time interval δ/2 : b.Next, compute the change in position over the entire time interval δ: c.Recalculate the change in momentum for the remaining half-time δ/2:

Figure 2 .
Figure 2. Decompose the two-qubit gate into a series of single-qubit gate sequences.

Figure 3 .
Figure 3.Comparison of the change in the minimum number of cuts required before and after two-qubit gate decomposition for a 5-qubit QFT circuit.

Figure 4 .
Figure 4. Comparison of the minimum number of cuts required for QFT circuit before and after decomposing a two-qubit gate.

Figure 5 .
Figure 5.Comparison of runtime and MSE for different post-processing algorithms for a single cut of random circuits.

Figure 6 .
Figure 6.(a) is a diagram of the maximum cut problem to be solved, and (b) is a QAOA quantum circuit built by the objective function; we decompose the farthest CZ gate of the circuit and make one cut of the circuit.

( 21 ) 2 Z⊗ZFigure 7 .
Figure 7. Figures (a) and (b) illustrate the expected value of the QAOA circuit and the variance of the measurements with the parameter β , γ for p = 1 (where p represents the number of iterations of the QAOA algorithm), respectively.

Figure 8 .
Figure 8. Results of the original circuit, the circuit after two-qubit gate decomposition, and results of running FRA on a single cut circuit after decomposing the two-qubit gate, with the probability values of some high probability solution states also marked in figure.

Figure 9 .
Figure 9.The Figure shows the number of reconstructed results obtained through various algorithms, namely Exact, DD, ARA, and FRA, for the 3-qubit to 13-qubit qubits QAOA