Low Rank Density Matrix Evolution for Noisy Quantum Circuits

In this work, we present an efficient rank-compression approach for the classical simulation of Kraus decoherence channels in noisy quantum circuits. The approximation is achieved through iterative compression of the density matrix based on its leading eigenbasis during each simulation step without the need to store, manipulate, or diagonalize the full matrix. We implement this algorithm in an in-house simulator, and show that the low rank algorithm speeds up simulations by more than two orders of magnitude over an existing implementation of full rank simulator, and with negligible error in the target noise and final observables. Finally, we demonstrate the utility of the low rank method as applied to representative problems of interest by using the algorithm to speed-up noisy simulations of Grover's search algorithm and quantum chemistry solvers.


I. INTRODUCTION
The scaling of quantum computers is limited by quantum decoherence [1][2][3][4]. State of the art quantum computers that consist of fewer than 100 qubits [5,6] are of interest for noisy intermediate-scale quantum (NISQ) applications, where qubits are used without error correction [7]. Therefore, classically simulating imperfect and noisy circuits is vital for the development and design of NISQ-era algorithms as well as characterizing errors in quantum hardware [8]. Existing classical simulators have typically focused on emulating noiseless circuits. In this case, simulations of quantum circuits with more than 50 qubits has been demonstrated [9][10][11]. There are several techniques developed for speeding up simulations of certain types of circuits [12][13][14][15][16] or algorithms [17][18][19][20]. For simulation of noisy circuits, there are high performance computations developed [21][22][23][24], as well as light-weight open source tools such as density matrix simulators in Qiskit [25] and Cirq [26]. These simulators are based on evolving full density matrices which is prohibitively expensive for large numbers of qubits.
In an open quantum system, the decoherence can be modeled as interactions with a large environment [27]. One can determine the properties of an open quantum system with the Monte Carlo wave function method where a system is decomposed into an ensemble of pure states that evolve individually and then are averaged [28][29][30][31][32]. On the other hand, the dynamics can also be characterized by the Lindblad equation [33,34] which well describes decoherence in various quantum hardware architectures [35][36][37]. To reduce the complexity of the Lindblad equation, one can project the quantum states onto a lower dimensional basis using filtering theory and simulate the states more efficiently with reasonable accuracy [38,39].
In this work, we combine the ideas of the pure state * Electronic address: yitchen@stanford.edu † Electronic address: cjf235@cornell.edu ‡ Electronic address: rob.parrish@qcware.com decomposition [28,29] and the low dimension basis projection [39] to efficiently simulate noisy quantum circuits. Compressed representations are commonly applied to classical simulation of quantum systems with high symmetry or low entanglement. Applications range from compressed sensing of quantum state tomography [40,41], limiting bond dimensions of tensor networks [14,42,43], and low-rank factorization of Hamiltonians [44] to efficiently represent of states [15,[45][46][47]. In our case, it is found that the von Neumann entropy of a density matrix is often small when the noise level is low, implying that it is possible to model the density matrix using a matrix of lower rank with minimal information loss. We achieve this by iteratively projecting onto a subspace of the eigenbasis, and evolving only a small ensemble of pure states.
In the following, we present a complete and explicit algorithm which decomposes a mixed density matrix into a low rank matrix representing an ensemble of pure states, applies gate and Kraus operators to this low rank matrix, and computes the output density matrix and probability distribution. The procedure involves iterative compression of the density matrix to maintain the most numerically compact form with minimal error. As an example, Fig. 1a shows a 6-qubits density matrix after a quantum circuit that solves a Grover's search problem for finding states with Hamming weight ≤ 2. The same circuit is simulated with depolarizing noise with noise strength p 0.33% by an exact method and by our low rank method. In this example, we use a low rank representation that has only 20% of the full rank. Fig. 1b and  1c show that the low rank method simulates noise with high accuracy. More extensive benchmarking and detailed descriptions on the performance and accuracy of the method are in Section III.
In fact, we show that it is possible to evolve and compute quantities of interest of a (2 N × 2 N ) density matrix without ever forming a matrix of size (2 N ×2 N ), where N is the number of qubits. The algorithm is then assessed by a sequence of random benchmarking under various types and strengths of noise channels to test its practical speed-up and error. We show that the algorithm per-arXiv:2009.06657v1 [quant-ph] 14 Sep 2020 forms consistently in random circuits and in structured circuits for quantum algorithms such as Grover's search, with a speed-up more than two orders of magnitude and with a small error (around 0.01%) in the probability distribution associated with the final output density matrix. Furthermore, as N becomes larger, and approaches the range for which classical simulations become difficult, the advantage of this algorithm continues to increase over the standard method of full density matrix evolution.

II. ALGORITHM FOR LOW RANK NOISE SIMULATION
In this section, we present an algorithm that simulates noisy circuits using a low rank representation of density matrices. The algorithm consists of two parts, low rank evolution and eigenvalue truncation, which are covered in section II A and II B below. In section II C, an iterative procedure consisting of these two parts is introduced. Then, in section II D we explain how to sample the associated probability distribution without explicitly forming the full density matrix.

A. Low Rank Evolution
A coherent quantum system can be represented by either a statevector |ψ or a density matrix ρ = |ψ ψ|. |ψ has dimension 2 N × 1, while ρ has dimensions 2 N × 2 N . Because of the substantial difference in the sizes of |ψ and ρ, most classical simulations of coherent quantum systems work directly with the statevector representation. Unfortunately, the statevector representation does not directly allow for the presence of decoherence. Instead, the evolution of a quantum system in a decoherent noise channel can only be described by a density matrix ρ, which is generally more computationally demanding to simulate. The evolution of a quantum state in a noisy quantum circuit is described by [48][49][50] where K α are Kraus matrices, p α is the corresponding probability, ρ (d) and ρ (d+1) are density matrices before and after a noise channel. This Kraus-based noise model is capable of encapuslating many different types of decoherence channels, such as bit flip, depolarizing, etc., and therefore is an extremely useful tool in modeling the operation of NISQ-era quantum algorithms in the presence of decoherence noise on real devices. However, most current implementations of this Kraus-based decoherence model explicitly work with the 2 N × 2 N noise-including density matrix. Our approach exploits the fact that for realistically small noise levels (p 0.01), the von Neuman entropy of the density matrix usually remains low, raising the possibility of working with an approximate but accurate low-rank representation of the density matrix. This observation is supported by an entropy analysis in Appendix A.
While the formal possibility of a low-entropy density matrix evolution is tantalizing, there remains to be resolved many pragmatical details about how to efficiently identify and exploit this rank structure while avoiding formation and manipulation of any density-matrix-sized quantities in the rank identification process. Here and in II.B we describe an algorithm that can accomplish this for noise-including density matrices that exhibit the desired low-rank structure. A density matrix, ρ, can always be decomposed as a outer product of the L ∈ C 2 N ×V matrix, for some V ∈ N. While the choice of L is not unique, in general, it is possible to find L with V equal to the rank of the density matrix using decomposition methods such as singular value decomposition. For density matrices with rank smaller than 2 N , this form most compactly represents the state with the minimal column dimension. Using the decomposition in Eq.
(2), we can evolve the density matrix by updating L without evaluating ρ (d) explicitly as in Eq.(1). For a gate operation where is a gate operation, and G (d) is its corresponding gate matrix. Likewise, for a Kraus operator, is a Kraus operator, and K ]. Note that, due to the concatenation, the number of columns of L changes after each noise operation; each column in L (d) will evolve to A columns in L (d+1) . For example, if the dimension of L (d) is 2 N × 3 and A = 2, then L (d+1) has dimension 2 N × 6.
FIG. 1: An example showing that the low rank method simulates noise with negligible error. (a) density matrix from noiseless simulation, ρ noiseless . (b) difference of density matrices from exact noise simulation and from noiseless simulation, ρ exact − ρ noiseless . (c) difference of density matrices from low rank noise simulation and from noiseless simulation, ρ LRET − ρ noiseless where LRET, the focus of this article, is a low rank method. The rank of ρ LRET is 20% of that of ρ exact , and corresponds to 2.2% of distortion (defined in Section III). The noise strength is p 0.33% and the truncation threshold is = 3 × 10 −4 .

B. Eigenvalue Truncation
From (d − 1)-th layer to (d)-th layer of a quantum circuit, the number of J α vectors grows by A times. For a system starting from a pure state, this number is A d at the d-th layer. This scaling makes tracking all J α vectors computationally intractable over time if left unchecked. Furthermore, in practice, when the noise level is small, the number of columns corresponding to significant eigenvalues of L (d) is often found to grow only polynomially with the system size. An eigenvalue truncation procedure is used to project the density matrix into lower rank and keep only those highest contributing columns, akin to the quantum filtering in simulating open quantum system [38]. We truncate those eigenvectors whose eigenvalues are negligible by where U (d) , Λ (d) are eigenvectors and eigenvalues of ρ (d) , and from which we defineŨ (d) ,Λ (d) as approximations for which the unimportant eigenvalues and eigenvectors are truncated. The truncation is based on a threshold . The descending-ordered eigenvalues are picked up one by one until they sum to 1 − . The remaining eigenvalues sum to are thrown away along with their associated eigenvectors. Although more sophisticated ways of truncation exist [51,52], we use this simple cutoff criteria to better control the error introduced by the procedure. Furthermore, this truncation method is the op-timal scheme to preserve the trace and the 2-norm of a matrix, known as the Eckart-Young theorem [53]. The representation above might be a useful method to retain only the maximal information L factors in Kraus noise models. However, the approach appears to have the computational problem that it involves the eigendecomposition of a 2 N ×2 N matrix. Solving an eigenvalue problem of a 2 N × 2 N matrix has a complexity of O((2 N ) 3 ), which is very expensive and would overwhelm the benefit of low rank simulation. However, using the result from theorem 1 in the appendix, we can efficiently compute the eigenvectors and eigenvalues without explicitly constructing the density matrix. The complexity of the eigenvalue problem is instead O((AV ) 3 ) where V < 2 N is the number of columns of L in Eqn. (2), and A is the number of Kraus matrices comprising the Kraus operator.

C. Kraus Operator Decomposition
We model noisy quantum channels with single-qubit Kraus operators. To model noise induced by gate operations, one may apply Kraus operators following each gate. If the circuit is sparse in terms of gates, correspondingly, there are only a few Kraus operators per layer. In this case, AV stays small and eigenvalue truncations can be done relatively efficiently. However, consider a dense noisy circuit with a depolarizing Kraus operator acting on every qubit at each time-step (Fig. 2). The depolarizing Kraus operator is comprised of 4 matrices [50], and therefore A = 4 N , making eigenvalue truncation intractable with complexity O((4 N V ) 3 ). This can be resolved by decomposing the Kraus operator in several groups This decomposition is possible because noise channels in a quantum computer can be well described by a combination of one and two-qubits Kraus operators [5]. In the example in Fig. 2, instead of an eigenvalue truncation after each whole Kraus layer, a truncation is applied β in order to prevents A from getting too large. In other word, the evolution where E is eigenvalue truncation operation. Because a Kraus operator is decomposed into B operations, each decomposed operation has onlyĀ = 4 N B Kraus matrices, denoted as K This quantity will be important to estimate the conditions for which low rank simulation is faster than full density matrix simulation in Section III. We focus on the simulation of NISQ-era circuits, which are shallow in terms of circuit depth. However, note that for very deep circuits, V I can grow larger than 2 N . In this case, there is no benefit of doing low rank evolution, and we switch back to full density matrix evolution. The algorithm of low rank noise simulation is summarized in Algorithm 1.

D. The LRET Algorithm
Section II A, II B and II C together describe the algorithm for getting the final low rank representation, L (D) . We refer this algorithm to as Low Rank simulation with Eigenvalue Truncation (LRET). The full procedure is summarized in Algorithm 1. The concatenation and the eigenvalue truncation in the algorithm are described in Section II A and II B, respectively. Note that although we use the |000... fudicial state as the initial state (as shown in Algorithm 1) throughout this article, the algorithm works with any single fiducial state or sparse linear combination of states. Also note that density matrix, probability distribution and expectation value in Algorithm 1 are optional and are included as examples of user-specified outputs.
Once we have the final low rank representation, L (D) , we can construct the density matrix using Eqn. (2). However, this full density matrix quantity is rarely needed in standard practice. For example, one may want to simulate the behavior of quantum hardware where the only information we get is from measurements in a fixed computational basis which sample the probability mass function, Prob(x), that is defined by the underlying density matrix. In this case, low rank simulation gains an additional speedup as the probability distribution is simply where subscript x and v run over computational basis dimension and column dimension of the matrix respectively. The measurement count for each state is then sampled from this distribution. Note that, if the goal of a circuit simulation is to observe and count the measurement outputs, a density matrix is not formed at any point of the simulation as long as the intermediate rank is smaller than 2 N . Similarly, we can evaluate observables in low rank form

Algorithm 1 Low Rank Simulation with Eigenvalue Truncation
end for end for Compute quantities of user's choice:

III. IMPLEMENTATION AND BENCHMARKING
We implemented the algorithm for low rank noise simulation in an in-house quantum circuit simulator built in Python. In our simulator, one can specify one of two options for a noisy simulation: full density matrix simulation (FDM), and low rank simulation with eigenvalue truncation (LRET) as described in Algorithm 1. We benchmark the two simulation methods in three scenarios: randomized benchmarking, state preparation for quantum chemistry and Grover's search algorithm. For time benchmarking, we use Cirq 0.5.0, a widely-used open source FDM simulator, to show that our implementation of FDM method is reasonably optimized and serves as a good baseline for comparison. All benchmarking are executed on an AWS c5.12xlarge instance.
The general result is that the LRET method is two orders of magnitude faster than the FDM method with a trade-off of ∼ 0.01% error. The error is measured by the distance between the output density matrices from the LRET method (ρ LRET ) and from an exact method (ρ exact ), such as FDM. Because this quantity depends on the noise level, we define a more appropriate measure for error benchmarking where ρ noiseless is the density matrix from the simulation of the same circuit without noise, and T is the variational distance [54] between the probability distributions defined by the two density matrices in the computational basis, ie.
To aid in a qualitative understating of the distortion measure, it is useful to note that the distance between ρ LRET and ρ exact captures the error or information loss incurred by the eigenvalue truncation procedure. The denominator scales this value relative to the change induced by the noise channel to ρ noiseless . For example, when the output error is 0.01% and the change induced by noise is 0.1%, the distortion is 10%.

A. Randomized Benchmarking
Randomized benchmarking is a standard tool used to evaluate the performance of quantum hardware [55][56][57]. We use the idea to benchmark time and error metrics for the two simulation methods on an ensemble of randomly generated circuits. The circuits are generated from random choices of common gates, including X, Y, Z, S, T, RX, RY, RZ, SWAP, CZ and CNOT. The Section III A 4 below discusses the different types of circuits we use for benchmarking. If not explicitly stated otherwise, the random circuits are dense circuits where 1-qubit and 2-qubit gates appear with equal probability in G (Fig. 2), and the 2-qubit gates connect to adjacent qubits. Dense circuits are those for which a gate acts on each qubit at each time-step. Inspired by the fact that noise is well described by a set of Kraus operators whose dimension does not scale with circuit size [5], all Kraus operators act on one qubit in the benchmarking, as illustrated in Fig. 2.
To understand the conditions for which the LRET method gains speed-up against FDM method, we also inspect the rank evolution of density matrix in LRET. The rank and the intermediate rank (as defined in Eq. (8)) directly influence the computational complexity and the speed of the LRET algorithm. In the following sections, ...

FIG. 2:
Schematic of a noisy quantum circuit. Here G (i) represents gate operations at the i-th layer and K represents a one-qubit Kraus operator that models the noise.
we first benchmark time, error, and rank of simulations under different noise channels. Then, we assess performance on a variety of differently characterized random circuits.

Depolarizing Noise Channel
In this section, we benchmark dense circuits with the depolarizing noise channel, which is defined as ρ It has been shown that noise in quantum hardware, in average, behaves like depolarizing noise [58,59], making it a good description of realistic noise channels. Fig. 3a shows that, while our FDM method and Cirq take a similar amount of time to run for 12-qubits circuits with p = 0.1% under depolarizing noise, the LRET method is much faster than both. In shallow circuits, LRET is 200× faster than FDM (Fig. 3b). Even for higher depth circuits, LRET remains roughly 100× faster. This can be understood by considering the size of the numerical representation these methods are keeping track of. While the FDM method evolves a 2 N × 2 N density matrix, LRET only keeps track of a 2 N ×V representation of a density matrix. Effective use of the LRET algorithm amounts to choosing the truncation threshold as to best manage the trade-off between the speed of the simulation and the error in the simulation results. This trade-off is characterized in Fig. 4.
Although V is always smaller than 2 N at low depth for N > 2 (Fig. 3c), the conditions for a speed-up is determined by the intermediate rank V I defined in Eq. (8); the LRET method is faster if V I < 2 N . As shown in Fig.  3d, a speed-up is only achieved for N > 7 for the circuit depth consider herein. Since V I increases approximately polynomially and 2 N increases exponentially in N , the range of depths for which LRET has an advantage will increase even more as the number of qubits increases. Critically, LRET has an advantage precisely in the range where classical simulations begin to become burdensome. Furthermore, the space of circuit sizes in which LRET provides a significant advantage also characterizes the circuits of the early NISQ area, with few tens of qubits, circuit depth and with noise strength p < 0.01 [7].
The LRET method gains a speed-up by truncating the negligible components of a density matrix. Although the truncation in each step is small, over time the discrepancy from the exact methods, like FDM, can build up.
Here, we benchmark the error introduced by the eigenvalue truncation in the LRET method. Fig. 4a shows the distortion as a function of the number of qubits (N ), depth of the circuit (D) and eigenvalue truncation threshold ( ). While the distortion depends on N and D, is the strongest factor. There is a general trend that the error starts to increase rapidly at 10 −4 , so we take = 10 −4 as a reasonable choice. From Fig.  4b, we can see that error < 8% for all the N and D considered herein. As we slice out the number of qubits and circuit depths axes in Fig. 4b, we can see that the error grows roughly linearly with N and D ( Fig. 4c and d).

Noise Strength
We now see how the noise strength affects the performance of the LRET method. All benchmarking in this section uses dense circuits with depolarizing noise channels of various strengths. In Fig. 5a, the speed-up of the LRET method against the FDM method degrades as the noise strength grows. From p = 0.1% to p = 1%, the speed-up drops from the order of 100× to 10× at D = 12. This degradation is due to the higher order terms in noise which scale super-linearly in p. While the truncation threshold is adapted linearly by fixing the ratio /p = 0.1 in this benchmarking, more higher order terms need to be included to meet the truncation threshold. This results in a larger V and V I (Fig. 5b), and thus a longer computational time. Fig. 5c shows that the distortion as a function of has a universal shape regardless of the circuit size (N × D) and/or noise strength p. The magnitude of the distortion is relatively insensitive to circuit size. The noise strength p proportionally shifts the curves in axis (ie. when p and are scaled by a same factor, the error stays in a similar range).

Other Noise Channels
We now consider noise simulations under bit flip and amplitude damping channels for dense circuits with p = 0.1% and = 10 −4 . Bit flip can be represented by ρ → (1 − p)ρ + pXρX in the operator-sum formalism. In other words, this channel takes a portion of the quantum state and project it uniformly in the X direction of the Hilbert space. Bit flip channel is a special case of anisotropic noises. The results for the bit flip channel generalizes to other types of anisotropic noise in randomized benchmarking, such as phase flip and all other channels described by ρ → (1 − p)ρ + pU ρU † where U is any 2 × 2 unitary matrix. The amplitude damping channel dissipates the energy of a qubit towards its lower energy basis, usually denoted as the |0 . We use the operatorsum formalism ρ → E 0 ρE 0 + E 1 ρE 1 , where E 0 and E 1 are Kraus operators for amplitude damping, defined as ing channel, where LRET completes in less than 3 seconds while FDM takes more than 25 minutes at depth = 12. The speed-up of the depolarizing and bit flip channels are about 100× faster while the speed-up for amplitude damping is almost 1000× faster (Fig. 6b). This is related to the slower increase of intermediate rank (Fig. 6c) due to the fact that amplitude damping has a preferred state, the |0 state, regardless of the details of the qubit state.
The error benchmarking for the bit flip channel (Fig.  7)a-d is very similar to that of depolarizing channel, except that bit flip is more tolerant to when the circuit size is small. At = 10 −3 , the distortion is ∼ 50% in bit flip channel while it is ∼ 100% in depolarizing channel. When = 10 −4 , distortion is reasonably small for all N and D considered herein, so we take 10 −4 as a recommended choice of .
In contrast to depolarizing and bit flip channels, under amplitude damping the distortion saturates at lower values when is large (Fig. 7e). This is possibly because amplitude damping prefers the ground state, favoring the LRET method which keeps only few important components of a quantum state. When = 10 −4 , the distortion is smaller than 4% for all circuits considered in Fig.  7f. The distortion grows slowly but linearly with circuit depth (Fig. 7g and h).

Sparsity and Connectivity of Quantum Circuits
A quantum circuit can be characterized by its sparsity and connectivity of gates. In all of the above benchmarking, the random circuits are dense, i.e. for all time steps a gate acts on each qubit (e.g. the circuit in Fig. 2a), and the connections are local, which means that two-qubit gates only connect adjacent qubits. Below, we consider other types of circuits. The first is dense and global, and the second is sparse and local. In sparse circuits, each qubit does not always interact with a gate at every time step and Kraus operators are only inserted after gates (i.e. the noise is as sparse as the gates). In a globally connected circuit, two-qubit gates can connect any pair of qubits in a circuit. In this section we use the bit flip channel for benchmarking the LRET method on all the aforementioned circuit-types with p = 0.1% and = 10 −4 . In terms of time-cost, simulating different circuit types goes from harder to easier as: dense-global → dense-local → sparse-local. In Figure 8a one can see that the LRET method retains it's speed-up for all circuit types. The time difference between the sparse and the dense is because there are about twice as many gates in dense circuits than in sparse circuits. The time difference between the global and the local is because the set of all fixeddepth globally connected circuits spans a larger Hilbert space. Therefore, the rank of dense-global circuits grows slightly faster (Fig. 8c) and the simulations are slightly slower (Fig. 8a). Interestingly, in Fig. 8a one can observe that in FDM simulation, due to the number of gates and the memory allocation, the time-complexity grows at a similar rate as that of LRET with increasing circuit depth. Thus, speed-ups are consistently ∼ 100× faster with LRET than with FDM (Fig. 8b).

B. State Preparation for Quantum Chemistry
Quantum simulation is one of the most promising application areas for NISQ devices [7]. Algorithms have been developed to solve optimization and physical problems through quantum simulation approaches [60][61][62]. Here, we use our low rank noise simulator to run a circuit that generates generalized-amplitude W states [63] and Dicke states [64]. Although states of this kind are not hard to simulate classically, they are commonly used as subroutines for quantum information processing [65][66][67][68][69] and thus are of high interest for simulations. Ordinarily the parameters of the circuit are initialized according to the solution of a configuration interaction singles chemistry problem, but for benchmarking purposes we set the parameters randomly.
Unlike most of the circuits used in randomized benchmarking above, the circuit in Fig. 9a is sparse. Two ways to model noise channels are (1) placing Kraus operators only after each gate, and (2) after each qubit at every time-step. We call the former sparse noise, and the later dense noise. We use a depolarizing noise channel with p = 0.1% and = 10 −4 . Figs. 9b and d show that the LRET method has at least a 10× speed-up, and more than 100× when N is larger. The distortion caused by LRET is about 5% in sparse noise and 15% in dense noise ( Fig. 9c and e). In the 13-qubit circuits, the rank of the final density matrix in LRET is 0.4% and 1% of the full rank in the sparse and dense noise cases, respectively. This is because the rank of a density matrix in a circuit model with dense noise increases faster, and the higher order terms thrown out by eigenvalue truncation become more important. The quantum states produced by this state preparation are highly entangled. The fact that the low rank method gains an order of magnitude speed-up demonstrates its utility when applied to practical algo- rithms.

C. Grover's Search Algorithm and Amplitude Amplification
Amplitude amplification is a generalization of Grover's quantum search algorithm [70][71][72]. The algorithm aims to find a solution, x, such that f (x) = 1 if x is a solution and f (x) = 0 otherwise, implying x is not a solution. If x is a solution we say that x is good. If one was to randomly sample from a search space then p good is the probability of sampling a solution. For a classical search algorithm, it is expected that one would have to sample from the input space on the order of 1 p good times to find a solution; however, using amplitude amplification, one can expect to find a solution using in only O( 1 p good ) samples -a quadratic speed-up over the classical case [70,73].
In the algorithm, qubits are initialized to the a uniform superposition over the entire search space, where each basis state in the superposition corresponds to an element, x i ∈ X , where X is the search space of the problem. Next, a number of unitaries, known as Grover Iterates, act on the initialized state and boost the amplitudes of states that correspond to good solutions. The number of Grover Iterates to apply is given by π 4 1 p good . A full measurement of the resulting circuit yields states corresponding to good solutions with high probability.
In our implementation, we define the function f such that f (x) = 1, i.e. a good input, when the binary string representation, x, has a Hamming Weight, HW(x), less than or equal to 2.
We run amplitude amplification on circuits ranging from 9 to 13 qubits with depolarizing noise with p = 0.1% and = 10 −4 and compare LRET and FDM methods (Fig.  10). In the 13-qubit circuit, the rank of final density matrix of LRET is 0.5% of the full rank with a trade-off of 3.7% distortion. The similarity of the measurement results from both methods demonstrates the accuracy of LRET; in other words, that any information loss from eigenvalue truncation is insignificant when sampling from the resulting density matrix. Time-benchmarking of the two methods illustrates the speed-up provided by LRET, which continues to improve as the number of qubits increases.
We note that it is not the intent of this study to most accurately predict the results of running this experiment on a particular hardware specification. When running on quantum hardware, gates must be decomposed into the set of gates native to the particular hardware, whereas here we model each of the Grover Iterates as a single unitary. Rather, the aim of the study is to show that LRET retains its accuracy and computational advantage not only for the random circuits used for benchmarking but also for circuits that may have legitimate applications and which exhibit more structure than the randomized circuits.

IV. SUMMARY AND OUTLOOK
In this work we have demonstrated a method to efficiently simulate the evolution of mixed quantum states in noise channels through density matrix decompositions and low rank approximations. Iterative compression of the density matrices enable us to take the advantage of low rank evolution throughout the simulation of a noisy circuit with minimal error. Provided that the noise level of the individual channel is smaller than 0.1 ∼ 1%, the density matrices are found to be well approximated by low rank matrices. We provide an entropy argument in the appendix to support this finding. Under the low noise assumption, our results show that the algorithm provides orders of magnitude of speed-up with a small error, on the order of 10 −4 , in the probability distributions associated with the output density matrix. The performance in speed and in distortion is robust in different circuit structures and for varying levels of entanglement, since we make no assumption on the symmetry or the entanglement of the circuits.
We posit that our methodology can be naturally extended to work with observable quantities beyond that of simple measurement probabilities. Furthermore, given that our approach is based in linear algebraic primitives it is likely that the use of GPUs could further improve the performance. While our attention rests on the column space of density matrices, further speed-ups can be achieved by optimizing the representation with respect to the computational basis [14,74,75].

V. ACKNOWLEDGEMENTS
The authors thank Dr. Sean Weinberg, Juan I. Adame, Dr. Fabio Sanches and Dr. Adam Bouland for discussions on the theoretical ground of this work and providing useful references. RMP owns stock/options in QC Ware Corporation. It is found that a density matrix can be well approximated by a low rank matrix when the noise level is low. In this section, we provide an entropy analysis to show that this statement is true for the bit-flip and the depolarizing channels. The quantum circuits considered here have the same structure as Fig. 2. It is known that the noise in quantum computers can be well characterized by only one and two-qubits Kraus operators [5]. The result derived here holds for both one and two-qubits Kraus operators when the number of qubits N is large. For convenience, we consider only one-qubit Kraus operators on each qubit. Each Kraus operator K is assumed to have the form where K α are Kraus matrices and Nα α=1 p α = p. We assume p is small. In other words, the noise level in the circuit is small.
This provides an inequality for entropy change ∆S ≡ S(ρ (d+1) ) − S(ρ (d) ) < − α p α log 2 (p α ). The Kraus operators considered here are a direct product of one-qubit Kraus operators. For the case that there is a bit-flip channel on each qubit, p α follows the Bernoulli distribution with sequence length N and probability {1 − p, p}.
The upper bound of the entropy change is ∆S bit = − α p α log 2 (p α ) = −N p log 2 (p) − N (1 − p) log 2 (1 − p). The literature on quantum hardware has reported qubits with noise level of 0.1% [78][79][80][81][82][83]. When the noise is small, ∆S bit is well described by first order approximation in p, ∆S bit = N (p − p log 2 (p)). The effective dimensionality of the density matrix in this approximation is where γ bit ≡ ( 2 p ) p − 1 0.008 when p = 0.1%. The effective dimensionality grows approximately linearly with number of qubits.
For the case that there is a depolarizing channel on each qubit, p α follows the categorical distribution with sequence length N and probability {1 − p, p 3 , p 3 , p 3 }. The upper bound of the entropy change is ∆S dep = − α p α log 2 (p α ) = −N p log 2 ( p 3 ) − N (1 − p) log 2 (1 − p) or ∆S dep = N (p−p log 2 ( p 3 )) in the first order approximation. The effective dimensionality of the density matrix in this approximation is where γ dep ≡ ( 6 p ) p − 1 0.009 when p = 0.1%. The effective dimensionality grows approximately linearly with number of qubits. These results suggest that, while the size of the density matrix grows exponentially with N , the effective rank of the density matrix grows linearly with N in a good approximation. The linearly approximation is good with error < 5% up to N = 130 qubits.

Appendix B: Low Rank Eigendecomposition
This section provides a theorem for finding eigenvalues efficiently without forming a density matrix explicitly. Theorem 1. Let B be a n × m matrix, where n > m. BB † and B † B share m eigenvalues. Furthermore, if u is a eigenvector of B † B, Bu is the eigenvector of BB † that shares the same eigenvalues.
Proof. The matrix B has the singular value decomposition where U is a n × m matrix with orthonormal columns, and V is a m × m orthonormal matrix. From B, we can construct two Hermitian matrices