Introduction

The time evolution of quantum systems is a century old subject that has been extensively studied for both fundamental and practical purposes. Open quantum dynamics is an important subfield of quantum physics that studies the time evolution of a system interacting with an environment1. Because the environment is usually too large to be treated exactly, in open quantum dynamics we often make approximations by averaging out the environment’s effect on the system. The resultant time evolution of the system density matrix is non-unitary and often governed by a master equation. The idea of simulating quantum systems with quantum algorithms was first proposed by Feynman2 and has received massive attention in recent years3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 for its promise of outperforming the best available classical algorithms. However relatively few studies18,19,20,21,22,23 have been done to develop quantum algorithms for open quantum dynamics despite its importance. A key difficulty is the evolution of an open quantum system is often non-unitary, while quantum algorithms are mostly realized by unitary quantum gates. An early study18 tackled this problem by including the environment in the quantum simulation process therefore making the evolution unitary. Focused on Markovian dynamics, their procedure required a reset on the environment for every time step, which can become expensive if the system becomes entangled with the environment or the evolution time is large. It is known that any non-unitary quantum operation can be made into a unitary one by the Stinespring dilation theorem24. However, due to the large increase of the dimension of the Hilbert space, the computational cost required by naïve application of Stinespring dilation to the quantum operation can be prohibitive for actual implementation on a quantum computing device. Perhaps for this reason, most algorithms developed (see e.g. ref. 19,20) so far rely on the knowledge of the decomposition of the quantum channel which may not be generally available for a large system without costly computational efforts. So far as we know a general quantum algorithm to simulate an arbitrary quantum channel for a general density matrix with minimal resource has not been proposed and demonstrated. In this study we propose and demonstrate such a quantum algorithm utilizing the Sz.-Nagy dilation theorem – being a variation of the Stinespring dilation theorem the Sz.-Nagy dilation requires much smaller dimension increase and can save significant computational resources. Without assuming any specific property of the quantum channel, we work with the most general form of the time evolution for a density matrix – the operator sum representation:

$$\rho (t)=\sum _{k}{\rho }_{k}(t)=\sum _{k}{{\bf{M}}}_{k}\rho {{\bf{M}}}_{k}^{\dagger }$$
(1)

where ρ(t) is the system density matrix at time t, ρ is the initial system density matrix, and Mk’s are the Kraus operators that satisfy:

$$\sum _{k}{{\bf{M}}}_{k}^{\dagger }{{\bf{M}}}_{k}={\bf{I}}$$
(2)

It is also common that the time evolution of ρ(t) is described by a master equation:

$$\dot{\rho }(t)={\mathcal{L}}\rho (t)$$
(3)

where the time derivative of ρ(t) is given by the superoperator \({\mathcal{L}}\) applying to ρ(t) itself. It is known that any master equation of the form of Eq. (3) can be converted into the form of Eq. (1) 25,26, affirming the generality of Eq. (1). An example of converting a widely used type of master equation – the Lindblad equation – into the operator sum representation has been provided in ref. 27. In this work we focus on the more general operator sum representation. Our method starts with an initial ρ expressed by a sum of different pure quantum states weighted by the associated probabilities:

$$\rho =\sum _{i}{p}_{i}|{\phi }_{i}\rangle \langle {\phi }_{i}|$$
(4)

where pi is the probability of finding the state |ϕi〉 in the mixture with the condition \(\sum _{i}{p}_{i}=1\). Note here the different |ϕi〉’s are not necessarily orthogonal to each other, and Eq. (4) can be understood either as a knowledge of the initial physical composition of the system that is easily available from system preparation, or as a diagonal decomposition of ρ that requires additional resource to compute. We show in the following that the quantum algorithm proposed can simulate the time evolution of ρ(t). The outputs of the algorithm carry the full information of ρ(t) that can be extracted by quantum tomography28. However, we remark that the most important physical information carried by ρ(t)–e.g. the populations of different quantum states and the expectation value of an observable 〈O〉 = Tr(Oρ(t))–can be obtained without quantum tomography but by projection measurements instead, which greatly reduces the resources needed. This is realized by exploiting the positive-semidefiniteness of ρ(t). We then estimate the gate complexity of our method to be significantly lower than the conventional Stinespring dilation. Finally we demonstrate the application of the quantum algorithm to an amplitude damping channel with implementation on the IBM Qiskit29 quantum simulator and the IBM Q 5 Tenerife30 quantum device.

Results

Theory for the algorithm

In this section we present the quantum algorithm that evolves ρ(t) with the initial ρ given in the form \(\rho =\sum _{i}{p}_{i}|{\phi }_{i}\rangle \langle {\phi }_{i}|\) in Eq. (4), which represents a knowledge of the initial physical composition of the system. Here we prepare each |ϕi〉 as an input state vi in a given basis and want to evolve:

$$|{\phi }_{ik}(t)\rangle ={{\bf{M}}}_{k}{{\bf{v}}}_{i}={({c}_{ik1},{c}_{ik2},\ldots ,{c}_{ikn})}^{T}$$
(5)

First note that each Kraus operator Mk is a contraction such that it can be dilated into a unitary matrix. An operator A is a contraction if it shrinks or preserves the norm of any vector such that the operator norm \(\Vert {\bf{A}}\Vert =\sup \frac{\Vert {\bf{A}}{\bf{v}}\Vert }{\Vert {\bf{v}}\Vert }\le 1\). By Eq. (2) we have \(\sum _{k}{{\bf{M}}}_{k}^{\dagger }{{\bf{M}}}_{k}={\bf{I}}\), which implies Mk is a contraction (see the supplementary information (SI) for the proof). By the Sz.-Nagy dilation theorem31,32, any contraction A of a Hilbert space H has a corresponding unitary operator UA in a larger Hilbert space K such that:

$${{\bf{A}}}^{n}={{\bf{P}}}_{H}{{\bf{U}}}_{{\bf{A}}}^{n}{{\bf{P}}}_{H},\,n\le N$$
(6)

where PH is the projection operator into space H. The physical meaning of Eq. (6) is that the effect of a contraction A applied up to N times on a smaller space H can be replicated by a unitary UA applied up to N times on a larger space K, given the input vector lies entirely in H and the output vector is projected into H. For the purpose of creating a quantum circuit the Sz.-Nagy dilation theorem allows us to simulate the effect of any non-unitary matrix by a unitary quantum gate, because every operator on a finite dimensional space is bounded and therefore can be made into a contraction which has a unitary dilation. The Sz.-Nagy theorem also guarantees the existence of a minimal dilation in the sense that the space K has the smallest dimension to achieve Eq. (6). An example of a minimal unitary dilation of A with N = 1 is:

$${{\bf{U}}}_{{\bf{A}}}=(\begin{array}{cc}{\bf{A}} & {{\bf{D}}}_{{{\bf{A}}}^{\dagger }}\\ {{\bf{D}}}_{{\bf{A}}} & -{{\bf{A}}}^{\dagger }\end{array})$$
(7)

where \({{\bf{D}}}_{{\bf{A}}}=\sqrt{{\bf{I}}-{{\bf{A}}}^{\dagger }{\bf{A}}}\) is called the defect operator of A. We can easily verify that UA is unitary and \({\bf{A}}={{\bf{P}}}_{H}{{\bf{U}}}_{{\bf{A}}}{{\bf{P}}}_{H}\). However if we want to apply \({{\bf{A}}}^{2}={{\bf{P}}}_{H}{{\bf{U}}}_{{\bf{A}}}^{2}{{\bf{P}}}_{H}\), or \({\bf{B}}{\bf{A}}={{\bf{P}}}_{H}{{\bf{U}}}_{{\bf{B}}}{{\bf{U}}}_{{\bf{A}}}{{\bf{P}}}_{H}\) where UB is a unitary dilation of B, then we need minimal unitary dilations with N = 2:

$${{\bf{U}}}_{{\bf{A}}}=(\begin{array}{ccc}{\bf{A}} & 0 & {{\bf{D}}}_{{{\bf{A}}}^{\dagger }}\\ {{\bf{D}}}_{{\bf{A}}} & 0 & -{{\bf{A}}}^{\dagger }\\ 0 & {\bf{I}} & 0\end{array}),\,{{\bf{U}}}_{{\bf{B}}}=(\begin{array}{ccc}{\bf{B}} & 0 & {{\bf{D}}}_{{{\bf{B}}}^{\dagger }}\\ {{\bf{D}}}_{{\bf{B}}} & 0 & -{{\bf{B}}}^{\dagger }\\ 0 & {\bf{I}} & 0\end{array})$$
(8)

We see that the number N in Eq. (6) is an important parameter that defines both the form and the applicability of the minimal unitary dilation. In the following we will refer to a minimal unitary dilation with a given N an N-dilation31. A general rule is that to simulate the effect of N contractions multiplying successively, we need to convert all of them to N-dilations. Going back to our original goal of simulating \({{\bf{M}}}_{k}{{\bf{v}}}_{i}\) with unitary gates we construct a 1-dilation \({{\bf{U}}}_{{\bf{A}}}\) of the form in Eq. (7) with \({\bf{A}}={{\bf{M}}}_{k}\) and evolve:

$$|{\phi }_{ik}(t)\rangle ={{\bf{M}}}_{k}{{\bf{v}}}_{i}\mathop{\to }\limits^{<mml:mpadded xmlns:xlink="http://www.w3.org/1999/xlink" width="+0.611em" lspace="0.278em" voffset=".15em">\text{unitary dilation}</mml:mpadded>}{{\bf{U}}}_{{{\bf{M}}}_{k}}{({{\bf{v}}}_{i}^{T},0,...,0)}^{T}$$
(9)

If \({{\bf{M}}}_{k}\) has the dimension \(n\) by \(n\), then the 1-dilation \({{\bf{U}}}_{{{\bf{M}}}_{k}}\) is \(2n\) by \(2n\). Now \({{\bf{U}}}_{{{\bf{M}}}_{k}}\) can be further decomposed into sequences of two-level unitary gates with a procedure illustrated in ref. 3,33. The two-level unitary gates can be used as elementary gates in an optical beamsplitter setup33,34,35,36 and in the following we use the number of two-level unitary gates in the decomposition of a unitary gate to represent the gate complexity. Generally the number of two-level unitary gates needed to decompose a unitary gate is equal to the number of non-zero elements in the lower-triangular part of the gate3,33. For each \({{\bf{U}}}_{{{\bf{M}}}_{k}}\) of the form in Eq. (7) with \({\bf{A}}={{\bf{M}}}_{k}\), we have \(\frac{{n}^{2}-n}{2}\) non-zero elements from \({{\bf{M}}}_{k}\), \({n}^{2}\) non-zero elements from \({{\bf{D}}}_{{{\bf{M}}}_{k}}\), \(\frac{{n}^{2}-n}{2}\) non-zero elements from \(-{{\bf{M}}}_{k}^{\dagger }\), and the total gate count is \(2{n}^{2}-n\) for each \(k\) and \(i\). The classical complexity of evaluating \(|{\phi }_{ik}(t)\rangle ={{\bf{M}}}_{k}{{\bf{v}}}_{i}\) is incidentally also \(2{n}^{2}-n\) for each \(k\) and \(i\), counting all the multiplications and additions needed for a matrix multiplying a vector. Now we have the gate count for a single Kraus operator on a single pure state \({{\bf{M}}}_{k}{{\bf{v}}}_{i}\), the maximum number3 of Kraus operators in Eq. (1) is \(m={n}^{2}\), and if the number of pure states in the physical composition Eq. (4) is comparable to \(n\), we have totally \(2{n}^{5}-{n}^{4}\) two-level \(2n\) by \(2n\) gates. As a comparison, the conventional Stinespring dilation converts the whole quantum operation \({\mathcal{E}}(\rho )=\sum _{k}{{\bf{M}}}_{k}\rho {{\bf{M}}}_{k}^{\dagger }\) into a unitary operation \({\bf{U}}(\rho \otimes |e\rangle \langle e|){{\bf{U}}}^{\dagger }\) by considering the tensor product space of the initial \(\rho \) and an auxiliary environment3. By results from ref. 3, the dimension of \({\bf{U}}\) is given by \(n\cdot m\) where \(m\) is the total number of Kraus operators \({{\bf{M}}}_{k}\) in the sum \(\sum _{k}{{\bf{M}}}_{k}\rho {{\bf{M}}}_{k}^{\dagger }\). In the most general case \(m={n}^{2}\), \({\bf{U}}\) is \({n}^{3}\) by \({n}^{3}\) which requires \(\frac{{n}^{6}-{n}^{3}}{2}\) two-level \({n}^{3}\) by \({n}^{3}\) unitary gates for the gate decomposition3. Therefore comparing the total cost for the entire quantum operation our method has a moderate advantage (O(n5) vs. O(n6)) over the conventional Stinespring method. In addition, the complexity for realizing a \(2n\) by \(2n\) two-level unitary is significantly smaller than that for realizing a \({n}^{3}\) by \({n}^{3}\) two-level unitary on conventional quantum computers using 1-qubit and 2-qubit elementary gates, due to the complication involved in multi-qubit controlled gates3. Finally, the greatest advantage of our method is the ability to break the whole quantum operation into \({n}^{3}\) processes (each \({{\bf{M}}}_{k}{{\bf{v}}}_{i}\) costing \(2{n}^{2}-n\) two-level unitaries) that can be simulated in parallel. Because all current quantum computing platforms have a limit on the maximum circuit depth due to gate errors, our method can enable the simulation of more complicated open quantum systems by replacing a single O(n6) process with n3 parallel O(n2) processes.

Now to put the pieces back together, the full evolved density matrix can be assembled by \(\rho (t)=\sum _{ik}{p}_{i}\cdot |{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|\) and measured by quantum tomography28. However, we remark that the most important physical information such as the populations of different states and the expectation value of an observable can be extracted by performing projection measurements on the output \(|{\phi }_{ik}(t)\rangle \) without actually determining the state of \(|{\phi }_{ik}(t)\rangle \).This means that we do not need the phases of the coefficients associated with each basis state, but only the absolute values, thus saving considerable computational costs. To get the populations of states in the current basis, note the diagonal vector of \(|{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|\) is:

$$diag(|{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|)={({|{c}_{ik1}|}^{2},{|{c}_{ik2}|}^{2},\ldots ,{|{c}_{ikn}|}^{2})}^{T}$$
(10)

Equation (10) means we can obtain the diagonal element \({|{c}_{ikj}|}^{2}\) of \(|{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|\) by applying a projection measurement on the \({j}^{th}\) entry in the first \(n\)-dimensional subspace of \({{\bf{U}}}_{{{\bf{M}}}_{k}}{({{\bf{v}}}_{i}^{T},0,\ldots ,0)}^{T}\). Using an optical setup such as in ref. 36 the probability of measuring each entry in \({{\bf{U}}}_{{{\bf{M}}}_{k}}{({{\bf{v}}}_{i}^{T},0,\ldots ,0)}^{T}\) can be efficiently obtained by recording the photon distribution at the output of the optical modes. Adding the results through \(k\) and \(i\) gives the diagonal elements of \(\rho (t)\):

$$diag(\rho (t))=\sum _{ik}{p}_{i}\cdot diag(|{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|)$$
(11)

which gives the populations in the current basis. Although the off-diagonal elements of \(\rho (t)\) cannot be directly obtained without quantum tomography, they are nonetheless carried by \(|{\phi }_{ik}(t)\rangle \) and can become physically important. For example, if we want to get the populations in another basis, a unitary basis transformation can be applied to each \(|{\phi }_{ik}(t)\rangle \) before measuring the diagonal elements:

$$\begin{array}{cc} & {\bf{T}}|{\phi }_{ik}(t)\rangle ={\bf{T}}{{\bf{M}}}_{k}{{\bf{v}}}_{i}\mathop{\to }\limits^{<mml:mpadded xmlns:xlink="http://www.w3.org/1999/xlink" width="+0.611em" lspace="0.278em" voffset=".15em">\text{unitary dilation}</mml:mpadded>}(\begin{array}{cc}{\bf{T}} & 0\\ 0 & {\bf{I}}\end{array}){{\bf{U}}}_{{{\bf{M}}}_{k}}{({{\bf{v}}}_{i}^{T},0,...,0)}^{T}\\ & diag({\bf{T}}\rho (t){{\bf{T}}}^{\dagger })=\sum _{ik}{p}_{i}\cdot diag({\bf{T}}|{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|{{\bf{T}}}^{\dagger })\end{array}$$
(12)

The additional unitary \({\bf{T}}\) applied to \(|{\phi }_{ik}(t)\rangle \) requires no dilation and increases the quantum gate count by \(\frac{{n}^{2}-n}{2}\) (non-zero elements in the lower-triangular part of \({\bf{T}}\)) to a total of \(\frac{5{n}^{2}-3n}{2}\) for each \(k\) and \(i\). The classical complexity of evaluating \({\bf{T}}{{\bf{M}}}_{k}{{\bf{v}}}_{i}\) is doubled from \(2{n}^{2}-n\) to \(4{n}^{2}-2n\) for each \(k\) and \(i\). Note here the quantum algorithm outperforms the classical one by taking advantage of the unitarity of \({\bf{T}}\).

Next to evaluate the expectation value of an observable \(\langle {\bf{O}}\rangle =Tr({\bf{O}}\rho (t))\), we recognize \({\bf{O}}\rho (t)\) is not always positive-semidefinite and hence additional processing must be done before the trace can be obtained from projection measurements on the output quantum state. The idea here is to see the operator norm is bounded by the Hilbert-Schmidt norm: \(\Vert {\bf{O}}\Vert \le {\Vert {\bf{O}}\Vert }_{HS}\), such that we define:

$$\tilde{{\bf{O}}}=\frac{{\bf{O}}+{\bf{I}}{\Vert {\bf{O}}\Vert }_{HS}}{2{\Vert {\bf{O}}\Vert }_{HS}}$$
(13)

where Õ is always a contraction and positive-semidefinite (see the SI for a detailed proof) so we can apply the Cholesky decomposition37: \(\tilde{{\bf{O}}}={\bf{L}}{{\bf{L}}}^{\dagger }\), where \({\bf{L}}\) is a lower triangular matrix. Note we could have defined \(\tilde{{\bf{O}}}\) with \(\Vert {\bf{O}}\Vert \) in place of \({\Vert {\bf{O}}\Vert }_{HS}\) in Eq. (13), but \({\Vert {\bf{O}}\Vert }_{HS}\) requires fewer arithmetic steps to calculate. Now we have:

$$\langle \tilde{{\bf{O}}}\rangle =Tr(\tilde{{\bf{O}}}\rho (t))=Tr({\bf{L}}{{\bf{L}}}^{\dagger }\rho (t))=Tr({{\bf{L}}}^{\dagger }\rho (t){\bf{L}})=\sum _{k}Tr({{\bf{L}}}^{\dagger }{\rho }_{k}(t){\bf{L}})$$
(14)

where each \({{\bf{L}}}^{\dagger }{\rho }_{k}(t){\bf{L}}\) is positive-semidefinite. \({{\bf{L}}}^{\dagger }\) is obviously a contraction because Õ is a contraction. Now evolve \({{\bf{L}}}^{\dagger }|{\phi }_{ik}(t)\rangle \) with two 2-dilations of the form in Eq. (8) with \({\bf{A}}={{\bf{M}}}_{k}\) and \({\bf{B}}={{\bf{L}}}^{\dagger }\):

$${{\bf{L}}}^{\dagger }|{\phi }_{ik}(t)\rangle ={{\bf{L}}}^{\dagger }{{\bf{M}}}_{k}{{\bf{v}}}_{i}\mathop{\to }\limits^{<mml:mpadded xmlns:xlink="http://www.w3.org/1999/xlink" width="+0.611em" lspace="0.278em" voffset=".15em">\text{unitary dilation}</mml:mpadded>}{{\bf{U}}}_{{{\bf{L}}}^{\dagger }}{{\bf{U}}}_{{{\bf{M}}}_{k}}{({{\bf{v}}}_{i}^{T},0,...,0)}^{T}$$
(15)

and we can evaluate 〈Õ〉 by:

$$\langle \tilde{{\bf{O}}}\rangle =Tr(\tilde{{\bf{O}}}\rho (t))=\sum _{i,k}Tr({p}_{i}\cdot {{\bf{L}}}^{\dagger }|{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|{\bf{L}})$$
(16)

where the trace of \({{\bf{L}}}^{\dagger }|{\phi }_{ik}(t)\rangle \langle {\phi }_{ik}(t)|{\bf{L}}\) can be obtained by projection measurements in a way similar to what has been explained for Eq. (10). The difference here is that we do not need to measure individual diagonal elements and sum them together for the trace, but instead can measure the total probability of projecting into the first \(n\)-dimensional space of \({{\bf{U}}}_{{{\bf{L}}}^{\dagger }}{{\bf{U}}}_{{{\bf{M}}}_{k}}{({{\bf{v}}}_{i}^{T},0,\ldots ,0)}^{T}\). This can potentially save measurement costs by reducing the number of inquiries needed on the output vector. Finally 〈O〉 is calculated with:

$$\langle {\bf{O}}\rangle =Tr({\bf{O}}\rho (t))=Tr((2{\Vert {\bf{O}}\Vert }_{HS}\tilde{{\bf{O}}}-{\bf{I}}{\Vert {\bf{O}}\Vert }_{HS})\rho (t))=2{\Vert {\bf{O}}\Vert }_{HS}Tr(\tilde{{\bf{O}}}\rho (t))-{\Vert {\bf{O}}\Vert }_{HS}$$
(17)

where we have successfully obtained \(\langle {\bf{O}}\rangle \) for the original observable.

The \({{\bf{L}}}^{\dagger }\) gate (\({{\bf{L}}}^{\dagger }\) is upper triangular requiring reduced number of two-level unitaries for the decomposition) plus the additional level of dilation for \({{\bf{M}}}_{k}\) increases the quantum gate count to \(\frac{5{n}^{2}+3n}{2}\) for each \(k\) and \(i\). A classical overhead (independent from the \(k\) and \(i\) counts) cost of \(2{n}^{2}-1\) for \({\Vert {\bf{O}}\Vert }_{HS}\) and \(\frac{{n}^{3}}{3}\) for the Cholesky decomposition37 should also be counted towards the total cost of the quantum algorithm. On the other hand the classical complexity of evaluating \({{\bf{L}}}^{\dagger }{{\bf{M}}}_{k}{{\bf{v}}}_{i}\) is \(3{n}^{2}-n\) for each \(k\) and \(i\) (taking into account that \({{\bf{L}}}^{\dagger }\) is upper triangular), plus an overhead of \(\frac{{n}^{3}}{3}+2{n}^{2}-1\).

Complexity analysis

So far in evaluating the complexity of our method, each Kraus operator \({{\bf{M}}}_{k}\) is assumed to be completely general, and thus the number of 2-level unitary gates required to realize \({{\bf{U}}}_{{{\bf{M}}}_{k}}\) is polynomial in the system dimension n (O(n2)). If using a conventional qubit-based hardware, \(q\) qubits can represent a \(n={2}^{q}\)-dimensional system, and the number of 2-level unitaries is exponential in the qubit number \(q\) (O(4q)). We note that this exponential scaling in the qubit number is necessary if \({{\bf{M}}}_{k}\) is a general matrix without any special properties, because approximating arbitrary unitary gates is generically exponential3. That said, in the case when \({{\bf{M}}}_{k}\) is not a full matrix with arbitrary entries (\({{\bf{M}}}_{k}\) is sparse), the algorithm can become polynomial in \(q\). This happens in many realistic settings (like the amplitude damping channel shown below) when each \({{\bf{M}}}_{k}\) only mediates the transition process between \(l\) quantum states, where \(l < n\) is a constant of \(n\) and \(q\). This effectively reduces the dimension of the system from \(n\) to \(l\) for each \({{\bf{M}}}_{k}\) and the number of 2-level unitaries becomes O(l2), without a dependence on \(n\) and \(q\). Note that we still need to run all the \(m\) number of \({{\bf{M}}}_{k}\)’s through all the initial composite states \(|{\phi }_{i}\rangle \)’s, but this can be done in parallel and the complexity of the circuit for each \({{\bf{M}}}_{k}\) is greatly reduced as compared to the conventional Stinespring dilation. If we stop the analysis on the level of 2-level unitary gates the complexity for a sparse \({{\bf{M}}}_{k}\) is constant in \(n\) and \(q\). This desirable result is possible if using an optical beamsplitter setup33,34,35,36 to implement the 2-level unitaries as elementary gates. If a conventional qubit-based quantum computer is used, the 2-level unitries need to be further decomposed into 1-qubit and 2-qubit elementary gates, in which case a Gray code sequence and a multi-control gate sequence are constructed with the complexity logarithmic in \(n\) (O(log2 n)) and polynomial in \(q\) (O(q2))3.

When implemented on a qubit-based quantum computer, our method also requires ancillary qubits for the Sz.-Nagy dilation process that enlarges the Hilbert space. For the simple evolution described in Eqs. (9) and (12) we need to increase the dimension by two-fold and one ancillary qubit is needed. For the observable evaluation described in Eq. (15) we need to increase the dimension by three-fold and two ancillary qubits are needed. Note that our use of the ancillary qubits is very simple as they do not scale with the system size or the complexity of the quantum operation.

Application to the amplitude damping channel

In this section we use a quantum channel model to demonstrate the application of the proposed method. The amplitude-damping channel models the spontaneous emission of a 2-level atom. The Lindblad master equation for this model is:

$$\dot{\rho }(t)=\gamma ({\sigma }^{+}\rho (t){\sigma }^{-}-\frac{1}{2}\{{\sigma }^{-}{\sigma }^{+},\rho (t)\})$$
(18)

where \(\gamma \) is the spontaneous emission rate, \({\sigma }^{+}=|0\rangle \langle 1|\) is the Pauli raising operator that mediates the transition from the excited state to the ground state, and \({\sigma }^{-}={({\sigma }^{+})}^{\dagger }\). In the operator sum representation:

$$\begin{array}{ccc}\rho (t) & = & {{\bf{M}}}_{0}\rho {{\bf{M}}}_{0}^{\dagger }+{{\bf{M}}}_{1}\rho {{\bf{M}}}_{1}^{\dagger }\\ {{\bf{M}}}_{0} & = & \frac{1+\sqrt{{e}^{-\gamma t}}}{2}{\bf{I}}+\frac{1-\sqrt{{e}^{-\gamma t}}}{2}{\sigma }_{z}=(\begin{array}{cc}1 & 0\\ 0 & \sqrt{{e}^{-\gamma t}}\end{array})\\ {{\bf{M}}}_{1} & = & \sqrt{1-{e}^{-\gamma t}}{\sigma }^{+}=(\begin{array}{cc}0 & \sqrt{1-{e}^{-\gamma t}}\\ 0 & 0\end{array})\end{array}$$
(19)

To calculate the populations in the initial basis \(\{|0\rangle ,|1\rangle \}\) we can construct \({{\bf{U}}}_{{{\bf{M}}}_{0}}\) and \({{\bf{U}}}_{{{\bf{M}}}_{1}}\) of the 1-dilation form in Eq. (7) with \({\bf{A}}={{\bf{M}}}_{k}\) and \({{\bf{D}}}_{{{\bf{M}}}_{k}}=\sqrt{{\bf{I}}-{{\bf{M}}}_{k}^{\dagger }{{\bf{M}}}_{k}}\):

$${{\bf{D}}}_{{{\bf{M}}}_{0}}=(\begin{array}{cc}0 & 0\\ 0 & \sqrt{1-{e}^{-\gamma t}}\end{array}),\,{{\bf{D}}}_{{{\bf{M}}}_{1}}=(\begin{array}{cc}1 & 0\\ 0 & \sqrt{{e}^{-\gamma t}}\end{array})$$
(20)

Using an arbitrary initial \(\rho =\frac{1}{4}(\begin{array}{cc}1 & 1\\ 1 & 3\end{array})\) and assuming the physical composition \(\rho =\frac{1}{2}(|1\rangle \langle 1|+|+\rangle \langle +|)\) is known, the input states are:

$$|{\phi }_{1}\rangle =|1\rangle \to {{\bf{v}}}_{1}={(0,1,\mathop{\overbrace{0,...,0}}\limits^{m})}^{T}|{\phi }_{2}\rangle =|+\rangle \to {{\bf{v}}}_{2}=\frac{1}{\sqrt{2}}{(1,1,\mathop{\overbrace{0,...,0}}\limits^{m})}^{T}$$
(21)

where m = 2 matching the size of the vectors with the dilation \({{\bf{U}}}_{{{\bf{M}}}_{k}}\). We set \(\gamma =1.52\times {10}^{9}{{\rm{s}}}^{-1}\) (typical nanosecond lifetime), numerically calculate \({{\bf{U}}}_{{{\bf{M}}}_{k}}{{\bf{v}}}_{i}\) from \(t=0\) to t = 1000 ps with a time step of 10 picosecond, and obtain the populations of the ground and excited states from the first two entries of \({{\bf{U}}}_{{{\bf{M}}}_{k}}{{\bf{v}}}_{i}\). The results are shown as the smooth lines in Fig. 1.

Figure 1
figure 1

Showing the populations of the ground and excited states for the amplitude damping model. The smooth lines are obtained by classical numerical calculations of the output vectors. These agree exactly with analytic results and are used as benchmarks. The crosses are obtained by the IBM Qiskit simulator. The dots are obtained by the IBM Q 5 Tenerife device. The quantum circuits include 2 qubits and on average 13 elementary gates (see Fig. 4 for an example and the SI for all the circuits).

To calculate the populations in another basis \(\{|+\rangle ,|-\rangle \}\) where \(|\pm \rangle =\frac{1}{\sqrt{2}}(|0\rangle \pm |1\rangle )\), we need the transformation matrix \({\bf{T}}=\frac{1}{\sqrt{2}}(\begin{array}{cc}1 & 1\\ 1 & -1\end{array})\). We numerically calculate \((\begin{array}{cc}{\bf{T}} & 0\\ 0 & {\bf{I}}\end{array}){{\bf{U}}}_{{{\bf{M}}}_{k}}{{\bf{v}}}_{i}\) and obtain the populations of the \(|\pm \rangle \) states shown as the smooth lines in Fig. 2.

Figure 2
figure 2

Showing the populations of the \(|+\rangle \) and \(|-\rangle \) states for the amplitude damping model. The smooth lines are obtained by classical numerical calculations of the output vectors. These agree exactly with analytic results and are used as benchmarks. The crosses are obtained by the IBM Qiskit simulator. The dots are obtained by the IBM Q 5 Tenerife device. The quantum circuits include 2 qubits and on average 30 elementary gates (see the SI for the circuits).

Now we evaluate the expectation value of an observable \(\langle {\bf{O}}\rangle \) for \({\bf{O}}=(\begin{array}{cc}-2 & 0.5\\ 0.5 & 1\end{array})\) as an example. With \({\Vert {\bf{O}}\Vert }_{HS}=\)\(\frac{\sqrt{22}}{2}\approx 2.35\), we define \(\mathop{{\bf{O}}}\limits^{ \sim }=\frac{{\bf{O}}+{\bf{I}}{\Vert {\bf{O}}\Vert }_{HS}}{2{\Vert {\bf{O}}\Vert }_{HS}}\approx (\begin{array}{cc}0.0740 & 0.107\\ 0.107 & 0.713\end{array})\) and find \({\bf{L}}\approx (\begin{array}{cc}0.271 & 0\\ 0.393 & 0.748\end{array})\) through Cholesky decomposition \(\tilde{{\bf{O}}}={\bf{L}}{{\bf{L}}}^{\dagger }\). Next we construct \({{\bf{U}}}_{{{\bf{L}}}^{\dagger }}\) and \({{\bf{U}}}_{{{\bf{M}}}_{k}}\) of the 2-dilation form in Eq. (8) and apply them to the initial state \({{\bf{v}}}_{i}\) in the form of Eq. (21) with \(m=4\). Numerically calculating the output vector will give us \(\langle {\bf{O}}\rangle \) by Eqs. (16) and (17). The results are shown as the smooth line in Fig. 3:

Figure 3
figure 3

Showing the expectation values \(\langle {\bf{O}}\rangle \). The smooth line is obtained by classical numerical calculations of the output vectors. This agrees exactly with analytic results and is used as a benchmark. The crosses are obtained by the IBM Qiskit simulator. The quantum circuits include 3 qubits and on average 184 elementary gates (see the SI for the circuits). Due to the large number of gates required the quantum device is not used for these results.

The overall simplicity of the unitary dilation gates used in our algorithm allows us to further demonstrate it using the IBM Qiskit29 quantum simulator and the IBM Q 5 Tenerife30 quantum device. So far we have been using the 2-level unitaries obtained from the \({{\bf{U}}}_{{{\bf{M}}}_{0}}\) and \({{\bf{U}}}_{{{\bf{M}}}_{1}}\) gates as elementary gates in our complexity evaluation. This is possible if we use an optical platform for quantum computation involving beamsplitters33,35. To implement the algorithm on a conventional quantum platform such as the IBM simulator and devices, we further decompose the 2-level unitaries into elementary 1-qubit and 2-qubit gates. The constructed circuits require 2 qubits and on average 13 elementary gates for the basic \({{\bf{U}}}_{{{\bf{M}}}_{k}}{{\bf{v}}}_{i}\) evolution and 30 elementary gates for the basis transformation \((\begin{array}{cc}{\bf{T}} & 0\\ 0 & {\bf{I}}\end{array}){{\bf{U}}}_{{{\bf{M}}}_{k}}{{\bf{v}}}_{i}\). An example of the circuit for \({{\bf{U}}}_{{{\bf{M}}}_{0}}{{\bf{v}}}_{1}\) is shown in Fig. 4, and a full list of the circuits can be found in the SI.

Figure 4
figure 4

Showing the quantum circuit for \({{\bf{U}}}_{{{\bf{M}}}_{0}}{{\bf{v}}}_{1}\) as used in the IBM simulator and quantum device. For a full list of the quantum circuits please see the SI.

We implement these circuits on both the simulator and the quantum devices and show the results as the crosses (simulator) and dots (device) in Figs. 1 and 2. To obtain \(\langle \tilde{{\bf{O}}}\rangle \) we construct a circuit of 3 qubits with an average of 182 elementary gates for the \({{\bf{U}}}_{{{\bf{L}}}^{\dagger }}{{\bf{U}}}_{{{\bf{M}}}_{k}}{{\bf{v}}}_{i}\) operation (see the SI). The increase in the number of gates is due to the increased size of the \({{\bf{U}}}_{{{\bf{L}}}^{\dagger }}\) and \({{\bf{U}}}_{{{\bf{M}}}_{k}}\) matrices and the associated 2-level unitaries. As mentioned earlier the decomposition of 2-level unitaries into 1-qubit and 2-qubit elementary gates scales as O(log2 n). The large number of gates for this circuit prevents us from running it on the quantum device so that only the simulator results are shown as the crosses in Fig. 3. On the other hand the 2-level unitaries can be more easily implemented with a multiport photonic device as demonstrated in refs. 35,36. We will seek to demonstrate our methods on such a preferred device in a future study.

In Figs. 1, 2 and 3, the numerical results (smooth lines) agree exactly with analytic results and are used as benchmarks for the simulator and device results. The simulator results (crosses) fit the benchmark well while including the probabilistic error from the projection measurements. The quantum device results (dots) fit the benchmark reasonably well considering all experimental errors from gate fault, qubit decoherence, and measurement. These results demonstrate the ability of the proposed algorithm to simulate the time evolution of a density matrix and evaluate physical observables from the outputs.

Discussion

In this work we have presented a quantum algorithm for evolving the density matrix \(\rho (t)=\sum _{k}{{\bf{M}}}_{k}\rho {{\bf{M}}}_{k}^{\dagger }\) and extracting physical information from the output. The method takes each physical composition \(|{\phi }_{i}\rangle \) of \(\rho =\sum _{i}{p}_{i}|{\phi }_{i}\rangle \langle {\phi }_{i}|\) as the input and evolves it through minimal Sz.-Nagy dilations of the Kraus operators \({{\bf{M}}}_{k}\). The input vector has the base length \(n\) (plus additional zeros matching the dimension of the evolution matrices) and various realizations of the time evolution have the quantum gate count of O(n2) for each \(k\) and \(i\), which is a significant improvement over the O(n6) scaling of the conventional Stinespring dilation. In cases when \({{\bf{M}}}_{k}\) involves a reduced number of quantum states in the system (\({{\bf{M}}}_{k}\) is sparse) the complexity scaling can become O(log2 n), or polynomial in the number of qubits. The requirement of knowing the initial physical composition as a probability-weighted mixture of not necessarily orthogonal \(|{\phi }_{i}\rangle \)’s should be easy to satisfy from system preparation. If indeed the initial physical composition is unknown and we have to work with the most general matrix form of the density matrix, then we can either perform the eigen-decomposition on \(\rho \) to obtain the form in Eq. (4) or use a modified quantum algorithm. As details shown in the SI, the modified method uses the flattened vector of the initial density matrix as the input and requires O(n3) gates per Kraus operator for various realizations of the time evolution. Both methods can be easily generalized to other open quantum dynamical models because the procedures involved are essentially the same – only the Kraus operators \({{\bf{M}}}_{k}\)’s for the operator sum representation are different for different models. The generality of our methods–in the sense of not requiring particular dynamical models or costly decompositions of the density matrix and the quantum channel–opens up the possibility of simulating more interesting systems such as decohering qubits or excitonic structures interacting with multiple baths38,39 – the latter helps to understand natural light harvesting complexes and exploit quantum coherence effect to improve light harvesting efficiency in artificial photocells38,40. Finally we have demonstrated the implementation of the algorithm on the IBM Qiskit simulator and the IBM Q 5 Tenerife device. Although the gate complexity is larger than calculated with the preferred optical setup, the results show reasonable agreements with the analytic benchmarks considering gate fault, qubit decoherence, and measurement error. In future studies we will seek to demonstrate our quantum algorithms on the preferred photonic devices that can implement 2-level unitaries as elementary gates.

Methods

Theoretical derivations and numerical details essential to the study are presented in the Results section. More details can be found in the Supplementary Information. For the parameters and configurations of the IBM simulator and quantum machine, please go to IBM Quantum Experience at http://www.research.ibm.com/quantum.