## Introduction

Many problems in physics, chemistry, nanotechnology and even biology require knowledge about the evolution of quantum systems. The exponential scaling of the computational overheads needed to simulate the expanding number of states of such systems as the number of particles increases limits the usefulness of classical simulators. Building simulators from other quantum systems offer a way around the problem1.

Several models exist for the realisation of such a device. They can be classified into universal quantum computers, able to implement arbitrary unitary operation in their state space, and, so-called quantum orreries2, which are less versatile but capture more of the physical features of the real system under investigation. Examples of quantum processors relying on techniques such as superconducting qubits, photonics, ion traps, NMR and others3 already exist. The pre-eminent model for universal quantum computation is the circuit model, which alters the states of qubits in an arbitrary way using a small set of universal building blocks. Most commonly this is composed of one-qubit rotations and two-qubit CNOTs. Using these, a variety of quantum algorithms have been developed and are expected to provide a significant speed-up over their classical counterpart. The second type, quantum orreries, take a different approach to quantum computation. In these implementations, the universality of computation is traded for other advantages that manifest themselves in a smaller range of problems, like simulating quantum phase transitions2. In this instance, quantum orreries are believed to have the advantage in simulations due to their analogue rather than digital inner workings. In this work, we choose to focus on universal quantum computers, given a large amount of literature on the quantum circuit model4 and the release of prototype quantum computers accessible through the cloud5. In particular, we use the five-qubit quantum computers from IBM via the Quantum Experience program.

While the high level of noise and the limited number of qubits in these devices restrict the range of experiments one can presently implement, they allow us to trial possible applications and to develop a practical intuition about what a quantum computer can and cannot do.

Progress towards deploying Noisy Intermediate Scale Quantum Computers (NISQ) solutions to industrial problems has accelerated considerably through the development of hybrid variational algorithms6. These involve preparing a quantum register in a known state, applying a parameterised unitary operation and measuring certain operators (most commonly the Hamiltonian) while scanning through the state space. If the goal is to find the ground state of a system, a classical subroutine is used to adjust the parameters until a local minimum is reached. The central element of the procedure is the ansatz, for which several choices exist. Most of these, including the popular Unitary Coupled-Cluster Ansatz (UCCA), require synthesising the circuits to implement unitary operations on subsets of the total number of qubits available in the machine. We focus our work on designing optimal decompositions for three-qubit gates, which can be used in a modular fashion to build larger circuits while allowing also for piecewise process tomography and parameter optimisation to reduce systematic errors due to cross-talk. These can be combined with other error detection techniques to optimise results7. Nevertheless, in any real device, it is expected that statistical errors will reduce the purity of our states. If the simulation is carefully engineered, these statistical errors could gainfully be employed to mimic effects of noise affecting a real open system, or can be purified using techniques such as Variational Quantum State Diagonalization8. Furthermore, the decomposition may also be used as part of transpiler passes to reduce circuits where long sequences affect only a small number of qubits and can be integrated with compilers such as t$$\left|{\rm{ket}}\right\rangle$$9.

Much effort has gone into understanding the role of quantum effects in energy transport in photosynthetic organisms10,11. Mechanisms12 ranging from semi-classical to all-quantum models have been used to explain the ability of photosynthetic FMO protein complexes to steer excitons through a network of seven sites between the light-harvesting antenna and the chemical reaction centre, with near-perfect quantum yield. The latest consensus13 on this fascinating question is that Nature has remarkably engineered FMOs as finely tuned thermalisation devices that take advantage of thermodynamic steering and thermal noise to optimise the efficiency of energy transport, even in the very short time- and length-scales of exciton transfer. Recent studies14 suggest that the quantum beats observed in 2D pump-probe spectroscopy experiments15 can mainly be attributed to vibrational modes rather than to nonstationary excitonic coherences. The physical model that emerges, then, for energy transport through the seven-site network in FMO entails steering the quasi-particles down an energy gradient set by the site energies and excitonic couplings where interactions with a heat bath provide an essential mechanism for incoherent relaxation. The energy profile of the transport channel is encoded in the Hamiltonian used below. The interplay and balance between the parameters in FMO appear to have been tuned to a great effect on the efficiency of energy transport. They provide a framework for thinking about whether enhanced transport can result from the interplay between coherence, dephasing and relaxation in disordered systems. Environment-assisted Quantum Transport (ENAQT) is a term often used in the study of such phenomena that refers to the parametric region in-between Anderson localisation (too little noise) and the quantum Zeno effect (too much noise), both of which can inhibit transport. Delocalisation and quantum interference, which become manifest through the ability of quantum particles to explore multiple paths at the same time using quantum superposition, lead to something reminiscent of a generalised Grover algorithm. Another useful concept is that of invariant subspaces16, which suggests that destructive interference can be used to avoid certain sites associated with unwanted loss of excitation, which has no analogue in classical random walks. A deeper understanding of these processes in biological systems can lead to breakthroughs in engineering applications such as more efficient computer chip architectures or solar cells and artificial protein arrays for nanotech medicine17.

In this work, we undertake the preliminary steps towards simulating the equations of motion of systems where the environment plays an important role on NISQ devices. To this end, we build circuits from fundamental blocks which act on a restricted number of qubits, within the architectural constraints of the state-of-the-art quantum computers available today. We develop software for the construction of arbitrary three-qubit circuits and benchmark their effectiveness in the context of simulating unitary evolutions. We propose an error model for the study of these circuits and show that it can be used as a good approximation for a general quantum error channel. Furthermore, we demonstrate that the errors are only sensitive to the structure of the circuit and not the exact parametrisation. These are meaningful steps towards incorporating errors occurring in the quantum computer within the simulation of open quantum systems.

## Results

### Method

We restrict our attention to unitaries of the type:

$$U({\vec{t}})={e}^{i\mathop{\sum}\limits_{j}{t}_{j}{P}_{j}}$$
(1)

where Pj are Pauli strings, defined as elements of the set $${\{X,Y,Z,{\mathbb{I}}\}}^{\otimes n}$$ acting nontrivially (anything other than the identity) on a set of three qubits and tj are the degrees of freedom of the classical optimisation subroutine. We assume also that direct interaction in the form of CNOT gates is available between any of the three qubits. This can usually be achieved by carefully mapping the state space to qubits inside the quantum computer such that the locality of interaction is preserved. If this is not the case, we can always mediate CNOT interactions using the shortest path between them, at the cost of an increased CNOT count for the overall unitary. Here, we test the decomposition by generating the dynamics of small systems with up to eight states, such as the FMO molecule. In this case, our approach is self-evidently advantageous as it does not require Trotterization (it is therefore theoretically exact) and can be shown to be optimal in the sense that no better decomposition is known that can reduce the CNOT count in the general case. The interest in generating exact multi-qubit decompositions has been diminished by the observation that they do not scale efficiently with the number of qubits. This is not a problem if the Hamiltonian of the system we would like to simulate only contains low-weight Pauli strings as we can split it by Trotterization and then implement the factors directly. This is almost always the case for bosonic simulations and can be applied to fermionic simulations using the Bravyi–Kitaev encoding, which uses Pauli strings of weight $$O({\mathrm{log}}\,n)$$18, the weight of the string being defined as the number of non-identity operators in the string. We expect the decomposition of each string to scale like O(ek), where k is the weight of the string. This makes the overall decomposition efficient since the number of terms in the Hamiltonian is polynomial in n by locality and each string is implemented using O(n) resources.

### Transposing the problem

We generate a set of unitaries to be implemented on the quantum computer by evolving the Hamiltonian to a time t. This is done through exponentiation on a classical machine

$$U(t)=\exp \left[-iHt/\hslash \right],$$
(2)

where H is the Hamiltonian of the system. To reconstruct the dynamics, we compute the unitary matrix for a range of times t which spans the relevant time scale of the evolution. As we pointed out earlier, mapping the real system to the quantum computer is very important as it must preserve locality. Two important restrictions derive from this condition. First, the weight of Pauli strings must scale at most like the logarithm of a polynomial in n to compensate for the difficulties of exact decomposition. Second, the qubits on which individual Pauli strings act must be as close as possible in the quantum computer, as defined by the connectivity metric. This depends on the architecture of the device and the degree to which this can be achieved in conjunction with the first condition is an open problem. The systems we analyse explicitly are small, so we start with the generator of dynamics in matrix form. In the case of the FMO, we wish to start the evolution in an equal superposition between site 1 and site 6, as these form the outer layer of the molecule and are most likely to be excited. To take this into account, the states of the quantum computer chosen to represent them can be turned into one another by the negation of a single qubit (in particular we choose the states ‘000’ and ‘001’ to represent them in our three-qubit register), to reduce decoherence.

### Cartan decomposition

Below we summarise briefly the underlying mathematics of the decomposition. The problem can be formulated as decomposing a general element of the semi-simple Lie group SU(2n) into a sequence of one and two-qubit gates which can then be implemented on a quantum device as a quantum circuit.

For the three-qubit case, no circuit has yet been proved to be optimal. However, using the results in refs. 19,20, we can transpose an arbitrary unitary into a circuit consisting of at most 73 one-qubit rotations and 26 CNOT gates. A more detailed look at the decomposition algorithm used here can be found in Supplementary Note 1.

Following the nomenclature in ref. 21, we call a Cartan decomposition of a Lie algebra $${\mathfrak{g}}$$ an orthogonal split of the form

$${\mathfrak{g}}={\mathfrak{k}}\oplus {\mathfrak{m}},$$
(3)

where the elements in the subspaces $${\mathfrak{m}}$$ and $${\mathfrak{k}}$$ satisfy the following commutation relations

$$[{\mathfrak{k}},{\mathfrak{k}}]\subset {\mathfrak{k}},\quad [{\mathfrak{m}},{\mathfrak{k}}]={\mathfrak{m}},\quad [{\mathfrak{m}},{\mathfrak{m}}]\subset {\mathfrak{k}}.$$
(4)

If one finds a maximal subalgebra $${\mathfrak{h}}\subset {\mathfrak{m}}$$, then this is guaranteed to be Abelian by the relations (4) and is called a Cartan subalgebra of the pair $$({\mathfrak{g}},{\mathfrak{k}})$$. The decomposition of the algebra $${\mathfrak{g}}$$ then induces a decomposition of the group G

$${\bf{G}}={\bf{K}}{\bf{H}}{\bf{K}}.$$
(5)

We now shift our attention to the particular case of $${\mathfrak{g}}={\mathfrak{s}}{\mathfrak{u}}(8)$$. We adopt the definitions in ref. 19 for the subspaces $${{\mathfrak{h}}}_{n}$$, $${\mathfrak{{f}_{n}}}$$. After an iterative application of the Cartan decomposition, we arrive at the following expression for an arbitrary unitary matrix U SU(8):

$$U={K}_{1}{F}_{1}{K}_{2}H{K}_{3}{F}_{2}{K}_{4},$$
(6)

where $${F}_{j}\in \exp \left({{\mathfrak{f}}}_{3}\right)$$, $$H\in \exp \left({{\mathfrak{h}}}_{3}\right)$$, $${K}_{j}\in SU\left(4\right)\otimes SU(2)$$. This is referred to as the modified Khaneja–Glaser decomposition and can be illustrated in the quantum circuit format as seen in Fig. 1. While the operators H, F1 and F2 can be implemented straight away using the circuits in ref. 19, further decomposition of the remaining two-qubit gates in Kj in terms of CNOTs and one-qubit rotation is necessary. A theoretical treatment of this procedure is given in refs. 22,23. After concatenating the individual circuits according to Eq. (6) and using gate identities to simplify the circuit, we arrive at a quantum circuit that can be implemented on a quantum computer to simulate the evolution of the system.

Transforming the matrix into a quantum circuit in this way goes far beyond its applicability as a classical matrix, through the ability to operate on specific subsets of qubits in a quantum computer. Computing the action of the unitary on a highly entangled initial state is inefficient on classical machines and requires exponentially many resources just to store the initial and final state.

It is also worth noting that synthesising the circuit in this way guarantees the correct behaviour for all input states. This contrasts with approximate circuit synthetisation methods, which use variational principles and are only accurate for the same initial states used in their learning phase.

Our code (written in MATLAB) can compute all the phases necessary for the decomposition of a three-qubit gate, and assembles it in a Python script using IBM’s Qiskit24 package for easy integration with IBM Q Experience quantum computers. The details of the code can be found in Supplementary Note 1.

### Error analysis in quantum computers

We used two quantum computers, labelled ibmq_santiago and ibmqx2, to implement our simulations. The specifications of these machines are available on the IBM Q Experience website, but due to daily calibrations, the values quoted may differ over time. For reproducibility, we provide tabulated error rates in Supplementary Note 2. The first quantum computer is a new-generation five-qubit computer featuring the nearest-neighbour line architecture and a Quantum Volume (QV) of 32. It was chosen for its low error rate in both one-qubit gates and CNOTs. These were in the region of 6.5‰ for a CNOT and 0.25‰ for a one-qubit operation on the day the data were taken. Since the original circuit design is suited for full connectivity we had to transpile the circuits to satisfy the device constraints. This led to an increase in the number of CNOTs from 26 to 38. The second machine allowed us to implement the circuit directly, without further transpiling, because it has three directly connected qubits. However, it has a much lower QV of just 8, and the errors for a CNOT are around 12.5‰. These data allow us to estimate the fidelities of our circuits to ~76% on the ibmq_santiago and 69% for the ibmqx2. However, we present data obtained from ibmq_santiago only, due to the large systematic errors present in the ibmqx2. An overview of the latter can be found in Supplementary Note 2. To verify fidelity between unitaries and their representative quantum circuits, we also perform gate execution simulations using QASM24, a classical simulator capable of simulating operations on up to 32 qubits.

We classify errors into three categories, each with its own mitigation technique. The first type is caused by faulty state preparation and measurement (SPAM). This can be corrected by implementing calibration circuits, constructing a filter and applying that filter to the data to infer the probabilities before the faulty detection occurred. These are available through the Ignis24 library. We further separate the remaining errors into statistical and systematic. We run the maximum number of allowed shots per circuit to reduce statistical errors to a minimum. We could further reduce these errors by running the same circuits multiple times. This comes at the cost of introducing additional systematic errors due to the drifts accumulated in the large intervals of time in-between. The systematic errors are in general more difficult to deal with, especially when treating the quantum computer as a black box device. In general, they have a complicated dependence on the unitary being implemented.

The 8th state of the three-qubit ensemble plays no role in the simulation itself, but can be used to separate signal from noise. We employed a simple model to filter the noise by assuming that with a certain probability p, which depends on the unitary, the system jumps to a randomly chosen state and remains there for the rest of the evolution. We can describe this using density matrices:

$$\rho =(1-p){\rho }_{s}+\frac{pI}{8},$$
(7)

where ρ is the density matrix we sample and ρs is the signal we would like to obtain in the absence of errors. It follows that the final results contain a level of noise that is almost equally distributed across sites. To reverse this process we subtracted the counts in site 8 from every site and renormalised the distribution to unity. The result of this operation is referred to as the mitigated data in this work. The overlap between the mitigated data and the classically simulated curve provides a post-hoc verification that illustrates its qualitative features. We use the average square deviation of the data from the classical simulation to quantify the improvement in accuracy of our mitigation technique using the measure

$${A}^{2}=\frac{1}{N}\mathop{\sum}\limits_{n}{({y}_{n}-f({x}_{n}))}^{2}-{\sigma }_{n}^{2},$$
(8)

where (xn, yn) are data points, f is the function giving the correct evolution (simulated classically) and σn are standard deviations. A lower value of A signals a better accuracy. An explanation for the choice of metric is given in Supplementary Note 3.

### Linear-ordered chain

To illustrate the technique in general for systems with stronger coupling between states we implemented a simulation of a linear chain characterised by the Hamiltonian:

$$H=\mathop{\sum}\limits_{n}\left|n+1\right\rangle \left\langle n\right|+{\rm{h.a.}},$$
(9)

where h.a. stands for the hermitian adjoint. For simplicity, the site energies are set to 0 and the coupling strength between consecutive sites are set to 1.

The output from the ibmq_santiago quantum computer is plotted in Fig. 2, and reveals a situation that is physically equivalent to having a perturbation travelling through an elastic medium and being reflected at the boundaries. A careful inspection of the plots in Fig. 2 shows an excitation starting in site 1 and propagating through the chain up to site 7 and back, where the numbering starts at one end of the chain and ends at the other.

This is one of the first configurations considered in the context of ENAQT and previous studies show that a purely dephasing environment cannot provide any advantage if the excitation is injected at one end and collected at the other25. The environment can be beneficial in certain regimes if the excitation is injected in different positions on the chain, thus highlighting that static disorder is not a requirement26. This also does not preclude the possibility of enhancing transport through coupling to correlated, non-Markovian environments.

The model offers useful insight for observing ENAQT in an experimental setting since many of its characteristics will remain unchanged in more complex systems with networked branches.

The result of the QASM simulator is in agreement with the classical simulation, which provides corroboration on the connection between the unitary matrix and its associated circuit. The standard deviation of the data from the fit is too large to extract quantitative information, but the profile remains consistent with predictions.

The bump in site 1 at ~t = 4.5 (a.u.) occurs consistently in simulations implemented on ibmq_santiago, suggesting it is due to a systematic error during the execution of the unitary. Such discoveries can be used to reverse engineer quantum processors with a higher degree of control over such errors.

### The FMO complex

Below we present the results of our quantum simulation of the FMO molecule. As a seven-site complex system, it presents a robust challenge for our simulation technique, with a Hamiltonian given by27

$${\rm{H}}=\left(\begin{array}{lllllll}240&-87.7&5.5&-5.9&6.7&-13.7&-9.9\\ -87.7&315&30.8&8.2&0.7&11.8&4.3\\ 5.5&30.8&0&-53.5&-2.2&-9.6&6.0\\ -5.9&8.2&-53.5&130&-70.7&-17.0&-63.3\\ 6.7&0.7&-2.2&-70.7&285&81.1&-1.3\\ -13.7&11.8&-9.6&-17.0&81.1&435&39.7\\ -9.9&4.3&6.0&-63.3&-1.3&39.7&245\end{array}\right),$$
(10)

where the site energies have been shifted to set the energy of the final receptor site (the reaction centre of the FMO complex) to 0. The units are cm−1 (wavenumber), used commonly in spectroscopy as a unit of energy when multiplied by hc (the product of Planck’s constant and the speed of light).

The results of the simulation are shown in Fig. 3. A superposition of sites 1 and 6 was chosen as a possible initial state because it is believed to initiate the energy transfer upon the capture of photons from the environment. Sites 3 and 4 are the closest to the reaction centre. Due to the large energy mismatch between sites and relatively low coupling, the excitation is only slightly delocalised. Sites 3, 4 and 7 are only negligibly populated at all times.

Although the simulation technique and the circuit lengths are identical for both the FMO and the linear chain, the results of the FMO simulation have a much larger deviation from the expected curve (given by classical simulation). This simply means that our method of dealing with the systematic error is less accurate for more complex hamiltonians (coupling all states directly rather than nearest neighbour only). In the case of the linear chain, it is likely that certain gates in the decomposition become redundant so do not introduce additional noise, making the predictions more precise. We note the overestimation of the population peaks in sites 1 and 2 at the expense of an underestimation of the populations in site 6. This reveals a deviation of the real device from the simplified noise model and obviates the requirement for a more sophisticated calibration procedure. The presence of such systematic differences could be helpful in simulating realistic open quantum systems. This stems from the intuition that an inherently noisy system such as the FMO can be represented more faithfully by a noisy quantum computer than by a perfectly error-corrected one, assuming that one can tune its noise environment to mimic real heat baths.

### Fidelity

The quantum computer measures site 8 with an average probability of 0.053 and a standard deviation of 0.01. The low deviation from the mean indicates that the probability of an error occurring is nearly independent of the gate parameters. This is interesting as it can be used to benchmark the ansatz itself rather than individual circuits. Assuming that all sites suffer from this constant noise we can estimate the value of p from Eq. (7) to be ~0.42. This can be used to estimate the fidelity of the circuits to be ~0.65, which is slightly lower than the theoretical prediction. We expect the discrepancy is due to the long time interval between the moment the device is calibrated and the moment the circuits are implemented and the fact that we did not explicitly include the loss of fidelity due to relaxation and dephasing in the calculation (which occur even when we do not apply any gates). Since the mitigation technique comes with subsequent renormalisation, any statistical error is amplified in the final data. This is not an issue as this type of error can be reduced by averaging over a higher number of shots.

To benchmark the mitigation technique on the simulations, we computed the accuracy parameter for both raw and mitigated data in the cases of the linear chain and the FMO and observed a drastic improvement as a result of mitigation. This is presented in Supplementary Note 3.

## Discussion

Here, we consider the limitations of the simulation approach and propose possible solutions. In the current form, the actual output of the three-qubit block is far from pure, which prevents us from integrating it in larger circuits. Quantum error correction algorithms can be a powerful tool against this problem, provided that experimental noise can be held below the threshold defined by the fault-tolerance theorem28,29. The correction codes can be integrated into the logical representation of the unitaries to increase the purity of the outgoing state without changing the way we work and think about these blocks. This would open a path to numerous simulation possibilities30,31.

We are equally interested in scaling up the algorithm to more than three qubits. While the Cartan decomposition can always be used to deconstruct arbitrarily large unitaries, there is no guarantee that the circuit obtained is optimal. Advanced transpiling techniques can be employed to shorten the length of the circuit using gate identities such as those developed by ZX-Calculus32,33. Some of these are available in Python as part of t$$\left|{\rm{ket}}\right\rangle$$9 and Qiskit Terra24.

In realistic simulations of open complex systems, one is usually also interested in simulating dephasing, relaxation and loss of probability to an acceptor state. Implementing this in a universal quantum computer is facilitated by the fact that any operation can be performed by projective measurements on ancillary qubits4. This implies that one must measure and reinitialise the ancilla qubits several times during the time evolution, which, for the time being, is beyond current technological capabilities. As a proof of principle, it is possible to solve the evolution beforehand, as shown in this work, calculate the Kraus operators, find the unitary-and-projective measurement in the system-and-ancilla space that replicates the Kraus operators, and decompose these into a quantum circuit by Cartan decomposition. To avoid having to use more qubits, this can be done with a two-state system and a four-state environment, such that no more than three qubits are required. The use of ancillary qubits can be avoided altogether by implementing circuits for the different transfer matrices corresponding to various Kraus operators and taking the statistical average of the results, as done in ref. 34. This results in an increase in the number of necessary circuits by a factor of ~N2.

The fast pace of advances and developments in quantum engineering is likely to enable scalable quantum simulators of this type in the near future, which would showcase the full potential of simulation techniques like that presented here on molecules with a significantly larger number of degrees of freedom.