Parallel quantum simulation of large systems on small NISQ computers

Tensor networks permit computational and entanglement resources to be concentrated in interesting regions of Hilbert space. Implemented on NISQ machines they allow simulation of quantum systems that are much larger than the computational machine itself. This is achieved by parallelising the quantum simulation. Here, we demonstrate this in the simplest case; an infinite, translationally invariant quantum spin chain. We provide Cirq and Qiskit code that translates infinite, translationally invariant matrix product state (iMPS) algorithms to finite-depth quantum circuit machines, allowing the representation, optimisation and evolution of arbitrary one-dimensional systems. The illustrative simulated output of these codes for achievable circuit sizes is given.


INTRODUCTION
The insight underpinning Steve White's formulation of the density matrix renormalisation group (DMRG) is that entanglement is the correct resource to focus upon to formulate accurate, approximate descriptions of large quantum systems 1 . Later understood as an algorithm to optimise a matrix product state (MPS) 2 , this notion underpins the use of tensor networks as a variational parametrisation of wavefunctions with quantified entanglement resources. Such approaches allow one to concentrate computational resources in the appropriate region of Hilbert space and provide an effective and universal way to simulate quantum systems 3,4 . They also provide an effective framework to distribute entanglement resources in simulation on noisy intermediate-scale quantum (NISQ) computers.
Quantum computers such as those of Google, Rigetti, IBM and others implement finite-depth quantum circuits with controllable local two-qubit unitary gates. Innovations for quantum simulation include using these circuits as variational wavefunctions 5 , optimising them stochastically or by phase estimation 6 , and evolving them either by accurate Trotterisation of the evolution operator [7][8][9] or variationally 10 . Currently, available NISQ devices are limited by gate fidelity and the resultant restriction of available entanglement resources. Since the finite-depth quantum circuit may be equivalently described as a tensor network 11 , tensor networks provide a convenient framework with which to distribute entanglement to the useful regions of Hilbert space and to make efficient use of this relatively scarce resource.
We dub the implementation of a tensor network on such a NISQ device a Quantum tensor network. There are several advantages to this framework. It fits directly into a broader ecosystem of classical simulation of quantum systems. Indeed, because it is based upon the manipulation of explicitly unitary elements, the quantum circuit provides perhaps the most natural realisation of tensor networks. Canonicalisation at each step in a classical tensor network calculation amounts to reducing the tensors to isometries -a step that is not required in an explicitly unitary realisation. Moreover, the remaining elements of unitaries parametrise the tangent space of the variational manifold 12,13 .
Here we demonstrate that quantum tensor networks can be used to parallelise quantum simulation of systems that are much larger than available NISQ machines [14][15][16] . Central to this is dividing the quantum system into a number of sub-elements that are weakly entangled and can be simulated in parallel on different circuits. The influence of the different regions of the system upon one another can be summarised by an effective state on a much smaller number of quantum bits. We provide Cirq and Qiskit code for the simplest class of examples-infinite, translationally invariant quantum spin chains. This is a direct translation (mutatis mutandis) of iMPS algorithms to quantum circuit machines. The remarkably simple circuits revealed below allow the representation of an infinite quantum state, and its optimisation and realtime evolution for a given Hamiltonian.

RESULTS
Parallel quantum simulation across weakly-entangled cuts To parallelise our simulation on a small NISQ machine, we first identify partitions of the system where the effect of one partition upon the other can be summarised by a small amount of information. This is achieved by making Schmidt decompositions across the cut: are an orthonormal set of states to the left of the cut and ϕ α R the same on the right. The λ α are known as the Schmidt coefficients and D the Schmidt rank or bond order. Retaining λ α only above some threshold value provides a way to compress representations of a quantum state; the MPS construction can be obtained by applying this procedure sequentially along a spin chain 4 .
If an observation is made on the right-hand-side of such a cut, the effect of the quantum state on the left upon the observation can be summarised by just D variables corresponding to the Schmidt coefficients. This same effect can be achieved by an effective state on a spin chain of length log 2 D-see Fig. 1-which can be parametrised on the quantum circuit by an SU(D 2 ) unitary V L . This encodes both the Schmidt coefficients λ α and the orthonormal states ϕ α L . The latter does not contribute to observables on the right and so in principle, V L can be parametrised by just D variational parameters. The precise 1 numerical values must be determined by solving a quantum mechanical problem on the left of the system. Similarly, for observations made to the left of the cut, the effect of the righthand side can be summarised by a unitary V R .
This gives a prescription for parallel quantum simulation. Calculations of the quantum wave function to the left and right of the cut can be carried out on different quantum circuits or sequentially on the same circuit. The effects of the left partition upon the right partition and vice versa-through the environment unitaries V L and V R -are iterated to consistency. At each stage of this iteration, measurements must be performed in order to determine V L/R . The small Schmidt rank of the cut reduces the computational complexity of this process-if we were to do full state tomography, to OðpolyðDÞÞ, but with more sophisticated methods even to Oðlog ðDÞÞ, as in the example in the following section.
There are many physical situations in which this parallelisation might be useful. For example, large organic molecules that have localised chemical activity-this activity may be modulated or tuned by the surrounding parts of the molecule and the interplay of these effects could be calculated in parallel. In the following, inspired by iMPS tensor networks, we give quantum circuits that embody these ideas.

Parallel simulation with quantum tensor networks
The translationally invariant MPS gives an approximate representation of a translationally invariant spin 1/2 state as where n labels the lattice site, σ n labels the spin 1/2 basis states on the nth site and i n are auxiliary tensor indices that run from 1 to the bond order D. The tensors A σ ij are the same on every site reflecting the translational invariance. These states can be mapped to quantum circuits by taking the MPS tensors in isometric form and identifying them with the circuit unitaries as described in the "Methods" section. A variety of techniques have been developed for the classical manipulation of these states for quantum simulation 3,17,18 . Here we confine ourselves to discussing the quantum circuit realisation. The Supplementary Material gives a detailed summary of the connection between the classical MPSs and their quantum circuit realisation.
Representing the state. A translationally invariant, spin 1/2 MPS state of bond order D = 2 N can be represented by the infinite circuit shown in Fig. 2a. Expectations of local operators in this state can be evaluated by the finite circuit shown in Fig. 2b. The effect of contracting the infinite circuit to the left of the operator is trivial due to the unitarity of U ∈ SU(2D) (which automatically encodes the left canonical form of the related MPS tensor). The contraction to the right is described by the tensor V ∈ SU(D 2 ), which encodes an effective state over N ¼ log 2 D spins and their entanglement with the remaining system to the left-hand side. This unitary is determined self-consistently from U by the circuit shown in Fig. 2c. As demonstrated in ref. 4 , an MPS in this form can be constructed from any state by a sequence of Schmidt decompositions running from left to right. This guarantees the existence of the isometric MPS representation and the quantum circuit realisation of it. The operation of such a circuit at D = 2 was demonstrated in ref. 19 on an IBM quantum circuit, where analytic forms were known for both U and V along a line through the phase diagram of a model with topological phase transition. In general, V ≡ V(U) is not known and must be solved following Fig. 2c.
Optimising the state. We can find the ground state and the corresponding energy density of translationally invariant Hamiltonians by minimising the expectation value of the energy. The algorithm mirrors the variational quantum eigensolver. The expectation of the local Hamiltonian is found by measuring the corresponding Pauli strings on the physical qubits (see Fig. 2b). The result can then be minimised as a function of the ansatz parameters. Updates must be interleaved with updates to the environment, V, such that we optimise over valid translationally invariant states.
Evolving the state. Perhaps the most compelling feature of this implementation is the ease with which time-evolution can be achieved. The simple circuit shown in Fig. 3 a returns the unitary U 0 Uðt þ dtÞ that updates the state encoded by U(t) to a time t + dt under evolution with the Hamiltonian H. The first variation of this circuit with respect to U 0 returns the timedependent variational principle for iMPS in the form first presented by Haegeman et al. in ref. 12 . The equivalence uses the automatic encoding of the gauge-fixing of the state to the canonical form, as well as encoding of the tangent space and its gauge fixing (see "Methods" section and additional notes in Supplementary Materials). As in the determination of the best groundstate approximation above, the update involves two nested loops; one to find the update U 0 and one to find the environment tensors L LðU; U 0 Þ and R RðU; U 0 Þ -both of which are required in this case as the circuit corresponds to the overlap of two different states rather than expectations taken in a given state. We have used a slightly different way of representing these environments in Fig. 3 compared to that employed in Fig. 2. Quantum advantage. It is natural to ask whether there is any quantum advantage from using a quantum circuit in this way. Algorithms for manipulating iMPS (iDMRG, TDVP, etc.) 3,17,18 are classically efficient-they have the complexity of OðD 3 Þ. Where then is the room for improvement by implementation on a quantum circuit? The quantum advantage comes from the potentially exponential reduction in the dependence upon the bond dimension, D.
In a quantum circuit, the multi-qubit unitaries must be compiled to the available gate set. A translationally invariant state with entanglement S can be captured by a matrix product state of bond dimension D $ exp S. This requires a circuit depth of Oðexp SÞ. An arbitrary U 2 SUð2 exp SÞ unitary to implement this iMPS requires Oðexp SÞ $ OðDÞ gates 20 . However, a subset of nontrivial unitaries with entanglement S can be achieved with circuits whose depth is OðSÞ $ Oðlog DÞ giving an exponential speedup over the classical implementation 21 . This reduces the contraction time from OðD 3 Þ for a typical classical implementation to Oðlog DÞ in a quantum case. Though these shallow circuits exist, the question remains whether they have high fidelity with the states that we are interested in ref. 22 identifies a subset of shallow quantum circuit MPS that have high fidelity with the states produced by Hamiltonian evolution. An example at bond order D = 4 is shown in Fig. 2d.
This demonstrates a quantum advantage for U. We must also consider whether the environment V (or R and L) have high fidelity shallow circuit representations. A priori there is no guarantee that, given a shallow circuit U, the V that satisfies the fixed point equations in Fig. 2, or the R and L that satisfy the fixed point equations in Fig. 3 are themselves shallow. However, there is a rigorous argument for this. We start by constructing an initial shallow approximation. When optimising the energy, as in Fig. 2, we can take advantage of the ability to diagonalize the reduced density matrix to the right and choose a corresponding V. A shallow circuit approximation to this allows us to set log D Schmidt coefficients exactly with the remainder lying on a smooth interpolation between them. In the case of time evolution, the mixed environments R and L cannot necessarily be diagonalised simultaneously (though off-diagonal elements are of order dt 2 ) and a richer-though still shallow-variational parametrization allowing for this is necessary. In either case, a shallow approximation for the environment can be improved exponentially for a linear cost in qubits and circuit depth by applying the transfer matrix a linear number of times, i.e., by using the power method (see the Supplementary Materials for more details). This simply corresponds to inserting further copies of the transfer matrix in the center of the circuits in Fig. 3a. These arguments establish an asymptotic quantum advantage for our algorithm. In practice, we find that the initial shallow approximations prove remarkably accurate and these corrections are unnecessary. This equation is to be interpreted as equality of the reduced density matrices implied by the free qubit lines. We show in the "Methods" section how to implement this using swap gates. d A shallow circuit representation of the D = 4 states following ref. 22 . Such circuits have been shown to have high fidelity with states obtained in Hamiltonian evolution and are exponentially quicker to contract on a quantum circuit than they are classical.  provides a non-trivial test for our simulation. Our main findings are as follows: (i) When run without gate errors and complete representation of the unitaries U, V, L, and R, our code precisely reproduces the optimum iMPS and its time evolution using the time-dependent variational principle.
(ii) Factorisations of the unitaries reduce the fidelity of our results. These are systematically improved as the depth of expressibility of the ansatz is increased. Full parametrisations of the unitaries reproduce classical MPS results exactly.
(iii) Going from representing to optimising, to time-evolving states places increasing demands upon circuit depth and width. Accurate results require increasingly deep factorisation of the unitaries and suffer increasingly adverse effects of gate errors.
Optimising. The simulation results for the optimisation of the transverse field Ising model are given in Fig. 4. The unitary U is factorised using the ansatz in Fig. 4a. This ansatz shows good agreement with the exact bond dimension 2 MPS results with a low depth circuit. This factorisation is much lower depth than a full factorisation of SU (4). Alternative factorisations of the state unitary are possible for this problem. Lower depth ansatz can be used by identifying that the ground state of the transverse field Ising model is made up of real tensors (See Supplementary Material for details).
Time evolving. Using exact representations of unitary matrices we exactly reproduce the TDVP equations as expected. We demonstrate that shallow factorisations of the state unitaries are able to effectively capture dynamical phase transitions in the Loschmidt Echos. The ground state of the transverse field Ising model with g = 1 is prepared. This state is then evolved under the same Hamiltonian but with g = 0.2. At each step of the trajectory, the overlap of the current state with the original state is recorded. The log of this overlap is shown for the whole trajectory in Fig. 4c. A low depth state is able to exactly represent the states used in ref. 24 to produce Poincare maps, which are used to study slow quantum thermalisation in the PXP Model 25 . The states are defined with a two-parameter circuit shown in Fig. 5. The quantum TDVP code is able to recreate the Poincare maps for a two-site translationally invariant MPS state. It is possible in this case to discard erroneous points by identifying points with energies that deviate from the known value by more than some fixed threshold. This is a form of error mitigation that may be applicable to other problems when using the quantum TDVP algorithm and may help mitigate the impact of noise from NISQ devices. The larger structures in the Poincare Map are distorted by errors but are still visible in the presence of integration and stochastic optimisation errors. The effects of noise on the quantum TDVP algorithm are outlined in the Supplementary Materials.

DISCUSSION
We have presented a way to perform quantum simulations by translating tensor network algorithms to quantum circuits. Our approach allows parallel quantum simulations of large systems on small NISQ computers. We have demonstrated this for onedimensional translationally invariant spin chains. The translation of MPS algorithms naturally encodes fundamental features of matrix product states and the tangent space to the variational manifold that they form. In demonstrating the operation of such circuits, we have touched upon some immediate questions including the expressiveness of shallow circuit restrictions of tensor network states, their effect upon simulation alongside that of finite gate fidelity. These warrant further systematic study.
Our algorithms are readily extensible to inhomogeneous onedimensional systems and to higher dimensions following existing þ λσ x n is studied with a bond order D = 2 quantum matrix product state. a The SU(4) unitaries U and V are compiled to the circuit as shown. The parameter p is varied to increase the accuracy. Although more efficient parametrizations exist for 2 qubit unitaries 39 , as well as circuits more specifically tuned to this problem, we choose a generic circuit. It is readily extendible to higher bond orders [see Supplementary Materials]. b The optimum state is found using the circuits depicted in Fig. 2. The energy of this state is a better approximation to the true ground state energy as the depth of parametrization increases and converges to that obtained in a conventional MPS algorithm. In particular, we have checked that the parametrization of ref. 39 perfectly reproduces the MPS results. Note that at λ = 0 the Hamiltonian is optimised by a product state, which is captured perfectly with p = 1. c Transverse field Ising model displays dynamical phase transitions in the Loschmidt echo 23 . These are revealed in the simulated runs of the quantum time-dependent variational principle embodied by the circuits in Fig. 3. More accurate circuits are required to obtain good agreement. The results indicated as exact in the above are exact analytical results.
methods that wrap one-dimensional states to higher dimensional systems 17,18 . It would be interesting to study other gauge restrictions of MPS-such as the mixed gauge of modern classical time-dependent variational principle codes-which can also be implemented in quantum circuits. Generalisations of MPS that more directly describe higher dimensional systems are also available. For example, the projected entangled pair states (PEPS) give a two-dimensional generalisation. In realising these states on a quantum circuit, they must be formed from isometric tensors. Until recently, a suitable canonical form for PEPS was not available. The isometric version of PEPS presented in ref. 26 shows great promise and ought to be possible to implement on a suitably connected quantum circuit. Other tensor networks such as the multi-scale entanglement renormalisation ansatz 27 (MERA) are naturally based upon unitary operators and can be realised on a quantum circuit 28 . Indeed, MERA has been deployed for image classification on a small quantum circuit 29 and as a quantum convolutional neural network 30 .
The tensor network framework also provides a convenient route to harness potential quantum advantage in simulation. The onedimensional matrix product state ansatz is efficiently contractible. The time is taken to calculate the expectation of a local operator scales proportional to the length of the system. A quantum implementation has the advantage of a potentially exponential decrease in the prefactor to this scaling. While a classical tensor network may efficiently represent the important correlations of quantum state in higher dimensions, its properties may not be efficiently contractible. Contraction of a PEPS state is provably ♯P hard 31 . However, a physically relevant subset of these states can be efficiently contractible and an isometric representation of them (isometric PEPS for which the Moses move of ref. 26 can be carried out without approximation are indeed quasi-local and finitely contractible) could confer quantum advantage from the shallow representation of the constituent unitaries 32 . The balance of advantage and cost in quantum algorithms can be delicate; extracting the elements of the tensors is easy classically, but quantum-mechanically requires tomography of the circuit state, which is exponentially slow in the number of spins measured. This may be the bottleneck in hybrid algorithms 33 . Using the tensor network framework to distribute entanglement resources over the Hilbert space appropriately can mitigate some of these costs.
This work demonstrates the utility of translating tensor network algorithms to quantum circuits and opens an unexplored direction for quantum simulation. Potentially all of the advances of classical simulation of quantum systems using tensor networks can be translated in this way. Moreover, it provides a complementary perspective on classical algorithms suggesting related benefits in purely unitary implementations.

Quantum matrix product states
The mapping from MPS to quantum circuits that we use automatically embodies much of the variational manifold and its tangent space. The parsimony of this mapping to the quantum circuit suggests that it is the natural home for MPS. The fundamental building block of the circuits depicted in Figs. 2, 3 is the MPS tensor. A tensor of bond order D and local Hilbert space dimension D is represented by an SU(dD) matrix following 13,34-37 A σ ij ¼ U ð1jÞ;ðσiÞ as shown in Fig. 6. This translation automatically encodes the left canonical form of the MPS tensor; P σ ðA σ Þ y ij A σ jk ¼ δ ik . This follows directly from the unitary property of U. A classical implementation of an MPS algorithm involves returning the tensors to this form after each step in an algorithm using singular value decompositions-in a quantum algorithm, such manipulation is not required.
Moreover, the remaining elements of the unitary encode the tangent space structure to the sub-manifold of states spanded by MPS. These are important in constructing the projected Hamiltonian dynamics. Adopting the notation of ref. 12 , V (σ⊗δ≠1),(i⊗j) = U (δ≠1⊗j),(σ⊗i) and automatically satisfies the null or tangent gauge-fixing condition P σ ðA σ Þ y ij V σ;δ≠1 jk ¼ 0. This structure is responsible for the very compact quantum implementation of the time-dependent variational principle shown in Fig. 3. It obviates the need to calculate the tangent space structure at each step 12 .

The quantum time-dependent variational principle
The equivalence with the classical implementation of the time-dependent variational principle for matrix product states and its quantum version can  25 , displays a curious property known as many-body scarring 25 , whereby from certain starting states, persistent oscillations that can be described with a low bond-order MPS are found. These are amenable to study on a NISQ machine using the quantum time-dependent variational principle of Fig. 3. a A simple set of 2-site periodic states at bond order D = 2 are parametrised by circuits with just 2 parameters per site, so 4 in total. b A partial Poincare section through the plane θ 1 = 0.9 produced from a classical simulation using the matrix product state equations of motion presented in ref. 24 . Initial conditions are chosen on a constant energy surface H h i ¼ 0. The partial plot was produced with initial conditions along a line with spacing δϕ 1 = 3 × 10 −2 , with θ 1 = 0.9 and θ 2 = 5.41. The final parameter, ϕ 3 , chosen to fix the energy. c The same Poincare map produced simulating the quantum time-dependent variational principle in Cirq. While the figure is blurred somewhat by integration, the main features are still apparent. Taking the explicit overlap of the circuit in Fig. 3a) with the state 000 j iand then calculating its derivative with respect to X recovers the time-dependent variational principle as formulated in ref. 12 . The tensor X is to be compared with that in ref. 12 rescaled by the square root of the environment tensor. The quickest route to see this is to expand the circuit to quadratic order in the tensor X and bi-linear order in X and dt, before differentiating with respect to X.

Optimising quantum circuits
Our algorithms require the optimisation of expectations of observables-in Figs. 2b, and 3a-and the solution of fixed-point equations in Figs. 2b and 3b, c to determine the environment and mixed environment. In all cases, optimisations are carried out stochastically.
We use the Rotosolve algorithm 38 to speed up our stochastic searches. This utilises the fact that the dependence of expectations of a parametrised quantum circuit on any particular parameter is sinusoidal. As a result, after just three measurements one can take this parameter to its local optimum value. Extensions of this allow the variation to be calculated when several elements of the circuit dependent upon the same parameter.
The equations illustrated in Figs. 2b and 3b, c are implicitly identities between density matrices. We solve them using a version of the swap test that amounts to a stochastic optimisation of the objective function tr [ðr ÀŝÞ y ðr ÀŝÞ]. Details of how the swap test is implemented for the environment in Fig. 2b and for the mixed environments in Figs. 3b, c are given in the supplementary materials.

DATA AVAILABILITY
All data generated or analysed during this study are included in this published article (and its supplementary information files).