Quantum circuit cutting with maximum-likelihood tomography

We introduce maximum-likelihood fragment tomography (MLFT) as an improved circuit cutting technique for running clustered quantum circuits on quantum devices with a limited number of qubits. In addition to minimizing the classical computing overhead of circuit cutting methods, MLFT finds the most likely probability distribution for the output of a quantum circuit, given the measurement data obtained from the circuit’s fragments. We demonstrate the benefits of MLFT for accurately estimating the output of a fragmented quantum circuit with numerical experiments on random unitary circuits. Finally, we show that circuit cutting can estimate the output of a clustered circuit with higher fidelity than full circuit execution, thereby motivating the use of circuit cutting as a standard tool for running clustered circuits on quantum hardware.


I. INTRODUCTION
The advent of noisy intermediate-scale quantum (NISQ) technologies [1] makes quantum processors with increasing numbers of qubits available to the quantum computing community for experimentation.The rapid progress in the development and manufacturing of these devices is remarkable, with state-of-the-art superconducting quantum processors reaching ∼ 50 qubits with percent-level gate and readout errors [2][3][4].Advances on the hardware front have been matched by the theoretical development of suitable hardware benchmarks [5], which have in turn enabled proof-of-principle demonstrations of a computational advantage over classical computing systems [4].
Despite tremendous progress, existing devices still lack the number and quality of qubits required for practical NISQ-era applications such as digital quantum simulation [6,7], quantum optimization [8][9][10] and quantum machine learning [11,12].Without error correction, these applications are severely limited by the accumulation of errors that will only compound as devices scale up to more qubits and deeper circuits.Bridging the gap between the requirements of NISQ-era quantum algorithms and the capabilities of NISQ devices will require error mitigation techniques [13,14] and problem decompositions that trade quantum and classical computing resources [15,16].
One decomposition, inspired by the fragmentation methods used for quantum molecular cluster simulations [17][18][19], applies fragmentation to the execution of quantum circuits [16].This decomposition consists of first "cutting" a quantum circuit into smaller subcircuits, or "fragments", that can be executed on processors with fewer qubits, and then reconstructing the probability distribution over measurement outcomes for the original quantum circuit from probability distributions associated with its fragments.The severed quantum connections between circuit fragments are simulated by classical postprocessing of fragment data, which leads to a classical computing overhead that grows exponentially with the number of cuts that are made to a circuit.This approach is therefore suitable for simulating circuits that are decomposable into clusters of gates with a small number of inter-cluster interactions.Such circuits can make appearances in the context of Hamiltonian simulation [16], as well as near-term applications based on a variational ansatz that allows for some freedom in choosing circuit structure, such as the quantum approximate optimization algorithm (also the quantum alternating operator ansatz, QAOA) [8,9,20,21] and variational quantum eigensolvers (VQE) [16,22].
Due to the presence of fundamental shot noise (equivalently, finite sampling error), an unavoidable feature of the original fragment recombination method in Ref. [16] is that the distribution over measurement outcomes obtained by characterizing and recombining circuit fragments does not generally satisfy central axioms of probability theory, namely that a probability distribution must be non-negative and normalized.A naive fix to this problem would be to simply remove all negative probabilities and normalize the reconstructed distribution in question.In the spirit of maximum likelihood state tomography (MLST) [23], however, one would like to determine the "most likely" probability distribution that is consistent with available fragment data.
In this work, we find this "most likely" probability distribution by generalizing MLST and introducing maximum likelihood fragment tomography (MLFT), the use of which guarantees that reconstructed probability distributions are non-negative and normalized.We discuss how MLFT minimizes the classical computing resources necessary to characterize circuit fragments, and provide a tensor-network-based method for fragment recombination.We test our methods in numerical experiments with random unitary circuits, and demonstrate that MLFT estimates the probability distribution at the output of a fragmented quantum circuit with higher fidelity than the naive method of removing negative probabilities and normalizing.These benefits come at no cost to the computational complexity of circuit cutting, as they are achieved by post-processing fragment data in a manner that has a smaller computational cost than that of recombining fragment data to reconstruct a circuit output.As an added bonus, for a fixed number of queries to quantum hardware (known as "shots" or "trials" in e.g.Qiskit [24] or pyQuil [25]) we show that circuit cutting methods can outperform direct execution and sampling of a clustered circuit in order to estimate its associated probability distribution.We provide theoretical arguments to support this finding, which motivates the use of circuit cutting as a standard tool for evaluating clustered circuits on quantum hardware, even when all hardware requirements for full circuit execution are satisfied.

II. BACKGROUND
Here we provide a basic overview and discussion of the circuit cutting procedure first introduced in Ref. [16], and establish terminology that we will use throughout the rest of this work.We note that our overview of circuit cutting will use the language of quantum states and channels, rather than the language of tensor networks that was used in Ref. [16].These two formalisms are mathematically equivalent, but the former will allow for a more seamless integration with the material in Section III.

A. The general cut-and-stitch prescription
Given an arbitrary quantum state |ψ of N qubits, a straightforward resolution of the identity operator I = b∈{0,1} |b b| on qubit n implies that where I n denotes the action of I on qubit n; the relation denotes equality up to a permutation of tensor factors (i.e.qubit order); and n b | ψ is a sub-normalized state of N − 1 qubits acquired by projecting |ψ onto state |b of qubit n.If the structure of a quantum circuit that prepares |ψ allows, a similar resolution of the identity operator I can be used to "cut" the circuit by inserting I at a location that splits the circuit into two disjoint subcircuits.For example, if |ψ = V 23 U 12 |000 , where U 12 and V 23 are the two-qubit gates U and V acting on qubits 1, 2 and 2, 3, then by inserting the identity operator I 2 (on qubit 2) between U 12 and V 23 we find that where the factors are (generally sub-normalized) "conditional" states prepared by projecting onto |b or preparing |b , as appropriate.The identity in Eq. ( 2) is visualized in Figure 1, albeit with the use of density operators that we discuss below.
The above splitting method relies on the capability to project qubit n onto state |b while preserving phase information.Such capability is possible when running classical simulations of a circuit, but is not possible on quantum computing hardware.This limitation can be overcome by representing quantum states |ψ with density operators ρ = |ψ ψ|, whose diagonal entries in a given measurement basis define a classical probability distribution over measurement outcomes in that basis.For ease of language, we will at times blur the distinction between a state ρ and the probability distribution defined by its diagonal entries in a fixed computational basis.In the remainder of this work, we will discuss circuit splitting and reconstruction in way that is compatible with circuit execution on quantum computing hardware.Nonetheless, our methods can be applied just as well to classical state simulation, with minor simplifying modifications to account for the added capability of performing deterministic, phase-preserving qubit projections.
The identity analogous to Eq. ( 1) for density operators ρ reads where B is a basis of self-adjoint 2 × 2 matrices with normalization tr M (i) M (j) = 2δ ij for M (i) , M (j) ∈ B; tr n denotes a partial trace with respect to qubit n; and M n with n ∈ Z n denotes an operator that acts with M on qubit n and trivially (i.e. with the identity I) on all other qubits.To be concrete, we will use the set of Pauli operators together with the singe-qubit identity operator, B ≡ {X, Y, Z, I}, as our basis.The identity in Eq. ( 4) implies that the state prepared by the action of a threequbit circuit V 23 U 12 on the trivial state |0 0| ⊗3 can be decomposed as where now the factors have no straightforward interpretation as "conditional" states, as with |ψ 1 (b) and |ψ 2 (b) in Eq. (2).In order to decompose ρ into conditional states, we can expand each M ∈ B in its eigenbasis: Circuit cutting example.A 3-qubit GHZ circuit can be cut into two 2-qubit fragments by inserting an identity operator.Here B ≡ {X, Y, Z, I} is the set of Pauli operators X, Y, Z and the identity I, which together form an orthogonal basis for the space of single-qubit operators; λ (M ) denotes the spectrum of M ; and Ms ≡ |Ms Ms| is the projector onto an eigenstate |Ms of M with eigenvalue s. Green (red) boxes labeled by the state Ms (Mr) correspond to preparations (projections) of a qubit in the corresponding state.After cutting a circuit, the resulting fragments can be simulated independently, and an appropriate post-processing of simulation results recovers the output of the original (pre-cut) circuit.
where λ (M ) denotes the spectrum of M , i.e. λ (X) = λ (Y ) = λ (Z) = (+1, −1) and λ (I) = (1, 1); and M s ≡ |M s M s | with s ∈ λ (M ) is a projector onto an eigenstate of |M s of M with eigenvalue s.Note that the choice of eigenstates for the identity operator I is arbitrary as long as these two states are orthogonal, so we can reuse the eigenstates from one of the other operators.
The decomposition in Eq. ( 7) allows interpreting each ρ f (M s ) as a conditional state, obtained either by postselecting onto the measurement of a qubit in state |M s , or by preparing a qubit in state |M s , as appropriate (see Figure 1).This decomposition thus corresponds to the following procedure for circuit cutting and reconstruction: after cutting a circuit into (say) two fragments, characterize the classical probability distributions ρ f (M s ) over measurement outcomes by running the corresponding sub-circuit and either post-selecting on measurement outcomes M s or preparing states M s , as appropriate.Note that post-selected probability distributions are generally sub-normalized, and the normalization tr ρ 1 (M s ) is equal to the probability of getting outcome M s when measuring in the diagonal basis of M .After characterizing the conditional distributions ρ f (M s ) for each of f ∈ {1, 2}, M ∈ {X, Y, Z}, and s ∈ {+1, −1}, combine these distributions according to Eq. ( 7).This scenario is illustrated in Figure 1, which cuts a 3-qubit GHZ circuit preparing the state |ψ ∝ |000 + |111 into two 2-qubit fragments.

B. Refinements
In practice, recombining circuit fragments as prescribed by Eq. ( 7) is inefficient in two ways.First, the tensor products in Eq. ( 7) are a computational bottleneck for fragment recombination.It is therefore faster to post-process conditional distributions by first (i) for each fragment f , combining the six independent distributions ρ f (M s ) into four distributions: for any M ∈ {X, Y, Z}, and then (ii) combining the fragment distributions ρ f (M ) according to Eq. (5).In a circuit with K cuts, this post-processing reduces the number of tensor products that must be computed during recombination from 16 K to 4 K , which is an exponential reduction (in K) of the number of floating-point operations required to recombine fragment data a .
Second, the recombination formula in Eq. ( 7) nominally requires, for each fragment f incident on K f cuts, characterizing K 6 f probability distributions.This characterization is overcomplete, because the K 6 f distributions are not all linearly independent.In the case of a fragment with a single incident cut, for example, we can use the fact that In fact, a fragment with K f incident cuts can be completely characterized by K 4 f distributions, which can be deduced from the fact that the space of operators on the Hilbert space of a qubit has real dimension four.The symmetric, informationally complete, positive operatorvalued measure (SIC-POVM) Π SIC j : j ∈ Z 4 (consisting of projectors Π SIC j onto the states represented by the four corners of a regular tetrahedron inscribed in a Bloch sphere), for example, form a mutually unbiased basis for the space of single-qubit operators.Given any single-qubit operator M , we can therefore expand Finally, characterizing fragments is a noisy process, due to both (i) hardware errors that are unavoidable without error correction, as in all NISQ devices, and (ii) statistical sampling (shot) noise.As a result, the "experimentally inferred" distributions ρf (M s ) approximating the "true" distributions ρ f (M s ) will generally contain errors, and will fail to satisfy self-consistency conditions such as Eq. ( 8).When combining these distributions according to Eqs. ( 5) and ( 7) there is similarly no guarantee that the reconstructed probability distribution a The recombination procedure in Ref. [16] involves 8 K tensor products, rather than 16 K , because it consolidates "measurement" conditions, but not "preparation" conditions, which is equivalent to collapsing the sum over r in Eq. ( 7) but leaving the sum over s.
will satisfy conditions required of a probability distribution, such as non-negativity and normalization.
To address these shortcomings, in the following section we recast the task of characterizing conditional distributions into the task of performing fragment tomography, treating the fragments ρ f , rather than distributions ρ f (M s ), as first-class objects.In addition to being automatically efficient in terms of the classical memory footprint of characterizing each fragment, performing fragment tomography allows us to adapt the method of maximum likelihood tomography [23] to construct a model for each fragment that is, by construction, guaranteed to satisfy all appropriate self-consistency conditions.Fragment recombination is then similarly guaranteed to yield a probability distribution that is both non-negative and normalized.Finally, we show how the fragment models constructed via fragment tomography naturally admit a tensor-network-based method for recombination.

III. MAXIMUM LIKELIHOOD FRAGMENT TOMOGRAPHY
Once a circuit has been cut into fragments ρ f , rather than characterizing conditional distributions ρ f (M s ) we can perform a more systematic maximum likelihood fragment tomography (MLFT) procedure to characterize these fragments.The purpose of MLFT is to perform a "maximum likelihood" characterization, similar to the characterization of quantum states in Ref. [23], which guarantees that any probability distribution associated with these fragments will be (i) the "most likely" distribution consistent with available fragment data, while (ii) satisfying all necessary constraints for a valid (i.e.non-negative and normalized) probability distribution.MLFT is a type of quantum process tomography, which generalizes maximum likelihood state tomography (MLST) [23] to the case of channels (processes) with mixed (quantum/classical) inputs and outputs.
Any given fragment, nominally a unitary circuit on Q qubits, will generally have Q i "quantum input" and Q o "quantum output" qubits at the locations of cuts.We refer to these inputs and outputs as "quantum" because characterizing the fragment for circuit reconstruction will require performing full quantum tomography on the corresponding degrees of freedom.In contrast, the remaining ⊗Ci , and the remaining output" qubits are always measured in a fixed computational basis.For definiteness, we can first think of a fragment as a quantum channel E Λ on the state of Q qubits.The channel-state duality [26][27][28] implies that this channel is uniquely determined by a 4-partite state (density operator) of the form Λ ≡ k, ,m,n p,q,r,s where the bitstrings k, (m, n; p, q; r, s) index states in the Hilbert space of the quantum input (classical input; quantum output; classical output) qubits of the fragment, and are implicitly summed over Z Qi 2 (Z Ci 2 ; Z Qo 2 ; Z Co 2 ).Specifically, the channel E Λ maps a bipartite input state at its input to the bipartite state at its output.To account for the fact that classical outputs are only ever measured in a fixed computational basis, we can remove all parts of E Λ (ρ) that are off-diagonal with respect to the measurement basis of the corresponding qubits.In total, we therefore need only characterize the channel E Λ defined by where Λk ;pq;s ≡ Λ k ;0,0;pq;ss .
The task of performing MLFT thus reduces to performing tomography on the tri-partite block-diagonal state where Eq. ( 14) implicitly defines the blocks Λs .In words, the reduced state Λ is acquired from the full state Λ by conditioning on (i.e.fixing) a trivial state |0 i 0 i | on its classical inputs, and the block Λs is acquired from Λ by conditioning on measurement of the bitstring s on its classical outputs.The relationship between Λ, Λ, and Λs is sketched out in Figure 2. In a nutshell, MLFT is performed by providing a variety of quantum inputs to E Λ, and measuring its quantum outputs in a variety of bases.The blocks Λs are inferred by least-squares fitting to a linear operator that maps quantum inputs to quantum outputs, using all available data from experiments in which bitstring s was observed on the classical outputs of a fragment.This procedure yields an experimental ansatz state Λ A that approximates Λ, but that generally does not have the properties required of a density operator, such as a non-negative spectrum.The last step in MLFT is therefore to convert the ansatz state Λ A into a "maximum likelihood" state Λ ML by using an algorithm borrowed from MLST in Ref. [23].We describe MLFT in more detail below.
Block-diagonalizing circuit fragments.Each circuit fragment can be identified with a density operator Λ on the joint Hilbert space of its input (left) and output (right) qubits.Classical inputs and outputs of a fragment (gray) correspond to qubits that are either prepared in the trivial state |0 (labeled "0") or measured in a fixed computational basis (labeled "0/1").Quantum inputs (left, green) and outputs (right, red) correspond to qubits associated with cuts in a circuit.Due to the presence of trivial inputs, we only need to characterize a reduced state Λ on the Hilbert space of the quantum inputs and all outputs.Classical outputs give this reduced state a block-diagonal structure: Λ = s Λs ⊗ |s s|, where the block Λs is associated with the measurement of bitstring s on the classical outputs of the fragment.
MLFT (and MLST) begins by collecting measurement data to characterize the quantum state under consideration.In the case of the block-diagonal state Λ, one needs to characterize the expectation values for some complete basis of operators {σ i ⊗ σ o ⊗ z c } on the target Hilbert space of Λ, where σ i , σ o , and z c are respectively operators on the quantum input, quantum output, and classical output of the fragment in question, with z c strictly diagonal in the computational basis.MLST [23] collects data by performing informationally complete measurements of Λ, for example by choosing operators σ i,o from the set of all Pauli strings {I, X, Y, Z} ⊗Qi,o , and choosing z c from the set of diagonal Pauli strings {I, Z} ⊗Co .In the case of fragment tomography, however, we do not have direct access to the state Λ, and instead have access to the channel E Λ.It is therefore not possible to directly measure the degrees of freedom in Λ that are associated with inputs to the channel.Instead, MLFT characterizes the quantum input degrees of freedom in Λ by preparing an informationally complete set of states, making use of the fact that where σ T i denotes the transpose of σ i .Whereas the operators σ o and z c may still be chosen from the set of Pauli strings, the input state σ T i is restricted to satisfy tr σ T i = 1.This restriction excludes the possibility of choosing σ T i from an orthogonal basis for the space of the space of Q i -qubit operators (such as the set of Pauli strings), but any complete basis will suffice.For example, one can choose input states from the basis of pure states ⊗Qi .For an unbiased basis, one can take tensor products of symmetric informationally complete (SIC) states of a single qubit, or even consider bases of multi-qubit SIC states.The practical advantages of using these bases, however, generally depend on the fidelity with which one can prepare SIC states.Similar considerations apply for the choice of measurement basis for quantum outputs [29].Overall, in order to characterize a fragment with Q i quantum inputs and Q o quantum outputs one must prepare each of 4 Qi input states, and measure outputs in each of 3 Qo possible bases (for each quantum output qubit, the diagonal bases of X, Y, Z), so fragment tomography requires O 4 Qi 3 Qo experiments.
After collecting an informationally complete set of data on the state Λ, a straightforward least-squares fitting procedure yields an empirical ansatz Λ A for Λ, which is the MLFT analogue of the "experimentally noisy" matrix µ described in the original MLST work [23].The block diagonal structure of Λ = s Λs ⊗ |s s| implies that the least-squares fitting procedure can be performed independently for each block Λs of size 2 Qi+Qo × 2 Qi+Qo .Specifically, Λs is obtained by fitting to tr Λs ( where p s is the probability of observing bitstring s on the classical output of a fragment, and σ i ⊗ σ o zc=s is the expectation value of σ o (on the quantum outputs) when preparing the state σ T i (on the quantum inputs) and observing bitstring s (on the classical outputs) of the fragment.Because the ansatz state Λ A ≈ Λ is constructed from a fit to noisy measurement data, Λ A will generally have negative eigenvalues, which is not allowed for density operators.The final step in both MLST and MLFT is therefore to find the closest state to Λ A that has no negative eigenvalues.To this end, MLFT borrows the "fast algorithm for subproblem 1" in Ref. [23], which As proven in Ref. [23], this algorithm finds the closest positive semidefinite state Λ ML to Λ A with respect to the metric induced by the 2-norm A 2 ≡ tr (A † A).
In this sense, Λ ML is the "most likely" state consistent with Λ A .The only additional consideration for this algorithm when performing MLFT has to do with making use of block diagonal structure to diagonalize Λ A : each block of size 2 Qi+Qo × 2 Qi+Qo can be diagonalized independently.The overall serial runtime of the algorithm to find Λ ML from Λ A is therefore O 2 3(Qi+Qo) N c , where N c ≤ 2 Co is the number of blocks in Λ A , or equivalently the number of distinct bitstrings observed on the classical output of the fragment throughout tomography.As we will see, the maximum-likelihood corrections to Λ A are responsible for the benefits of MLFT in estimating a circuit's output.Moreover, the cost of computing these corrections is smaller than the unavoidable cost of fragment recombination, so the benefits of MLFT are free as far as the computational complexity of circuit cutting is concerned.
The treatment of fragments and their dual states Λ as first-class objects in MLFT enables a straightforward tensor-network-based circuit reconstruction method.Rather than explicitly computing and summing over each term of the fragment recombination formula in Eq. ( 4), the basic idea is to think of the entire sum as a contraction of two tensors.We sketch out this idea in Figure 3, making use of the relationship between fragment states Λ, their reductions Λ, and diagonal blocks Λs .In total, the full probability distribution over measurement outcomes for a reconstructed circuit can be acquired by a tensor network contraction of reduced states Λ, and the individual probabilities of measuring any given bitstring at the output of a circuit can be acquired by a similar contraction of diagonal blocks Λs .
If a circuit has K cuts and F fragments, and N (f ) c distinct bitstrings were observed on the classical output of fragment f ∈ {1, 2, • • • , F } throughout fragment tomography, then reconstructing the circuit's output requires contracting f N (f ) c tensor networks, each of which nominally involves summing over 4 K terms.Whereas the 4 K cost to contract a single tensor network g can be reduced to 2 O(cc(g)) , where cc(g) is the contraction complexity of g [16], the overall multiplicative cost in N (f ) c is unavoidable.In comparison, performing maximum-likelihood corrections to fragment models comes at a cost that is additive in N (f ) c .For this reason, fragment recombination is generally the computational bottleneck of circuit cutting, and maximum-likelihood corrections add no significant overhead.

IV. NUMERICAL EXPERIMENTS
In order to test the benefits of MLFT in an applicationagnostic setting, we run classical simulations of random unitary circuits (RUCs).Because the cost of circuit cutting scales exponentially with the number of cuts made to a circuit, we construct RUCs with a structure that makes them amenable to circuit cutting (see Figure 4).We then vary the number of qubits and clusters in our RUCs, as well as the total number of samples (known as "shots" in Qiskit [24] or "trials" in pyQuil [25]) in a simulation, where the result of each sample is a single bitstring representing one measurement outcome.In this way, we compare three methods to estimate the probability distribution over measurement outcomes at the end of a clustered RUC.
First, as a standard benchmark, we consider sampling an entire circuit S times without any circuit cutting, which we refer to as the method of "full" circuit execution.Second, we consider cutting a circuit into fragments, with each fragment corresponding to a cluster as shown in Figure 4, and reconstructing these fragments as prescribed by the original circuit cutting work [16], namely without maximum likelihood corrections.We refer to this second method as the "direct" method of circuit cutting and reconstruction.A fragment with Q i quantum inputs and Q o quantum outputs has 4 Qi × 3 Qo variants that must be simulated for circuit reconstruction, where each variant corresponds to a choice of state preparations and measurement bases on the quantum inputs and outputs of the fragment.We therefore divide the budget of S samples evenly among all fragment variants.Finally, we consider the full MLFT and recombination procedure, which we refer to as the "MLFT" Random unitary circuit (RUC) of ten qubits split into three clusters.Qubits are first split among clusters as evenly as possible, and each cluster is prepared in a random state by the application of a Haar-random unitary gate [30,31].Adjacent clusters are then entangled with random two-qubit gates, before again applying a layer of random unitaries on all clusters.A clustered RUC is cut into fragments (labeled A, B, C) by cutting the bottom legs (shown in red) of every inter-cluster entangling gate.
method.The direct and MLFT methods only differ in the classical post-processing of fragment simulation results.Specifically, the differences between the final outputs of the direct and MLFT methods are entirely due to the application (or non-application) of maximum-likelihood corrections to fragment models.
To compare the efficacy of the full, direct, and MLFT methods, we compute the fidelity of reconstructed probability distributions over measurement outcomes, p estimate , with the actual probability distribution p actual that is determined by exact classical simulations of a circuit: where p actual (s) , p estimate (s) are, respectively, the probabilities of measuring the N -qubit state (bitstring) s ∈ Z N 2 according to the distributions p actual , p estimate .The fidelity F is an analogue of the quantum state overlap | φ | ψ | 2 for classical probability distributions.The only caveat in our calculation of fidelities is that they are only well defined when dealing with valid (non-negative and normalized) probability distributions, whereas the direct circuit cutting method generally yields an unnormalized distribution that may have negative entries.We therefore convert the distribution yielded by the direct method into a valid probability distribution by eliminating all negative entries (setting them to zero), and normalizing the distribution.
Figure 5 shows the infidelities I = 1 − F of the probability distributions yielded by each simulation method.To ensure that results are not sensitive to the specific choice of random gates, these infidelities are averaged over 100 instances of each clustered RUC, although in practice we find that these infidelities vary by only ∼ 1-10% of their mean value (see Appendix D). Figure 5 also shows analytical estimates of infidelity for the full and direct simulation methods, derived in Appendices A-C.
An immediate takeaway from Figure 5 is that the MLFT method introduced in this work always outperforms the direct method: MLFT infidelities are always lower than direct infidelities.This result is consistent with theoretical arguments that MLFT finds the "most likely" fragment model consistent with noisy measurement data.Although we only consider shot noise in this work, it would be interesting to see how the benefits of MLFT change with the introduction of additional noise such as measurement and gate errors.We defer a study of the effect of such errors to future work.
Figure 5 also shows that the infidelity I for all simulation methods scales more or less identically with the sample number S, namely I ∼ 1/S for large S. Though some of the numerical data in Figure 5 may better be fit by I ∝ 1/S (1+η) for some η = 0, the deviation from η = 0 are minor, and may be an artifact of small circuit sizes.It is worth noting that the original circuit cutting work [16] proved that a reconstructed circuit output (probability distribution) can be estimated to an accuracy of with S = O(1/ 2 ) samples, which by dimensional analysis suggests that I ∼ 2 ∼ 1/S in all cases.
Though scaling with sample number does not strongly distinguish these methods, it is clear that the direct and MLFT methods scale much more favorably with circuit size: the full method has an infidelity I ∼ 2 Q for Q qubits, whereas cutting a circuit into F fragments results in I ∼ is the number of classical outputs on fragment f and f C f o = Q.The more favorable scaling for circuit cutting methods is surprising at first glance, as these methods require strictly fewer quantum computing resources: their sample budget is spent on executing smaller circuits (namely, fragment variants).The better performance of the circuit cutting methods can be understood by the fact that they use their sample budget in a targeted manner that exploits circuit structure, rather than blindly sampling the entire circuit.However, when circuits are sufficiently small for the fixed number of samples to explore the sample space of the entire circuit, full circuit sampling performs better than circuit cutting because it does not waste resources on characterizing numerous variants of nearly identical fragments.
Deferring a detailed derivation of expected infidelities to Appendices A-C, we can make the above intuition more quantitative by considering the difficulty of estimating a probability distribution defined by a Q-qubit RUC by (i) sampling the full circuit directly, versus (ii) sampling all fragment variants for circuit reconstruction.The first task requires, in principle, exploring a sample space of size 2 Q with S samples, so one might reasonably expect (as is indeed the case) that I ∼ 2 Q /S.If a circuit is cut into F fragments, meanwhile, then each fragment Open markers correspond to simulations of the full circuit ("full"), or simulations via circuit cutting before ("direct") and after ("MLFT") maximum likelihood corrections to fragment models.The last two markers in the legend correspond to analytical estimates of infidelity: 2 Q /S for the full method, and F f =1 2 C f o /n for the direct method, where C f o is the number of classical outputs on fragment f and n = S/V is the number of samples devoted to each of V total fragment variants.Whereas the estimates for the full method are quantitatively accurate, the estimates for the direct method are provided only to highlight approximate scaling relationships (see Appendices A-C).Results for each data point are averaged over 100 instances of a clustered RUC.
will have ∼ Q/F qubits, and if the number of qubits is independent of the number of fragments, then the overall sample space volume is reduced from 2 Q to 2 O(Q/F ) .Indeed, this argument agrees with the estimate of infidelity for the direct method of circuit cutting in Figure 5, where we show that I ∼ F × 2 Q/F /n with n = S/V the number of samples devoted to each of V total fragment variants.

V. CONCLUSIONS AND OUTLOOK
Circuit cutting is a promising technique for reducing the qubit requirements of running clustered quantum circuits.We have introduced improved circuit cutting methods by minimizing associated classical computing costs (with an exponential improvement over previous methods), and by using MLFT to reconstruct the "most likely" probability distribution defined by a quantum circuit, given the measurement data obtained from its fragments.To test our ideas in an application-agnostic setting, we ran classical simulations of random unitary circuits, which demonstrate the advantages of MLFT compared to the original circuit cutting method.Moreover, we also show that circuit cutting has advantages as a standard technique for running clustered circuits on quantum hardware, even when full circuit execution is possible.
Our work opens several avenues for the improvement and application of circuit cutting techniques.For example, MLFT guarantees that fragment models satisfy appropriate self-consistency conditions, but MLFT makes no use of the fact that each fragment corresponds to a unitary quantum channel.Furthermore, our present work neglects the effects of hardware errors that are important to consider in the context of NISQ devices.Because MLFT has the capability to mitigate shot noise, we expect the advantages of MLFT over full circuit execution to be enhanced when additionally considering the effects of hardware errors.We likewise expect unitarity constraints to provide additional benefits for mitigating sources of noise.Our work thus complements ongoing efforts that study the benefits of circuit cutting in the presence of hardware errors, which have generally found that circuit cutting helps mitigate the effects of noise [32].Having framed fragment characterization as a tomography task, it would also be interesting to adapt and apply different quantum process tomography techniques [33] to the task of circuit cutting, and compare their performance and cost to that of MLFT.
As a final point, we note that circuit cutting in its current form estimates a probability distribution associated with a given circuit.Ideally, one would like to sample this probability distribution (defined over an exponentially large space of possible measurement outcomes) without having to reconstruct it in full.To this end, our work makes important progress in understanding the mechanics of circuit cutting, by providing a convenient and efficient framework for thinking about individual circuit fragments.We hope that this framework will help in achieving the ultimate the goal of sampling a quantum circuit by sampling its fragments.

Sampling infidelity
Let p b be the probability of observing bitstring b at the end of a circuit, and pb = p b + b an empirical estimate of p b .The infidelity of the estimated probability distribution is where we tentatively assume that all p b = 0. Expanding the square root as ), up to O 3 corrections we find that where 2 Q = b 1 is the total number of bitstrings that can be measured at the output of Q qubits.A few comments concerning the result in Eq. (B2) are in order.First, the restriction that I ∈ [0, 1] implies that Eq. (B2) can only hold for 4n > 2 Q − 1, and that if 4n is comparable to 2 Q then the O( 3 ) contributions to I must become relevant.Second, if any p b = 0, then the factor 2 Q should be replaced by the sample space volume |{b : p b = 0}|.This second observation in particular suggests that Eq. (B2) can only hold for sufficiently 'generic' probability distributions, as large separations of scale in the probabilities p b should reduce 2 Q to some smaller 'effective' sample space volume, likely determined by the output entropy S(p) = − b p b log p b .Finally, we point out that Eq. (B2) also describes the infidelity with which n samples estimate the conditional probability distributions associated with a single fragment of a cut-up circuit.Unfortunately, the presence of quantum correlations between circuit fragments implies that the infidelity of a reconstructed circuit output is not additive in the infidelities of the fragments.Nonetheless, we show in the following section that the infidelity of a reconstructed circuit still scales inversely with the number of fragment samples, and exponentially in fragment size, i.e.
where now n is the number of samples used to estimate each variant of each fragment, f indexes a single fragment, and C f o is the number of classical output bits on fragment f .structure of our calculations, and therefore leaves our main conclusions (namely, how reconstruction infidelity I scales with different fragment parameters) in tact.
In order to compute the expected infidelity of p, we now need to determine the statistical means b , covariances b c , and variances 2 b .The means b = 0 because all contributions to b are either (i) proportional to a single error in the estimate of a multinomially distributed random variable, which is mean-zero, or (ii) a product of multiple independent (uncorrelated) errors, which is also mean-zero.

a. Covariances
To compute the covariance b c , we note that are only working to second order in error variables, whereas the contributions to b c from the ∼ 1 ⊗ 2 terms of Eq. (C6) are either fourth order or zero; we are therefore free to neglect these terms.Additionally throwing out terms that vanish because they are the product of uncorrelated random variables, we find that where f ∈ {1, 2} and f = f , i.e. such that f, f = {1, 2}; and b f , c f are the substrings of b, c associated with fragment f .We now expand where we again throw out terms that are zero or fourth order in error variables.The normalization errors β f (M r ) , β f (M s ) are zero for initialization conditions, and are always correlated for measurement conditions because they are errors in mutually exclusive measurement outcomes.The probability distribution errors γ f b (M r ) , γ f c (M s ), meanwhile, are independent unless M r = M s .The covariances between these errors are determined by multinomial distribution sampling errors, so where n is the number of times that we sample fragment each variant of each fragment (i.e. each choice of initialization conditions and measurement bases on a fragment), a f (M r ) n is the expected number of times that we sample fragment f with condition M r , Q f o is the number of quantum outputs (i.e. or measurement conditions) on fragment f .If δ Q f o ,0 = 0, then the ∼ a f (M r ) δ rs contributions from Eq. (C10) cancel out with the ∼ q f b (M r ) q f c (M r ) contributions from Eq. (C11) when substituting these results into Eq.( we thus find that b,c b c = 0. (C16) The infidelity I is therefore determined entirely by the variances 2 b .

b. Variances
We now consider the variances where from Eq. (C12) we know that In principle, we have to simplify but this calculation is intractable due to the sum over M ∈ B in the denominator.We therefore settle for trying to find an upper bound on this sum, to which end we observe that on fragment f , averaged over all possible conditions.This mean probability is p f b f = p f b f (I) /2 Q f i , where Q f i is the number of quantum inputs (initialization conditions) on fragment f , so In turn, we can bound The factor p 1 b1 p 2 b2 /p b is a measure of the quantum correlation between two fragments: it is equal to 1 if there is no correlation, and is smaller (greater) than 1 if quantum correlations cause constructive (destructive) interference for measurement outcome b on the combined circuit.In principle, this factor can be made arbitrarily large, but that requires fine tuning, and generally speaking we would expect p 1 b1 p 2 b2 /p b ∼ O(1) for random circuits.We therefore expect that where (C34) Though unsure how to evaluate this quantity, we can use the fact that a f M f ≤ a f I f 2 ≤ 4 K f to bound and similarly As in the case of one cut and two fragments, we now bound The contribution to infidelity I from the variances 2 b is then so altogether In practice, we find this asymptotic bound to be overly pessimistic with regards to the scaling with K.There are other ways in which this bound is too optimistic: by assuming that fragments are weakly correlated and 1 p b h p h b h ∼ O(1), this bound does not capture the effect of errors due to noisy virtual teleportation of qubits across cuts between fragments, at their quantum inputs and outputs.Nonetheless, the bound in Eq. (C44) demonstrates ∼ 1

FIG. 3 .
FIG.3.Fragment recombination as a tensor network contraction problem.The full probability distribution over measurement outcomes for a circuit reconstructed from fragments A, B, C can be represented by a tensor contraction of the reduced states Ã, B, C, obtained by performing MLFT on the fragments.The probability to measure a given bitstring k mn (i.e. a concatenation of k, , m, n ∈ Z2) on the output of the fragment is given by the contraction of the diagonal blocks Ãk , B m , Cn.The lack of classical inputs to fragment B implies that B = B.
FIG.4.Random unitary circuit (RUC) of ten qubits split into three clusters.Qubits are first split among clusters as evenly as possible, and each cluster is prepared in a random state by the application of a Haar-random unitary gate[30,31].Adjacent clusters are then entangled with random two-qubit gates, before again applying a layer of random unitaries on all clusters.A clustered RUC is cut into fragments (labeled A, B, C) by cutting the bottom legs (shown in red) of every inter-cluster entangling gate.

FIG. 5 .
FIG. 5. Infidelity in reconstructed circuit outputs.The infidelity I = 1 − F as a function of sample number S (a, b, c) or qubit number Q (d, e, f) for clustered random unitary circuits (RUCs) with F = 2 (a, d), 3 (b, e) or 4 (c, f) fragments.Open markers correspond to simulations of the full circuit ("full"), or simulations via circuit cutting before ("direct") and after ("MLFT") maximum likelihood corrections to fragment models.The last two markers in the legend correspond to analytical estimates of infidelity: 2 Q /S for the full method, and F f =1 2 C f o /n for the direct method, where C f o is the number of classical outputs on fragment f and n = S/V is the number of samples devoted to each of V total fragment variants.Whereas the estimates for the full method are quantitatively accurate, the estimates for the direct method are provided only to highlight approximate scaling relationships (see Appendices A-C).Results for each data point are averaged over 100 instances of a clustered RUC.
factors p f b f (I), we can express this bound in terms of the mean probability p f b f to get bitstring b f C36) which implies that the contribution to I from the covariances b b satisfies

n f 2
C f o scaling with shot number n and fragment size C f o , which (all else equal) are not affected by these considerations.
FIG. D1.Standard deviation of the infidelities shown in Figure 5.

Figure 5
Figure 5 of the main text shows infidelities I of reconstructed outputs for clustered random unitary circuits (RUCs).To provide a sense of the robustness of I to circuit variations, Figure D1 shows the standard deviation σ (I) = (I − I ) 2 in the same simulations.Generally speaking, σ (I) is orders of magnitude smaller than