Variational consistent histories as a hybrid algorithm for quantum foundations

Although quantum computers are predicted to have many commercial applications, less attention has been given to their potential for resolving foundational issues in quantum mechanics. Here we focus on quantum computers’ utility for the Consistent Histories formalism, which has previously been employed to study quantum cosmology, quantum paradoxes, and the quantum-to-classical transition. We present a variational hybrid quantum-classical algorithm for finding consistent histories, which should revitalize interest in this formalism by allowing classically impossible calculations to be performed. In our algorithm, the quantum computer evaluates the decoherence functional (with exponential speedup in both the number of qubits and the number of times in the history) and a classical optimizer adjusts the history parameters to improve consistency. We implement our algorithm on a cloud quantum computer to find consistent histories for a spin in a magnetic field and on a simulator to observe the emergence of classicality for a chiral molecule.


Jochen Gemmer
Reviewer #2 (Remarks to the Author): The paper by Arrasmith and collaborators discusses the use of a quantum computer within the consistent histories framework to address quantum foundational problems. Although the algorithm presented uses many of the same tools these authors have used in previous studies, the work describes a new way to use variational algorithms to answer interesting questions, and shows a proof of principle computation on an experimental device. While this is in-principle a novel way to use a quantum device, what is less clear to me in terms of impact that could result form this use. While the work has this potential, I think the authors could strengthen the exposition in a few ways to better merit publication, which I detail below.
In particular, many works on quantum foundations posit the power of the approach to answer fundamental questions about interpretation and connection to classical correspondence, but often do so in an abstract sense. To better gauge the potential for impact, it would be helpful if the authors could posit at least one use case they would apply the technique to in the aspirational limit of many/perfect qubits to get a result that has not yet been classically accessible. It does not need to be performed on a device, but rather a sketch of the question to be answered, which system it would be applied to, how one uses their technique to measure it, and about how much resources one might need. It is difficult for someone outside of quantum foundations to know what measurements or results would settle an unanswered question. If no concrete use cases are known where one wants to measure a beyond classical result that would answer a real foundational question, it may be safe to assume this work will minimal impact.
On the more technical side, I find one aspect a bit troubling that I hope the authors can correct me on. In particular, there are likely to be an almost infinite number (in a continuous construction) of possible families of consistent histories. This could lead to a large redundancy in the optimization landscape, where even at convergence one can drift between different consistent families. The assertion that sampling is efficient and one finds a dominant set of classical paths would seem to depend on having found a family of histories where such classical paths are ingrained the the projectors. If I randomize these projectors continuously, but still get a consistent set of histories, I will likely find an inefficient distribution, circumventing most of the useful aspects of this approach, and rendering it somewhat useless. In the body and text of the methods section, the authors assert that one can penalize such high entropy consistent histories through a particular measurement. However, it seems like this could lead to projectors which have quantum character (project into superpositions of spins for example) in order to minimize this entropy measure, which seems like it defeats the purpose of a consistent histories approach telling the story in a classically understandable way. I imagine in the non-variational version of consistent histories, these projectors are probably just chosen this way. I feel to strengthen the paper, the authors need to go into more detail as to how their approach can find paths that are both *likely* and *classically interprettable*, otherwise I can't really see the utility in the approach.

Reviewer #3 (Remarks to the Author):
This is an interesting piece of work which covers a wide variety of aspects and applications of the field of decoherent histories. I believe that with small changes it is suitable for publication and will, as the authors state, revitalize the field. I have only minor comments which I ask the authors to address. 1. There is only one small error that I could spot. The authors state after Eq.(4) that D(a,a) is a probability if and only if E. (4) holds. Actually, we only need the real part of the LHS of Eq. (4). This run on a quantum computer, this would most likely cause a leap forward in the consistency based reserach on the foundation of quantum physics.
Overall the paper is clearly written and the algorithm is to my best knwowledge original. The result is timely and possibly of major impact. In principle I would thus recommend publication of the paper. There are however a few points that I would ask the authors to address prior to a final decision: 1. At the end of the first papragraph it reads: "...the emergence of classical behavior (objectivity, irreversibility, lack of interference, etc.)...." Should "irreversibility" really be in this list? At the bottom quantum and classical physics are reversible both alike.
In this context, we mean irreversibility in the sense of statistical mechanics associated with increasing entropy. This sense of irreversibility is important for the behavior of many aspects of the classical world we experience.
2. Near the end of the introduction it reads: "Hence, useful implementations of our algorithm will be feasible on near-term quantum devices." This may be perceived a bit optimistic (speculative).
We acknowledge that this is speculative and have changed "will" to be "may". We note however that this speculation is not wildly unreasonable based on the recent progress that has been made with quantum computers. 3. The Hamiltonian of the first example reads H = -\gamma B \sigma_z. What is the concrete value of \gamma B that was used to compute Fig. 3 ? What is the time elapsed before and between the two measurements?
Rather than specifying a specific value of \gamma, B, and \Delta t, we instead specify only their product (since that is all that matters for the current situation). For our purposes, we set \gamma* B*\Delta t = 2 radians. We have clarified this in the section "Spin in a Magnetic Field". 4. In the caption of Fig. 3 the initial state is | +><+| . What is this state? Is its definition similar to the one given later in the context of the chiral molecule?
We have also clarified this point in the section "Spin in a Magnetic Field".  5. What is the "simulator" (Fig. 3a) ? The first example is simple enough to allow for a suffienctly precise computation of the cost landscape using standard numerical means. So is the landscape from the simulator practically exact?
Our numerical simulator is fairly standard and capable of being essentially exact, but we choose to reproduce the "noise" of the finite statistics associated with the number of samples that we took on the physical quantum computer we used. This is now clarified in the text. 6. Also in the caption of Fig. 3 it reads: "Note that negative cost values are possible due to finite statistics". What statistics does that refer to? I guess the simulator does not need to rely on any statistics?
The statistics are those mentioned in response to point 5.
7. What are the "non-unique" minima marked by the white x-signs in Fig. 3 b? What is special about them? Are they also found by the simulator? And if not, why?
In principle, there is nothing special about these points. They just happen to have been found on the noisy quantum computer by imposing a noise cutoff on points found with a minimization procedure. This has also been clarified in the text. 8. The Section on "experimental realizations" is a bit poorly structured. The subsections are "quantum hardware" and "simulator" but indeed two different examples are discussed in the two subsections. The calrity of presentation of this Section could be improved.
We agree that the presentation here needed improvement and have now changed the subsection headings to hopefully improve the clarity.

Jochen Gemmer
Reviewer #2 (Remarks to the Author): The paper by Arrasmith and collaborators discusses the use of a quantum computer An Equal Opportunity Employer / Operated by Triad National Security, LLC for the U.S. Department of Energy's NNSA within the consistent histories framework to address quantum foundational problems. Although the algorithm presented uses many of the same tools these authors have used in previous studies, the work describes a new way to use variational algorithms to answer interesting questions, and shows a proof of principle computation on an experimental device. While this is in-principle a novel way to use a quantum device, what is less clear to me in terms of impact that could result form this use. While the work has this potential, I think the authors could strengthen the exposition in a few ways to better merit publication, which I detail below.
In particular, many works on quantum foundations posit the power of the approach to answer fundamental questions about interpretation and connection to classical correspondence, but often do so in an abstract sense. To better gauge the potential for impact, it would be helpful if the authors could posit at least one use case they would apply the technique to in the aspirational limit of many/perfect qubits to get a result that has not yet been classically accessible. It does not need to be performed on a device, but rather a sketch of the question to be answered, which system it would be applied to, how one uses their technique to measure it, and about how much resources one might need. It is difficult for someone outside of quantum foundations to know what measurements or results would settle an unanswered question. If no concrete use cases are known where one wants to measure a beyond classical result that would answer a real foundational question, it may be safe to assume this work will minimal impact.
In response to this point, we have added additional discussion on two concrete use cases in the "Discussion" section, and expanded on them in the Supplementary Material. In particular, we focused on two systems for which experimentalists have observed a sort of quantum-to-classical transition, but for which there is no theoretical treatments of these transitions. The first example is spin diffusion, where the scale and abruptness of the transition is unknown. Exploring this may be relevant to the development of spin qubits for quantum computing. The second example is protein folding, where there is an ongoing debate about whether proteins fold by many pathways or by a single deterministic pathway. Applying VCH to this case could help to resolve this debate. Please see our revised Discussion section, as well as our newly added Appendix F, for details about these applications, including estimates for the qubit resource requirements.
On the more technical side, I find one aspect a bit troubling that I hope the authors can correct me on. In particular, there are likely to be an almost infinite number (in a continuous construction) of possible families of consistent histories. This could lead to a large redundancy in the optimization landscape, where even at convergence one can drift between different consistent families. The assertion that sampling is efficient and one finds a dominant set of classical paths would seem to depend on having found a family of histories where such classical paths are ingrained the the projectors. If I randomize these projectors continuously, but still get a consistent set of histories, I will likely find an inefficient distribution, circumventing most of the useful aspects of this approach, and rendering it somewhat useless. In the body and text of the methods section, the authors assert that one can penalize such high entropy consistent histories through a particular measurement. However, it seems like this could lead to projectors which have quantum character (project into superpositions of spins for example) in order to minimize this entropy measure, which seems like it defeats the purpose of a consistent histories approach telling the story in a classically understandable way. I imagine in the nonvariational version of consistent histories, these projectors are probably just chosen this way. I feel to strengthen the paper, the authors need to go into more detail as to how their approach can find paths that are both *likely* and *classically interprettable*, otherwise I can't really see the utility in the approach.
We understand the concern here. However, we wish to reassure the reviewer that there is a very strong connection between pathways that are "likely" and those that are "classically interpretable". There is an extensive body of literature (e.g., see the newly added references [51,52]) that connects classicality to predictability. This concept (or phenomenon) is called the predictability sieve, and it was partially developed by one of the authors (Zurek). The idea is that families of histories that we associate with classical dynamics also happen to correspond to families that have low entropy (high predictability) relative to other families for the same system.
In the context of our VCH algorithm, the situation is fortuitous. We are lucky that the families for which VCH may be efficient (low entropy families) also happen to correspond to the families in which we are most interested (classical families).
The benefit of our proposed alternative cost function \tilde{C} is that we are actually biasing our search towards histories that are more predictable and hence more classical (via Refs [51,52]). We have added a mention of this connection at the end of the "Precision of probability readout" sub-section in the "Methods".

Reviewer #3 (Remarks to the Author):
This is an interesting piece of work which covers a wide variety of aspects and applications of the field of decoherent histories. I believe that with small changes it is An Equal Opportunity Employer / Operated by Triad National Security, LLC for the U.S. Department of Energy's NNSA suitable for publication and will, as the authors state, revitalize the field. I have only minor comments which I ask the authors to address.
1. There is only one small error that I could spot. The authors state after Eq.(4) that D(a,a) is a probability if and only if E.(4) holds. Actually, we only need the real part of the LHS of Eq.(4). This needs to be corrected. (And any subsequent consequences worked out). Alternatively, the authors could observe that although only the real part is required but, typically, both real and imaginary parts vanish hence it is no particular loss to work with the stronger condition Eq.(4).
We thank the reviewer for pointing this out. Indeed, we had left out a mention that the weak decoherence condition (where the only the real part is required to vanish) suffices for a lack of interference. We have clarified this point in the text.
2. The authors make the general remark early on about how the number of histories increases dramatically with system size and number of times, and hence the computational intractability increases correspondingly. I make one small comment here which is that for linear systems (and perturbation about them) described by continuous variables one can often solve for the decoherence functional in essentially exact terms (even though one has an infinite dimensional Hilbert space). Some brief (optional) discussion around this case may be useful by way of contrast to the finite dimensional systems considered here.
We have added a brief comment in the introduction acknowledging that there are also some cases of simple continuous systems where path integral approaches can facilitate the evaluation of the decoherence functional, along with a relevant citation. 3. A closely related situation to consider is the linear positive condition of Goldstein and Page, in which the histories are assigned a quasi-probability q(a) = Re Tr ( C^a rho ). When positive it is a probability satisfying all sum rules exactly and is an alternative but weaker condition to the decoherence condition Eq.(4) (and also involves a smaller number of component conditions). These conditions have been the focus of some interest over the years so it may be a nice extension of the present work to show how to apply the technique to this case. Also, note that q(a) < 0 implies that the histories must be inconsistent so this is a potentially cheaper way of identifying inconsistent histories which does not involve all the off-diagonal terms of Eq.(3). This is again an optional change.
We do not feel that it would be helpful to include a discussion of this alternative as this work is about the consistent history framework. Additionally, while we can see a way to use methods similar to what we have used here for evaluating the linear positive condition for a single particular history, we can see no way to do so that is more efficient than evaluating the VCH cost function and thus getting all of the off-diagonal terms. We therefore also do not expect that there would be a computational advantage to the linear positive condition over consistent histories on quantum computers to motivate such an extension.

Sincerely,
While quantum computers are predicted to have many commercial applications, less attention has been given to their potential for resolving foundational issues in quantum mechanics. Here we focus on quantum computers' utility for the Consistent Histories formalism, which has previously been employed to study quantum cosmology, quantum paradoxes, and the quantum-to-classical transition. We present a variational hybrid quantum-classical algorithm for finding consistent histories, which should revitalize interest in this formalism by allowing classically impossible calculations to be performed. In our algorithm, the quantum computer evaluates the decoherence functional (with exponential speedup in both the number of qubits and the number of times in the history), and a classical optimizer adjusts the history parameters to improve consistency. We implement our algorithm on a cloud quantum computer to find consistent histories for a spin in a magnetic field, and on a simulator to observe the emergence of classicality for a chiral molecule.
The foundations of quantum mechanics (QM) have been debated for the past century [1,2], including topics such as the EPR paradox, hidden-variable theories, Bell's Theorem, Born's rule, and the role of measurements in QM. This also includes the quantum-to-classical transition, i.e., the emergence of classical behavior (objectivity, irreversibility, lack of interference, etc.) from quantum laws [3][4][5].
The Consistent Histories (CH) formalism was introduced by Griffiths, Omnès, Gell-Mann, and Hartle to address some (though not all) of the aforementioned issues [6][7][8]. One inventor considered CH to be "the Copenhagen interpretation done right" [6], as it resolves some of the paradoxes of quantum mechanics by enforcing strict rules for logical reasoning with quantum systems. In this formalism, the Copenhagen interpretation's focus on measurements as the origin of probabilities is replaced by probabilities for sequences of events (histories) to occur, and hence by avoiding measurements it avoids the measurement problem. The sets of histories whose probabilities are additive (as the histories do not interfere with each other) are considered to be consistent and are thus the only ones able to be reasoned about in terms of classical probability and logic [7].
Here we present a scalable algorithm for the CH formalism that achieves an exponential speedup over classical methods both in terms of the system size and the number of times considered. It will allow exploration beyond toy models, such as the quantum-to-classical transition in mesoscopic quantum systems. We expect this to revitalize interest in the CH approach to quantum mechanics by increasing its practical utility.
Our algorithm is a variational hybrid quantumclassical algorithm (VHQCA). With the impending arrival of the first noisy intermediate-scale quantum computers [18], the field of VHQCAs, which make the most of short quantum circuits combined with classical optimizers, has been taking off. VHQCAs have now been demonstrated for a myriad of tasks ranging from factoring to finding ground states, among others [19][20][21][22][23][24][25][26]. The VHQCA framework potentially brings the practical applications of quantum computers years closer to fruition. Hence, useful implementations of our algorithm will ::: may be feasible on near-term quantum devices.
Below we introduce our algorithm, present experimental results from implementing our method on IBM's superconducting qubit quantum processor as well as on a simulator, and then discuss future applications.

CONSISTENT HISTORIES BACKGROUND
In the CH framework [27][28][29], a history Y α is a sequence of properties (i.e., projectors onto the appropriate subspaces) at a succession of times t 1 < t 2 < . . . < t k , where P αj j is chosen from a set P j of projectors that sum to the identity at time t j . For example, for a photon passing through a sequence of diffraction gratings and then striking a screen, a history could be the photon passed through one slit in the first grating, another slit in the second, and so on. Clearly, we find interference between such histories unless there is some sense in which the photon's path has been recorded. Since there is interference, we cannot add the probabilities of the different histories classically and expect to correctly predict where the photon strikes the screen.
The CH framework provides tools for determining when a family (i.e., a set that sums to the multi-time identity operator) of histories F = {Y α } exhibits interference, which is not always obvious. In this framework, one defines the so-called class operator which is the time-ordered product of the projection operators (now in the Heisenberg picture and hence explicitly time dependent) in history Y α . If the system is initially described by a density matrix ρ, the degree of interference or overlap between histories Y α and Y α is This quantity is called the decoherence functional. The consistency condition for a family of histories F is then Re( ::: If and only if this condition holds do we say that D(α, α) is the probability for history Y α . ::: For ::::::::::::: computational ::::::::::: convenience, :::: we :::: will :::::::: instead :::::: work ::::: with ::: a :::::::: stronger :::::::: condition :::: [28]: : :::::::::::::::::::::: Since we are presenting a numerical algorithm, it will also be useful to consider approximate consistency, where we merely insist that the interference is small in the following sense: which guarantees that probability sum rules for F are satisfied within an error of [30].
To study consistency arising purely from decoherence (i.e., records in the environment), researchers have proposed a functional that instead takes a partial trace over E, which is (a subsystem of) the environment [31,32]: With this modification, the consistency condition is where 0 is the zero matrix. Instead of only signifying the lack of interference, partial-trace consistency singles out whether or not the records of the histories in the environment interfere. Note that the full-trace condition of Eq. (5) is satisfied when this partial-trace consistency is satisfied, but the converse does not hold [31]. With this formalism in hand, we can now see why classical numerical schemes for CH have faced difficulty. For example, consider histories of a collection of n spin-1/2 particles for k time steps, depicted in Fig. 1. The number of histories is 2 nk , and hence there are ∼ 2 2nk decoherence functional elements. Furthermore, evaluating each decoherence functional element D(α, α ) requires the equivalent of a Hamiltonian simulation of the system, i.e., the multiplication of 2 n × 2 n matrices. This means modern clusters would take centuries to evaluate the consistency of a family of histories with k = 2 time steps and n = 10 spins. Given this limitation, we can see why, for the most part, only toy models have been analyzed in this framework thus far.

HYBRID ALGORITHM FOR FINDING CONSISTENT HISTORIES
We refer to our VHQCA as Variational Consistent Histories (VCH), see Fig. 2. VCH takes as its inputs a physical model (i.e., an initial state ρ and a Hamiltonian H) and some ansatz for the types of projectors to consider. It outputs: (1) a family F of histories that is (approximately) full and/or partial trace consistent in the form of projection operators prepared on a quantum computer, , and a measure of how consistent F is (d). This is accomplished via a parameter optimization loop (b), which is a hybrid quantum-classical computation. Here the classical computer adjusts the projector parameters (contained in the gates {Bj(θ)}, where Bj(θ) diagonalizes the Pj projectors) and a quantum computer returns the cost. Note that Pj denotes the set of Schrodinger-picture projectors at the j th time. The optimal parameters are then used to compute the probabilities of the most likely histories in F (panel c) and to prepare the projectors for any history in F (panel (e), where X is the Pauli-X operator). While the quantum circuits are depicted for a one-qubit system, the SM discusses the generalizations to multi-qubit systems, non-trivial environment E, coarse-grained histories, and branch-dependent histories.
quantum computer evaluates a cost function that quantifies the family's inconsistency, while a classical optimizer adjusts the family (i.e., varies the projector parameters) to reduce the cost. Classical optimizers for VHQCAs are actively being investigated [26,33], and one is free to choose the classical optimizer on an empirical basis.
To compute the cost, note that the elements of the decoherence functional form a positive semi-definite matrix with trace one. In VCH, we exploit this property to encode D in a quantum state σ A , whose matrix elements are α|σ A |α = D(α, α ).
Step b of Fig. 2 shows a quantum circuit that prepares σ A . This circuit transforms an initial state ρ ⊗ |0 0| on systems SA, where S simulates the physical system of interest and A is an ancilla system, into a state σ SA whose marginal is σ A . For the full trace consistency, we introduce a global measure of the (in)consistency that quantifies how far σ A is from being diagonal, which serves as our cost function: where D HS is the Hilbert-Schmidt distance and Z A (σ A ) is the dephased (all off-diagonal elements set to zero) version of σ A . This quantity goes to zero if and only if F is consistent. For the partial trace case, we arrive at a similar cost function but with σ A replaced by σ SA : (10) Here the notation Z A (σ SA ) indicates that the dephasing operation only acts on system A, and the absolute squares of Eq. (9) have been generalized to Hilbert-Schmidt norms, M 2 HS := Tr(M † M ). In the Methods section, we present quantum circuits that compute these cost functions from two copies of σ A or σ SA . Derivations of the second equalities in Eq. (9) and Eq. (10) can be found in the Supplementary Material (SM). We remark that alternative cost functions may be useful, for example, to penalize families F with high entropy (see Methods) or to obtain a larger cost gradient by employing local instead of global observables (see Ref. [26]).
The parameter optimization loop results in an approximately consistent family, F, of histories, where the consistency parameter is upper bounded in terms of the final cost (see Methods). In Step c in Fig. 2, we then generate the probabilities for the most likely histories by repeatedly preparing σ A and measuring in the standard basis, where the measurement frequencies give the probabilities.
Step e shows how one prepares the set of projection operators for any given history in F. These projectors can then be characterized with an efficient number of observables (i.e., avoiding full state tomography) to learn important information about the histories.
Let us discuss the scaling of VCH. With the potential exceptions of the Hamiltonian evolution and the projection operators, the complexity of our quantum circuits (i.e., the gate count, circuit depth, and total number of required qubits) scales linearly with both the system size n and the number of times k considered. The complexity of Hamiltonian evolution to some accuracy is problem dependent, but we typically expect polynomial scaling in n for physical systems with properties like translational symmetry [34]. On the other hand, we consider the circuit depth for preparing the history projectors to be a refinement parameter. One can begin with a short-depth ansatz for the projectors and incrementally increase the depth to refine the ansatz, potentially improving the approximate consistency. We therefore expect the overall scaling of our quantum circuits to be polynomial in n and k for the anticipated use cases of VCH.
The complexity of minimizing our non-convex cost function is unknown, which is typical for VHQCAs. As classical methods for finding consistent families also involve optimizing over some parameterization for the projectors, classical methods also need to deal with this optimization complexity issue.
While the number of required repetitions of the probability readout step can scale inefficiently in n and k for certain families of histories, we assume that minimizing the cost outputs a family F for which the probability readout step is efficient. (See Methods for elaboration on this point.) This scaling behavior means that for systems that can be tractably simulated on a quantum computer and whose properties of interest are simple to implement, we achieve an exponential speedup and reduction in the needed resources as compared to classical approaches to this problem.
Simulator :::::: Chiral ::::::::: Molecule To highlight applications that will be possible on future hardware, we simulate VCH to observe the quantum-toclassical transition for a chiral molecule [36,37]. It has , and thus we find that the energy eigenbasis (z axis) is the only consistent stationary family as all others will branch as they evolve. In contrast, panels c and d are the full and partial trace cost functions, respectively, for the case where the environment interactions dominate (θz = .01 rad, θx = 5 rad). One can see in c and d a significant difference between the full and partial trace costs for the y axis, meaning that this family of histories is consistent but not classical. In this regime, we also see that the chirality basis (the x axis) is a local minimum for both cost functions and thus is approximately consistent and classical. For this chirality basis family, there is a ∼ 0.01% chance that the molecule will change chirality during the evolution, showing that the quantum-to-classical transition leaves this system in a stabilized chiral state. been modeled as a two level system where the right |R and left |L chirality states are described as |R /|L = |+ /|− = 1 √ 2 (|0 ± |1 ) [37]. A chiral molecule in isolation would tunnel between |R and |L , but we consider the molecule to be in a gas, where collisions with other molecules convey information about the molecule's chirality to its environment. This information transfer is modeled by a rotation by angle θ x about the x axis of an environment qubit, controlled on the system's chirality, and for simplicity we suppose such collisions are evenly spaced at five points in time. (See the SM for further details.) We then consider simple families of stationary histories [37], where the projector set corresponds to the same basis at all five times (just after a collision occurs). Letting θ z be the precession angle due to tunneling in the time between collisions, we can then explore the competition between decoherence and tunneling. Figure 4 shows our results for this model. Notably we observe the transition from a quantum regime, where the chirality is not consistent, to a classical regime, where the chirality is both consistent and stable over time.

A. Evaluation of the Cost
The Tr((σ A ) 2 ) and Tr((σ SA ) 2 ) terms are computed via the Swap Test, with a depth-two circuit and classical post-processing that scales linearly in the number of qubits [49,50]. A similar but even simpler circuit, called the Diagonalized Inner Product (DIP) Test [26], calculates the Tr(Z A (σ A ) 2 ) term with a depth one circuit and no post-processing. Finally, the Tr(Z A (σ SA ) 2 ) term is evaluated with the Partial-DIP (PDIP) Test [26], a depth-two circuit that is a hybridization of the Swap Test and the DIP Test.

C. Approximate Consistency
Here we discuss how VCH outputs an upper bound on the consistency parameter . Let us first relate the cost C to . For any pair of histories Y α and Y α in F, which follows from Eq. (9) and the fact that |D(α, α )| = |D(α , α)|. Let us define Then it follows from Eq. (14) that which corresponds to the approximate consistency condition from Eq. (6). Hence, probablity sum rules for these two histories are satisfied within error α,α , which can be calculated from Eq. (15) for histories in F c since the probabilites are known for these histories. Next, consider histories in F c . As we do not have enough information to differentiate these histories, we advocate combining the elements of F c into a single coarsegrained history Y γ .
Let Y β be the least likely history in F c . Then defining δ 2 = D(γ, γ)/D(β, β), we can make use of the positive semi-definite property of σ A to write: Since Y β is the least likely history in F c , this expression then lets us bound the error on the probability sum rule (giving a weaker approximate consistency condition [30]) between Y γ and any Y α ∈ F c as: It is then possible to characterize the approximate consistency of the histories of F pairwise with α,α and δ. Alternatively, to give an upper bound on the overall consistency , we take the greatest of these pairwise bounds: For those applications where we are working with the partial trace consistency, the notion of approximate consistency is somewhat more obscured. In order to generate probabilities and bound , we therefore recommend evaluating the full trace cost function at the minimum found with the partial trace cost. This approach is helpful since any partial trace consistent family will also be full trace consistent and the partial trace consistency does not directly allow one to discuss probabilities in the same way. Taking this approach allows us to then directly utilize the approximate consistency framework above.

Supplementary Material for "Variational Consistent Histories: A Hybrid Algorithm for Quantum Foundations" Appendix A: Generalizations
Here we discuss various generalizations of the circuits shown in the main text, which presented our VCH algorithm for the special case of branch-independent histories of a one-qubit system S with no environment E.

Multi-Qubit Systems
The circuits in the main text showed systems S composed of a single qubit. The generalization to multi-qubit systems is straightforward. We must discuss the generalizations of both the state preparation circuit in Fig. 2 as well as the cost evaluation circuits in Fig. 5. In particular, this figure shows how a portion of state preparation circuit (the portion that entangles the system to the ancillas) generalizes for the case of a fine-grained set of projectors. (Note that the case of a coarse-grained set of projectors is discussed in the next subsection.) The cost evaluation circuits in Fig. 5 generalize as follows. For fine-grained histories, one needs n ancillas for each time step and hence a total of nk ancillas. The circuits in Fig. 5 shown for k ancillas generalize in a straightforward way, where now one has nk ancilla systems. In addition, the circuits in Fig. 5b also involve the S system, and hence all n qubits in S must be included in this circuit. Again, these n qubits are included in the most straightforward way (in the same way that the single qubit S system appears in the circuits in Fig. 5b).
FIG. S.1. The generalization of our state preparation circuit to multi-qubit systems S. In this example, we show the portion of the circuit that entangles the system and the ancillas, for the special case of a fine-grained set of projectors. In this fine-grained case, one employs the same number of ancilla qubits as are in S, i.e., n qubits.

Coarse Grained Histories
Multi-qubit systems S allow for non-trivial coarsegrained histories. In such families of histories, the sets P j are composed of projectors whose ranks are possibly greater than one. We remark that coarse-grained histories are often important to the study of macroscropic systems and the quantum-to-classical transition. VCH can easily be adapted to study coarse-grained histories as follows.
For each time t j , one should decide (prior to running VCH) what projector ranks that one is interested in. VCH will then optimize over sets of projectors with these particular ranks. The projector ranks can therefore be viewed as hyperparameters, i.e., parameters that one fixes for a given run of VCH.
For instance, suppose S is composed of a pair of spins. In this case, Fig. S.2 shows two examples of the state preparation circuit for a single time step. In the first example, Fig. S.2a, we consider a projector set that contains two rank-two projectors revealing whether the spins were aligned or anti-aligned. In the second example, Fig. S.2b, we consider a projector set that contains a rank-three and a rank-one projector that respectively indicate whether the spins are in the triplet states or the the singlet state. Note that the ranks of the projectors are determined by the gate that entangles the system to the ancilla, which is a single CNOT gate in  Examples of implementing coarse-grained projector sets in our state preparation circuit, when S corresponds to two spin-1/2 particles. The projectors in a record whether the two spins are aligned or anti-aligned, while the projectors in b differentiate between the spin singlet and spin triplet states.

Nontrivial Environments
For many applications of VCH, (e.g., the chiral molecule example in the main text) it will be helpful to explicitly model an environment E. We can think of this case as a particular choice of coarse graining where the projectors we consider only act on a subsystem of our model (the S system) and do not directly record any information about E. Note that the Hamiltonian evolution involves both S and E, as shown in Fig. S.3.   FIG. S.3. Simple example with an environment E. The projectors still only act on S, but the evolution includes both S and E.

Branch Dependent Histories
A final generalization that we consider are families of branch dependent histories [1], or histories where the projector set at a given time may depend on the properties of the system at earlier points in the histories. VCH can accommodate these histories, as follows.
The basic idea is that the unitary gate B j that determines the projector set at time t j now becomes a controlled unitary. Specifically, the control system(s) for B j are (potentially) all the ancilla qubits associated with times t i < t j . So the choice of projector set at some time is influenced by the ancilla states for earlier times.

Appendix B: Generalized state preparation
We now present the details of our generalized state preparation circuit (as shown in Fig. S.5) and show that σ SA and σ A have the properties we claim in the main text. Note that our treatment here includes all of the generalizations discussed above in Appendix A. We begin with the input state ρ SE ⊗ |0 0| A (where the superscript SE denotes the system and its environment and A denotes the ancillas). We then apply the gate sequence associated with the P 1 projector set, which includes B 1 , a multi-qubit gate that entangles S and A (which we refer to as the "entangling gate"), and then B † 1 . This gives the state: Note that the system and ancilla are (possibly) entangled at this point. Next in our state preparation circuit is the time evolution from t 1 to t 2 , given by e −iH∆t1,2 . This is followed by the gate sequence associated with P 2 , which in general may be branch dependent. The resulting state is where the notation P α2 2 (α 1 ) indicates that the second projector set depends on α 1 . Repeating this state evolution until we have applied the gate sequences associated with all k projector sets (and switching to the Heisenberg picture), we end up with Note that we have suppressed explicit branch dependence here to simplify notation. Branch dependence does not alter the formalism except to make the later projectors functions of the earlier α i 's, so our treatment remains fully general.
If we then trace out the environment (which in the circuit means not measuring it) we are then left with σ SA : By examining Eq. (B4), we can see that (1⊗ α|)σ SA (1⊗ |α ) is precisely D pt (α, α ) = Tr E (C α ρ SE C α † ). Further, if we similarly trace over the system S, we get: We can thus see that we have prepared a density matrix whose elements are D(α, α ) = Tr(C α ρ SE C α † ), as claimed in the main text. Let us now derive the equivalence stated in the definition of our full trace cost function, Eq. (9). Starting with the definition of C we have: Therefore, the circuits we use to calculate Tr((σ A ) 2 ) and Tr(Z A (σ A ) 2 ) implement this cost function as claimed.

Partial trace cost
Arriving at the expression for the partial trace cost function (Eq. (10)) is similar if slightly more complicated: As with the full trace cost function, the circuits we use to calculate Tr((σ SA ) 2 ) and Tr(Z A (σ SA ) 2 ) thus implement this cost function as claimed.
Appendix D: Reading out the Decoherence Functional Elements While VCH avoids the need to compute the exponentially many D(α, α )'s in order to determine the consistency of a family F, we do have the ability to efficiently read out any particular D(α, α ) if desired. Figure S. 6 shows the circuit that one can use to read the real and/or imaginary parts of D(α, α ) out for α = α . The postprocessing is similar to that of the Swap test [2,3], except that we add a conditional statement.
FIG. S.6. Circuit to read out D(α, α ). The controlled U (α, α ) prepares the state |α on the B registers when the control qubit is in the state |0 and |α when the control qubit is in the state |1 , so the combination of the Hadamard gate on C and the controlled U (α, α ) prepares a superposition of the histories. The z-rotation in the green box is excluded when we calculate the real part of D(α, α ) and included when we calculate the imaginary part. The post processing is described in the text.

(D5)
Again, we combine these to get: Im(D(α, α )) = 1 2 (I 1 − I 0 ) We also note that the controlled U (α, α ) we have made use of here can be implemented with depth that scales linearly in the number of bits by which |α and |α differ. This is accomplished by acting with X gates on all of the registers where the bit-string associated with |α is 1 followed by CNOT gates from C to each of the registers where the bit-strings for |α and |α differ.
Finally, we comment that reading out D(α, α) is simpler than the general case as we merely have to prepare |α α| (which consists of a single layer of X gates) on the B registers and perform the Swap test, without any need for or reference to C.
Appendix E: Implementation Circuits

Spin in a Magnetic Field
For our simulations of the spin-1/2 particle in a magnetic field, Fig. S.7 shows the quantum circuit that was used on the simulator and IBM's ibmqx5 processor to perform the cost minimization and to generate the cost landscape plots (shown in Fig. 3).

Chiral Molecule
Figure S.8 shows the quantum circuit that was used on a simulator to map the cost function landscapes for the chiral molecule (shown in Fig 4). The tunneling between the chirality states was modeled as a rotation about the z-axis by an angle θ z . We considered the chiral molecule to be in a gas, and hence its environment is composed of other surrounding molecules that may collide with the molecule of interest. Our model for these collision interactions was implemented by performing a rotation around the x-axis by an angle θ x (which determines the interaction strength) on an environmental qubit representing the colliding molecule, controlled by the chirality of the molecule of interest.
Appendix F: ::::::::::: Highlighted :::::::::::: Applications  FIG. S.7. Quantum circuit that we employed to evaluate the cost functions for the spin in a magnetic field. The wires labeled S represent the copies of the spin and those labeled A represent the ancillas. Note that this circuit prepares two copies of σ A . The gates and measurements inside the solid green box are only included to calculate Tr((σ A ) 2 ), as without them this is the circuit to calculate Tr(Z A (σ A ) 2 ).