Probabilistic state synthesis based on optimal convex approximation

When preparing a pure state with a quantum circuit, there is an unavoidable approximation error due to the compilation error in fault-tolerant implementation. A recently proposed approach called probabilistic state synthesis, where the circuit is probabilistically sampled, is able to reduce the approximation error compared to conventional deterministic synthesis. In this paper, we demonstrate that the optimal probabilistic synthesis quadratically reduces the approximation error. Moreover, we show that a deterministic synthesis algorithm can be efficiently converted into a probabilistic one that achieves this quadratic error reduction. We also numerically demonstrate how this conversion reduces the $T$-count and analytically prove that this conversion halves an information-theoretic lower bound on the circuit size. In order to derive these results, we prove general theorems about the optimal convex approximation of a quantum state. Furthermore, we demonstrate that this theorem can be used to analyze an entanglement measure.


I. INTRODUCTION
The latest quantum computer applications require various nontrivial quantum states for computation, secure communication, and the fundamental investigation of quantum mechanics.Examples include the ground state (or its approximation) of a Hamiltonian, which is used to compute the ground energy in quantum chemistry [1], a graph state (or its variants [2,3]), which has a wide range of applications such as measurement-based quantum computation [4], blind computation [5], and secret-sharing [6], and data-hiding states, which are utilized for quantum data hiding [7] and the study of local indistinguishability [8,9].In addition, quantum linear system solvers [10,11], which have various applications in machine learning, require a quantum state encoding classical data.
These applications have motivated researchers to optimize a subroutine that synthesizes a target quantum state.In order to capture the complexity of the state synthesis, there are extensive studies about the size and depth of a circuit consisting of a sequence of k(≤ 2)-qubit unitary gates needed to generate a target state by applying the circuit to a fixed state |0 ⊗N [12][13][14][15][16][17][18].While these studies focus on the exact synthesis of a target state, a certain level of error is allowed in many quantum information processing protocols and algorithms.In practice, we have no choice but to approximately synthesize a target state due to imperfections and discretization when implementing unitary gates in a synthesis circuit.The imperfection of gates can be almost removed for specific unitary gates, called elementary gates, according to the nature of the system [19] or the quantum error correction [20].The set of elementary gates is usually a finite set of unitary gates, e.g., Clifford gates (on a constant number of qubits) + T gates, which causes an approximation error when we synthesize a target state since there are infinite quantum states.We focus on the synthesis of a target state by using a finite number of perfectly implementable elementary gates.In this case, the objective of the optimization is reducing the size or depth of a circuit consisting of elementary gates in order to synthesize a target state with a certain level of approximation error.In other words, the objective is to reduce the approximation error within a fixed circuit size or depth.
Unfortunately, a simple volume consideration implies that the size of a circuit required for the approximate synthesis of a quantum state in an N -qubit system grows exponentially with N .However, it is important to optimize the state synthesis even on a small number of qubits since such small systems are often used repeatedly in quantum cryptography [6,7] and metrology [21,22] protocols.Such optimization is also beneficial to generate an intermediate quantum state required for synthesizing a state on a large system.Recently, theoretical physicists have taken an interest in the minimum circuit size or depth for the state synthesis on large systems due to its nontrivial physical interpretations [23][24][25], even if it may not be practically implementable.
The final goal of conventional synthesis algorithms is to deterministically find one of the best circuits for the approximation (even if an algorithm [15] succeeds probabilistically).Thus, the minimum approximation error obtained by such deterministic state synthesis is given by min x∈X φ − φx tr , where φ is a target state, ρ − σ tr is the trace distance between two states ρ and σ, and X is the label set of pure states φx generated by circuits C x within a given cost, e.g., the circuit size, depth, or number of T -gates.
While it makes sense to approximate a target pure state by utilizing an approximated state generated by a single circuit, a recently proposed approach called probabilistic state synthesis probabilistically samples a circuit for the approximation.Suppose that the probabilistic algorithm independently samples a circuit C x (generating φx ) in accordance with a probability distribution p(x) each time the subroutine synthesizing φ is called.Then, each generated state is described by a mixed state x p(x) φx .This can be interpreted as the transition from unitary errors to stochastic errors [26][27][28], and recent studies have experimentally demonstrated that this transition reduces the approximation error [29].
Despite its importance, the limitation of probabilistic state synthesis, especially the minimum approximation error min p φ − x p(x) φx tr , remains unknown, nor is it clear how to find the optimal probability distribution p.While a few analytical results are obtained for the case of a qubit state [30][31][32] in the context of the optimal convex approximation of a quantum state, minimax optimization to compute the minimum approximation error makes analyses quite difficult in general.
where the maximization of φ is taken over the set of pure states.This theorem compares the worst approximation errors occurring when one synthesizes the target state that is most difficult to approximate by using { φx } x .It implies that the optimal probabilistic synthesis always quadratically reduces the worst approximation error, moreover, it is impossible to further reduce the approximation error.
In many cases, there is no need to synthesize all possible pure states.Instead, it is more useful to understand the limitations of probabilistic synthesis when a target state is chosen from a subset S G of pure states.As shown in Fig. 1(b), we can also anticipate the quadratic error reduction in this scenario.This expectation is confirmed in the comprehensive version of Theorem 1, which includes the case of Fig. 1

(b).
The technique used to prove Theorem 1 is also applicable to analyzing the minimum trace distance between a general mixed state ρ and a convex hull of { φx } x .For example, we can analyze the entanglement measure by setting { φx } x∈X to be the set of pure product states.As a byproduct, we obtain where SEP represents the set of separable states, ρ WER q and ρ ISO q represent the Werner and isotropic state with a parameter q, respectively.These coincide with a conjecture numerically found in [33].Moreover, we provide alternate succinct proof about a recently identified coincidence between the entanglement measure and coherence measure [34].
We also show an efficient way to convert a deterministic state synthesis algorithm into a probabilistic one that achieves quadratic error reduction.We assume there exists a deterministic state synthesis algorithm D with INPUT: a target pure state φ in a constant number of qubits and target approximation error ǫ, OUTPUT: circuit C x (generating φx ) such that φ − φx tr ≤ ǫ and a matrix representation of φx can be obtained within runtime polylog 1 ǫ .We can construct D by combining algorithms to generate an exact synthesis circuit where arbitrary unitary transformations on a constant number of qubits are allowed [12][13][14][15][16][17][18] with the Solovay-Kitaev algorithm [35] to decompose the unitary transformations into a sequence of elementary gates.Recent numerical analysis suggests that we could construct better D that reduces the size of a synthesis circuit by skipping the exact synthesis as an intermediate step [17,36].The efficient conversion is shown in the following theorem.

Theorem 2. (informal version)
There exists a probabilistic state synthesis algorithm P that calls a deterministic state synthesis algorithm D as an oracle, and has INPUT: a target pure state φ in a constant number of qubits and target approximation error ǫ OUTPUT: circuit C x (generating φx ) sampled in accordance with probability distribution p : X → [0, 1] such that P satisfies the following properties: • Efficiency: P calls D constant times, and runtime of P is polylog 1 ǫ , • Quadratic improvement: The approximation error φ − x∈ X p(x) φx tr obtained with this algorithm is upper bounded by ǫ 2 , whereas min x∈ X φ − φx tr ≤ ǫ.
Since probabilistic state synthesis reduces the approximation error, it also reduces the size of a circuit to approximately generate a target state for a given approximation error.However, the reduction rate depends on the circuit's construction, e.g., what kind of elementary gates and synthesis algorithms are used.Since there is an established way to synthesize a single qubit state by using Clifford + T gates, we perform a numerical simulation to demonstrate how the probabilistic synthesis reduces the number of T -gates, called a T -count, for a randomly selected target state in S G defined in Fig. 1(b).
As a rigorous estimation, we also analyze a universal lower bound on the size of synthesis circuits obtained by regarding the circuit as a classical encoding of a pure state, where a description of a circuit C x and the state φx generated by C x correspond to a label encoding a pure state and the reconstructed state by a decoder, respectively.To analyze how probabilistic synthesis reduces this lower bound, we investigate the minimum length of classical bit strings that encodes a pure state φ so as to approximately reconstruct the original state as shown in Fig. 2. bits FIG. 2. Probabilistic encoding of pure state φ on a d-dimensional system using n-bit strings and a decoder Γ that generates an approximated state ρ.State φ is probabilistically encoded in label x in a finite set X in accordance with probability distribution p φ : X → [0, 1].As a special case of probabilistic encoding, we also consider deterministic encoding that utilizes probability distribution p φ : X → {0, 1}.Note that the length of classical bit strings to represent x ∈ X is given by n = ⌈log 2 |X|⌉.
We compare two types of encoding: (1) deterministic encoding that associates each φ to a single label x, and (2) probabilistic encoding that associates each φ to a label x in accordance with a probability distribution p φ (x).The decoder Γ generates, in general, a mixed state ρx based on the input label x.Thus, the reconstructed state in the deterministic and probabilistic encoding is given by ρ = ρx and ρ = x p φ (x)ρ x , respectively.In the following theorem, we show that probabilistic encoding exactly halves the bit length required for deterministic encoding in the asymptotic limits.
Theorem 3. (simplified version) Let n det (or n prob ) be the minimum bit length required for deterministic (or probabilistic) encoding that reconstructs a state ρ satisfying φ − ρ tr ≤ ǫ for any pure state φ in a d-dimensional Hilbert space.Then, it holds that

B. Technical Outline
Although several probabilistic synthesis methods suggest that the approximation error can be reduced from ǫ into O(ǫ 2 ) [26-29, 37, 38], these methods are not applicable for analyzing the achievable minimum approximation error.This is mainly because the prior research relies on the first-order approximation to show the error reduction, which provides little information about the lower bound on the error reduction.The achievable minimum approximation error for the probabilistic unitary synthesis has been obtained by us [39].However, this result cannot be directly applied to the state synthesis since the generated state in state synthesis is obtained by applying a gate sequence to a fixed input state while the approximation error in unitary synthesis is quantified for the worst input state.Moreover, a target state could be approximated by probabilistically mixing two unitary transformations whose behaviors are totally different, except for the fixed input state.
In the proof of Theorem 1, we analyze the minimum approximation error which contains minimax optimization by definition.The main tool for the analysis is the strong duality of semidefinite programming.This enables us to formulate the minimum approximation error as a semidefinite program (SDP).Moreover, we show that the SDP can be dramatically simplified when both φ and { φx } x exhibit symmetry.As discussed in the previous subsection, these techniques can be utilized to analyze the minimum trace distance between a general mixed state and a convex set, such as the set of separable states.The reformulation of the minimum approximation error as an SDP enables us to compute the optimal probability distribution to achieve it efficiently.By using Theorem 1, we can verify that by solving this SDP with { φx } x∈X satisfying max φ min x∈X φ − φx tr ≤ ǫ, which is called an ǫ-covering, we obtain a probability distribution p that achieves quadratic reduction of the approximation error, i.e., φ − x∈X p(x) φx tr ≤ ǫ 2 .However, the size of this SDP is too large to achieve the efficiency shown in Theorem 2, since the size |X| of the ǫ-covering is 1 ǫ Ω (1) .This problem can be resolved by proving that any φx in the support of the optimal probability distribution in the minimum approximation error is close to φ; more precisely, φ − φx tr ≤ 2ǫ.This enables us to construct a modified SDP whose size is independent of ǫ.
Theorem 3 is obtained by combining Theorem 1 with the estimation of the minimum size of the ǫ-covering.Due to its prominent role in algorithm design and asymptotic geometric analysis, the order of the minimum size of the ǫ-covering has been well-studied [40][41][42].However, to obtain Theorem 3, we precisely analyze the constant factor in the order, which refines the previous estimations [40,41].

A. Preliminaries
We consider only finite-dimensional Hilbert spaces in this paper.The two-dimensional Hilbert space C 2 is called a qubit.The trace distance ρ − σ tr of two quantum states ρ, σ ∈ S (H) is defined as M tr := 1  2 tr √ M M † for M ∈ L (H).It represents the maximum total variation distance between probability distributions obtained by measurements performed on two quantum states.Thus, it satisfies ρ − σ tr = max 0≤M≤I tr [M (ρ − σ)].A similar notion measuring the distinguishability of ρ and σ is the fidelity function, defined by and the maximization is taken over all the purifications.Fuchs-van de Graaf inequalities [43] provide relationships between the two measures with respect to the distinguishability as follows: holds for any states ρ, σ ∈ S (H), where the equality of the right inequality holds when ρ and σ are pure.An operator A : H → H is called antilinear if it satisfies A(α|φ + β|ψ ) = α * A|φ + β * A|ψ , where α * represents the complex conjugate of α ∈ C. The Hermitian adjoint A † of an antilinear operator A is defined by ψ|A † |φ = φ|A|ψ .An antilinear operator U is called antiunitary if it satisfies U † U = I.An antiunitary operator Θ is called a conjugation if it satisfies Θ † = Θ.An example of a conjugation is the complex conjugation θ with respect to the computational basis.Note that for Hermitian operators M 1 and M 2 and an antilinear operator A, the cyclic property tr M 1 AM 2 A † = tr A † M 1 AM 2 of the trace holds.

B. Quadratic reduction of approximation error
We first show the lower bound of the approximation error obtained by the optimal probabilistic mixture in the following lemma.
Lemma 1.For a finite set { φx } x∈X ⊆ P (H) of pure states and a pure state φ ∈ P (H), it holds that Proof.Let p minimize the left-hand side of Eq. ( 6).The following calculation completes the proof.
where we use ρ − σ tr ≥ max φ∈P(H) tr [φ(ρ − σ)] in the first inequality and use the right equality in Ineq.(5) in the last equality.
This lemma shows that the reduction rate of the approximation error by using probabilistic synthesis is, at best, quadratic.However, the two examples given in Fig. 1 indicate that a precisely quadratic reduction is possible if we consider the worst approximation error occurring when we synthesize the target state that is most difficult to approximate in a particular subset S G of states.To achieve the quadratic reduction, it is important to carefully select S G .We use group symmetries in the following lemma to characterize S G and prove the quadratic reduction.This characterization also makes it easier to apply this lemma to various settings in the state synthesis.
Lemma 2. Let X be a finite set, G be a finite subgroup of unitary and antiunitary operators, and Lemma 2 is a direct consequence of the following lemma for computing the minimum trace distance between a mixed state and a convex subset of mixed states.Lemma 3. Let X be a finite set and G be a finite subgroup of unitary and antiunitary operators.Let P G be the set of positive semidefinite operators invariant under the action of G, i.e., where the minimization is taken over a probability distribution p over X.In particular, when ρ is a pure state φ, it holds that Proof.We start from a mixed state ρ.By using the minimax theorem, we obtain (L.H.S. of Eq. ( 9)) = min = max This proves (L.H.S.) ≥ (R.H.S.).Let M maximize Eq. ( 13).Due to the invariance of ρ and { ρx } x under the action of G, we can verify that U † M U also maximizes Eq. (13).By defining where we use Eq. ( 13) in the last equality.When ρ is a pure state φ, we can derive where we use the fact that the dimension of the eigenspace of φ − x∈X p(x)ρ x associated with positive eigenvalues is zero or one in the first equality, and use the minimax theorem in the second equality.We complete the proof of Eq. ( 10) by using the following observation: When (L.H.S. of Eq. ( 10)) = 0, Eq. ( 10) holds since there exists x ∈ X such that ρx = φ.When (L.H.S. of Eq. ( 10)) > 0, σ maximizing Eq. ( 15) is a pure state.For if σ with σ ∞ < 1 maximizes Eq. ( 15), we can show a contradiction by setting ρ = φ and M = σ σ ∞ in Eq. ( 13).
Proof of Lemma 2. By setting ρx in Eq. ( 10) to be φx , we obtain where we use the right equality in Ineq.(5) in the last equality.
As consequences of Lemma 2 or Lemma 3, we obtain the following implications.
1.When G = {I}, we obtain S G = P (H).This case is applicable to any { φx } x∈X and proves the quadratic reduction of the approximation error given in Fig. 1(a).
In this case, the quadratic reduction of the worst approximation error occurring when we synthesize a target state in S G is possible if { φx } x is reflection-symmetric with respect to the XZ-plane in the Bloch representation.This proves the quadratic reduction of the approximation error given in Fig. 1(b).In general, conjugation-invariant pure states are often utilized in the optimal parameter estimation [22].

When
In this case, the quadratic reduction of the worst approximation error occurring when we synthesize a target state in V is possible if { φx } x is reflection-symmetric under the action of 2Π − I.This is because where we use Eq.(10) in the first equation.In general, preparing a state in a particular subspace is a widely used subroutine in various quantum information processing tasks.
We obtain the following theorem as a summary of Lemmas 1 and 2.
Theorem 1.Let X be a finite set, G be a finite subgroup of unitary and antiunitary operators, and This theorem indicates that by using mixed states, we can reduce the approximation error with respect to the trace distance.When attempting to estimate the expectation value tr [Oφ] of an observable O for φ, this theorem implies that the bias of the expectation value can be reduced by using x∈X p(x) φx instead of using φx as a substitute of φ.

C. Efficient probabilistic state synthesis algorithm
In this section, we present an efficient method for converting any deterministic state synthesis algorithm, denoted as D, into a probabilistic one.If it takes polylog1 ǫ -time for D 1 to achieve an approximation error ǫ with an l(ǫ)-size circuit, then our method allows us to construct a probabilistic synthesis algorithm that achieves an approximation error ǫ 2 by sampling l(ǫ)-size circuits, with a total runtime of polylog 1 ǫ .Note that our method assumes the target state is taken from a constant-dimensional Hilbert space.As mentioned in the introduction, constant-qubits states are commonly utilized in quantum cryptography and metrology protocols.Although the existence of highly complex pure states results in an exponential runtime with respect to the number of qubits for any state synthesis algorithms, we discuss the potential of probabilistic state synthesis for a high dimensional system in Appendix C. Our conversion is based on the following proposition and lemma.
Proposition 1.Let ρ and { ρx } x∈X be a target mixed state and a finite set of mixed states in S (H), respectively.Then, distance min p ρ − x∈X p(x)ρ x tr and the optimal probability distribution {p(x)} x∈X , which minimizes the distance, can be computed with the following SDP: Note that the strong duality holds in this SDP, i.e., the optimum primal and dual values are equal.
Proof.Recall that for two states ρ and σ, ρ − σ tr can be computed by the following SDP: A formal SDP and the verification of the strong duality are provided in Appendix A. By extending the dual problem of this SDP to include the minimization of probability distribution {p(x)} x∈X , we obtain Eq. (22).Note that the last condition x∈X p(x) ≤ 1 in the dual problem is different from the condition x∈X p(x) = 1 of a probability distribution; however, the optimum dual value can be achieved under the latter condition.Again, a formal SDP and the verification of the strong duality are provided in Appendix A. To understand this lemma, it is helpful to refer to the examples shown in Fig. 1.If the goal is to optimally approximate a target state φ depicted by the red point in (a) (or (b)), it is sufficient to mix three (or two) Pauli eigenstates that are 2ǫ (or 2ǫ) close to φ.This fact is shown to be true for any target state in this lemma, and its proof can be found in Appendix B as it involves technical details.
By combining Proposition 1 and Lemma 4, we can efficiently convert a deterministic state synthesis algorithm into a probabilistic one.We assume there exists a deterministic state synthesis algorithm D with INPUT: a target pure state φ ∈ S G in a constant-dimensional Hilbert space and a target approximation error ǫ ∈ (0, 1) OUTPUT: a set {C (U) x } U∈G of circuits (generating U φx U † ) such that φ − φx tr ≤ ǫ and a matrix representation of U φx U † can be obtained within runtime polylog 1  ǫ , where G is a finite subgroup of unitary and antiunitary operators and S G is the set of pure states invariant under the action of G.
Theorem 2. For a given gate set, there exists a probabilistic state synthesis algorithm P that calls a deterministic synthesis algorithm D as an oracle, and has INPUT: a target pure state φ ∈ S G in a constant-dimensional Hilbert space, a target approximation error ǫ ∈ (0, 1), and precision δ ∈ (0, 1) OUTPUT: circuit C x (generating ρx ) sampled from a set X in accordance with probability distribution p : X → [0, 1] such that P satisfies the following properties: • Efficiency: P calls D a constant number of times, and runtime of P is poly log 1 ǫ , log 1 δ , • Quadratic improvement: The approximation error φ − x∈ X p(x)ρ x tr obtained by P is upper bounded by ǫ 2 + δ, whereas min x∈ X φ − ρx tr ≤ ǫ.
Proof.In the following, we explicitly construct the algorithm.

Sample C (U) x
in accordance with p, whose domain is X = X × G.
The two properties can be verified as follows: • Efficiency: We can verify that all steps of the algorithm take poly log 1 ǫ , log 1 δ -time by using the following observations: We can construct a list {φ x } x∈ X whose size is independent to ǫ.From the assumption on D, we can also obtain a list of matrix representations of {U φ x U † } x∈ X,U∈G within polylog 1 ǫ -time.The ellipsoid method guarantees that the optimal value of our SDP can be computed in poly log 1 ǫ , log 1 δ -time within an approximation error δ [44].
• Quadratic improvement: The minimum approximation error min p φ − x∈ X p(x)ρ x tr is at most While this theorem assumes the dimension d of the Hilbert space is constant, we can also provide an estimation of the runtime of P when d grows.The runtime varies depending on the symmetry G that target states possess (see Appendix C).In the worst case where target states have no common symmetry, i.e., G = {I}, the size of X will be | X| = poly(exp(d)).In this case, we can provide the upper bound on the runtime of P as poly log 1 ǫ , log 1 δ , exp(d) -time, based on the proof of Theorem 2.

D. Numerical simulation of T -count reduction
In this section, we demonstrate how Theorem 2's probabilistic synthesis algorithm can reduce the T -count through numerical simulation.We select a target state φ from S G = {φ ∈ P C2 : |φ = cos t|0 + sin t|1 , t ∈ R}, as shown in Fig. 1(b).Recall that S G consists of G-invariant pure states, where G = {I, θ} with the complex conjugation θ.
We assume that the set of elementary gates consists of Clifford gates and T -gate, which is a commonly utilized gate set in FTQC based on stabilizer codes or surface codes.Considering that the implementation cost of a T -gate is much higher than that of Clifford gates, it is necessary to minimize the T -count of the circuits.To do this, we use the Ross-Selinger algorithm [45]  approximation error is smaller than ǫ.Note that without exploiting the symmetry of the target state, we need 13 states to form a (0.7 √ ǫ)-covering of (2 √ ǫ)-ball around φ due to the disk covering problem.We examine how the T -count and the approximation error for a specific target state are related in Fig. 3.As we can see, we were able to achieve a 50 ∼ 60% reduction in T -count.We observe similar behavior for other randomly selected target states (see https://github.com/akibue/prob-synthesis for details).

E. Halving bit representation of pure states
We verify that the existence of probabilistic and deterministic encoding given in Fig. 2 can be reduced into a property of output states of the decoder Γ, as shown in the following propositions.Proposition 2. A probabilistic encoding of P C d with approximation error ǫ and a label set X exists if and only if there exists set where the minimization is taken over a probability distribution p over X.
A set { ρx } x∈X of mixed states satisfying Eq. ( 25) is called an external ǫ-covering of P C d .A set { ρx ∈ P C d } x∈X of pure states satisfying Eq. ( 25) is called an internal ǫ-covering of P C d .The minimum size of internal (or external) ǫ-coverings is called the internal (or external) covering number and denoted by I in (or I ex ).Note that I ex ≤ I in by definition and the minimum bit length n det required for deterministic encodings is equal to ⌈log 2 I ex ⌉.We obtain the following lemma by using the volume consideration and applying the construction of an ǫ-covering shown in [42].
Lemma 5.For any ǫ ∈ 0, 1  2 and an integer d ≥ 2 specified below, the internal and external covering numbers I in and I ex of an ǫ-covering of P C d are bounded by where l(d, ǫ) := (d − 1) log 2 1 ǫ .Moreover, if d ≥ 4, the first lower bound can be strengthened as 2 • l(d, ǫ) ≤ log 2 I ex .The details of the proof are given in Appendix D. By combining this lemma with Theorem 1, we obtain the following theorem about the minimum bit length.Theorem 3.For any ǫ ∈ 0, 1  2 and an integer d ≥ 2 specified below, the minimum bit length n det (or n prob ) of the deterministic (or probabilistic) encoding of P C d with approximation error ǫ is bounded by where l(d, ǫ) , the first lower bound can be strengthened as 2 • l(d, ǫ) ≤ n det .Proof.Since the bounds on n det are a direct consequence of Lemma 5, we show the bounds on n prob .The upper bound is obtained by setting { ρx } x in Proposition 2 to be the minimum internal √ ǫ-covering of P C d .This is because Theorem 1 with G = {I} guarantees that { ρx } x satisfies Eq. ( 24), and an upper bound on the size of the internal √ ǫ-covering is given by Lemma 5.
Next, we show the lower bound on n prob .Let { ρx ∈ S C d } x∈X satisfy Eq. ( 24).We obtain where we use ρ − σ tr ≥ max φ∈P(H) tr [φ(ρ − σ)] in the second inequality.By letting ρx = d i=1 p(i|x)φ i|x , we ensure that for any φ ∈ P C d , there exists i and x such that Thus, {φ i|x } i,x is an internal √ ǫ-covering of P C d .Hence, the lower bound can be obtained by applying Lemma 5

F. Applications for analysis on entanglement measure
Determining whether a quantum state ρ is separable or entangled is a crucial inquiry in quantum information, as entanglement provides quantum advantages in various information processing tasks.The separability test is also fundamental to various optimization problems in distributed quantum computation.The separability test is computationally hard even if we are given the matrix representation of ρ [46].Further analysis of the computation complexity of the separability test has resulted in several important findings relating to QMA(2) [47][48][49][50].Although the separability test for general states is challenging, there are specific classes of states that make it easier to test for separability, e.g., low rank [51,52] and symmetric [53,54] states.
In order to identify the tractable states in the separability test, the study of the optimal convex approximation examines a generalized problem of how to approximate a target state ρ with a probabilistic mixture of a restricted subset { ρx } x of quantum states [30][31][32].When this subset consists of product states, it becomes the separability test.From this general perspective, we demonstrated that restricting a target state to be rank-one or symmetry simplifies the optimization, as shown in Lemma 3. Furthermore, we demonstrate that our general lemma for the optimal convex approximation can reproduce the nontrivial facts about entanglement, either already known or derivable through known facts, in a simpler and unified way.
Recall that the set of separable states is defined as follows.
In [33], Girardin et al. used a neural network to conjecture Eqs.(2).Recall that Π ∧ with Hermitian projectors Π ∨ and Π ∧ whose ranges are the symmetric subspace and antisymmetric subspace and i=0 |ii , respectively.Since the Werner (or isotropic) state is entangled if and only if 1 2 < q ≤ 1 (or 1 d+1 < q ≤ 1), we assume they are entangled in Eqs.(2).By exploiting the symmetry of the Werner (or isotropic) state and using Lemma 3, we can prove this conjecture.The complete proof is given in Appendix E. Note that Eqs. ( 2) can be proven straightforwardly by combining the following two facts: (i) the closest separable state can be assumed to be the Werner (or isotropic) state without loss of generality, and (ii) the Werner (or isotropic) state is separable if and only if 0 ).In contrast, our proof directly computes the minimum trace distance without constructing the closest separable state, moreover, it includes a proof for (ii).Since a POVM element M appeared in Eq. ( 9) can be regarded as an entanglement witness, our proof can be regarded as a method for "quantifying entanglement with witness operators" [55,56].Taking account of the fact that the closest separable state is not necessary in our method, it is expected that the advantage of our method becomes obvious when the closest separable state is unknown or analytically hard to obtain, as shown in the next example.
Due to its clear operational meaning, the resource measure based on trace distance has been investigated for various resource theories, including entanglement and coherence [57].Lemma 3 provides an alternate concise proof for the following recently identified coincidence between entanglement and coherence measures.
where I := conv {|i i|} d−1 i=0 is called a set of incoherent states and {|i } d−1 i=0 is an orthonormal basis.Since it is suggested that a simple closed-form formula for Eq. ( 31) might not exist [34], the closest separable state is also hard to obtain.However, our method is applicable to show the relationship of the minimum approximation error between different types of probabilistic approximation by exploiting the purity of the target states.Moreover, it simplifies the proof of [34,Theorem 3].The complete proof is given in Appendix E.

III. DISCUSSION
We investigated the limitation of the optimal probabilistic state synthesis and its potential for reducing the size of a synthesis circuit.As a main result, we verified the tight relationship between the approximation error obtained by the optimal probabilistic state synthesis and the optimal deterministic one.We also constructed an efficient method to convert a deterministic synthesis algorithm into a probabilistic one that quadratically reduces the approximation error.
To estimate how the error reduction reduces the size of a synthesis circuit, we performed a numerical simulation and evaluated the length of the classical bit string required to approximately encode a pure state.As a result, we found that probabilistic encoding asymptotically halves the bit length.Note that under the presence of noise on elementary gates, which was not taken into account in this study, certain conditions on the noise may be required to achieve the quadratic reduction of the approximation error.However, our SDP can still be used to numerically determine the optimal probabilistic synthesis in cases where the noise is explicitly described.
In addition to our contribution to the state synthesis, the our result would improve the performance of classical simulation of a quantum computer as well as that of optimization algorithms including a brute force search over pure states, e.g., the separability test [58].This is because we essentially show that the set of pure states can be approximated by its ǫ-covering or probabilistic mixtures of its √ ǫ-covering in the same accuracy, where the size of the minimum √ ǫ-covering is almost the square root of that of the minimum ǫ-covering.These results are based on general theorems about the optimal convex approximation of a quantum state.While the optimal convex approximation and state synthesis have been studied in different contexts, our theorems have demonstrated that analyzing the former problem provides not only the fundamental limitation of probabilistic synthesis but also a construction of an efficient synthesis algorithm.Furthermore, our theorems contribute to the original motivation of the studies of the optimal convex approximation [30][31][32], which is quantifying a resource measure in convex resource theories [59][60][61] such as the resource theory of entanglement.Indeed, the SDP constructed in Proposition 1 would provide a basis for numerical investigation for such resource measures.Our theorems would reveal more quantitative relationships between different resource measures as shown in Proposition 4. Proof.We show the nontrivial inequality (L.H.S.) ≥ (R.H.S.).We also consider the nontrivial case when ǫ < 1 2 .By using Lemma 3, we obtain where Re (β) represents the real part of a complex number β.We can draw the possible region of a Re (β) as the shaded region in Fig. 4. By using 0 < t 1 − t 2 ≤ π 2 − t 2 and Fig. 4, we obtain This implies (R.H.S.) = 0, which completes the proof., respectively.
In the proof, we represent some parameters explicitly, which are tailored to the ǫ-covering with respect to the trace distance.Assume d ≥ 2 and let D = 2(d − 1)(≥ 2).Let {φ j ∈ P C d } JR j=1 be a set of finite randomly sampled pure states with respect to product measure µ JR .The expected volume of the region not covered by A := ∪ JR j=1 B ǫR (φ j ) (0 < ǫ R ≤ 1) can be calculated as follows: where we use Fubini's theorem and Eq.(D5) in the second and the third equations, respectively.Note that I [X] ∈ {0, 1} is the indicator function, i.e., I [X] = 1 iff X is true.
Thus, there exists {φ j } JR j=1 such that µ (A c ) ≤ exp −J R ǫ D R .Pick {ψ j } JP j=1 as much as possible such that B ǫP (ψ j ) are disjoint and contained in A c .When 0 < ǫ P ≤ ǫ R ≤ 1, we can verify that {φ j } JR j=1 ∪ {ψ j } JP j=1 is an (ǫ R + ǫ P )-covering and its size J := J R + J P is upper bounded as x , and ǫ R = x 1+x ǫ with x ≥ 1, we obtain the following upper bound: .

Volume analysis 1
We compute the volume of B ǫ (φ) in P C d as follows: where µ is the unitarily invariant probability measure on the Borel sets of P C d .When d = 1, Eq. (D5) holds.By assuming d ≥ 2, we proceed as follows: where the first equality uses fixed pure state |0 and the unitary invariance of µ and the trace distance, the second equality uses Eq. ( 5), and the third equality uses the relationship between µ and the uniform spherical probability measure ξ.Using a spherical coordinate system, we can proceed as follows: and the domain of the integration D ǫ is given by {(θ, φ) : θ, φ ∈ (0, π), sin θ sin φ < ǫ}.Since this domain and that of the integrand have reflection symmetries for two lines θ = π 2 and φ = π 2 , it is sufficient to perform the integration in domain Dǫ := {(θ, φ) : θ, φ ∈ 0, π 2 , sin θ sin φ < ǫ}.By changing the variables as x y = sin θ sin φ sin θ , we obtain for ǫ ∈ [0, 1].This completes the calculation.

Volume analysis 2
We show the following upper bound on the volume of ǫ-ball B ǫ (ρ).For any ǫ ∈ 0, 1 2 , it holds that where {|i } i is a set of eigenvectors of ρ and eigenvalues are arranged in decreasing order, i.e., p 0 ≥ p 1 ≥ • • • .Since µ(B ǫ (ρ)) depends not on the eigenvectors but on the eigenvalues of ρ, it is sufficient to consider only diagonal ρ with respect to a fixed basis.However, it is difficult to precisely calculate µ(B ǫ (ρ)) due to a complicated relationship between ψ and the largest eigenvalue of ψ − ρ, resulting from the condition ǫ ≥ ψ − ρ tr = λ max (ψ − ρ).

FIG. 1 . 3 √ 3 − 1 and 1 − 1 √ 2 and
FIG.1.Quadratic reduction of the approximation error by using probabilistic synthesis.We assume that we can exactly generate an eigenstate φx of the Pauli operators, represented by the six extreme points of the octahedron.We represent the Bloch sphere by a sphere with radius1  2 , where the trace distance between two quantum states equals the Euclidean distance between the corresponding points.(a) We can compute minp φ − x p(x) φx tr L (H) and Pos (H) represent the set of linear operators and positive semidefinite operators on Hilbert space H, respectively.I ∈ Pos (H) represents the identity operator.For Hermitian operators A and B on H, A ≥ B represents A − B ∈ Pos (H), and A > B means A − B is positive definite.S (H) := {ρ ∈ Pos (H) : tr [ρ] = 1} and P (H) := ρ ∈ S (H) : tr ρ 2 = 1 represent the set of quantum states and pure states, respectively.Pure state φ ∈ P (H) is sometimes alternatively represented by complex unit vector |φ ∈ H satisfying φ = |φ φ|.

Lemma 4 .
Let G be a finite subgroup of unitary and antiunitary operators, and S G := {φ ∈ P (H) : ∀U ∈ G, [U, φ] = 0} be the set of pure states invariant under the action of G.For a positive number ǫ > 0, if φ ∈ S G and { φx } x∈X is a finite ǫ-covering of S G that is invariant under the action of G, i.e., max ψ∈SG min x∈X ψ − φx tr ≤ ǫ and { φx } x∈X = {U φx U † } x∈X for all U ∈ G, then min p holds, where X := {x ∈ X : φ − φx tr ≤ 2ǫ} and the minimization of p and p are taken over probability distributions over X and X, respectively.

3 . 4 .
Call D to find C (U) x generating U φx U † such that φ x − φx tr ≤ c ′ ǫ for all x ∈ X and all U ∈ G. Numerically solve the SDP shown in Proposition 1 by setting ρ = φ and { ρx } x∈ X = {U φx U † } x∈ X,U∈G and obtain a probability distribution p, which causes the approximation error δ-close to min p φ − x∈ X p(x)ρ x tr .
where {ψ y ∈ S G } y is a finite ǫ-covering of {ψ ∈ S G : φ − ψ tr > 2ǫ} and φ − ψ y tr > 2ǫ for any y, { ρx } x∈ X ∪ {ψ y } y is invariant under the action of G, and we can thus apply Theorem 1 and Lemma 4.

Proposition 3 .
A deterministic encoding of P C d with approximation error ǫ and a label set X exists if and only if there exists set { ρx ∈ S C d } x∈X of mixed states satisfying max φ∈P(C d ) min x∈X φ − ρx tr ≤ ǫ.

Proposition 4 .
[34, Theorem 3] For pure states |Φ = d−1 i=0 α i |ii and |φ = d−1 i=0 α i |i , it holds that min σ∈SEP Φ − σ tr = min ρ∈I φ − ρ tr , holds for any linear operators Y ∈ L (H) and P ∈ L C |X| , where P (x) represents a diagonal element x|P |x .The primal problem is obtained by observing that the adjoint of Ξ † satisfies Ξ tr [M ρx ] |x x| + Q − tI (A7) for any linear operators M, M ′ ∈ L (H) and Q ∈ L C |X| .We can verify the strong duality of this SDP by observing Ξ I 2 ⊕ I 2 ⊕ I 2 ⊕ 1 = B and applying Slater's theorem.Appendix B: Proof of Lemma 4
We perform probabilistic synthesis based on Theorem 2 to synthesize the same multiple target states |φ with the same multiple target approximation errors.When the target approximation error is ǫ, we execute the Ross-Selinger algorithm within a target approximation error of 0.3 A set consisting of the target state |φ 1 = cos t|0 + sin t|1 and two shifted states {|φ x } 3 x=2 = {cos t ′ |0 + sin t ′ |1 : t ′ = Thus, we obtain three gate sequences to generate states { φx } 3 x=1 after executing the Ross-Selinger algorithm.To apply Theorem 2, we also require gate sequences to generate {θ φx θ} 3 x=1 , the complex conjugation of { φx } 3 x=1 .These gate sequences can be obtained by modifying the gate sequence to generate φx without increasing the T -count.This is because θT θ ∝ ZST and the set of Clifford gates is closed under the complex conjugation.After obtaining six synthesized states { φx , θ φx θ} 3 x=1 , we solve the SDP described in Proposition 1 to determine the actual approximation error 3 .Theorem 2 guarantees the actual t (≃ R y (2t)).This allows us to obtain an approximated state U t |0 (≃ R y (2t)|0 = cos t|0 + sin t|1 ) 2 .We run this deterministic synthesis algorithm for multiple randomly selected target states φ in S G , with multiple target approximation errors.By utilizing the description of each output gate sequence, we determine the T -count and the actual approximation error.G : ψ − φ tr ≤ 2 √ ǫ}.ε FIG. 3. Relationship between T -countand the approximation error for synthesizing |φ = cos t|0 + sin t|1 with t = 1.For each target approximation error, we run the Ross-Selinger algorithm to obtain a gate sequence to approximate φ.The blue dashed line interpolates points, each of which represents a target approximation error and the T -count of the gate sequence.The actual approximation error and the T -count achieved by the gate sequence are plotted by blue dots.Note that both the target and actual approximation errors are represented by ǫ.For each of the target approximation errors, we run the probabilistic synthesis algorithm and obtain a list of six gate sequences to be probabilistically sampled.The purple dashed line interpolates points, each of which represents a target approximation error and the maximum T -count of gate sequences in the list.The actual approximation error and the maximum T -count achieved by optimally mixing the gate sequence are plotted by purple dots.