Taming numerical errors in simulations of continuous variable non-Gaussian state preparation

Numerical simulation of continuous variable quantum state preparation is a necessary tool for optimization of existing quantum information processing protocols. A powerful instrument for such simulation is the numerical computation in the Fock state representation. It unavoidably uses an approximation of the infinite-dimensional Fock space by finite complex vector spaces implementable with classical digital computers. In this approximation we analyze the accuracy of several currently available methods for computation of the truncated coherent displacement operator. To overcome their limitations we propose an alternative with improved accuracy based on the standard matrix exponential. We then employ the method in analysis of non-Gaussian state preparation scheme based on coherent displacement of a two mode squeezed vacuum with subsequent photon counting measurement. We compare different detection mechanisms, including avalanche photodiodes, their cascades, and photon number resolving detectors in the context of engineering non-linearly squeezed cubic states and construction of qubit-like superpositions between vacuum and single photon states.


Taming numerical errors in simulations of continuous variable non-Gaussian state preparation Jan Provazník * , Radim Filip & Petr Marek
Numerical simulation of continuous variable quantum state preparation is a necessary tool for optimization of existing quantum information processing protocols. A powerful instrument for such simulation is the numerical computation in the Fock state representation. It unavoidably uses an approximation of the infinite-dimensional Fock space by finite complex vector spaces implementable with classical digital computers. In this approximation we analyze the accuracy of several currently available methods for computation of the truncated coherent displacement operator. To overcome their limitations we propose an alternative with improved accuracy based on the standard matrix exponential. We then employ the method in analysis of non-Gaussian state preparation scheme based on coherent displacement of a two mode squeezed vacuum with subsequent photon counting measurement. We compare different detection mechanisms, including avalanche photodiodes, their cascades, and photon number resolving detectors in the context of engineering non-linearly squeezed cubic states and construction of qubit-like superpositions between vacuum and single photon states.
Quantum information theory exploits fundamental features of quantum physics to design protocols and algorithms that offer significant improvements over their classical counterparts [1][2][3][4] . There are several candidate physical systems suitable for these applications, each with distinct advantages. Continuous variable quantum information processing with light offers feasible and fast generation and manipulation of entangled Gaussian quantum states that are at the core of the information protocols [5][6][7][8][9][10][11][12] . However, truly universal quantum information processing also requires elements of quantum non-Gaussianity [13][14][15][16][17] . Protocols based on Gaussian states and Gaussian operations are not universal 13 and can be efficiently simulated on a classical device 18 .
For continuous variables of light, the non-Gaussianity is commonly introduced by photon number counting detectors, either the most basic on-off detectors capable of discerning presence of light 19 , or the more advanced detectors truly distinguishing the photon numbers [20][21][22][23][24][25][26][27][28][29] . Such detectors can be employed for direct conditional implementation of non-Gaussian operations [30][31][32][33][34][35][36][37] , or for conditional preparation of non-Gaussian quantum states [38][39][40][41][42][43][44][45][46][47][48][49][50] . The latter can be then used as a resource in deterministic implementation of non-Gaussian gates 14,37,51 . One thing these approaches have in common is the inherent probabilistic nature of measurement that results in several trade-offs between quality of the implemented operation or the prepared quantum state, the rate with which the desired operation succeeds, and the experimental challenges of the photon number resolving detector [26][27][28][29]52 . For any given set of realistic detectors and any desired task we then need the ability to faithfully simulate the optical circuit to find out the required parameters leading to the optimal performance, or to find out whether the task is even feasible.
However, numerical simulation of simple quantum optical circuits, even though it is often employed in continuous variable quantum information processing [53][54][55][56][57] , is not a straightforward task. It is burdened by various difficulties, including discretization errors in numerical models relying on continuous representation, truncation errors in discrete models 53 , the omnipresent rounding errors due to finite precision of arithmetics [58][59][60][61][62] and numerical truncation errors occurring in finite approximations of infinite processes 60,62 . If not prevented by rigorous analysis, these numerical artifacts can dominate the computed values and lead to rapid divergence from correct results.
In this paper we evaluate the numerical errors arising when an optical circuit for probabilistic preparation of non-Gaussian quantum states of light 14,38 is simulated on a classical digital computer. We then propose an alternative method for construction of truncated unitary operators aiming to curtail these errors. Finally we take advantage of these tools to fully simulate the circuit for preparation of resource states for the cubic phase gate 51 , and single mode qubit-like superpositions of zero and one photon. The goal is to find the optimal tradeoffs between the quality of the states and the probability of success for a range of available photon counting detectors 20,[26][27][28][29]52 . This paper is structured as follows. In the State preparation circuit section we review the state preparation circuit. In the Perils of numerical simulation of CV systems section we describe the errors naturally occurring in numerical simulations. In the The curious case of coherent displacement section we focus on coherent displacement and identify the numerical errors appearing in different methods of its calculation. In the Truncated approximate matrix exponential (TAME) section we propose an alternative method for its calculation, followed by an overview of verification process in the Verification of approximated matrices section. We then proceed with the Numerical simulation of the preparation circuit section, where we describe the methodology of the actual simulation and present the results of its applications in sections eight and nine.

State preparation circuit
The most common method of conditional state preparation is based on suitable manipulation of ENPR state with coherent displacement and subsequent photon counting measurement 14,39,63,64 . In Fig. 1 we present a variant of the circuit which can be used for preparation of simple non-Gaussian quantum states, including the qubit-like |0� and |1� superpositions. Our circuit accounts for basic imperfections limited to detection inefficiencies and propagation losses. A physical EPR resource, generating the two mode squeezed vacuum (TMSV), lies at its very heart and serves as a source of perfectly correlated photons. One of the entangled modes is then displaced with controllable amplitude and phase and consequently measured. The detection can use either an avalanche photodiode (APD), a photon number resolving detector (PNRD) or its approximation employing an APD cascade 52 . The resulting marginal state conditioned on the detection outcome π , characterized by the POVM element � 2 (π) , is obtained with the probability of success In both the expressions (1) and (2) we use the lower right indices to emphasize which modes the operators and channels act on. Starting from the inner-most component, the initial TMSV state is denoted with where the parameter γ ∈ R sets the experimentally controllable squeezing strength. We model the overall losses and inefficiencies in the preparation scheme as attenuation of the measured mode prior to its displacement. This can represented by a Gaussian quantum channel G η 2 (ρ) with its action on the mode given in terms of Kraus operators 65 where N 2 :=Â † 2Â 2 defines the photon number operator and Â 2 denotes the annihilation operator respective to the measured mode. The parameter η ∈ [0, 1] describes the efficiency of the preparation circuit. Subsequently the converse 1 − η characterizes the overall losses and inefficiencies in the preparation scheme. The displacement of the second mode is given by the unitary operator D 2 (ξ ) = exp(ξÂ † 2 − ξ * Â 2 ) , where ξ ∈ C is the displacement amplitude 66 . In a more realistic analysis of the preparation circuit it would be straightforward to include the propagation losses affecting the mode carrying the resulting state. This form of decoherence can be accounted for by modifying the squeezing strength of the nonlinearly squeezed state 67 . Consequently we do not consider this additional attenuation since it does not influence the fundamental properties of these non-Gaussian states.
From the experimental perspective the parameters γ and ξ can be fine tuned to engineer a desired state ρ with optimal performance given particular experimental configuration characterized by the efficiency η and conditioning on the detection outcome π with respective POVM element � (π).
We can utilize this scheme to prepare a variety of quantum states. Consider now a lossless configuration employing an ideal PNRD. Its POVM elements correspond to projectors |f ��f | onto individual Fock projective measurement prepared state entangled state Figure 1. Variation of the conditional preparation scheme. We start with a two mode squeezed vacuum state |γ � . One of its modes is then displaced with D (ξ ) and measured, using either APD, PNRD or an APD cascade approximating PNRD. The detection outcome is characterized by the POVM element ˆ . We model overall losses and inefficiencies within the scheme using a beam splitter with intensity transmittance η to represent attenuation of the signal state in the setup. www.nature.com/scientificreports/ states f . The output state, conditioned on the detection of a particular Fock state f , is then proportional to ∞ i=0 µ i (γ )[D(ξ )] fi |i� where the coefficients µ i (γ ) follow from the definition of the TMSV state and [D(ξ )] fi = �f |D(ξ )|i� are matrix elements of the displacement operator. By tuning the parameters γ and ξ we can construct a set of states parametrized by the possible combinations of the µ i and [D(ξ )] fi coefficients.
Possible applications of this scheme include construction of generally non-classical superpositions of Fock states 38,39 and, in particular, non-linearly squeezed non-Gaussian states 14,63 . Every application can be translated into constrained optimization of the tunable parameters with the constraint and objective functions embodying the nature of the particular application.
For example, if one were to construct a specific state |ψ� , the optimization objective could be to maximize some metric of similarity with the target state, e.g., fidelity. It would also be practical to construct the state with non-negligible probability of success. This requirement could be expressed either as a constraint P ≥ τ allowing only solutions with the probability greater than some threshold, or as an additional optimization objective in multi-objective optimization.
The constraint and objective functions generally involve the success probability (2) and the resulting density operator (1). Both of which can be obtained by simulating the preparation procedure numerically on a classical digital computer. But alas, numerical simulations come with their own hurdles which will be identified and subsequently addressed in the following sections.

Perils of numerical simulation of CV systems
Classical digital computers 68 encode information into finite sequences of bits and it is therefore impossible to represent arbitrary real numbers. The standard approach [59][60][61][62]69 is to approximate real numbers with floating point (FP) numbers. Real numbers are then rounded to their closest representable FP neighbors. This generally introduces rounding errors. To make matters worse, FP arithmetic with FP numbers does not necessarily produce exactly representable floating point numbers. Results of FP arithmetic must be rounded, possibly introducing additional rounding errors [59][60][61][62]69 . Consequently complex sequences of arithmetic operations possess the potential to accumulate and even amplify rounding errors. Even the most straightforward tasks such as adding up a sequence of FP numbers can produce widely different results with varying degrees of accuracy based on the algorithm of choice [59][60][61] . Rounding error analysis is therefore a crucial part of algorithm design 58-61 and commonly used numerical algorithms are frequently accompanied by rigorous rounding error analysis. Nevertheless numerical simulation cannot be considered completely accurate as the error analysis only establishes upper bounds on the numerical errors 58,60-62 .
The practical concerns, when dealing with numerical simulation, are therefore always related to size of the errors, rather than to their presence. This is a familiar concept in physics, a discipline which is well acquainted with limited precision of measured quantities 70,71 . Numerical simulation of CV systems suffers from further issues related to the fundamental representation of quantum states and quantum operations. CV states reside in infinite-dimensional Hilbert spaces and can be, in principle, described in two distinct ways. The first description employs continuous functions, either wave functions given in position or momentum representation, or quasi-probability distributions 5-7 which combine the two quadratures. The practical issue with this approach is the continuous nature and generally infinite support of these functions as their support must be limited to finite intervals and both their domains and ranges discretized during numerical integration 61,62 , introducing additional numerical errors.
Alternatively we can take the advantage of the discrete Fock basis spanned by eigenstates of the number operator. This basis is still infinite but, unlike in the case of basis spanned by eigenstates of continuous operators, the number of its elements is countable. While exact representation of CV states in Fock basis remains impossible, we can truncate the basis to a finite number of elements and approximate the original Hilbert space with this truncated, finite-dimensional, restriction. We can thusly avoid discretization errors and deal with truncation errors instead. Consequently numerical simulations utilizing truncated Hilbert spaces spanned by truncated Fock basis are often employed in detailed analysis of CV quantum circuits.

Formal definition of truncated Fock spaces. Let H ∞ denote the original Hilbert space and let
The basis therefore remains orthonormal. The linear hull of S F forms the F dimensional truncated Fock space (TFS) H F . So far we have only defined TFS itself and the transition from FB to TFB. In the following we define the transition of vectors from H ∞ into H F and linear operators from defines its truncated analogue on L (H F ) . A natural extension of this approach allows for transitions from higher-dimensional spaces to lower-dimensional spaces. www.nature.com/scientificreports/ Navigating truncated Fock spaces. In this description, pure quantum states become complex F dimensional vectors of numbers, linear operators turn into complex F × F matrices and the operations we would otherwise perform, reduce to linear algebraic expressions such as matrix multiplication, Kronecker products and matrix traces. There is, however, a hefty price to be paid for this simplification, manifesting in the form of truncation errors with several distinct effects on the simulation. Firstly, it is impossible to represent general quantum states exactly. Take an arbitrary quantum state |ζ � ∈ H ∞ and its truncated variant trunc F {|ζ �} ∈ H F . The quality of the truncated state can be determined from its normalization, or rather the lack of it, using the cutoff error where c ζ (i) = [|ζ �] S i ≡ �i|ζ � are the vector components of the state |ζ � in Fock representation. In essence the quality of the representation is loosely given by the support of the state relative to the dimension of the TFS. This is not the only conceivable metric, but it is a convenient one as it is straightforward to calculate.
Secondly, the algebraic structure of the space changes with the transition to finite dimension. As a result the usual commutation rules no longer apply since for any pair of operators Ĝ and Ĥ the relation does not necessarily hold anymore. We can illustrate the change in algebraic structure on bosonic creation and annihilation operators. In the regular infinite-dimensional case we have [Â,Â † ] =1 , that is, the two operators commute to identity. With the truncated commutator the result remains the same which is an identity matrix of the corresponding dimension F . Conversely the commutator of the truncated annihilation and creation operators differs from identity in the final element on the diagonal which can be understood as a truncation error due to the product of two truncated matrices.
Thirdly and finally, replacing infinite-dimensional operators in arguments of operator functions with their truncated versions may not be without consequences. Consider an operator function f (Q) . In principle for general operator arguments. This has grave consequences for numerical simulation of unitary evolution. It is customary to approximate the exponential operator, trunc F {exp(Q)} , with the matrix exponential expm (trunc F {Q}) of the truncated operator argument 53 . However, this method can not be relied upon as trunc F {exp(Q)} � = expm (trunc F {Q}) . We must therefore seek alternative approaches: there are three primary techniques available for numerical simulation. The first one relies on the knowledge of a closed form formula for elements of the unitary operator. It has to be derived analytically and is not always attainable. The second method, proposed in the recent paper 53 , is numerical and derives individual elements of unitaries by recurrent formulae. In the third approach the matrix exponential is simply computed with the truncated matrix argument as expm (trunc F {Q}) and the dimension of the computation space is chosen large enough so that the errors are irrelevant in the particular simulation.
Neither approach is perfect. Each suffers from specific numerical errors. This is a valid concern even for the first method which uses analytical forms: it is because mathematical expressions, especially those involving factorials, large powers of non-negligible numbers or relying on special functions, which are often defined using similar expressions or recurrent formulae, still need to be evaluated numerically with finite precision in floating point arithmetic, leading to introduction and eventual accumulation of rounding errors. The numerical errors cannot be straightforwardly calculated without a priori knowledge of the ideal operator or without thorough numerical analysis of rounding errors, an area of expertise that is mostly out of the scope of theoretical physics and therefore scarcely present in research reports.
In the following section we apply these methods of construction to the simplest experimentally testable example, coherent displacement, and use this particular case study to demonstrate the fundamental shortcomings of each approach.

The curious case of coherent displacement
Coherent displacement is a fundamental Gaussian operation in quantum optics used in a broad range of quantum protocols for quantum state preparation, manipulation, and measurement [5][6][7]13,14,66 . Coherent displacement is represented by the unitary operator where ξ ∈ C gives the displacement amplitude and Â ,Â † represent the annihilation and creation operators. It is one of the operations for which a closed form formula exists 66 , given as where L α β (x) denotes the associated Laguerre polynomial function 72 . This relation only covers the lower triangular matrix; the rest of the matrix can be easily recovered from (6) using www.nature.com/scientificreports/ The formula (6) can be computed in multiple different ways with varying numerical accuracy impacted by the simplifications made in the expression and the order of their evaluation. When implemented exactly as it stands in (6), it is plagued by the limitations of FP arithmetic. Its first term underflows for comparatively large m, while the second term overflows for |ξ | > 1 and large enough difference m − n . When both the numerical underflow and the overflow coincide, the ill-defined expression 0 × ∞ is evaluated, resulting in error. We discuss the circumstances in detail in Sect. S1 of the Supplementary material and establish a set of acceptable combinations of the m, n and |ξ | parameters such that formula (6) is always well defined. We can utilize the recurrent formulae 53 or the plain matrix exponential 73,74 with a truncated argument instead of the closed form formula (6). While we can not ascertain their accuracy without a priori knowledge of the ideal operator, we can easily determine whether the generated matrices G are outright incorrect by checking the normalisation of displaced truncated Fock states {|0� (F) , . . . , |F − 1� (F) } . It corresponds to the sum of squared absolute values of elements in the jth column of the truncated displacement matrix G := trunc F {D(ξ )} or its approximation employing the matrix exponential expm (trunc F {Q}) with truncated argument where we set In Fig. 2 we show the normalisation (8) for trunc 101 {D(3 − 2ı)} constructed using the closed form formula (6), represented with a blue solid line, the recurrent formula 53 shown with a red dash-dotted line, and approximated with the matrix exponential (black dashed line). We utilize double precision 60,69 in the computation and try to avoid numerical issues plaguing the direct method (6) by keeping the working dimension sufficiently low. There are two regions of qualitatively distinct behaviour in the plot. The first region, spanning the first 40 Fock states, shows correct normalization for all three methods of construction. In the following region the normalisation dwindles for both the closed form and the recurrent formulae whilst the matrix exponential remains incorrectly normalized. It remains normalized only because the matrix exponential function, by definition, produces unitary matrices from anti-Hermitian arguments. Unitarity is not necessarily the desired outcome here since the goal is to obtain the correct trunc 101 { expm (Q)} matrix rather than the computed approximation expm (trunc 101 {Q}).
Let us explicitly discuss the issue at hand. The displacement operator (5) is unitary by definition. Columns of its matrix representation can be understood as coefficient vectors of displaced Fock states. In the infinitedimensional case these states should be normalized, that is the vector 2-norm 75 of each column should satisfy �D(ξ ) j � 2 ≡ 1 ∀ j ∈ S . However, this will not generally hold in finite dimension where we can find a threshold state |τ � (F) ∈ S F that, when displaced, will not be properly represented on the TFS. The states j ≥ τ will suffer from non-negligible errors (3), making their normalization � trunc F {D(ξ )} j (F) � 2 < 1.
The plot in Fig. 2 reveals that when the matrix is constructed via (6), the higher states are correctly denormalized. Conversely the matrix exponential produces incorrectly normalized states. In this context such behavior can be considered a manifestation of truncation errors.
The normalisation of the recurrently computed matrix starts to rise exponentially somewhere around j ≈ 50 due to accumulation of rounding errors. This behavior depends on the chosen ξ and the breakdown is more prominent when ξ is large. Here the displacement ξ = 3 − 2ı was chosen to emphasize this effect. For instance, when ξ = 1 , a similar exponential breakdown appears for j ≈ 400 instead.

Truncated approximate matrix exponential (TAME)
So far we have seen that, when it comes to numerically generating truncated representations of unitary operators, both direct calculation and the recurrent formulae have fundamental issues leading to significant rounding errors or numerically invalid expressions. The matrix exponential function avoids these issues mostly at the cost www.nature.com/scientificreports/ of truncation errors and their subsequent amplification. However, the observations in Fig. 2 also suggest that these errors tend to be significant only in higher regions of said matrices. This opens up a new possibility of approximating the exponential operators. We can use the matrix exponential on a sufficiently higher dimension d 1 and only then truncate the result to the required d 0 , thus avoiding the erroneous areas, while, at the same time, keeping the computational dimension d 1 low enough to avoid needlessly increasing the time of computation. We call this approach truncated approximate matrix exponential (TAME). Consider the approximation of the truncated displacement operator, Here d 1 represents the initial working dimension and d 0 the final dimension of the target TFS. Following (5) In Fig. 3 we compare trunc 101 {D(3 − 2ı)} constructed using the closed form formula (6) and approximated with TAME. We chose the dimension d 0 and the displacement magnitude |ξ | to accommodate the limits established in Sect. S1 of the Supplementary material. The secondary dimension d 1 = 161 was chosen high enough to suppress the effects of truncation errors. The plot suggests that our method produces results equal to the closed form formula in terms of the normalisation (8). Further comparison of individual matrix elements reveals that, on average, the approximate matrix matches (6) up to 14 decimal places with the worst difference matching only up to 11 decimal places.
What remains to be determined is the proper choice, or rather the methodology of choosing a sufficiently large working dimension d 1 given the target dimension d 0 . In the subsequent paragraphs we are going to show that it is practical to set the dimension d 1 as small as possible. The de facto standard scaling and squaring matrix exponentiation algorithm 73,74 relies on matrix multiplication with the actual number of matrix products depending on the binary logarithm of the 1-norm 75 of the exponentiated matrix.
The 1-norm 75 of the trunc d 1 {Q} argument inside the matrix exponential within (9) reads where the final approximation holds asymptotically. Therefore the asymptotic computational complexity of the matrix exponential in (9)  ) . Consequently the computational complexity of (9) scales as O (d 2.807 1 log 2 d 1 ) under optimal conditions. It is therefore imperative to keep the dimension d 1 as low as possible. www.nature.com/scientificreports/ We propose a simple iterative algorithm for finding optimal d 1 . Suppose a sufficiently sized expm (trunc q {Q}) matrix is correct on some region spanning {|0� (u) , . . . , |u − 1� (u) } where u ≤ q . Suppose the matrix exponential ( expm ) algorithm is also consistent: for a differently sized trunc p {Q} matrix with dimension p > q the computed matrix exponential is correct on a region of at least the same size. Given these assumptions, which are upheld by the standard expm implementation 73,74 , we introduce the Algorithm 1 as follows. First we take the desired dimension d 0 of the correct region and set an equality tolerance ε 1 for small numbers: our condition with ε 1 = 10 −13 proclaims two numbers identical if they match up to their twelfth decimal place. Then we search for a pair of larger matrices such that their d 0 regions match. The search process is significantly simplified by taking the dimension of the second larger matrix to be constantly shifted from the first larger matrix. To improve its speed we always recycle one of the matrices in the next iteration instead of recalculating it every time. The depth of the search is specified by the factor h. In our experience the dimension is found somewhere well below q = 3 · d 0 in the case of displacement, hence we set the depth h above that. Once the search algorithm finishes successfully, we obtain d 1 .

Verification of approximated matrices
In general, we can not verify the matrix (9) constructed via TAME simply by comparing its elements against some exact solution for the obvious reason: if we knew the exact solution we would not be in this situation in the first place.
We have used normalisation (8), or more precisely the implied necessary condition of unitarity max ij [G] ij ≤ 1 , to detect outright incorrect matrices in Fig. 2, but alas, necessary conditions alone can not be used to prove the matrix correct. In Fig. 2 we determined that employing the recurrent formula 53 in construction of trunc F {D(ξ )} was ill-advised due to accumulation and consequent amplification of rounding errors over the course of the computation. While we can not safely use the recurrent formula to construct an arbitrary truncated displacement matrix, we can use it to determine whether a candidate matrix, for example one constructed via TAME (9), possesses appropriate structure as the formulae define relations between neighboring matrix elements.
We can repurpose the relations Eq. In Fig. 4 (a) the matrix is structurally correct, with the maximal difference still matching up to 11 decimal places. On average the differences fall below the unit round-off, essentially making the errors negligible. In (11)  www.nature.com/scientificreports/ Fig. 4 (b) the matrix constructed using the plain matrix exponential maintains the correct structure in the first third of its columns, however, the truncation errors begin to manifest at that point. This can be observed as an exponential explosion in the maximal difference (around the 75th column) and a steady rise in the mean value. We saw a similar manifestation of truncation errors in Fig. 2 where the columns incorrectly retained their normalization as if the truncated matrix remained unitary.

Numerical simulation of the preparation circuit
The CV nature of the preparation scheme in Fig. 1, described with relations (1) and (2), makes its exact numerical simulation not only impractical, but outright impossible. We can, however, perform an approximate numerical simulation of the formulae on a TFS. We have already proposed TAME as the method for approximating the truncated displacement operator. We have yet to ascertain a key ingredient of the simulation. We must determine the optimal dimension d 0 of the TFS, which should be large enough to support all the quantum states occurring in the simulation. Following the Fig. 1, we begin with the TMSV state. One of its modes is attenuated by the G η 2 channel. This only reduces its energy and, as a consequence, the required support shrinks in size. We can therefore safely disregard the attenuating channel and simplify the expression for the marginal state (1) into ̺ ∝ tr 2 D 2 (ξ )|γ � 1,2 �γ |D 2 (ξ ) †� 2 . We then require that both the initial and the displaced TMSV states are faithfully approximated on the d 0 dimensional TFS for all the possible values of γ and ξ . By taking the largest displacement ξ ⋆ and squeezing rate γ ⋆ considered in the simulation, we can iteratively determine d 0 as the least dimension such that the cutoff error (3) falls below some threshold ε 0 . This condition reads for the displaced TMSV state. While the coefficients [|γ ⋆ �] S ii = cosh −1 γ ⋆ tanh i γ ⋆ of the TMSV state are determined trivially, the matrix elements [D(ξ ⋆ )] S ij of the displacement operator can not be, in general, obtained analytically with (6) and we must employ alternate means such as TAME.
In the following sections we use the numerical methodology we developed to determine the benefits of using PNRD, APD, and APD cascades in a pair of applications of the preparation circuit. First we discuss preparation of non-linearly squeezed states (Section "Engineering non-linearly squeezed states") and then follow with construction of well defined non-classical superpositions of Fock states (Section "Preparation of high fidelity qubit in Fock basis"). In both applications the figures of merit are functions depending on the resulting density matrix (1) and the associated probability of success (2). We approach the analysis with a rudimentary grid based exploratory strategy for optimization. We divide the [0 ≤ γ ≤ 1] ⊗ [0 ≤ ξ ≤ 1] region into equidistant 1001 × 1001 grid of points p j := (γ j , ξ j ) and evaluate the numerically approximated relations (1) and (2) for each point p j and each www.nature.com/scientificreports/ experimental scenario q i := (η i , π i ) defined by the overall efficiency η i ∈ {0.80, 1.00} of the setup and expected measurement outcome π i respective to the POVM elements. These entail representing the click of the ideal APD and the first six PNRD outcomes relevant in our preparation scheme, as well as PNRD approximations employing APD cascades where � M n := d 0 −1 k=0 p M (n|k)|k� 2 �k| . The POVM elements ˆ M n represent outcomes where exactly n detectors click within APD cascade comprising M detectors 52 . The individual probabilities p M (n|k) read This way we procure an assortment of probabilities P(i, j) and density matrices ̺(i, j) corresponding to the p i and q j sequences. We then utilize these values in objective and constraint functions, that will be discussed in detail in the following sections, to analyze the performance of the preparation scheme in particular applications and its response to different experimental configurations.

Engineering non-linearly squeezed states
Nonlinear squeezing was originally introduced 51 as a measure quantifying the quality of approximate cubic states suitable for optical measurement-induced quantum gates 37,51 . It has been shown to apply to higher ordered phase squeezing gates as well 37 and was recently discussed in detail 67 . The ideal cubic operation facilitates unitary evolution with interaction Hamiltonian proportional to X 3 . When approximatively implemented in the measurement induced fashion 51 its action in the Heisenberg picture can be represented by the operator transformation where the X S ,P S operators correspond to the signal state and X A ,P A to some ancillary mode. The first terms of both relations correspond to the ideal cubic interaction exp(ı ν 3X 3 ) . The additional term ( P A − νX 2 A ) represents the nonlinear quadrature of the ancillary mode and embodies the undesirable noisy contribution. It can be suppressed by choosing an appropriate ancillary state with the right structure. Effects of this contribution, or more precisely its variance and mean, vanish for the ideal cubic state. In general, neither the variance nor the mean vanish for physical approximations of the ideal cubic state. Good approximations, however, minimize their values and consequently the variance of this contribution may be used to quantify the quality of these approximate cubic states.
The preparation scheme presented in Fig. 1 can be utilized for production of quantum states approximating the ideal cubic state. We have discussed the methodology of the simulation in detail in Section Numerical simulation of the preparation circuit. In essence we search for optimal values of squeezing γ and displacement ξ that lead to high quality cubic state approximations while maximizing the probability of successful preparation. The optimization is performed for various experimental scenarios involving different detectors and taking a range of overall losses into account.
To measure the approximation quality we adapted the nonlinear quadrature and the concept of nonlinear squeezing discussed in 51 to fit our simulation. We employ the nonlinear quadrature Ŷ = µP − √ 2 −1 µ −2X2 and use its variance V (ρ) = �(Ŷ − �Ŷ �ρ ) 2 �ρ to measure the nonlinear squeezing of arbitrary states ρ . Potential effects of Gaussian squeezing 51,67 on V (ρ) are eliminated by minimizing over the parameter µ . Consequently we base our analysis on the minimized quantity M(ρ) := −1 G min µ V (ρ) normalized with respect to the minimal variance G := minρ G min µ V (ρ G ) ≡ 0.75 achievable by Gaussian states ρ G 67 .
The numerical simulation yields density matrices ̺(i, j) along with the P(i, j) probabilities of success corresponding to different experimental parameters. We then compute the individual moments required in the calculation of V (ρ) from the elements of density matrices ̺(i, j) . We avoid the matrix representation of the operators in the computation to avert truncation errors and employ closed form formulae instead. The minimization with respect to µ within M(ρ) is solved analytically.
We thus obtain M(i, j) values for their respective ̺(i, j) matrices and P(i, j) probabilities. We then divide the dataset corresponding to each experimental scenario q i into bins based on values of the variance M(i, j) and find the maximal attainable probability P(i, j) within each bin.
In Fig. 5 we present a comparison of the attainable variance M(ρ) as a function of success probability P. We examine different detection outcomes, in particular the PNRD projection onto |3� (red line) and its three approximations realized through an APD cascade 52 where three APD detectors out of four (dashed magenta), five (magenta) and ten (blue) click. Their respective POVM elements ˆ 4 3 , ˆ 5 3 and ˆ 10 3 are given by the relation (14). We consider a single APD detector (black line) as well. The plots show (a) 99% , (b) 90% , and (c) 80% transmission efficiency η . The results are normalized with respect to the minimal variance achievable by Gaussian states. The optimal cubic state approximations 51 v ⋆ ∈ H v constructed on v dimensional TFS are marked with dashed horizontal lines. These states were found by searching for pure states spanning the first v Fock states that would (16) X S →X S , www.nature.com/scientificreports/ minimize the variance M(ρ) of the non-linear quadrature 51 . Their inclusion makes it possible for qualitative comparison with the states produced by our scheme.
In general using PNRD yields the best results. In the idealized scenario with 99% efficiency the PNRD projecting onto |3� approaches the variance of the optimal H 4 non-linearly squeezed state 4 ⋆ . It also attains the best values consistently across the considered transmission efficiencies, therefore producing comparatively better approximations of the cubic state than either the APD cascades or a single APD. In the 90% and 99% regimes, the APD cascade comprising ten detectors promises better performance than a single APD or any other cascade configuration for that matter. In the low-efficiency mode ( 80% ) we can see that a single APD outperforms APD cascades for probabilities of success greater than 5% . This can be attributed to the imperfections inherent to APD cascades 52 . Their flaws become emphasized with increased loss, rendering a single APD to be the better choice.
In conclusion, unless a PNRD capable of distinguishing at least three photons is available, it is advantageous to use a single APD in any practical scenario with non-ideal transmission efficiency as long as success probabilities larger than approximately 5% are desired. The advantage of a single APD can be offset by using an exorbitant number of detectors within APD cascade.

Preparation of high fidelity qubit in Fock basis
The qubit-like superposition |ϑ� := cos ϑ|0� + sin ϑ|1� represents one of the simplest non-Gaussian quantum states of light. It serves an important role in quantum information processing 84 and is one of the resources available in contemporary experimental quantum optics [85][86][87] . As such it is has been employed in experimental demonstrations of various theoretical concepts including witnessing of non-Gaussianity 88-91 and hybrid entanglement 44,92 in quantum communication.
This family of quantum states can be produced with the preparation scheme we have previously introduced in Fig. 1. We can search for the optimal squeezing γ and displacement ξ parameters to obtain a given target state |ϑ� with sufficient fidelity F = �ϑ|ρ|ϑ� and maximal performance in terms of success probability.
We compute the corresponding F ϑ (i, j) values for the P(i, j) probabilities and ̺(i, j) density matrices obtained in the simulation described in detail in Section Numerical simulation of the preparation circuit. We then divide the dataset for each experimental scenario q i into bins comprising subsets of data satisfying F ϑ (i, j) ≥ τ where τ specifies a moving fidelity threshold. The maximal attainable probability P(i, j) is then found for each subset.
In Fig. 6 we demonstrate the relative improvement in probability of successfully engineering |ϑ� states by employing different detectors instead of a single APD. We consider a pair of target states, π 3 and π 6 , both evaluated for 99% and 80% transmission efficiencies. These target states were chosen to probe the improvement for unbalanced superpositions biased either towards |0� or |1� states. In the plot we show the result obtained for projection onto |1� (red line) realized by PNRD and the results obtained with its approximations realized through APD cascades where a single detector out of ten ( ˆ 10 1 , magenta), five ( ˆ 5 1 , blue) and two ( ˆ 2 1 , black) clicks. The POVM elements ˆ M n of the cascades were defined in (14). The figure of merit is defined as L := (log 10 P • − log 10 P APD ) with P • respective to individual detection outcomes. Figure 5. A comparison of attainable variance M(ρ) as a function of success probability. The variance is normalized with respect to the minimal variance achievable by Gaussian states. We use the same vertical and horizontal axes in the plots to show the contrast between the almost ideal (a) and lossy (b, c) scenarios with 99% , 90% and 80% transmission efficiencies. Horizontal dashed lines are used to mark the optimal cubic state approximations v ⋆ ∈ H v constructed on low-dimensional TFS. We encode the information about the POVM elements as follows: APD click with solid black line, PNRD projection onto |3� with solid red, APD cascades comprising four ( ˆ 4 3 , dashed magenta), five ( ˆ 5 3 , magenta) and ten ( ˆ 10 3 , blue) detectors where three detectors click. Overall, utilizing the PNRD |3� (solid red) produces states with lowest non-linear variance, therefore producing comparatively better approximations of the cubic state. In both (b) and (c) a single APD outperforms the APD cascades comprising five and four detectors for probabilities greater than 1% . In this regime the cascade comprising ten detectors still offers advantage over single APD. In (c) a single APD outperforms APD cascades comprising either four, five or ten detectors for success probabilities larger than roughly 5%. www.nature.com/scientificreports/ In general, conditioning on the PNRD |1� detection outcome yields the best results. In the high-fidelity regime with 99% efficiency the relative improvement is roughly tenfold for π 3 and roughly four times better for π 6 . The APD cascades comprising ten and five detectors follow. The relative lead of the PNRD diminishes in the 80% efficiency regime. Its advantage also dwindles when we consider target states closer to |0� , such as the π 6 state. While the cascade comprising two detectors falls short in every case, it still outperforms a single APD, albeit not by a lot.

Conclusions
We have analyzed the numerical accuracy of several currently available methods 53,66 used in construction of the truncated coherent displacement operator, an essential ingredient of state preparation in quantum optics 14,30,32,36 and many protocols used in quantum information processing [5][6][7]13,14,66 . We have proposed an alternative approach promising a better accuracy. Our method is based on the standard matrix exponential 73,74 with truncated argument. We compute the matrix exponential on a higher-dimensional space and truncate the resulting matrix to the target dimension, thus stripping erroneous matrix elements away from the truncated displacement operator. To avoid negatively impacting computational performance, the higher dimension should be ideally kept as low as possible. To this end we provide an off-line search algorithm that can be used to determine its optimal value. To ascertain the accuracy of the resulting matrix we complement the construction method with a verification strategy based on the recurrent formulae discussed in 53 .
We have used our construction method for analysis of non-Gaussian state preparation scheme based on suitable manipulation of a two mode squeezed vacuum with subsequent photon counting measurement 14,39,63,64 in the context of engineering non-linearly squeezed cubic states 14,38,51 for measurement induced cubic gates 14,37,51 and construction of qubit-like superpositions between vacuum and single photon states. The latter application can be verified experimentally with currently available technology. We have compared the effects of different detection mechanisms, including APD, PNRD and its approximations using APD cascades 52 with varying number of APD detectors, to determine practical approach towards state preparation. In our analysis we have optimized the free parameters of the prepraration scheme, the initial squeezing and the displacement, to attain optimal results in both applications. This analysis also provides additional metric which can be used to quantify the quality of APD cascades. We have found that in practical applications when PNRD is not available, using a single APD to engineer non-linearly squeezed states offers better performance compared to employing APD cascades comprising small number of detectors. We attribute this counter-intuitive result to the imperfections inherent to APD cascades 52 which are exaggerated with increased loss; these flaws became significant for 20% overall loss. The primary cause of this behaviour lies within the employed avalanche detectors as a single click may be triggered by multiple photons. While this is a critical issue for multi-photon state engineering, it is not as significant for single-photon states. We have determined that using APD cascade, even if one comprising only a pair of APD detectors, improves upon using a single APD in preparation of high-fidelity non-Gaussian qubit-like superpositions. Figure 6. Benchmarking the performance of PNRD and its approximations using APD cascades relative to a single APD detector in preparation of particular superpositions |ϑ� := cos ϑ|0� + sin ϑ|1� parametrized with ϑ ∈ R . The PNRD projection onto |1� is represented by red line, whereas the magenta line corresponds to APD cascade comprising ten detectors where a single detector clicks ( 10 1 ), blue line to cascade of five detectors ( 5 1 ) and black line depicts the case with two detectors ( 2 1 ). The plots demonstrate preparation of two distinct states while considering different transmission efficiencies. In (a) and (c) we aim to prepare π 3 . In (b) and (d) we target π 6 . In plots (a) and (b) we consider 99% transmission efficiency, while in (c) and (d) we consider mere 80% . The horizontal dashed line marks a twofold improvement in each plot. (a) In the high-efficiency regime we obtain roughly tenfold improvement in the high-fidelity preparation of the π 3 state. The PNRD approximations using more than two detectors offer a significant improvement as well. (b) While the advantage of PNRD is reduced when targeting the state biased towards |0� , it still offers roughly four times better performance. (c) The PNRD detector and its approximations offer 2-3x higher success probability even in the lower-efficiency scenario. (d) The PNRD detector and its approximations offer roughly twofold improvement. www.nature.com/scientificreports/ Our circuit variant can be extended to utilize multiple displacements and detectors. Similarly the proposed method for numerical construction of truncated unitary operators is not limited to displacement only can be applied to, for example, squeezing or cubic phase-shift operators. Furthermore, the method could be employed in preparation of a wider variety of quantum states with practical applications, such as GKP states 14 .

Data availability
The datasets generated and analyzed in the current study are available from the corresponding author on reasonable request.

Code availability
The source code used to generate and analyze the datasets is available from the corresponding author on reasonable request.