Theoretical error performance analysis for variational quantum circuit based functional regression

The noisy intermediate-scale quantum devices enable the implementation of the variational quantum circuit (VQC) for quantum neural networks (QNN). Although the VQC-based QNN has succeeded in many machine learning tasks, the representation and generalization powers of VQC still require further investigation, particularly when the dimensionality of classical inputs is concerned. In this work, we first put forth an end-to-end QNN, TTN-VQC, which consists of a quantum tensor network based on a tensor-train network (TTN) for dimensionality reduction and a VQC for functional regression. Then, we aim at the error performance analysis for the TTN-VQC in terms of representation and generalization powers. We also characterize the optimization properties of TTN-VQC by leveraging the Polyak-Lojasiewicz condition. Moreover, we conduct the experiments of functional regression on a handwritten digit classification dataset to justify our theoretical analysis.


INTRODUCTION
The imminent of quantum computing devices opens up new possibilities for exploiting quantum machine learning (QML) [1,2,3] to improve the efficiency of classical machine learning algorithms in many new scientific domains like drug discovery [4] and efficient solar conversion [5].Although the exploitation of quantum computing devices to carry out QML is still in its early exploratory states, the rapid development in quantum hardware has motivated advances in quantum neural network (QNN) to run in noisy intermediate-scale quantum (NISQ) devices [6,7,8,9,10], where not enough qubits could be spared for quantum error correction and the imperfect qubits have to be directly employed at the physical layer [11,12,13].Even though, a compromised QNN is proposed by employing a quantum-classical hybrid model that relies on an optimization of the variational quantum circuit (VQC) [14,15].The resilience of the VQC to certain types of quantum noise errors and the high flexibility concerning coherence time and gate requirements admit VQC to apply to many promising applications on NISQ devices [16,17,18,19,20,21,22,23].Although many empirical studies of VQC for quantum machine learning have been reported, its theoretical understanding requires further investigation in terms of representation and generalization powers, particularly when the non-linear operator is employed for dimensionality reduction.This work introduces a tensor-train network (TTN) on top of the VQC model to implement a TTN-VQC.The TTN is a non-linear operator mapping high-dimensional features into low-dimensional ones.Then, the resulting low-dimensional features go through the framework of VQC.Compared with a hybrid model where the operation of dimensionality reduction is constituted by a classical neural network (NN) [24], TTN can be genuinely realized by utilizing universal quantum circuits [25,26,19], and an end-to-end quantum neural network can be truly set up.In this work, we discuss the theoretical performance of TTN-VQC in the context of functional regression.Functional regression refers to building a vector-to-vector operator such that the regression output can approximate a target operator.In more detail, given a Q-dimensional input vector space R Q and a measurable U -dimensional output vector space R U , the TTN-VQC-based vector-to-vector regression aims to find a TTN-VQC operator f : R Q → R U such that the output vectors of f can approximate a desirable target one.In particular, this work concentrates on the error performance analysis for TTN-VQC-based functional regression by leveraging the error decomposition technique [27] to factorize an expected loss over the TTN-VQC operator into the sum of the approximation error, estimation error, and training error.We separately upper bound each error component by harnessing statistical machine learning theory.More specifically, we define F T V as the TTN-VQC hypothesis space which represents a collection of TTN-VQC operators.Then, given a data distribution D, assuming a smooth target function h * D and a set of N training data drawn independent and identically distributed from a data distribution D, for a loss function ℓ and an optimal TTN-VQC operator f * D ∈ F T V , an expected loss is defined as: which can be minimized by using an empirical loss as: Since the mean absolute error (MAE) [28] is a 1-Lipschitz continuous [29], the loss function ℓ is set as the MAE.Furthermore, we separately define f * D , f * S and fS as an optimal TTN-VQC operator, an empirical optimal operator, and an actual returned operator.Then, as shown in Figure 1, the error decomposition technique [27] factorizes the expected loss L D ( fS ) into three error components as: where L D (f * D ) is associated with the approximation error, RS (F T V ) is an empirical Rademacher complexity [30] over the family F T V , and ν refers to the training error that results from the optimization bias of gradient-based algorithms.The Rademacher complexity R(F T V ) can measure the model complexity and is particularly used for the regression problem [27].In this work, our theoretical results concentrate on the error analysis by upper-bounding each error component, and our empirical results are illustrated to corroborate our theoretical analysis.

Main Results
Our derived theoretical results in this work and the significance of TTN-VQC-based functional regression are summarized as follows: • Representation power: our upper bound on the approximation error is derived as , where U and M separately denote the number of qubits and the count of quantum measurement.The result suggests that the expressive capability of TTN-VQC can be mainly determined by the number of qubits, and the quality of the expressiveness is also affected by the count of quantum measurements.Larger U and M correspond to the fact that more algorithmic qubits and a longer decoherence time are necessarily required to ensure stronger representation power of TTN-VQC.Furthermore, since more qubits are more likely to result in the problem of Barren Plateaus of VQC during the training process, the introduction of PL condition is significant to handle the problem.
• Generalization power: we derive an upper bound on the estimation error concerning the empirical Rademacher complexity RS (F T V ), which is further upper bounded by a constant as Here, P , N , and K separately denote the input power, the amount of training data, and the order of multi-dimensional tensor; Λ k and Λ refer to the upper bounds on the Frobenius norm of TTN parameters.The result of the generalization power suggests that given the training data and model structure, the additive noise corresponds to a larger value of P which results in an upper bound on a weaker generalization capability.
• Optimization bias: the PL condition is employed to initialize the TTN-VQC parameters and the training error can be exponentially converged to a small loss value.The problem of barren plateau is a serious issue in the training process of the quantum neural network [31], especially for a randomized QNN architecture, the variance of gradients exponentially vanishes with the increase of qubits.In this work, we claim that the model setting based on the PL condition could be beneficial to the improvement of the TTN-VQC training.
Besides, our empirical results of functional regression are designed to corroborate the corresponding theoretical results of representation and generalization powers, and the analysis of optimization performance.

Related Work
The related work comprises theoretical and technical aspects.As for the theoretical point, Du et al. [32] analyzes the learnability of quantum neural networks with parameterized quantum circuits and gradient-based classical optimizer.A theoretical comparison between this work and Du et al. [32] is shown in Table 1, where our theoretical results mainly follow the error decomposition method [27,33].More specifically, in this work, we factorize an expected loss based on MAE over a TTN-VQC operator into three error components: approximation error, estimation error, and training error.We separately derive upper bounds on each error component and the results are summarized in Table 1.

Category This work Du et al. [32]
Learning problem Regression Classification Dimensionality reduction TTN N/A Representation power Table 1: A comparison of learning theory for VQC between this work and Du et al. [32] Besides, the techniques of this work rely on the TTN and VQC models.The TTN, also known as matrix product state (MPS) [34], was first put forth by Alexander et al. [35] in the applications  of machine learning.Chen et al. [26] employs MPS to extract low-dimensional features for VQC.Although this work leverages the TTN for dimensionality reduction, we rebuild the TTN as parallel neural network architecture, where the sigmoid activation function is separately imposed upon each neural network.In this work, we choose the TTN for dimensionality reduction for two reasons: (1) Although classical neural networks can be also applied for feature dimensionality [19,18,16], the classical-quantum hybrid system may take up more computational resources and it is intractable to place the classical neural networks on quantum devices; (2) numerous works have shown that classical neural networks can be converted into the TTN formats, and the TTN models can maintain or even outperform the classical counterpart [36,37,38].Moreover, since the VQC models have been widely used in the domains of quantum machine learning [39,40,41], we follow the standard VQC pipeline such that our theoretical results can be employed for the general VQC model.

Preliminaries
Before we delve into the detailed architecture of the TTN-VQC, we first introduce the basic components of TTN and VQC, which have been previously proposed and widely used in quantum machine learning.

Variational Quantum Circuit
As shown in Figure 2, we first introduce a VQC which is composed of three components: (1) Tensor Product Encoding (TPE); (2) Parametric Quantum Circuit (PQC); (3) Measurement.The TPE model, which is shown in Figure 2 (a), was proposed in [42] and it aims at converting a classical data x into a quantum state |x by adopting a one-to-one mapping as: where each x i can be strictly restricted in the domain of [0, 1] such that the conversion between x and |x is a reversely one-to-one mapping.
The PQC framework is illustrated in Figure 2 (b), where U quantum channels are utilized to correspond to currently accessible U qubits on NISQ devices.Here, the controlled-NOT (CNOT) gates realize the quantum entanglement and the single rotation gates R X , R Y , and R Z compose the PQC model with model free parameters The PQC model corresponds to a linear operator T θvqc that transforms the quantum input state |x into the output one |z .The PQC model in the green dash square is repeatably copied to compose a deeper architecture.
The measurement framework, as shown in Figure 2 (c), outputs expectation values with respect to the Pauli-Z operators, namely σ ] T .The expectation vector z refers to the classical data and it is connected to the operation of functional regression.

Tensor-Train Network
TTN refers to a tensor network aligned in a 1-dimensional array and is generated by repetitively singular value decomposition (SVD) [43] to a many-body wave function [25].To utilize the TTN for dimensionality reduction, in this work, we first define the tensor-train decomposition (TTD) for a 1-dimensional vector and a tensor-train representation for a 2-dim matrix.More specifically, given a vector x ∈ R D where D = Q K k=1 D k , we reshape the vector x into a K-order tensor X ∈ R D1×D2×•••×DK .Then, given a set of tensor-train ranks (TT-ranks) {R 1 , R 2 , ..., R K+1 } (R 1 and R 2 are set as 1), all elements of X can be represented by multiplying K matrices X [k] d k based on the TT format as: where the matrices X [k] d k is a scalar.Next, we are concerned with the TTD for a 2-dim matrix.A feed-forward neural network with U neurons has the form: If we assume that U = Q K k=1 u k , then we can reshape the 2-order matrix W as a D-order doubleindexed tensor W and it can be factorized into the TT-format as: where is a 4-order core tensor, and each element W [k] is a matrix.Then, we can reshape the input vector x and the output one y into two tensors of the same order: and we build the mapping function between the elements and the input tensor X d1,d2,...,dK and the output one Y u1,u2,...,uK as: W (d1,u1),(d2,u2),...,(dK ,uK ) X d1,d2,...,dK .
Then, by employing TTD for the 1-dim vector X d1,d2,...,dK and 2-dim matrix W (d1,u1),(d2,u2),...,(dK ,uK ) separately defined in Eq. ( 6) and ( 7), we attain that where d k refers to an element-wise multiplication of the two matrices, and u k is a scalar.Based on the framework of TTN, two necessary requirements need to be met as follows: (a) given a D-dimensional input vector, we need that In particular, the output dimension U in this work corresponds to the number of qubits.

Theoretical Results
This section first exhibits the architecture of TTN-VQC, and then we analyze the upper bounds on the representation and generalization powers and the optimization performance.

The Architecture of TTN-VQC
The TTN-VQC pipeline is shown in Figure 3, where (a) denotes the framework of TTN, (b) is associated with the VQC model, and (c) represents the operation of functional regression.The VQC model is based on the standard architecture as shown in 2.1.1,and the TTN is designed according to the framework in 2.1.2.To introduce the non-linearity to the TTN model, a sigmoid activation function Sigm(•) is taken for each Y k (j k ) such that which introduces the non-linearity to the TTN features and corresponds to a parallel neural network structure.
The parallel DNN structure is illustrated in Figure 4, where a K-order tensor X d1,d2,...,dK is first dK and each X [k] uK are non-linearly activated by applying the sigmoid activation function before multiplying them together into a K-order tensor Ŷu1,u2,...,uK .By iterating u k ∈ [U k ] and fixing other indices u 1 , u 2 , ..., u k−1 , u k+1 , ..., u K , we separately collect a vector associated with the k th order of Y.More significantly, the non-linearity introduced by the sigmoid function sets up a parallel DNN structure for TTN and helps to build a one-to-one mapping in the TPE framework because the sigmoid function compresses the functional values in the domain of (0, 1).Proposition 1 suggests Proposition 1 can be justified based on Eq. ( 4), where cos( π 2 x i ) and sin( π 2 x i ) are reversible one-toone functions because of each x i ∈ (0, 1).Then, we can deduce the original classical vector y given the quantum state |y .
The VQC outputs a classical vector z = [ σ ] T , and then z is connected to the framework of functional regression, where a fixed linear regression operator T lr further transforms z into the output vector.The MAE is taken to measure the loss value and the related gradients of the loss function, which are used to update the parameters of both VQC and TTN models.

Upper Bounds On the Approximation Error
Theorem 1 shows an upper bound on the approximation error.The upper bound on the approximation error relies on the theoretical analysis of the inherent parallel structure for the TTN model and the universal approximation theory utilized for neural networks [44,45,46].Theorem 1 suggests that the representation power of linear operator M • T θvqc • T y is strengthened by applying a non-linear operator T θttn (x).Theorem 1.Given a smooth target function h * D : R Q → R U and a classical data x, there exists a TTN-VQC g(x; where U and M separately refer to the number of qubits and the count of quantum measurement, and E[g(x; θ vqc , θ ttn )] represents an expectation value of the output measurement.
The upper bound in Eq. (11) implies that the number of qubits U and the count of measurement M jointly decide the representation power of TTN-VQC, and larger values of U and M are expected to lower the upper bound.However, a larger value U requires an advanced quantum computer with more logic qubits, but more qubits are likely to degrade the optimization performance because of the problem of Barren Plateaus.To strike a balance between a large number of qubits and low optimization bias, the PL condition is introduced to initialize the TTN-VQC model.

Upper Bounds on the Estimation Error
Theorem 2 suggests the upper bounds on the estimation error.The upper bound on the estimation error can be derived based on the empirical Rademacher complexity RS (F T V ), which is defined as: where N samples S = {x 1 , x 2 , ..., x N }, and ǫ = {ǫ 1 , ǫ 2 , ..., ǫ N } refers to a set of N Rademacher random variables taking on values 1 and −1 with an equal likelihood.The empirical Rademacher complexity measure how well the functional family F T V correlates with random noise ǫ on the dataset S, and it describes the richness of the family F T V : a richer family F T V can generate more functions f that better correlates with the random noise on average.Theorem 2. Based on the TTN-VQC setup in Theorem 1, the estimation error is upper bounded by the empirical Rademacher complexity 2 RS (F T V ), which is where F T T N and F V QC separately denote the family of TTN and VQC, P , Λ ′ and Λ k are constants, W(T θvqc ) refers to a matrix associated with the operator T θvqc , and W [k] (T θttn ) corresponds to a 4-order tensor of TTN, W F and W [k] F represent the Frobenius norm of a matrix and a tensor, respectively.
The upper bound on the estimation error in Eq. ( 13) shows when an input x and an initialized TTN-VQC model are given, a sufficiently large amount of training data N is needed to lower the related upper bound.On the other hand, the noise perturbation associated with the noisy power P noise imposed upon the input corresponds to a larger total power P = P in + P noise , which corresponds to a larger upper bound on the estimation error and accordingly weakens the generalization power.

Upper Bounds on Optimization Error
A QNN system always suffers from the problem of Barren Plateaus [31], which results from optimizing a non-convex objective function and the gradients may vanish almost everywhere in the training stage.To alleviate the problem of Barren Plateaus, we introduce a new initialization strategy based on the Polyak-Lojasiewicz (PL) condition [47,48,49].More specifically, given the set of model parameters θ = {θ ttn , θ vqc } for TTN-VQC, if an empirical loss function L S satisfies µ-PL, the L 2 -norm of the first-order gradient ∇L S concerning θ should satisfy the following inequality as: Theorem 3. If a 1-Lipschitz loss function L over the set of TTN-VQC parameters θ satisfies the PL condition, the gradient descent algorithm with a learning rate of 1 can lead to an exponential convergence rate.More specifically, at epoch T , we have where θ 0 and θ T separately denote the parameters at the initial stage and at the epoch T. Furthermore, given a radius r = 2 p 2L S (θ 0 )/µ for a closed ball B(θ 0 , r), there exists a global minimum hypothesis θ * ∈ B(θ 0 , r) such that the optimization error becomes sufficiently small.Furthermore, we show a necessary condition in Proposition 2 for a TTN-VQC operator f ∈ F T V to satisfy the µ-PL setup of L S (θ), which is related to the tangent kernel of the operator f .Proposition 2. For a TTN-VQC operator f ∈ F T V , we define the tangent kernel K f as ∇f (θ)∇f (θ) T .If a 1-Lipschitz loss function L S (θ) satisfies the µ-PL condition, λ min (K f ) represents the smallest eigenvalue of K f and meets the condition as: Theorem 3 suggests that the µ-PL condition for the TTN-VQC ensures an exponential convergence rate and the training loss can reach as low as 0. Proposition 2 can check if the µ-PL condition can be met by calculating its tangent kernel.Our theorems suggest that the TTN-VQC model meeting the PL condition can better deal with the problem of Barren Plateaus, but we cannot guarantee that the model with a low optimization bias has to meet the PL condition.In other words, the PL condition is one of the potential approaches to ensure the VQC handles the optimization issue.

Putting It All Together
Based on the derived upper bound, under the setup of µ-PL condition, the upper bounds on the error components can be combined into an aggregated upper bound as: The aggregated upper bound in Eq. (17) shows that the training error ǫ can be reduced to closely 0 with the setup of µ-PL condition, and the expected loss is mainly determined by the upper bounds on the approximation and estimation errors.

Empirical Results
To separately corroborate our theoretical analysis of the TTN-VQC, our experiments are composed of two groups: (1) to evaluate the representation power, the training and test datasets are set in the same clean environment; (2) to assess the generalization power of TTN-VQC, the test data are separately mixed by additive Gaussian and Laplacian noises, where the SNR levels are set as 8dB and 12dB, respectively.Our baseline system is a linear PCA-VQC model where the technique of principal component analysis (PCA) [50] is employed.PCA is a standard method to reduce data dimensionality by using a linear transformation in an unsupervised manner.Our experiments compare the performance of the TTN-VQC and PCA-VQC models, and particularly aim at verifying the following points: 1.The TTN-VQC can lead to better performance than PCA-VQC in both matched and unmatched environmental settings.2. Increasing the number of qubits can improve the representation power of TTN-VQC.3. Exponential convergence rates demonstrate our configurations of the TTN-VQC model satisfy the µ-PL condition.
We evaluate the performance of TTN-VQC on the standard MNIST dataset [51].The MNIST dataset aims at the task of handwritten associated with the number of qubits.In particular, we separately assess the models with 8 qubits and 12 qubits, and the parameters (U 1 , U 2 , U 3 ) are set as (2, 2, 2) for the 8 qubits and (2, 3, 2) for the 12 qubits.The stochastic gradient descent (SGD) [52] with an Adam optimizer [53] is utilized in the training process, where a mini-batch of 50 and a learning rate of 1 are configured.The 1-Lipschitz continuous function based on MAE is taken to meet the PL condition.

Experiments for Representation Power of TTN-VQC
To corroborate the Theorem 1 for the representation power of TTN-VQC, both training and test data are mixed with the Gaussian noise of the 15dB SNR level, and we compare the performance of TTN-VQC with PCA-VQC on the generated noisy settings.Figure 5 demonstrate the related empirical results, where TTN-VQC 8Qubit and TTN-VQC 12Qubit separately represent the TTN-VQC models with 8 and 12 qubits and PCA-VQC 8Qubit and PCA-VQC 12Qubit denote that the PCA-VQC models with 8 and 12 qubits, respectively.Our experiments show that the TTN-VQC can significantly outperform the PCA-VQC counterparts in terms of lower training and test loss values.Moreover, our results also suggest that more qubits can improve the empirical performance of both TTN-VQC and PCA-VQC models.Table 2 presents the final results on the test dataset.The TTN-VQC 12Qubit model owns more parameters than the TTN-VQC 8Qubit model (0.636Mb vs. 0.452Mb), but the former one attains better empirical performance in terms of lower MAE scores (0.0156 vs. 0.0597) on the test dataset.

Experiments for Generalization Power of TTN-VQC
To assess the generalization power of TTN-VQC, the test data are separately mixed with additive Gaussian and Laplacian noises with 8dB and 12dB SNR levels.Based on the well-trained TTN-VQC and PCA-VQC models with 8 qubits, we further assess their performance on the test data with Gaussian and Laplacian noisy conditions related to the evaluation of their generalization power.
Based on the upper bound of the generalization power in Theorem 2, given the input dataset, a more noisy setting corresponds to a larger P noise , which results in a larger total power P = P in + P noise .Thus, we corroborate our theorem in the experiment by evaluating the empirical performance under different noisy conditions.In the meanwhile, to highlight the advantage of non-linearity for TTN-VQC, we also compare the experimental results of both TTN-VQC and PCA-VQC.For one thing, Figure 6 suggests that the TTN-VQC models significantly outperform the PCA-VQC counterparts in the two noisy settings, and Table 3 shows the MAE scores of TTN-VQC and PCA-VQC models, where the TTN-VQC models achieve much better performance than the PCA-VQC ones in terms of lower MAE scores under all kinds of noisy environments.For another, we observe that the experimental performance of the TTN-VQC models under more adverse Gaussian and Laplacian noisy settings is degraded because of higher MAE scores, which corresponds to our theoretical analysis.Moreover, our derived upper bound on the estimation error is also associated with the amount of training data.To test the effect of training data for the generalization capability, the number of training data is gradually incremented from a subset of data to a whole set.In

DISCUSSION
This work focuses on the theoretical error performance analysis for VQC-based functional regression, particularly when the TTN is employed for dimensionality reduction.Our theoretical results provide upper bounds on the representation and generalization powers of TTN-VQC.Our theoretical results suggest that the approximation error is inversely proportional to the square root of qubits, which means that the increase of qubits can lead to better representation power of TTN-VQC.The estimation error of TTN-VQC is related to its generalization power, which is upper-bounded based on the empirical Rademacher complexity.The optimization error can be lowered to a small score by leveraging the PL condition to realize an exponential convergence based on the SGD algorithm.To our best knowledge, no prior works, such as a complete error characterization, have been delivered.
Our experiments of vector-to-vector regression on the MNIST dataset are designed to corroborate the theoretical results.We first compare the representation power of the TTN-VQC models with the PCA-VQC counterparts.We observe that more qubits and the non-linear property for TTN-VQC can improve the empirical performance that matches our theoretical analysis.Further, we assess the generalization power of TTN-VQC by taking different noisy inputs into account, and we demonstrate that more mismatched and noisy inputs can worsen the generalization power.Besides, the non-linear TTN-VQC models outperform the linear PCA-VQC models in terms of representation and generalization powers.That implies that the non-linearity of TTN-VQC can greatly contribute to the improvement of VQC performance.
We also note that the TTN-VQC models attain exponential convergence rates.The optimization error is eventually reduced to 0 in the training process, which corresponds to the PL condition in our theoretical analysis.Moreover, the empirical results on the test dataset consistently exhibit a decreasing trend.The empirical results imply that the model setup for TTN-VQC meets the PL condition and thus it can handle the problem of Barren Plateaus.Our future work will discuss how to initialize the VQC model based on the PL condition to minimize the optimization bias.
Furthermore, our theoretical results are built upon the Lipschitz loss function utilized for the regression problem, and the theoretical contributions can be certainly generalized to the classification tasks where the loss functions like hinge loss and cross-entropy are data-dependent Lipschitz continuity and the Lipschitz constant does not keep the same value on different datasets.

METHOD
This section aims at providing detailed proof of our theoretical results.We first present the upper bound on the representation power, and then we derive another upper bound on the generalization power.The analysis of optimization performance is also conducted based on the PL condition.

Proof for Theorem 1
The derivation of Theorem 1 is mainly based on the classical universal approximation theorem [44,45,46] and a parallel structure of TTN.We first assume g m (x; θ vqc , θ ttn ) as the m-th measurement for the TTN-VQC operator g(x; θ vqc , θ ttn ), and P M m=1 g m (x; θ vqc , θ ttn ) is defined as: where the operator H = T θvqc •T y refers to a unitary matrix, and M m denotes the m-th measurement and M ′ = P M m=1 M m .Moreover, H −1 is a reversely linear unitary operator of H, and g m refers to the function after the quantum measurement.Next, we can further derive that The k th channel of TTN is equivalent to a feed-forward layer of neural network with the sigmoid function.The input x[k] is derived from the reshape of X [k] and goes through the a feed-forward neural network with the weight matrix W[k] and sigmoid function.The output y [k] corresponds to the array for the k th order of Ŷ.

Proof for Theorem 2
Based on Eq. ( 9) and Figure 4, the k th channel is equivalent to a feed-forward layer of neural network with the sigmoid function.More specifically, the input X which further goes through the feed-forward layer with the weight matrix After the operation of sigmoid function, we have an output vector y As for the upper bound for the TTN-VQC model on the estimation error, we separately upper bound each term of the TTN and VQC families by leveraging the empirical Rademacher complexity.Moreover, we define RS (F

[k]
T T N ) as the functional family for the k th channel associated with Figure 7.
Thus, based on the Rademacher identities, we attain that RS (F T T N ) ≤ P K k=1 RS (F

RS (F
[k] Furthermore, we upper bound RS (F T T N ) by utilizing Talagrand inequality [54] and we obtain

RS (F
[k] E ǫ [ǫ i ǫ j ](x where we assume ||x n || 2 ≤ P k and accordingly Finally, we utilize the Cauchy-Schwarz inequality and obtain the result that where P = Similarly, we can also obtain the result that RS (F V QC ) ≤ P Λ ′ √ N with the constraint that ||W(T θvqc )|| F ≤ Λ ′ .Then, we complete the proof for Theorem 2.

Figure 1 :
Figure 1: An illustration of error decomposition technique.h * D is a smooth target function in a family of all functions Y X over a data distribution D; F T V denotes the family of TTN-VQC operators as shown in the dashed square; f * D represents the optimal hypothesis from the space of TTN-VQC operators over the distribution D; f * S denotes the best empirical hypothesis over the set of training samples S; fS is the returned hypothesis based on the training dataset S.

Figure 2 :
Figure 2: A VQC model consists of three components: (a) Tensor Product Encoding (TPE); (b) Parametric Quantum Circuit (PQC); (c) Measurement.The TPE employs a series of R Y ( π 2 x i ) to transform classical data into quantum states.The PQC is composed of CNOT gates and single-qubit rotation gates R X , R Y , R Z with free model parameters α, β, and γ.The CNOT gates impose the operation of quantum entanglement among qubits, and the gates R X , R Y , and R Z can be adjustable during the training stage.To build a deeper model, the PQC model in the green dash square is repeatably copied.The measurement converts the quantum states |σ z , ..., |σ (U ) z are connected to a loss function and the gradient descent algorithms can be used to update VQC parameters.

Figure 3 :
Figure 3: An illustration of the TTN-VQC architecture.(a) Tensor-Train Network (TTN); (b) Variational Quantum Circuit (VQC); (c) Functional Regression.T θttn and T θvqc represent the TTN and VQC operators with trainable parameters θ ttn and θ vqc , respectively.T y refers to a reversible classical-to-quantum mapping.The VQC model in the green dash square can be repeatably copied to generate a deep parametric model.The framework of functional regression outputs loss values and evaluate gradients of loss functions to update model parameters θ vqc and θ ttn .T lr refers to a fixed regression matrix.

Figure 4 :
Figure 4: Reformulating the TTN model in a parallel structure.Each element of the input K-order tensor X d1,d2,...,dK is factorized into K matrices X d k by utilizing TTD.Each X [k] d k goes through the TTN associated with model parameters W [k] .The sigmoid function is imposed upon the output Y [k] u k , and all Y [k]u k are all multiplied to form the output Ŷu1,u2,...,uK .

Figure 6 :
Figure 6: Empirical results of the vector-to-vector regression on the MNIST dataset to evaluate the generalization power of TTN-VQC and PCA-VQC with 8 qubits.There are two noisy settings on the test dataset to evaluate the performance of the TTN-VQC and PCA-VQC models: (a) Gauss-8dB and Gauss-12dB separately denote the Gaussian noisy conditions of 8dB and 12dB SNR levels; (b) Laplace-8dB and Laplace-12dB refer to the Laplacian noisy settings of 8dB and 12dB SNR levels, respectively.
10 digit classification, where there are 60, 000 examples for training and 10, 000 data for testing.In our experiments, we randomly sample 10, 000 in training data and 2, 000 in test data.Both training and test data are corrupted with noisy signals at different SNR levels, and the generated noisy data are taken as the input to the quantum-based models.The target of the models is set as the clean data during the training stage, where the model-enhanced data are expected to be as close as the target one.We measure the model performance in the test stage by calculating the L 1 -norm loss between enhanced data and target one.As the experimental baseline, a hybrid PCA-VQC model is conducted, where PCA serves as a simple feature extractor followed by the VQC as the classifier.The PCA-VQC represents a linear VQC model which is in contrast to a nonlinear one based on the TTN-VQC model.We include 4 PQC blocks in the VQC employed in the experiments.As for the experiments of TTN-VQC, the image data are reshaped into a 3-order 7 × 16 × 7 tensors.Given a set of ranks R = {1, 3, 3, 1}, we can set 3 trainable tensors as:

Table 2 :
Empirical results of TTN-VQC and PCA-VQC models on the test dataset.

Table 4 ,
we observe that a larger amount of training data leads to lower MAE scores which correspond to better generalization power.

Table 3 :
Empirical results of TTN-VQC and PCA-VQC models on the test dataset with either Gaussian or Laplacian noise with 8dB or 12dB SNR levels.

Table 4 :
Empirical results of TTN-VQC on datasets of different sizes.Three groups of data sizes are attempted and a large amount of training data achieves a lower MAE score.