Quantum machine learning with differential privacy

Quantum machine learning (QML) can complement the growing trend of using learned models for a myriad of classification tasks, from image recognition to natural speech processing. There exists the potential for a quantum advantage due to the intractability of quantum operations on a classical computer. Many datasets used in machine learning are crowd sourced or contain some private information, but to the best of our knowledge, no current QML models are equipped with privacy-preserving features. This raises concerns as it is paramount that models do not expose sensitive information. Thus, privacy-preserving algorithms need to be implemented with QML. One solution is to make the machine learning algorithm differentially private, meaning the effect of a single data point on the training dataset is minimized. Differentially private machine learning models have been investigated, but differential privacy has not been thoroughly studied in the context of QML. In this study, we develop a hybrid quantum-classical model that is trained to preserve privacy using differentially private optimization algorithm. This marks the first proof-of-principle demonstration of privacy-preserving QML. The experiments demonstrate that differentially private QML can protect user-sensitive information without signficiantly diminishing model accuracy. Although the quantum model is simulated and tested on a classical computer, it demonstrates potential to be efficiently implemented on near-term quantum devices [noisy intermediate-scale quantum (NISQ)]. The approach’s success is illustrated via the classification of spatially classed two-dimensional datasets and a binary MNIST classification. This implementation of privacy-preserving QML will ensure confidentiality and accurate learning on NISQ technology.

a hybrid quantum-classical model that is trained to preserve privacy using differentially private optimization algorithm.This marks the first proof-of-principle demonstration of privacy-preserving QML.The experiments demonstrate that differentially private QML can protect user-sensitive information without diminishing model accuracy.Although the quantum model is simulated and tested on a classical computer, it demonstrates potential to be efficiently implemented on near-term quantum devices (noisy intermediate-scale quantum [NISQ]).The approach's success is illustrated via the classification of spatially classed two-dimensional datasets and a binary MNIST classification.This implementation of privacy-preserving QML will ensure confidentiality and accurate learning on NISQ technology.
name and the confidence levels outputted from the "black box" model.Furthermore, in many applications, a hostile adversary also may have access to the model parameters.In mobile applications, the model usually is stored on the device to reduce communication with a central server [39].Differential privacy (DP) is an optimization framework to address these issues.
DP involves a trade-off of accuracy and power to protect the identity of data.Differentially private QML will allow private and efficient processing of big data.We hypothesize that the benefits of QML will offset the decrease in accuracy arising from DP.This research aims to create a hybrid quantum-classical model based on a variational quantum circuit (VQC) and train it using a differentially private classical optimizer.The classification of two-dimensional (2D) data to two classes is used to test the efficiency of the DP-VQC.As controls in the experiment, we will compare its accuracy to classical neural networks (with and without DP) and a non-private quantum circuit.Two classification tasks are used as benchmarks to compare the efficiencies of private and non-private VQCs to their classical analogs.
The novel work detailed in Section III represents the main contribution of this research, exploring how we develop a novel framework that ensures privacy-preserving QML and employ it in two benchmark examples (as follows): • Demonstrate differentially private training on VQC-based ML models.
Section II introduces the concept of differentially private ML and the required QML background.Section III illustrates the proposed differentially private QML.Section IV describes the experimental settings and performance of the proposed differentially private quantum learning and is followed by additional discussions in Section V. Section VI is the conclusion.

II. BACKGROUND A. Supervised Learning
Supervised learning is an ML paradigm that learns or trains a function that maps the input to output given the input-output pairs [40].That is, given the training dataset {(x i , y i )}, it is expected that after successful training, the learned function f θ is able to output the correct or approximate value y j provided the testing case x j .To make the training possible, we must specify the loss function or cost function L( ŷ, y), which defines how close the output of the ML model ŷ = f θ (x) is to the ground truth y.The learning or training of an ML model generally aims to minimize the loss function.
In classification tasks, the model is trained to output discrete labels or the targets y given the input data x.For example, in computer vision applications, it is common to train ML models to classify images.The most famous example is the MNIST dataset [41].In MNIST, there are around 5000 images of handwritten digits of the numbers 0-9.In this case, the ML model is trained to output the probability distribution P (y i |x).Here, P (y i |x) represents the probability of label y i of each number i ∈ {0 • • • 9} given the input data, which is a image in this scenario.
In classification, the cross-entropy loss is the common choice for the loss function.It can be written in the following formulation: where • M = the number of classes.
• log = the natural log.
• y o,c = the binary indicator (0 or 1) if class label c is the correct classification for observation o.
• ŷo,c = the predicted probability observation o is of class c.
The loss function then is used to optimize the model parameters θ.In the current DL practice, the model parameters are updated via various gradient descent methods [42].The "vanilla" form of gradient descent is: where θ is the model parameter, L is the loss function, and η is the learning rate or the step-size of each updating step.Mini-batch stochastic gradient descent (SGD) simplifies ML by approximating the loss gradient when the dataset is large or when it is impractical to calculate the loss for the whole dataset at once.Suppose the training data include N points, then define a randomly sampled subset of points B. This is the mini-batch.Equation 3approximates the gradient from the whole training set ), y i ) with a loss gradient calculated for a subset of the training set, the mini-batch.
where B is the mini-batch set randomly sampled from the complete set of inputs and associated ground truth labels.This batch gradient is used in the step update rule instead of the total loss gradient θ ← θ − ηg B .The batch gradient is recalculated N/|B| times per epoch, and the model parameters are updated for each gradient batch.
However, this vanilla form does not always work.For example, it may be easily stuck in local optima [42], or it can make the model difficult to train or converge.There are several gradient-descent variants that are successfully applied in DL [42][43][44].Based on previous works [21,30], we use the RMSProp optimizer to optimize our hybrid quantum-classical model.RMSProp [43] is a special kind of gradient-descent method with an adaptive learning rate that updates the parameters θ as: where g t is the gradient at step t and E [g 2 ] t is the weighted moving average of the squared gradient with E[g 2 ] t=0 = g 2 0 .In this paper, the hyperparameters are set for all experiments as follows: learning rate η = 0.05, smoothing constant α = 0.9, and = 10 −8 .

B. Quantum Computing Basics
Because of the power of superposition and entanglement generated by quantum gates, quantum computing can create a huge speedup in certain difficult computational tasks and afford quantum advantages to ML [45,46].A qubit is the basic unit of quantum information processing that can consist of any two state system, i.e., the spin of an electron or polarization of a photon.Such a state will be written as |ψ = α |1 + β |0 , where the probability of measuring |1 and |0 is |α| 2 and |β| 2 , respectively.Because all classical operations can be considered a set of reversible logical operations, analogous quantum operations can be formalized.These operators are unitary and can be thought of as successive rotations, such that the logic operators are equivalent to quantum rotations.The basic components of quantum rotations are the Pauli matrix, With the Pauli matrix, we can define the single-qubit rotation along each of the X, Y , and Y -axis as follows: The general single-qubit rotation can be constructed with two of the single-qubit rotations For example, the quantum NOT gate also is known as the "Pauli-X gate," which corresponds to a π rotation about the X-axis [47].
The true power of quantum computing stems from quantum entanglement, which can be achieved by using two-qubit quantum gates.The controlled-NOT (CNOT) gate, shown in Eq. 9, is a gate commonly used to entangle qubits.It reverses the state of second qubit if the first qubit (control qubit) is in the |1 state.
Its operation on the quantum state can be described in the following circuit diagram: The set of CNOT and single-qubit rotation operators allows for a rich group of quantum algorithms that already have been shown to be faster than their classical counterparts, for example, in factorization problems [48] and database searching [49].The quantum algorithm output is the observation of the final quantum state.On a real quantum computing device, the expectation values can be retrieved through repeated measurements (shots).In simulation, the expectation values 0| can be calculated analytically.For a more detailed review of quantum computing, measurements, and algorithms, refer to [47,50,51].

C. Variational Quantum Circuits
In recent years, quantum computing has become feasible due to many breakthroughs in condensed matter physics and engineering.Companies, such as IBM [52], Google [7], and Dwave [53], are creating NISQ devices [6].However, noise limits the reliability and scalability in which quantum circuits can be used.For example, quantum algorithms requiring large numbers of qubits or circuit depth cannot be faithfully implemented on these NISQ devices.
Because current cloud-based quantum devices are not suitable for the training described in this research, quantum circuit simulators are used [54].
VQCs are a special kind of quantum circuit, equipped with tunable or learnable parameters that are subject to iterative optimization [11,12].Figure 1 presents the basic components of a VQC.VQCs potentially can be robust against device noise as they can absorb the noise effects into their parameters in the optimization process [9,10].Numerous efforts have been made to design quantum algorithms based on VQCs [9,10], including the calculation of chemical ground states [8] and optimization problems [55].
Several theoretical studies have shown that VQCs are more capable than conventional deep neural networks [56][57][58][59] in terms of the number of parameters or convergence speed.

D. Differential Privacy
Many technology companies collect data about the online presence of their users, and these data are shared, sometimes publicly, to use in focused marketing.This can create a breach in privacy because anonymizing data requires more than just erasing the name from each data entry [65].Privacy also can be breached by ML models that use crowdsourced information and data scraped from the Internet.Previous studies have shown that models memorize their training samples, and even models with millions of parameters can

The information in data
General information Private information FIG.2: Information in data under the view of differential privacy.In a DP context, general information is that of the entire population in the data.On the other hand, private information is specific to a particular data entry.
be attacked to output memorized data [37].
Section I detailed the necessity of protecting information through privacy-preserving training algorithms.In other words, anonymizing data requires more than just censoring personally identifiable information (PII) from each data entry [65].The solution requires using DP to curtail privacy leaks.
DP is a powerful framework to restrict the information that adversaries can obtain from attacking a trained ML model, but it is not an all-powerful technique.There are two kinds of information under the perspective of DP: general information and private information.
General information refers to the information that does not specify any particular data entry and can be seen as the general property of the underlying population.On the other hand, private information refers to the information that is specific to any individual data entry (Figure 2).For a concrete example [65], consider a study about smokers.An adversary may still learn information from the trained model, e.g., a differentially private query could show that smoking correlates to lung cancer, yet it is impossible to deduce whether or not a specific person is involved in the study.This is known as general information.It remains possible to deduce that an individual smoker is likely to have lung cancer, but this deduction is not due to her/his presence in the study.DP does protect an individual's private information.
The power of DP is that deductions about an individual cannot be influenced by the fact that the person did or did not participate in the study [65].
DP seeks to create a randomized machine, characterized by the hyperparamemers ε and δ, which gives roughly the same output for two similar datasets.In the context of ML, the output here is the trained model.This means an adversary cannot deduce the dataset from the output even with auxiliary information or infinite computing resources.Figure 3 illustrates the concept of DP by comparing the output between two datasets, where one X opts out of the dataset.Changing the input means the output could be very different, but DP ensures that the outputs only differ by, at most, ε.In other words, DP combats extraction attacks by having the output be just as likely produced from a model with or without a given training point [66].
In is the number of elements of type i ∈ X .Additionally, there is a 1 -norm defined, such that x − y 1 ≤ 1 represents the fact that x and y are neighboring databases, i.e., they differ by up to one record [65].
Rigorously, the definition of DP is [65] a randomized algorithm M with a domain N |X | is (ε, δ)-differentially private for all S ⊆ range(M) and for all x, y ∈ N |X | , such that x − y 1 ≤ where • M = the randomized algorithm.
• N |X | = the set of records and the union of the input and label sets in ML context.
• S = output randomized algorithm; some subset of all possible model configurations or parameters in ML context.
• x = set of records used for model training.
• y = another set of records for model training, neighboring x.
• ε = privacy loss for the randomized algorithm.
• δ = cutoff on DP, the percentage chance that the model does not preserve privacy.
(ε, δ)-DP is a relaxation of ε-DP because there is a chance δ that the privacy is broken.DP gives the worse-case scenario privacy loss, so a smaller epsilon does not necessarily mean the privacy is better.However, the additional noise typically means that accuracy is worse.
An important characteristic in determining the effectiveness of a differentially private algorithm is the privacy loss.Privacy loss is defined for a given observation ξ ∈ range(M), which quantifies the likeness of observing ξ from M(x) versus M(y) [66].
Combining these two equations shows that a (ε, δ)-differentially private algorithm has a privacy budget of ε.

E. Differential Privacy in Machine Learning
For given that d, d are adjacent databases.Then, the Gaussian algorithm is (ε, δ)-differentially private for some noise multiplier σ, such that: There is an infinite number of pairs (ε, δ), which can be defined for a given noise multiplier σ, although usually, as in [67], δ will be defined as a constant.Likewise, for ML, the most important techniques for creating DP are to add Gaussian noise, as well as to clip the loss gradients [39].The gradient clip reduces the effect any single data entry can have on the model training, making membership inference difficult.The hyperparameters associated with these operations are the noise multiplier, σ, and a cutoff for the 2 norm, S [39,67].
After calculating the gradients, if the batch gradient has an 2 norm greater than the cutoff, it is scaled down to have a norm equal to the cutoff.After clipping the gradient, the gradient for the mini-batch has Gaussian noise added with a standard deviation equal to the 2 norm cutoff multiplied by the noise factor, σ.
As in Equation 13, a relationship between ε, δ, σ, andS exists, but its calculation is beyond the scope of this review.More information about the privacy loss calculator can be found in A 1.This modification to the optimizer algorithm can be applied to any ML algorithm (SGD, Adam, RMSprop, etc.).The DP-SGD algorithm is based on the techniques from [39,68].
Details specific to the software package used in this study to implement DP, PyVacy, are available in A 2.

A. Quantum Encoding
A quantum circuit operates on the quantum state.To make QML useful, the first step is to encode the classical data into a quantum state.

Amplitude Encoding
Amplitude encoding is a technique to encode the classical vector (α The advantage of using this encoding method is that it is possible to significantly reduce the number of qubits and potentially the number of parameters of the quantum circuit.An N -dimensional input vector would require only log 2 N qubits to encode.Refer to [69,70] for details regarding this encoding procedure.

Variational Encoding
In variational encoding, the input values are used as the quantum rotation angles.A single-qubit gate with rotation along the j-axis by angle α is given by: where I is the identity matrix and σ j is the Pauli matrix with j = x, y, z.In this work, given a vector input x N with N dimensions, we rotate each qubit by R i (x), i ∈ [0, N ): Each single-qubit state is initialized by rotations in the y-axis then in the z-axis.This allows our inputs, x ∈ X, to be encoded into a quantum state of N qubits.Figure 1 depicts this particular encoding scheme.For a detailed review of different quantum encoding schemes, refer to [69].

B. Quantum Gradients
Modern DL practices heavily depend on gradient-based optimization methods.Classically, the gradients of DL models are calculated using backpropagation methods [71].In QML, the corresponding method is the parameter-shift rule, which can calculate the analytical gradients of quantum models [11,54].
For parameter-shift rule, knowledge of certain observables are given.A VQC's output can be modeled as a function of its parameters f (x; θ) with parameters θ.Then, in most cases, the partial derivative of the VQC, ∇ θ f (x; θ), can be evaluated with the same quantum circuit only with the parameters shifted [11].We illustrate the procedure as follows: consider a quantum circuit with a parameter θ, and the output can be modeled as the expectation of some observable, e.g., B for some prepared state |ψ = U (θ)U 0 (x) |0 or . This is simplified by considering the first unitary operation as preparing the state |x and the other unitary operators as a linear transformation of the observable, U † (θ) BU (θ) = M θ ( B).
It can be shown that a finite parameter, s, exists, such that the Equation 17b stands [11].
This implies that the quantum circuit can be shifted to allow for a calculation of the quantum gradient with the same circuit.
Now that DP and our VQC architecture are introduced, we unveil our differentially private optimization algorithm-the first of its kind to ensure privacy-preserving QML.Our differentially private optimization framework starts by calculating the quantum gradient using the parameter shift rule.Next, we apply Gaussian noise and clipping mechanisms to this gradient, ∇ θ f (x; θ).The differentially private gradient, ∇ DP θ f (x; θ), now is used in the parameter update step instead of the non-private gradient.This parameter update rule can be SGD, adaptive momentum, or RMSprop.In this study, we solely use RMSprop to update parameters.
where M θ ( B) is defined in Equation 17a and S, σ are the hyperparameters implicitly defining the level of privacy (ε, δ).This novel framework seamlessly incorporates privacy-preserving algorithms into the training of a VQC, ensuring (ε, δ)-differential privacy.In this work, we choose the standard classification task to demonstrate the proof-of-concept result.However, the proposed framework is rather generic and can be applied to any hybrid quantum-classical ML scenarios.

IV. EXPERIMENTS AND RESULTS
To demonstrate the hypothesized quantum advantage, this study compares differentially private VQCs (DP-VQCs) to non-private VQCs, as well as private and non-private neural networks.We also illustrate the efficacy of our differentially private QML framework.Two different types of classifications will be investigated as benchmarks: 1) labeling points in a 2D plane and 2) a binary classification from an MNIST dataset, differentiating between the '0' and '1' digits.The 2D datasets are standard benchmarks from scikit-learn [72] that are useful in QML because the inputs are low dimensional, thus easy to simulate on classical computers [46].Meanwhile, the MNIST dataset is used to study the performance of the proposed model with larger dimensional inputs.
We implement the model with several open-source software packages.The high-level quantum algorithms are implemented with PennyLane [73].The quantum simulation backend is Qualacs [74], which is a high-performance choice when the number of qubits is large.
The hybrid quantum-classical model is built with the PyTorch interface [75].For differentially private optimization, we employ the PyVacy package [76].
The experiments are characterized by the hyperparameters of the neural network training process: the optimizer, number of epochs, number of training samples, learning rate, batch size, momentum, and weight penalty.When differentially private optimizers are used, the additional hyperparameters needed are the 2 norm clip, noise multiplier, number of iterations, and ε.After preliminary experiments, the RMSprop optimizer was selected for use in all of the experiments presented in this paper.Most of the model's hyperparameters are the same for both the MNIST and scikit 2D set classification tasks.The learning rate is set to 0.05, while the portion of training and testing is 60% and 40%, respectively.In addition, the batch size used is 32 with a momentum value of 0.5, but no weight regularization is used.
An ε is calculated from the DP hyperparameters, S, σ.Because all tasks are classifications, cross-entropy is used as the loss function for all training.According to [65,66], the probability of breaking ε-DP should be δ ∼ O(1/n) for n samples.A δ larger than 1/n always will be able to satisfy DP simply by releasing nδ complete records.Therefore, ε is determined by hyperparameter choice, and δ is set to be 10 −5 for the entire study.
As part of the investigation into differentially private QML, classical and quantum classifiers are compared.For both the MNIST and 2D classifiers, the quantum circuit has two modules that contain the parameters for the unitary transforms comprising the two quantum subcircuits.
FIG. 6: First quantum circuit block for 2D classification.The single-qubit gates R y (arctan(x i )) and R z (arctan(x 2 i )) represent rotations along the y-axis and z-axis by the given angle arctan(x i ) and arctan(x 2 i ), respectively.The state is prepared with variational encoding.
The dashed box denotes one layer of a quantum subcircuit that is repeated twice.At the end of this circuit, two qubits are measured, and the Z expectation values are calculated.The output from this circuit is a 2D vector.

A. Two-dimensional Mini-benchmark Datasets
Three datasets of 2D classification from scikit-learn are considered.The generated datasets are divided into training, validating, and testing sets with 60%, 20%, and 20% proportions, respectively.Different datasets are used because the decision boundary between the two classes is increasingly nonlinear and more difficult to classify.Thus, they make good benchmarks for DP training.The leftmost plots of Figure 11 display the input sets, which are named "blobs," "moons," and "circles" based on the shapes they form.The more transparent points are those not part of the training, but instead used for testing the model's FIG. 7: Second quantum circuit block for 2D classification.The parameters labeled R y (arctan(x i )) and R y (arctan(x 2 i )) are for state preparation.x 1 and x 2 are the outputs of the first circuit block.The dashed box denotes one block of a quantum circuit that is repeated twice.
At the end of this circuit, two qubits are measured, and the Z expectation values are calculated.
The output from this circuit a 2D vector.In the context of cross-entropy loss, the outputs will be interpreted as the probability that the 2D point belongs to class one or two, respectively. accuracy.
As a baseline for the study, Figure 4 illustrates the classical neural network written with two classical layers.The classical classifier uses tanh as the activation function after each layer and softmax at the end of the calculation.The neural network has Xavier weight initialization.The linear layers sizes are such that the number of total trainable parameters in the quantum classifier is 66%, while the number of trainable parameters in the classical classifier is 24 for the VQC and 36 for the neural network.
The VQC to classify the 2D test set consists of two successive quantum subcircuits (Figures 6 and 7).Each quantum subcircuit has two wires, while each unitary transform can be thought of as rotations on each qubit.Thus, each subcircuit is parameterized by 12 Euler angles or parameters because there are two layers of transforms per subcircuit.The angles are initialized on a normal distribution with mean 0, standard deviation 1.0, and then scaled by 0.01.
Table II summarizes the key results from the 2D classification experiments.Three different levels of privacy have been investigated non-private, (1.628, 10 −5 )-DP, and (0.681, 10 −5 ))-DP on three different input sets blobs, moons, and circles.For most pairs of model architecture and input set, the differentially private result has a lower accuracy than the nonprivate one.The one exception is that the VQC classifies the moons more accurately with (1.628, 10 −5 )-DP than without privacy.
As detailed in Table II, the classical and quantum classifiers are almost equally successful for the blobs and moons sets.On the other hand, Figures 10 and 9 demonstrate that DP-VQC affords superior performance for the circles set as the quantum classifier's accuracy is between 13% and 17% higher than the DP-neural network.The last two columns of Figure11 depict the decision boundary and accuracy of privacy-preserving (0.681, 10 −5 )-differentially private classical neural networks and VQCs.The comparison of Figure8 and Figure9 demonstrates that the neural network's efficiency under DP training differs for different datasets.For the moons input set, the accuracy degradation from DP is somewhat significant at 10%.Yet with the circles set, the accuracy decreases by 40%, and the final loss is double that of the non-private loss.Figures 9 and 10 illustrate that the VQC converges faster than the classical classifier, implying a potential quantum advantage over a classical neural network.

B. MNIST Binary Classification
The MNIST classification task is prepared similarly to the 2D classification problem.
Because of the computational complexity of simulating large quantum systems, the problem is reduced to a binary classification of distinguishing the handwritten digits of '0' and '1.' The digits are grayscale images with a total of 784 pixels.The variational quantum classifier   uses amplitude loading (described in sectionIII A 1) to compress the number of inputs to fit within 10 qubits.Therefore, the 784 inputs are padded with additional zeros to make the inputs 1024 dimensional.Next, amplitude loading transforms the 1024 pixels into a 10-qubit quantum state for operating the variational quantum classifier.
The neural network uses the same padded 1024 pixels as an input.The hidden layer has only one node, and the output layer is two nodes.Hence, the classical model has 1029 parameters divided between the two weight matrices and biases.The design for this classical benchmark aims to limit the number of parameters for fair comparison to the quantum model.
The quantum classifier has two quantum subcircuits.The first has 10 inputs, eight layers of unitary transforms, and four outputs (Figure 12).Each qubit has a tunable unitary transform per layer, so there are 8 × 10 × 3 = 240 parameters in the first subcircuit.The second subcircuit has four inputs, two outputs, and four layers (Figure 13), so it has 4 × 4 × 3 = 48 tunable parameters associated with the rotations of quantum bits.Consequently, the VQC has 288 parameters.Importantly, this represents roughly only a quarter (27.99% exactly) of the number of parameters associated with the analogous classical neural network used for the same classification task.The MNIST results are summarized in Table III.
Multiple levels of privacy are created by iterating the noise multiplier from 1.0 to 5.0.The privacy budget for such noise is between 1.73 to 0.07, respectively.Figure 14 and Table III exemplify that the accuracies of both neural networks and VQCs decrease as ε decreases.
This emphasizes the trade-off between utility and privacy in differentially private algorithms.This study has presented the implementation and successful roof-of-concept application of DP to QML models.The application can be extended to a myriad of applications that require privacy-preserving learning and the power advantage stemming from QML.One potential application is facial recognition.These models must train on thousands faces, whose identities are not protected.Therefore, this area would intrinsically benefit from DP, and QML could create even more accurate predictions [17].QCNN would be another logical application of a private QML algorithm as QCNNs already are being investigated with the MNIST and other benchmarks [61,62,77].Our results show that the private VQC distinguishes between the '0' and '1' digits with an accuracy exceeding 90%.As such, it  is expected that a privacy-preserving framework would benefit these application scenarios, including QCNN.
With recent QML developments impacting a spectrum of applications, such as speech recognition [34], quantum recurrent neural network (QRNN) and quantum LSTM for se-quential learning [21,33,63], and even certain emerging applications in medical imaging [78], we expect the framework described by this work would be of benefit to these new scenarios as well.
An important point to consider is the limitation of extending these results to real-world quantum devices.High-dimensional input, such as MNIST, takes an extremely long time to run on a cloud-based quantum computer.However, it is possible to run.One limitation of this study is that the VQCs are simulated with noise-free quantum computers.A future study could investigate the results of running privacy-preserving quantum optimization on a noisy simulator or cloud-based quantum computer.

B. Success of Differentially Private QML
This research is seeking to show that a differentially private variational quantum classifier can be trained to identify the decision boundary between two classes.Figure 11 shows that the given hyperparameters achieve nearly perfect classification.After 30 epochs, both the quantum and classical classifiers achieve accuracies greater than 95% for data organized into blobs and concentric circles.On the other hand, the classical network achieves 99% accuracy for the moons classification, while the moons dataset proved to be the most difficult input for the quantum classifier to classify, achieving merely 86% accuracy.It may be conjectured that the VQC had difficulties in learning the highly convex decision boundary necessary for the moons input set.In spite of that, the VQC generally trains just as well as a classical neural network with only 66% of the total parameters.
While DP training usually causes models to fail to capture the long tail of a data distribution, the DP-QML training is just as successful as the non-private algorithm for the blobs and moons datasets, where only a modest accuracy penalty occurs.While the accuracy of the private training for the circles classification is much lower than its non-private counterpart, the DP-VQC still is much more successful at the task than the classical differentially private neural network.Our study demonstrates that a quantum advantage can offset the usual compromise between privacy and accuracy as seen in other DP applications [39,67].
The MNIST binary classification problem creates an even more compelling case for the QML algorithm being advantageous compared to a classical ML algorithm.FigureIV demonstrates that a privacy-preserving variational quantum classifier can learn to distinguish be-tween the handwritten digits '0' and '1' from the MNIST dataset to an accuracy of nearly 100%.The same figure shows that a classical neural network also can accomplish the task.
The quantum advantage arises because the quantum network has only 288 parameters compared to the 1029 parameters characterizing the classical neural network.Furthermore, the differentially private VQC attains better accuracy than the classical neural network for εs between 0.4 and 1.4 (shown in Table III).This range of ε is sufficient, where differentially private techniques attain good privacy as defined in [39].
This work mainly focuses on the numerical demonstration of potential quantum advantages, leaving the theoretical investigation for future work.gradient are calculated.Then, the micro-batch step function is called, clipping the gradients.
After all the micro-batches, the step function is called to add Gaussian noise and update the parameters according to these altered gradients [76].
The mini-batch step takes the parameter gradients for each micro-batch and calculates the effective mini-batch gradients.First, the total norm of the parameters gradients, N , is calculated.Then, the scaled micro-batch gradients are added to a new parameter, called the accumulated gradient.The gradients are scaled by a coefficient that scales down the gradients to have a total norm equal to the norm cutoff, S, or, if necessary, c = min( S N +1e−6 , 1).The accumulated gradients add the micro-batch gradients together to create a new effective gradient for the mini-batch.This effective gradient is scaled so that the loss gradients, calculated from a given microbatch, do not have too large norms.This creates DP by limiting the effect each training point has on the batch gradient [39].In the overridden step method, the accumulated gradients have Gaussian noise added.The Gaussian noise is proportional to the norm cutoff and noise multiplier, S and z, respectively.The accumulated gradient then is scaled by the ratio of micro-batch to mini-batch sizes, and the micro-batch size usually is set to be 1.This has the effect of using the accumulated gradients in place of the original parameter gradients in the step update rule.

FIG. 1 :
FIG. 1: Variational quantum circuit component.The single-qubit gates R y (arctan(x i )) and R z (arctan(x 2 i )) represent rotations along the y-and z-axis by the given angle arctan(x i ) and arctan(x 2 i ), respectively.Arctan is used because the input values are not in the interval of [−1, 1].The CNOT gates are used to entangle quantum states from each qubit and R(α, β, γ) represents the general single qubit unitary gate with three parameters.The parameters labeled R y (arctan(x i )) and R y (arctan(x 2 i )) are for state preparation and are not subject to iterative optimization.Parameters labeled α i , β i and γ i are optimized iteratively.The dashed box denotes one layer of a quantum subcircuit.The dial to the far right represents that the circuit has one output that is the σ z measurement of the first qubit.
DP, we are interested in mechanisms M, which are randomized algorithms.Suppose M has a domain A and a discrete range B. A randomized algorithm maps its domain A to the probability space of B. Given an input a ∈ A, the algorithm M outputs M (a) = b with probability (M (a)) b for each b ∈ B. In general, a point in the domain may be a database (i.e., collection of records).A collection of records can be represented by a histogram, so the domain is the set of all possible histograms N |X | .A x ∈ N |X | has |X | elements, where x i ML, we can interpret the randomized algorithm M : A → B as a training algorithm with a training set x ∈ A, which produces a model b ∈ B [39, 67].The definition of DP implies that two training sets, which only differ by the omission of a record, should be equally likely to output a given model, i.e., the set of parameters completely describing the model.The most basic technique to ensure DP is the Gaussian Mechanism as defined in [39, 65, 68].Every deterministic function f (d) has a defined sensitivity S f = max(|f (d) − f (d )|)

FIG. 4 :
FIG.4: Architecture for the classical neural network used as a control.The left layer is the 2D input, while the right layer is the output.The output is a vector of the probability of being in each class given the input.

FIG. 5 :
FIG.5: Differential Privacy in Quantum Machine Learning.In the proposed framework, the outputs from the quantum circuit are processed on a classical computer.The gradients of the quantum function ∇ θ f (x; θ) and the differentially private gradients ∇ DP θ f (x; θ) are calculated.The quantum circuit parameters are updated according to the differentially private gradients and fed back to the quantum computer.

FIG. 8 :
FIG. 8: Results for "moons" classical classifier with 200 samples, a learning rate of 0.05, and RMSprop optimizer with and without DP.

FIG. 10 :
FIG. 10: Results for "circles" variational quantum classifier with 200 samples, a learning rate of 0.05, and RMSprop optimizer with and without DP.

60 FIG. 11 :
FIG. 11: Results from the 2D ML experiments.The first column shows three input sets.The subsequent columns show different models tasked with classifying and learning the decision boundary.The array of plots illustrates the decision boundaries formed by the different models.The solid points are those used in training, and the transparent ones are part of the testing set.Total accuracy after 30 epochs is displayed on the lower right of each plot.

β 10 , γ 10 )FIG. 12 :
FIG. 12: First quantum circuit block for MNIST classification.The first VQC block encodes the MNIST image.The 1024-dimensional vector is encoded via amplitude encoding into a log(1024), i.e., 10-qubit state.U (x) denotes the quantum algorithm for amplitude encoding as explained in [69, 70].α i , β i , and γ i are the parameters to optimize.The dashed box denotes one block of a quantum circuit that is repeated eight times.Thus, there are 30 × 8 = 240 parameters to the circuit block.The dial to the far right represents that the circuit has four outputs.The expectation of σ z is measured on four qubits.The output becomes the input for the next circuit block.
VI. CONCLUSIONOverall, the QML algorithm attains the same accuracy in the MNIST classification task as the classical ML algorithm with only 28% of the number of parameters, making it more efficient than an ML algorithm.In this work, a QML algorithm in a differentially private framework is developed, and the quantum advantage is maintained when the ML algorithm is improved to preserve privacy.This research also shows that VQCs maintain their quantum advantage under DP in the classification of the handwritten digits '0' and '1' and 2D nonlinear classifications with careful selection of hyperparameters.This novel framework combines differentially private optimization with QML.Including DP in the algorithm ensures privacy-preserving learning.We also demonstrate a capacity for high-fidelity privacy and high accuracy in variational quantum classifiers with two different benchmarks.Notably, we show the superior performance in terms of convergence of differentially private QML over classical DP-ML.These results indicate the potential benefits quantum computing will bring to privacy-preserving data analytics.forceDevelopment for Teachers and Scientists (WDTS) under the Science Undergraduate Laboratory Internships Program (SULI).
|Ψ • |0 where |Ψ is a single-qubit state.Concretely, if the |Ψ is in the state α |0 + β |1 , which means the system is in |Ψ ⊗ |0 , then under the CNOT operation, the state will

TABLE I :
Hyperparameters chosen for non-private and differentially private classifiers.The neural networks and VQCs use the same hyperparameters for both classification tasks.The learning rate (LR) is the same across all experiments.Different noise multipliers are used to compare differentially private networks.The "varies" noise parameter means that multiple values of noise have been used in the DP-neural network and DP-VQC experiments.ε also varies among DP experiments as it directly depends on the noise multiplier.The last four hyperparameters are applicable only with differentially private optimizers.
Results for "circles" classical classifier with 200 samples, a learning rate of 0.05, and RMSprop optimizer with and without DP.

TABLE II :
Accuracies of differentially private neural networks and variational quantum classifiers after 30 epochs for 2D input sets: "blobs," "moons," and "circles."The quantum classifier can achieve DP with more accuracy for the circles set.For the blobs and moons, the quantum and classical classifier has nearly the same accuracy under a given level of DP.

TABLE III :
β 4 , γ 4 ) FIG.13: Second quantum circuit block for MNIST classification.The second subcircuit uses variational encoding to encode the output from the first block to be the input for this subcircuit.α i , β i , and γ i are the parameters to optimize.The dashed box denotes one block of a quantum circuit that is repeated four times.There are 12 × 4 = 48 parameters to the circuit block.The dial to the far right represents that the circuit has four outputs, and the expectation of σ z is measured on two qubits.In the context of cross-entropy, the outputs will be interpreted as the probability that the image is of a '0' or a '1,' respectively.Results from binary MNIST classification.Accuracies of differentially private neural networks and variational quantum classifiers after 30 epochs.The private quantum classifier is more accurate and successful for εs between 0.41 and 1.34.

TABLE IV :
Results