Speeding up quantum perceptron via shortcuts to adiabaticity

The quantum perceptron is a fundamental building block for quantum machine learning. This is a multidisciplinary field that incorporates abilities of quantum computing, such as state superposition and entanglement, to classical machine learning schemes. Motivated by the techniques of shortcuts to adiabaticity, we propose a speed-up quantum perceptron where a control field on the perceptron is inversely engineered leading to a rapid nonlinear response with a sigmoid activation function. This results in faster overall perceptron performance compared to quasi-adiabatic protocols, as well as in enhanced robustness against imperfections in the controls.

In the era of information expansion, the merge of quantum information and artificial intelligence will have a transformative impact in science, technology, and our societies [1][2][3] . In particular, classical networks of artificial neurons (or nodes) represent a successful framework for machine learning strategies, with the perceptron being the simplest example of a node 4 . The perceptron is based on the McCulloch-Pitts neuron 5 , and it was originally proposed by Rosenblatt in 1957 to create the first trained networks 6 . Nowadays, extensions of these original ideas such as multilayer perceptrons in networks with interlayer connectivity are exploited to deal with demanding computational tasks.
The emergence of quantum computing and machine learning has boosted the development of both fields [7][8][9][10][11][12][13] , giving rise to the field of quantum machine learning. In this context, quantum neural networks (QNNs) have attracted growing interest 14,15 since the seminal idea proposed by Kak 16 . In particular, the entering of classical machine learning techniques into the quantum domain has the potential to accelerate the performance of different applications such as classification and pattern recognition 2,[17][18][19][20][21][22][23] . In addition, nowadays the excellent degree of quantum control over the registers in modern quantum platforms [24][25][26][27] allows the performance of quantum operations with high fidelity, which further feeds the idea of having reliable QNNs. However, the linear and unitary framework of quantum mechanics raises a serious dilemma, since neural networks present nonlinear and dissipative behaviours which are hard to reproduce at the quantum level. To address this challenge, many efforts have been attempted by exploiting quantum measurements 16,28 , the quadratic kinetic term to generate nonlinear behaviours 29 , dissipative 16 or repeat-until-success 30 quantum gates, and reversible circuits 31 . Among them, gate-based QNNs 32 with training optimization procedures 33 are feasible to implement by a set of unitary operations. Furthermore, gate-based QNNs can behave as variational quantum circuits that encode highly nonlinear transformations while remaining unitary 20 . Also, a quantum algorithm implementing the quantum version of a binary-valued perceptron was introduced in Ref. 18 , showing an exponential advantage in resources storage. Remarkably, a universal quantum perceptron has been proposed as an efficient approximator in Ref. 34 , where the quantum perceptron is encoded in an Ising model with a sigmoid activation function. In particular, the sigmoid nonlinear response is parametrized by the potential exerted by other neurons, and driven by adiabatic techniques.
In this article, motivated by the nonadiabatic control provided by shortcuts to adiabaticity (STA) techniques 35,36 , we design fast sigmoidal responses with the aid of the invariant-based inverse engineering (IE) [37][38][39] . The IE method is based on dynamical modes of Lewis-Riesenfeld invariant instead of one instantaneous eigenstates of the original reference Hamiltonian 40,41 . As IE directly imposes boundary conditions in the wave function evolution, the nonlinear activation function of the quantum perceptron encoded in the probability of the excited state can be achieved in a fast and robust way. In particular, an external control field on the perceptron is designed such that it leads to a fast nonlinear activation function with a wide tolerance window to the variation of the input potential induced by neurons in the previous layer. We demonstrate that our method produces

Results
Quantum perceptron. The capacity of feed-forward neural networks to classify complex data relies in the "universal approximation theorem" proved by Cybenko 43 , claiming that any continuous function can be written as a linear combination of sigmoid functions. A QNN is also demonstrated as a universal approximator of continuous functions 34 . In a classical network, a perceptron (or neuron) generates the signal s j = f (x j ) as a sigmoidal response to the weighted sum of the signals (or outputs) from the neurons in the previous layer. More specifically, x j = k i=1 w ji s i − b j with the neuron interconnectivities w ji , the bias b j , and s i being the output of the ith neuron in the previous layer. In analogy with classical neurons, a quantum perceptron can be constructed as a qubit that encodes the nonlinear response to an input potential in the excitation probability, see Fig. 1. One possibility for the latter is the following gate 34 : where, in close similarity with the classical case, we have x j = k i=1 w jiσ z i − b j , where σ z i is the z Pauli matrix of the ith neuron (qubit), w ji is interaction between the perceptron j and the ith neuron in the previous layer, b j is the bias of the perceptron. The transformation in Eq. (1) can be engineered by evolving adiabatically the qubit with the Ising Hamiltonian ( = 1) where the jth qubit (encoding the quantum perceptron) is controlled by an external field �(t) , leading to a tunable energy gap in the dressed-state qubit basis |±� , with σ x j |±� = ±|±� . When this perceptron is integrated in a feed-forward neural network, the potential depends on the neurons in earlier layers, as the perceptron interacts with other neurons in the previous layer (labeled by i = 1, . . . , k ) via the x j potential, see Fig. 1. Therefore, the network is encoded in a Hilbert space via the external potential exerted by other neurons. The Ising Hamiltonian in Eq. (2) has the reduced eigenstate, where x j now represents the lowest eigenvalue of the operator x j , while f(x) corresponds to a sigmoid excitation probability Figure 1. Schematic configuration of a quantum perceptron. When it is integrated in a feed-forward neural network, the potential depends on neurons in earlier layers, e.g., where the activation function of the quantum perceptron is the probability of the excited state P j (x j /� f ) at the final time t = t f in the form of sigmoid-shape, shown in the inset. www.nature.com/scientificreports/ In order to generate the state on the right side of Eq. (1), we propose the following strategy: First, a Hadamard gate is applied to drive the state from |0� to |+� = (|0� + |1�)/ √ 2 . Secondly, by appropriately tuning �(t) according to inverse engineering (IE) techniques (to be explained later), the state |�(0)� = |+� evolves to |�(t f )� = |�(x j /� f )� (up to some phase factor that can be eventually canceled by a phase gate), along with one eigenstate of the Lewis-Riesenfeld invariant of Ĥ , with |�(x j /� f )� being the instantaneous eigenstate of Ĥ (t = t f ; � f ) , and � f ≡ �(t f ) . It is noteworthy to mention that, unlike the fast quasi-adiabatic passage (FAQUAD) approach 34 , our method based on IE does not need to achieve the initial condition �(0) ≫ |x j | , as it is not required that the initial state meets one eigenstate of Ĥ (0) . The latter results in a smooth control field �(t) which is easy to be used in experiments.
Another possibility to achieve |�(t f )� from |�(0)� is by an adiabatic driving in a Landau-Zener scheme. However, as it is discussed in Ref. 34 , this spends long time and may be unfeasible depending on the coherence time of the physical setup that implements the Hamiltonian in Eq. (2). Accelerating quantum perceptron by IE. We adopt the IE method to achieve the |�(0)� → |�(x j /� f )� state transfer with shorter time than FAQUAD 44 . The control field �(t) is then engineered to guarantee that at the final evolution time t = t f the qubit excitation probability P j (x j /� f ) corresponds to a sigmoid-like response, i.e. to a mono-valuate f function satisfying lim Since the universality of neural networks does not rely on the specific shape of the sigmoid function 43,45 , e.g. Eq. (4), we quantify the performance of the control field Meanwhile, in all the numerical results, the activation function is found to be wellbehaved, i.e., the function is monotonic and with a sigmoid-like behaviour, lim As we will see later, our IE technique also provides with robustness with respect to timing errors. Now we show the procedure to find the control �(t) . To this end, we start with the parameterisation of the dynamical state with the two unknown polar and azimuthal angles, θ ≡ θ(t) and β ≡ β(t) , on the Bloch sphere. Having the state in Eq. (5) at hand, the corresponding orthogonal state |� ⊥ (t)� gets completely determined and the Lewis-Riesenfeld invariant can be thus constructed with constant eigenvalues 37,38 . Substituting one of the states ( |�(t)� or |� ⊥ (t)� ) into the time-dependent Schrödinger equation driven by the Hamiltonian in Eq. (2), we obtain the following coupled differential equations (for more details see Methods.) Setting the wavefunction |�(0)� = |+� and |�(t f )� = |�(x j /� f )� at the initial and final times leads to the boundary conditions with the introduced κ parameter being infinitely large which results in |�(x j /κ)� = |+� . Also, it is important to remark that κ does not need to equal the value of our control �(t) at t = 0 , as |�(x j /κ)� is not necessarily the eigenstate of Ĥ [t = 0; �(0)] . In addition, from Eq. (6) one can find the following conditions for the first derivatives of θ at the boundaries We can interpolate θ by choosing a simple polynomial function θ = N i=0 a i t i and a trigonometric fuction θ = a 0 + a 1 t + N i=2 a i sin[(i − 1)πt/t f ] with less coefficients required for matching the same boundary conditions 46 . The appropriate adoptions on the coefficients can make the solution approach the one gained from optimal control theory 47 . We present the comparison of the performance of activation function by using IE with these two ansatzes and exponential functions inspired by regularized optimal solutions in Supplementary Information. We stress that, unlike the method in Ref. 38,48 , in our case θ and β are correlated. We impose β(t f ) = π/2 and β(0) = π − ǫ (note that we will allow a certain deviation by introducing the ǫ parameter, see later). Once we construct θ , the function β can be obtained by solving Eq. (7) with the boundary condition β(t f ) = π/2 . After the functions θ and β are obtained, the control field �(t) is deduced using Eq. (6).
The solution to β from Eq. (7) depends on x j leading to a set of � ≡ �(t, x j ) . However, in order to make the control independent of the input potential, we set �(t) = �(t, x j = y) where the value of y is chosen such that it minimizes the C distance for different x j in a certain interval (see next section).

IE performance.
As the state evolves from |�(0)� = |+� , the κ parameter should be a large number compared to the input potential x j . We numerically study situations where κ = 2000 and explored the range x j =θ cot θ cot β −β. www.nature.com/scientificreports/ , with x max = 12 . Note that, we consider the situation where x max = 12 , although our results are not limited to the specific number. We use dimensionless units, by setting the unit of time t 0 such that the control field �(t) is given in terms of 1/t 0 . In addition, we consider an unbiased perceptron with b j = 0. Not limited to a fixed large number of κ , our method shows the flexibility and the feasibility of the control field. For a case in which we impose f = 1 and solve Eq. (7) with a fixed value for x j /� f = y/� f = 12 , we find θ(0) = 1.576 ≃ π/2 . Figure 2a indicates the obtained solutions for θ and β for this case in which we have also selected the operation time t f = 1 . We find that the boundary condition for β(0) is also satisfied with a tiny error of ǫ = 2 × 10 −5 . In this specific case, we find that the designed control �(t) at t = 0 is �(0) = 1999.6 ≈ κ when κ = 2000 , the initial state corresponds to the eigenstate state of the Hamiltonian. Also, we observed that β(0) tends to π when t f gets larger. In Fig. 2b, the control field �(t) obtained with our method is illustrated. This �(t) leads to an excitation probability such that it arrives at P j (x max ) = 0.998 . Using the same control field �(t) , we find that the probability of the state |1� for other input neural potentials x j /� f ∈ [−x max , x max ] is in the form of a sigmoid-like response ranging from 0 to 1 during the interval, as shown in the inset of Fig. 2b. This proves the successful construction of a sigmoid-shape transfer function, which is a crucial factor for a quantum perceptron. The fields calculated from κ = 1000 , κ = 500 lead to the same sigmoid activation function which, as shown in the inset of Fig. 2b, cannot be distinguished to the one derived from κ = 2000.
Our IE method provides a wider range of y/� f than FAQUAD to construct sigmoid transfer functions. In Fig. 3a the value of the distance C obtained with the IE method, as a function of y/� f for various operation times t f , is shown. It can be observed that a low value for C appears with large values for |y| and t f . We have checked (also for t f = 1 ) the appearance of nonlinear perceptron responses that connect 0 and 1 with a sigmoid shape. In particular, these lead to C < 10 −2 in the range y/� f ∈ [5,12] with control fields �(t) for t f = 1 similar to the one in Fig. 2b. In contrast, C goes to almost 2 at y/� f = −x max by FAQUAD techniques 44 , in which only for long t f and in the regime y/� f → x max the transfer function can be produced, see Fig. 3b.
The target state |�(t f )� = |�(x j /� f )� depends on the value of the driving field at the final time, see Eq. (3). In general we observe that, with our IE method, a larger value of the control field at t = t f (i.e. f ) offers higher fidelity. As an example of the latter, in Fig. 4 we show the value of C as a function of f for t f = 0.2 with the  Quasi-optimal-time solution. As the activation function P(x j /� f ) connects 0 and 1 at −x max and x max , we set C < 0.01 as the criteria of successful construction of a quantum perceptron. In Fig. 5a, we illustrate the dependence of C value on t f by using the polynomial ansatz θ = N i=0 a i t i with N = 3 and N = 5 of IE as well as FAQUAD 34 . When N = 3 , the smallest t f , such that C < 0.01 is satisfied, is 0.2, while employing techniques based on FAQUAD, this is at t f = 0.3 . The further reduction of the smallest t f , such that C < 0.01 is satisfied, can be improved since IE method allows to approach the quasi-optimal-time solution by introducing more degrees of freedom in the ansatz of θ 47 , leading to faster quantum perceptrons. With N = 5 (i.e. a solution with two additional parameters, namely a 4 and a 5 ), see Fig. 5a (dotted-black curve) we get a speed up of 2 with respect to FAQUAD method, leading to the minimal operation time t min f = 0.15 . The values of the transfer function at −x max and x max and C value with the application of IE strategies in polynomial, trigonometric and exponential functions as well as FAQUAD can be seen in Supplementary Information, showing that high-order polynomial ansatz can give a quasi-optimal-time solution.
Moreover, we find that the IE method is robust with respect to timing errors, i.e. variations on the operation time t f . More specifically, once the minimal value of C is reached for solid-blue in Fig. 5a, C does not show any appreciable oscillation for t > t min f . Conversely, the FAQUAD driving leads to the dashed-red curve in Fig. 5a that shows an oscillatory behavior of C, indicating that only at some specific t f the sigmoid transfer function can be constructed.
Remarkably, for short times, e.g. t f = 0.15 , the transfer functions and driving fields are completely different for IE and FAQUAD protocols. In the inset of Fig. 5a,b, we give the detailed demonstration of transfer functions and driving fields designed from IE. On the one hand, FAQUAD protocol cannot produce the sigmoid function, by connecting from 0 to 1 at the edges, see the inset of Fig. 5a dashed-red curve. On the other hand, we find that the case of IE with the polynomial ansatz of N = 3 fails to connect the state |0� presenting P(−x max ) = 0.2 (solid-blue curve). However, We can overcome this limitation by increasing the order of the polynomial ansatz to N = 5 . Here, we compare the activation functions achieved by different strategies at the same value of y/� f = 12 . It is worth mentioning that by increasing the value of x max which means more energy is supplied to the system, we can recover a more stretched sigmoid with the FAQUAD protocol or IE with the polynomial ansatz of N = 3 .  www.nature.com/scientificreports/ However, in this work, we find the external driving �(t) by which the perceptron can have a sigmoidal response in a fixed Hamiltonian configuration with the range [−x max , x max ].
In addition, the derived controls �(t) from IE methods are smooth and present values close to zero at t = 0 , see Fig. 5b. Compared to the case of t f = 1 , shorter operation time leads to larger ǫ so that �(0) is farther away from κ . This is in contrast with the control �(t) derived from FAQUAD techniques that demands an abrupt change from �(0) = 2000 to �(t f ) = 1 , see Fig. 2b. This demonstrates the appropriateness of our IE derived controls to be implemented experimentally. In this regard, in the next section we give estimations based on state of the art experimental parameters in NV centers in diamond that demonstrates the suitability of an implementation of our method in such quantum platform.

Discussions
We have demonstrated that the enhanced performance of our method using IE techniques leads to sigmoid activation functions within a minimal operation time of t min f = 0.15 t 0 . If, for instance, one selects t 0 = 500 ns, the maximum value for the control �(t) amounts to | max | ≈ 50 MHz for the kind of solutions presented in Fig. 5b (see horizontal axis limits in that figure). This permits the application of our controls in modern quantum platforms such as NV centers in diamond that present coherence times much longer than 0.15 t 0 = 0.15 × 500 = 75 ns even at room temperature 49,50 . In addition, current arbitrary waveform generators allow to change the amplitude of the delivered microwave field (and consequently of the Rabi frequency ) in time-scales significantly smaller than 1ns 42,51 . Then, one can easily introduce the controls in Fig. 5b to produce nonlinear sigmoid responses in NV centers. IE is also helpful to achieve the robust control in a specific physical setup 52-54 when one considers the Ising model with unwanted transitions between the target two-level system and other levels. In this manner one could envision a diamond chip with several NVs, each of them with available nearby nuclear spin qubits, as a quantum hardware to construct QNN using IE methods.
Fast quasi-adiabatic method. Another protocol to construct a quantum perceptron by controlling the qubit gate is to use FAQUAD strategy 34,44 , which can achieve the fast and adiabatic-like procedure. The adiabatic parameter is kept as a constant µ(t) = c during the whole control process, where the instantaneous eigenstates for the Hamiltonian (Eq. 2) are with the eigenenergies are E l = −(−1) l � 2 + x 2 j /2, α = arccos −x j / � 2 + x 2 j and l ∈ {0, 1} . In order to construct a universal quantum gate, a single control should not depend on the neuron potential x j . The largest value |µ| occurs at |x j /� f | ≈ 1.272 . We take this µ value as an optimal condition that works for all input neuron configurations. As the relation between the field and time is invertible, we can apply the chain rule to Eq. (12) and obtain where the negative sign represents �(t) monotonously decreases from �(0) to �(t f ) . The total duration time is rescaled as s = t/t f so that � (s) := �(s t f ) and d�/dt = t −1 f d�/ds . As a result, we have |φ l � = cos(α/2)|1� + (−1) l sin(α/2)|0�