Introduction

In the era of information expansion, the merge of quantum information and artificial intelligence will have a transformative impact in science, technology, and our societies1,2,3. In particular, classical networks of artificial neurons (or nodes) represent a successful framework for machine learning strategies, with the perceptron being the simplest example of a node4. The perceptron is based on the McCulloch-Pitts neuron5, and it was originally proposed by Rosenblatt in 1957 to create the first trained networks6. Nowadays, extensions of these original ideas such as multilayer perceptrons in networks with interlayer connectivity are exploited to deal with demanding computational tasks.

The emergence of quantum computing and machine learning has boosted the development of both fields7,8,9,10,11,12,13, giving rise to the field of quantum machine learning. In this context, quantum neural networks (QNNs) have attracted growing interest14,15 since the seminal idea proposed by Kak16. In particular, the entering of classical machine learning techniques into the quantum domain has the potential to accelerate the performance of different applications such as classification and pattern recognition2,17,18,19,20,21,22,23. In addition, nowadays the excellent degree of quantum control over the registers in modern quantum platforms24,25,26,27 allows the performance of quantum operations with high fidelity, which further feeds the idea of having reliable QNNs. However, the linear and unitary framework of quantum mechanics raises a serious dilemma, since neural networks present nonlinear and dissipative behaviours which are hard to reproduce at the quantum level. To address this challenge, many efforts have been attempted by exploiting quantum measurements16,28, the quadratic kinetic term to generate nonlinear behaviours29, dissipative16 or repeat-until-success30 quantum gates, and reversible circuits31. Among them, gate-based QNNs32 with training optimization procedures33 are feasible to implement by a set of unitary operations. Furthermore, gate-based QNNs can behave as variational quantum circuits that encode highly nonlinear transformations while remaining unitary20. Also, a quantum algorithm implementing the quantum version of a binary-valued perceptron was introduced in Ref.18, showing an exponential advantage in resources storage. Remarkably, a universal quantum perceptron has been proposed as an efficient approximator in Ref.34, where the quantum perceptron is encoded in an Ising model with a sigmoid activation function. In particular, the sigmoid nonlinear response is parametrized by the potential exerted by other neurons, and driven by adiabatic techniques.

In this article, motivated by the nonadiabatic control provided by shortcuts to adiabaticity (STA) techniques35,36, we design fast sigmoidal responses with the aid of the invariant-based inverse engineering (IE)37,38,39. The IE method is based on dynamical modes of Lewis-Riesenfeld invariant instead of one instantaneous eigenstates of the original reference Hamiltonian40,41. As IE directly imposes boundary conditions in the wave function evolution, the nonlinear activation function of the quantum perceptron encoded in the probability of the excited state can be achieved in a fast and robust way. In particular, an external control field on the perceptron is designed such that it leads to a fast nonlinear activation function with a wide tolerance window to the variation of the input potential induced by neurons in the previous layer. We demonstrate that our method produces solutions that outperform those based on adiabatic techniques, which significantly facilitates the implementation of quantum perceptrons in modern platforms such as nitrogen vacancy (NV) centers in diamond. Note that, the latter are settings where external control fields can be introduced with extraordinary precision42.

Results

Quantum perceptron

The capacity of feed-forward neural networks to classify complex data relies in the “universal approximation theorem” proved by Cybenko43, claiming that any continuous function can be written as a linear combination of sigmoid functions. A QNN is also demonstrated as a universal approximator of continuous functions34. In a classical network, a perceptron (or neuron) generates the signal \(s_j = f(x_j)\) as a sigmoidal response to the weighted sum of the signals (or outputs) from the neurons in the previous layer. More specifically, \(x_j = \sum _{i=1}^k w_{ji} s_i - b_j\) with the neuron interconnectivities \(w_{ji}\), the bias \(b_j\), and \(s_i\) being the output of the ith neuron in the previous layer. In analogy with classical neurons, a quantum perceptron can be constructed as a qubit that encodes the nonlinear response to an input potential in the excitation probability, see Fig. 1. One possibility for the latter is the following gate34:

$$\begin{aligned} {\hat{U}}_j({\hat{x}}_j; f) |0_j\rangle = \sqrt{1-f({\hat{x}}_j)} |0_j\rangle + \sqrt{f({\hat{x}}_j)} |1_j\rangle , \end{aligned}$$
(1)

where, in close similarity with the classical case, we have \({\hat{x}}_j = \sum _{i=1}^k w_{ji} {\hat{\sigma }}^z_{i} - b_j\), where \({\hat{\sigma }}^z_{i}\) is the z Pauli matrix of the ith neuron (qubit), \(w_{ji}\) is interaction between the perceptron j and the ith neuron in the previous layer, \(b_j\) is the bias of the perceptron. The transformation in Eq. (1) can be engineered by evolving adiabatically the qubit with the Ising Hamiltonian (\(\hbar = 1\))

$$\begin{aligned} {{\hat{H}}}(t)= & {} \frac{1}{2}\left[ {\hat{x}}_j {\hat{\sigma }}^z_j + \Omega (t){{\hat{\sigma }}}_j^x\right] \nonumber \\= & {} \frac{1}{2} \left[ \sum _{i=1}^k (w_{ji} {\hat{\sigma }}^z_i{{\hat{\sigma }}}^z_j) - b_j {\hat{\sigma }}_j^z + \Omega (t) {{\hat{\sigma }}}_j^x \right] , \end{aligned}$$
(2)

where the jth qubit (encoding the quantum perceptron) is controlled by an external field \(\Omega (t)\), leading to a tunable energy gap in the dressed-state qubit basis \(|\pm \rangle \), with \({{\hat{\sigma }}}^x_j|\pm \rangle = \pm |\pm \rangle \). When this perceptron is integrated in a feed-forward neural network, the potential depends on the neurons in earlier layers, as the perceptron interacts with other neurons in the previous layer (labeled by \(i = 1, \ldots , k\)) via the \(x_j\) potential, see Fig. 1. Therefore, the network is encoded in a Hilbert space via the external potential exerted by other neurons. The Ising Hamiltonian in Eq. (2) has the reduced eigenstate,

$$\begin{aligned} |\Phi (x_j/ \Omega (t)) \rangle = \sqrt{1- f(x_j / \Omega (t))} |0\rangle + \sqrt{f(x_j / \Omega (t))} |1\rangle ,~~~~ \end{aligned}$$
(3)

where \(x_j\) now represents the lowest eigenvalue of the operator \({\hat{x}}_j\), while f(x) corresponds to a sigmoid excitation probability

$$\begin{aligned} f(x) = \frac{1}{2} \left( 1+\frac{x}{\sqrt{1+x^2}}\right) . \end{aligned}$$
(4)
Figure 1
figure 1

Schematic configuration of a quantum perceptron. When it is integrated in a feed-forward neural network, the potential depends on neurons in earlier layers, e.g., \({\hat{x}}_j = \sum _{i=1}^k w_{ji} {\hat{\sigma }}^z_{i} - b_j\), where the activation function of the quantum perceptron is the probability of the excited state \(P_j (x_j / \Omega _f)\) at the final time \(t=t_f\) in the form of sigmoid-shape, shown in the inset.

In order to generate the state on the right side of Eq. (1), we propose the following strategy: First, a Hadamard gate is applied to drive the state from \(|0\rangle \) to \(|+ \rangle = (|0\rangle +|1\rangle ) / \sqrt{2}\). Secondly, by appropriately tuning \(\Omega (t)\) according to inverse engineering (IE) techniques (to be explained later), the state \(|\Psi (0)\rangle = |+\rangle \) evolves to \(|\Psi (t_f)\rangle =|\Phi (x_j/\Omega _f) \rangle \) (up to some phase factor that can be eventually canceled by a phase gate), along with one eigenstate of the Lewis-Riesenfeld invariant of \({\hat{H}}\), with \(|\Phi (x_j/\Omega _f)\rangle \) being the instantaneous eigenstate of \({{\hat{H}}}(t=t_f;\Omega _f)\), and \(\Omega _f \equiv \Omega (t_f)\). It is noteworthy to mention that, unlike the fast quasi-adiabatic passage (FAQUAD) approach34, our method based on IE does not need to achieve the initial condition \(\Omega (0) \gg |x_j|\), as it is not required that the initial state meets one eigenstate of \({{\hat{H}}}(0)\). The latter results in a smooth control field \(\Omega (t)\) which is easy to be used in experiments.

Another possibility to achieve \(|\Psi (t_f)\rangle \) from \(|\Psi (0)\rangle \) is by an adiabatic driving in a Landau-Zener scheme. However, as it is discussed in Ref.34, this spends long time and may be unfeasible depending on the coherence time of the physical setup that implements the Hamiltonian in Eq. (2).

Accelerating quantum perceptron by IE

We adopt the IE method to achieve the \(|\Psi (0)\rangle \rightarrow |\Phi (x_j/\Omega _f)\rangle \) state transfer with shorter time than FAQUAD44. The control field \(\Omega (t)\) is then engineered to guarantee that at the final evolution time \(t=t_f\) the qubit excitation probability \(P_j(x_j/\Omega _f)\) corresponds to a sigmoid-like response, i.e. to a mono-valuate f function satisfying \(\lim \limits _{x\rightarrow -\infty }f(x)\rightarrow 0\) and \(\lim \limits _{x\rightarrow \infty }f(x)\rightarrow 1\). Since the universality of neural networks does not rely on the specific shape of the sigmoid function43,45, e.g. Eq. (4), we quantify the performance of the control field \(\Omega (t)\) in the interval \([-x^{\text {max}}, x^{\text {max}}]\) with the distance \(C=2-F_0-F_1\). Here \(F_0= |\langle 0 | \Psi (t_f; x_j / \Omega _f=-x^{\text {max}}) \rangle |^2\) and \(F_1= |\langle 1| \Psi (t_f; x_j / \Omega _f= x^{\text {max}})|^2\) characterize how the engineered states overlap with \(|0\rangle \) and \(|1\rangle \), at \(x_j / \Omega _f=-x^{\text {max}}\) and \(x_j / \Omega _f=x^{\text {max}}\) respectively. Note that, for a sigmoid-like function, \(C\rightarrow 0\), for \(x^{\text {max}} \rightarrow \infty \). Meanwhile, in all the numerical results, the activation function is found to be well-behaved, i.e., the function is monotonic and with a sigmoid-like behaviour, \(\lim \limits _{x\rightarrow -\infty }f(x)\rightarrow 0\) and \(\lim \limits _{x\rightarrow \infty }f(x)\rightarrow 1\). As we will see later, our IE technique also provides with robustness with respect to timing errors.

Figure 2
figure 2

(a) The functions of \(\theta \) (solid-blue) and \(\pi -\beta \) (dashed-red), where \(\theta \) is interpolated by a polynomial ansatz \(\theta = \sum _{i=0}^3 a_i t^i\), and \(\beta \) is solved from Eq. (7) for \(t_f =1\) and \(\kappa =2000\), with \(\epsilon =2 \times 10^{-5}\). (b) The control fields \(\Omega (t)\) designed from IE (solid-blue) with the help of \(\theta \), \(\beta \) and from FAQUAD (dashed-red), where \(\kappa =2000\). We also show the control fields derived from \(\kappa =1000\) (dotted-black) and \(\kappa =500\) (dot-dashed-green). The inset in (b) displays the corresponding activation functions for different \(\kappa \), which coincide with each other. In both plots, \(y / \Omega _f =12\).

Now we show the procedure to find the control \(\Omega (t)\). To this end, we start with the parameterisation of the dynamical state

$$\begin{aligned} |\Psi (t)\rangle = \cos (\theta /2) e^{i\beta /2} |0\rangle + \sin (\theta /2) e^{-i\beta /2} |1\rangle , \end{aligned}$$
(5)

with the two unknown polar and azimuthal angles, \(\theta \equiv \theta (t)\) and \(\beta \equiv \beta (t)\), on the Bloch sphere. Having the state in Eq. (5) at hand, the corresponding orthogonal state \(|\Psi _{\perp }(t)\rangle \) gets completely determined and the Lewis-Riesenfeld invariant can be thus constructed with constant eigenvalues37,38. Substituting one of the states (\(|\Psi (t)\rangle \) or \(|\Psi _{\perp }(t)\rangle \)) into the time-dependent Schrödinger equation driven by the Hamiltonian in Eq. (2), we obtain the following coupled differential equations (for more details see Methods.)

$$\begin{aligned} \Omega (t)= & {} {\dot{\theta }}/\sin \beta , \end{aligned}$$
(6)
$$\begin{aligned} x_j= & {} {\dot{\theta }} \cot \theta \cot \beta - {\dot{\beta }}. \end{aligned}$$
(7)

Setting the wavefunction \(|\Psi (0)\rangle =|+\rangle \) and \(|\Psi (t_f)\rangle =|\Phi (x_j/\Omega _f) \rangle \) at the initial and final times leads to the boundary conditions

$$\begin{aligned} \theta (0)= & {} 2 \sin ^{-1}\left[ \sqrt{f({x}_j/ \kappa )}\right] , \nonumber \\ \theta (t_f)= & {} 2 \sin ^{-1}\left[ \sqrt{f({x}_j/ \Omega _f)}\right] , \end{aligned}$$
(8)

with the introduced \(\kappa \) parameter being infinitely large which results in \(|\Phi (x_j/\kappa )\rangle = |+\rangle \). Also, it is important to remark that \(\kappa \) does not need to equal the value of our control \(\Omega (t)\) at \(t=0\), as \(|\Phi (x_j/\kappa )\rangle \) is not necessarily the eigenstate of \({{\hat{H}}}[t=0; \Omega (0)]\). In addition, from Eq. (6) one can find the following conditions for the first derivatives of \(\theta \) at the boundaries

$$\begin{aligned} {{\dot{\theta }}}(0) = \Omega (0) \sin \beta (0), \quad {{\dot{\theta }}} (t_f) = \Omega _f \sin \beta (t_f). \end{aligned}$$
(9)

We can interpolate \(\theta \) by choosing a simple polynomial function \(\theta = \sum _{i=0}^N a_i t^i \) and a trigonometric fuction \(\theta = a_0 + a_1 t + \sum _{i=2}^N a_i \sin [(i-1)\pi t/t_f]\) with less coefficients required for matching the same boundary conditions46. The appropriate adoptions on the coefficients can make the solution approach the one gained from optimal control theory47. We present the comparison of the performance of activation function by using IE with these two ansatzes and exponential functions inspired by regularized optimal solutions in Supplementary Information. We stress that, unlike the method in Ref.38,48, in our case \(\theta \) and \(\beta \) are correlated. We impose \(\beta (t_f) =\pi /2\) and \(\beta (0)=\pi -\epsilon \) (note that we will allow a certain deviation by introducing the \(\epsilon \) parameter, see later). Once we construct \(\theta \), the function \(\beta \) can be obtained by solving Eq. (7) with the boundary condition \(\beta (t_f) =\pi /2\). After the functions \(\theta \) and \(\beta \) are obtained, the control field \(\Omega (t)\) is deduced using Eq. (6).

The solution to \(\beta \) from Eq. (7) depends on \(x_j\) leading to a set of \(\Omega \equiv \Omega (t, x_j)\). However, in order to make the control independent of the input potential, we set \(\Omega (t)=\Omega (t, x_j=y)\) where the value of y is chosen such that it minimizes the C distance for different \(x_j\) in a certain interval (see next section).

IE performance

As the state evolves from \(|\Psi (0)\rangle = |+\rangle \), the \(\kappa \) parameter should be a large number compared to the input potential \(x_j\). We numerically study situations where \(\kappa = 2000\) and explored the range \(|x_j|/\Omega _f \in [-x^{\text {max}}, x^{\text {max}}]\), with \( x^{\text {max}}=12\). Note that, we consider the situation where \(x^{\text {max}}=12\), although our results are not limited to the specific number. We use dimensionless units, by setting the unit of time \(t_0\) such that the control field \(\Omega (t)\) is given in terms of \(1/t_0\). In addition, we consider an unbiased perceptron with \(b_j=0\).

Figure 3
figure 3

The dependence of C value on y with the application of IE \(\theta = \sum _{i=0}^3 a_i t^i \) (a) and FAQUAD (b) for different operation times \(t_f = 0.1\) (solid-blue), \(t_f=0.2\) (dashed-red), \(t_f=0.5\) (dotted-black), and \(t_f=1\) (dot-dashed-green).

Not limited to a fixed large number of \(\kappa \), our method shows the flexibility and the feasibility of the control field. For a case in which we impose \(\Omega _f=1\) and solve Eq. (7) with a fixed value for \(x_j/\Omega _f=y/\Omega _f=12\), we find \(\theta (0) = 1.576 \simeq \pi /2\). Figure 2a indicates the obtained solutions for \(\theta \) and \(\beta \) for this case in which we have also selected the operation time \(t_f=1\). We find that the boundary condition for \(\beta (0)\) is also satisfied with a tiny error of \(\epsilon =2\times 10^{-5}\). In this specific case, we find that the designed control \(\Omega (t)\) at \(t=0\) is \(\Omega (0) = 1999.6 \approx \kappa \) when \(\kappa =2000\), the initial state corresponds to the eigenstate state of the Hamiltonian. Also, we observed that \(\beta (0)\) tends to \(\pi \) when \(t_f\) gets larger. In Fig. 2b, the control field \(\Omega (t)\) obtained with our method is illustrated. This \(\Omega (t)\) leads to an excitation probability such that it arrives at \(P_j(x^{\text {max}}) = 0.998\). Using the same control field \(\Omega (t)\), we find that the probability of the state \(|1\rangle \) for other input neural potentials \(x_j / \Omega _f \in [-x^{\text {max}}, x^{\text {max}}]\) is in the form of a sigmoid-like response ranging from 0 to 1 during the interval, as shown in the inset of Fig. 2b. This proves the successful construction of a sigmoid-shape transfer function, which is a crucial factor for a quantum perceptron. The fields calculated from \(\kappa =1000\), \(\kappa =500\) lead to the same sigmoid activation function which, as shown in the inset of Fig. 2b, cannot be distinguished to the one derived from \(\kappa = 2000\).

Our IE method provides a wider range of \(y/\Omega _f\) than FAQUAD to construct sigmoid transfer functions. In Fig. 3a the value of the distance C obtained with the IE method, as a function of \(y/\Omega _f\) for various operation times \(t_f\), is shown. It can be observed that a low value for C appears with large values for |y| and \(t_f\). We have checked (also for \(t_f = 1\)) the appearance of nonlinear perceptron responses that connect 0 and 1 with a sigmoid shape. In particular, these lead to \(C< 10^{-2}\) in the range \(y/\Omega _f \in [5,12]\) with control fields \(\Omega (t)\) for \(t_f=1\) similar to the one in Fig. 2b. In contrast, C goes to almost 2 at \(y / \Omega _f=-x^{\text {max}}\) by FAQUAD techniques44, in which only for long \(t_f\) and in the regime \(y / \Omega _f \rightarrow x^{\text {max}}\) the transfer function can be produced, see Fig. 3b.

The target state \(|\Psi (t_f)\rangle = |\Phi (x_j/\Omega _f)\rangle \) depends on the value of the driving field at the final time, see Eq. (3). In general we observe that, with our IE method, a larger value of the control field at \(t=t_f\) (i.e. \(\Omega _f\)) offers higher fidelity. As an example of the latter, in Fig. 4 we show the value of C as a function of \(\Omega _f\) for \(t_f=0.2\) with the application of IE (solid-blue) and FAQUAD (dashed-red). In this figure one can observe the improved performance of our IE method. Actually, every point of the lower value C by IE implies the successful discovery of sigmoid-shape transfer function and driving field \(\Omega (t)\).

Figure 4
figure 4

Dependence of C value on \(\Omega _f\) is shown for IE \(\theta = \sum _{i=0}^3 a_i t^i \) (solid-blue) and FAQUAD (red-dashed) protocols, when \(t_f=0.2\), \(y/\Omega _f=12\).

Quasi-optimal-time solution

As the activation function \(P(x_j / \Omega _f)\) connects 0 and 1 at \(-x^{\text {max}}\) and \(x^{\text {max}}\), we set \(C<0.01\) as the criteria of successful construction of a quantum perceptron. In Fig. 5a, we illustrate the dependence of C value on \(t_f\) by using the polynomial ansatz \(\theta = \sum _{i=0}^N a_i t^i \) with \(N=3\) and \(N=5\) of IE as well as FAQUAD34. When \(N=3\), the smallest \(t_f\), such that \(C < 0.01\) is satisfied, is 0.2, while employing techniques based on FAQUAD, this is at \(t_f=0.3\). The further reduction of the smallest \(t_f\), such that \(C < 0.01\) is satisfied, can be improved since IE method allows to approach the quasi-optimal-time solution by introducing more degrees of freedom in the ansatz of \(\theta \)47, leading to faster quantum perceptrons. With \(N = 5\) (i.e. a solution with two additional parameters, namely \(a_4\) and \(a_5\)), see Fig. 5a (dotted-black curve) we get a speed up of 2 with respect to FAQUAD method, leading to the minimal operation time \(t_f^{\text {min}} = 0.15\). The values of the transfer function at \(-x^{\text {max}}\) and \(x^{\text {max}}\) and C value with the application of IE strategies in polynomial, trigonometric and exponential functions as well as FAQUAD can be seen in Supplementary Information, showing that high-order polynomial ansatz can give a quasi-optimal-time solution.

Moreover, we find that the IE method is robust with respect to timing errors, i.e. variations on the operation time \(t_f\). More specifically, once the minimal value of C is reached for solid-blue in Fig. 5a, C does not show any appreciable oscillation for \(t>t_f^{\text {min}}\). Conversely, the FAQUAD driving leads to the dashed-red curve in Fig. 5a that shows an oscillatory behavior of C, indicating that only at some specific \(t_f\) the sigmoid transfer function can be constructed.

Figure 5
figure 5

(a) Dependence of C as a function of the final time \(t_f\), using IE in the cases of \(\theta = \sum _{i=0}^3 a_i t^i \) (solid-blue), \(\theta = \sum _{i=0}^5 a_i t^i \) (dotted-black) and FAQUAD (dashed-red). The inset of (a) shows the corresponding transfer functions for \(t_f =0.15\), where the dotted-black curve represents the quasi-optimal-time solution with \(a_2 = -50\), \(a_3 = -3980\). (b) For \(t_f=0.15\), the driving field \(\Omega (t)\) designed from IE in the cases of using \(\theta = \sum _{i=0}^3 a_i t^i \) (solid-blue), using \(\theta = \sum _{i=0}^5 a_i t^i \) with the optimal parameters \(a_2 = -50\), \(a_3 = -3980\) (dotted-black), and \(y/\Omega _f =12\).

Remarkably, for short times, e.g. \(t_f=0.15\), the transfer functions and driving fields are completely different for IE and FAQUAD protocols. In the inset of Fig. 5a,b, we give the detailed demonstration of transfer functions and driving fields designed from IE. On the one hand, FAQUAD protocol cannot produce the sigmoid function, by connecting from 0 to 1 at the edges, see the inset of Fig. 5a dashed-red curve. On the other hand, we find that the case of IE with the polynomial ansatz of \(N=3\) fails to connect the state \(|0\rangle \) presenting \(P(-x^{\text {max}}) = 0.2\) (solid-blue curve). However, We can overcome this limitation by increasing the order of the polynomial ansatz to \(N=5 \). Here, we compare the activation functions achieved by different strategies at the same value of \(y/\Omega _f=12\). It is worth mentioning that by increasing the value of \(x^{\text {max}}\) which means more energy is supplied to the system, we can recover a more stretched sigmoid with the FAQUAD protocol or IE with the polynomial ansatz of \(N=3\). However, in this work, we find the external driving \(\Omega (t)\) by which the perceptron can have a sigmoidal response in a fixed Hamiltonian configuration with the range \([-x^{\text {max}}, x^{\text {max}}]\).

In addition, the derived controls \(\Omega (t)\) from IE methods are smooth and present values close to zero at \(t=0\), see Fig. 5b. Compared to the case of \(t_f=1\), shorter operation time leads to larger \(\epsilon \) so that \(\Omega (0)\) is farther away from \(\kappa \). This is in contrast with the control \(\Omega (t)\) derived from FAQUAD techniques that demands an abrupt change from \(\Omega (0) = 2000\) to \(\Omega (t_f) = 1\), see Fig. 2b. This demonstrates the appropriateness of our IE derived controls to be implemented experimentally. In this regard, in the next section we give estimations based on state of the art experimental parameters in NV centers in diamond that demonstrates the suitability of an implementation of our method in such quantum platform.

Discussions

We have demonstrated that the enhanced performance of our method using IE techniques leads to sigmoid activation functions within a minimal operation time of \(t_f^{\text {min}}=0.15 \ t_0\). If, for instance, one selects \(t_0 = 500 \) ns, the maximum value for the control \(\Omega (t)\) amounts to \(|\Omega _{\mathrm{max}}| \approx 50\) MHz for the kind of solutions presented in Fig. 5b (see horizontal axis limits in that figure). This permits the application of our controls in modern quantum platforms such as NV centers in diamond that present coherence times much longer than \(0.15 \ t_0=0.15 \times 500 = 75\) ns even at room temperature49,50. In addition, current arbitrary waveform generators allow to change the amplitude of the delivered microwave field (and consequently of the Rabi frequency \(\Omega \)) in time-scales significantly smaller than 1ns42,51. Then, one can easily introduce the controls in Fig. 5b to produce nonlinear sigmoid responses in NV centers. IE is also helpful to achieve the robust control in a specific physical setup52,53,54 when one considers the Ising model with unwanted transitions between the target two-level system and other levels. In this manner one could envision a diamond chip with several NVs, each of them with available nearby nuclear spin qubits, as a quantum hardware to construct QNN using IE methods.

Methods

Inverse engineering and derivation of auxiliary differential equations

The quantum perceptron gate evolves a qubit with the general Hamiltonian (Eq. (2)) which has the instantaneous ground state (Eq. (3)) with the basis \(|0\rangle = (0,1)^T\) and \(|1\rangle = (1,0)^T\) and a sigmoid excitation probability (Eq. (4)). Therefore, we need to control the final state exactly as \(|\Psi (t_f)\rangle = |\Phi (x_j / \Omega (t_f))\rangle \) in the form of Eq. (3). Inverse engineering by parameterizing the Bloch sphere angles \(\theta \) and \(\beta \) can manipulate the dynamical state evolution in a fast way. After substituting the wave function \(|\Psi (t)\rangle \) (Eq. (5)) or the orthogonal state \(|\Psi _{\perp }(t)\rangle \) into Schrödinger equation, we can obtain two equations

$$\begin{aligned} -i {\dot{\theta }}\sin \frac{\theta }{2} - {\dot{\beta }}\cos \frac{\theta }{2}= & {} x_j \cos \frac{\theta }{2} + \Omega (t) \sin \frac{\theta }{2} e^{-i \beta }, \end{aligned}$$
(10)
$$\begin{aligned} i {\dot{\theta }} \cos \frac{\theta }{2} + {\dot{\beta }}\sin \frac{\theta }{2}= & {} -x_j \sin \frac{\theta }{2} +\Omega (t) \cos \frac{\theta }{2} e^{i \beta }. \end{aligned}$$
(11)

Eq. (10) \(\times \sin (\theta /2)\) \(+\) Eq. (11) \(\times \cos (\theta /2)\) and Eq. (10) \(\times \sin (\theta /2)\) − Eq. (11) \(\times \cos (\theta /2)\), respectively, result in the analytical expressions of \(\Omega (t)\) (Eq. (6)) and \(\beta \) (Eq. (7)). Once setting the operation time \(t_f\) and the dynamics of the polar angle \(\theta \), we can obtain the function \(\beta \) by solving Eq. (7) with the boundary condition \(\beta (t_f) = \pi /2\). Hence, from Eq. (6), we derive the applied field \(\Omega (t)\).

Fast quasi-adiabatic method

Another protocol to construct a quantum perceptron by controlling the qubit gate is to use FAQUAD strategy34,44, which can achieve the fast and adiabatic-like procedure. The adiabatic parameter

$$\begin{aligned} \mu (t) = \hbar \left| \frac{\langle \phi _0(t) | \partial _t \phi _1(t) \rangle }{E_1(t) - E_0(t)} \right| \end{aligned}$$
(12)

is kept as a constant \(\mu (t) =c\) during the whole control process, where the instantaneous eigenstates for the Hamiltonian (Eq. 2) are

$$\begin{aligned} |\phi _l\rangle = \cos (\alpha /2) |1\rangle +(-1)^l \sin (\alpha /2) |0\rangle \end{aligned}$$
(13)

with the eigenenergies are \(E_l = - (-1)^l \hbar \sqrt{\Omega ^2 + x^2_j} /2\)\(\alpha = \arccos \left[ -x_j / \sqrt{\Omega ^2+x^2_j}\right] \) and \(l \in \{0,1\}\). In order to construct a universal quantum gate, a single control should not depend on the neuron potential \(x_j\). The largest value \(|\mu |\) occurs at \(|x_j / \Omega _f| \approx 1.272\). We take this \(\mu \) value as an optimal condition that works for all input neuron configurations. As the relation between the field and time is invertible, we can apply the chain rule to Eq. (12) and obtain

$$\begin{aligned} \frac{d \Omega }{d t} = - \frac{c}{\hbar } \left| \frac{E_1(\Omega ) - E_0(\Omega )}{\langle \phi _0(\Omega ) | \partial _\Omega \phi _1 (\Omega ) \rangle } \right| , \end{aligned}$$
(14)

where the negative sign represents \(\Omega (t)\) monotonously decreases from \(\Omega (0)\) to \(\Omega (t_f)\). The total duration time is rescaled as \(s = t/t_f\) so that \({\tilde{\Omega }}(s) := \Omega (s ~ t_f)\) and \(d\Omega / dt =t_f^{-1} d\tilde{\Omega } /ds\). As a result, we have

$$\begin{aligned} \frac{d {\tilde{\Omega }}}{d s}= & {} - \frac{{\tilde{c}}}{\hbar } \left| \frac{E_1 - E_0}{\langle \phi _0 | \partial _{{\tilde{\Omega }}} \phi _1\rangle } \right| _{\tilde{\Omega }}, \end{aligned}$$
(15)
$$\begin{aligned} {\tilde{c}}= & {} c t_f = -\hbar \int _{\tilde{\Omega }(0)}^{\tilde{\Omega }(1)} \frac{d \tilde{\Omega }}{\left| \frac{E_1 -E_0}{\langle \phi _0 | \partial _{{\tilde{\Omega }}}\phi _1\rangle }\right| _{\tilde{\Omega }}}. \end{aligned}$$
(16)

A selection of \(t_f\) corresponds to different scaling of \({\tilde{c}}\) and \(\Omega (t =s t_f) = \tilde{\Omega }(s) \). Consequently, we can derive \(\Omega (t)\) from \(\tilde{\Omega }(s)\) by solving the differential equation (Eq. (15)).