Introduction

Quantum computing and machine learning have come together in recent years as a rapidly growing, cross-disciplinary field with huge transformative potential via quantum advantage. This new field of quantum machine learning (QML) aims to further revolutionize many of the areas that are currently seeing real transformation from new approaches in machine learning and artificial intelligence. An area that is critical to business and research is predicting or forecasting sequential time series data, which is paramount in finance, business, economics, climatology, meteorology, and ecology.

Reservoir computing (RC) is a paradigm for time series prediction that draws from some of the successful properties of RNNs, such as sequential memory, while greatly improving learning efficiency by fixing reservoir weights for all but a single trainable output layer1,2,3,4,5,6. While RC is well-suited to dynamical system modeling, it is proven to be a universal approximator for sequential functions7. The quantum-enhanced version of RC (QRC) leverages a quantum reservoir, a natural quantum many-body system or a programmable quantum computer circuit. QRC provides a path to quantum advantage by using a quantum reservoir with an exponentially larger computational space and greater complexity for time series prediction.

Many QRC frameworks have been developed, under which QRC algorithms can be classified at a higher level. For example, QRC reservoir nodes have been realized with quantum basis states8,9 in contrast to qubits or qudits. One such novel framework leverages hybrid quantum-classical RNNs, such as quantum long short-term memory (QLSTM), as a reservoir10. QRC reservoirs can be based directly on known quantum system models or on hardware-efficient quantum feature map designs, where this work falls under the latter. In this work we focus on noisy or dissipative quantum reservoirs, pioneered in works11,12,13. Furthermore, we demonstrate a successful, novel approach to quantum reservoir optimization, where other schemes have been explored in works14,15,16.

QRC has been applied to many prediction tasks including nonlinear time series prediction10,11, time series classification11, image recognition17, and stock market value14 and volatility prediction15. In quantum information science, QRC has been used for entanglement recognition, non-linear function estimation and quantum state tomography18,19,20.

QRC may have begun to demonstrate superior computational capacity to classical RC. In one example, numerical studies have shown that quantum reservoirs consisting of 5–7 qubits possess computational capacities comparable to conventional recurrent neural networks of 100–500 nodes21. In this work we demonstrate excellent prediction capacity of few-qubit reservoirs.

The most significant computational capacity that is a main goal of QML is quantum advantage, which is a measurable performance improvement over classical computation on a well-defined objective task (e.g. a business time series prediction task) using quantum computation22. Quantum advantage with QRC likely exists if the quantum reservoir requires a complex, many-qubit entangled architecture that is intractable to classical computation. This view is the same as that expressed in23 for quantum-enhanced feature spaces, a closely related approach to QRC, where the quantum reservoir acts as a sequential feature map.

In this work we build on the quantum noise-induced reservoir (QNIR) framework11,12, with a novel approach to parameterized quantum circuits for the reservoir and a systematic reduction of circuit complexity. The term reduction is used for minimizing quantum circuit resources to clearly differentiate it from the reservoir optimization achieved with parameterized noise channels. QNIR is a type of QRC that relies on quantum hardware noise or, as in the focus of this work, artificial noise models in quantum software, as a resource to generate rich, dissipative quantum reservoir dynamics. In the current, transitional NISQ phase of quantum computing, QNIR can use inherent hardware noise. However, in future strongly error-mitigated and fault-tolerant quantum computers, QNIR noise channels can be coded instructions in a quantum program along with quantum gates. This approach abstracts this QNIR algorithm from the underlying physical device. In a novel approach we implement parameterized artificial noise models programmed to a quantum computer for improved time series prediction performance. With this, we address the important need of reservoir tuning, in QNIR and QRC in general, for classes of prediction tasks.

Powerful optimization approaches for reservoir noise are offered by dual annealing and evolutionary optimization (EO). EO is capable of optimizing quantum systems at various levels, such as quantum circuit parameters, successfully realized in this work and in previous works24,25,26 and quantum circuit architecture. Here we use a previously successful EO algorithm27 in which model parameters were evolved for quantum reinforcement learning agents in a hybrid quantum-classical neural network approach.

Quantum noise-induced reservoir computing

Theoretical framework

We develop QNIR theory starting from general RC theory. RC is a computational paradigm and class of machine learning algorithms that derives from RNNs. RC involves mapping input signals, or time series sequences, into higher dimensional feature spaces provided by the dynamics of a non-linear system with fixed coupling constants, called a reservoir. Having a smaller number of trainable weights confined to a single output layer is a core benefit of RC because it makes training fast and efficient compared to RNNs. RC has a number of properties that should be met28,29 including adequate reservoir dimensionality, nonlinearity, fading memory/echo state property (ESP) and response separability.

For the univariate case, a reservoir, f, is a recurrent function of an input sequence, \(u_t\), and prior reservoir states, \(\bar{x}_{t-1}\), as

$$\begin{aligned} \bar{x}_t = f(\bar{x}_{t-1},u_t). \end{aligned}$$
(1)

As output sequences, \(\bar{x}_t\), training sequences are selected between time-steps \(t=t_i\) and \(t=t_f\), and form a training design matrix, \(\textbf{X}_{tr}\). The initial sequence, \(t<t_i\), is a washout interval required for fading memory/ESP. A multiple linear regression model with an initial form:

$$\begin{aligned} \textbf{y} = W^T \textbf{X}_{tr}, \end{aligned}$$
(2)

is trained based on least squares, where \(\textbf{y}\) is the target vector and W is an initial weight vector. The trained model has the form:

$$\begin{aligned} \hat{\textbf{y}} = W^T_{opt}\textbf{X}, \end{aligned}$$
(3)

with an optimized weight vector, \(W^T_{opt}\), to give a predicted sequence, \(\hat{\textbf{y}}\), from new sequences, \(\textbf{X}\).

Figure 1
figure 1

Circuit channel diagrams of the QNIR computer in the unrolled view, composed using30. The initial state of the quantum reservoir is \(|+\rangle ^{\otimes n}\) and the quantum channels labeled \(\mathscr {T}_{u_i}\) evolve the density operator as in Eq. (4), where N quantum circuits are required for N time steps. A number of output sequences, n, are concatenated from sequential, single-qubit expectation value measurements \(\langle Z_{i} \rangle\) on n qubits.

For QNIR with artificial noise channels, the RC framework that has been developed is now instantiated in the following way. The density operator evolves in time steps as

$$\begin{aligned} \rho _t = \mathscr {T}_{u_t}(\rho _{t-1}), \end{aligned}$$
(4)

where the reservoir map \(\mathscr {T}_{u_t}\) is composed of a sequence unitary quantum gates, \(U_i\), and associated artificial noise channels, \(\mathscr {E}_i\), that are completely positive and trace preserving (CPTP). The reservoir map can be represented as a composition of quantum channels

$$\begin{aligned} \mathscr {T}_{u_t}(\rho _{t-1}) = \mathscr {E}_{U_K} \circ \ldots \circ \mathscr {E}_{U_2} \circ \mathscr {E}_{U_1} (\rho _{t-1}), \end{aligned}$$
(5)

where the notation \(\mathscr {E}_{U_i} = \mathscr {E}_i( U_i \rho U_i^{\dagger } )\) is used for clarity and to emphasize that each quantum gate is acted on by a noisy channel and K is the number of noise channels in the time step. We will refer to \(\mathscr {T}_{u_t}\) as a noisy quantum circuit. QNIR requires an initial washout phase, \(t<t_i\), where the reservoir forgets its initial state before a steady state is reached.

The unitary, noiseless part of the quantum circuit is composed of an initial layer of RX gates followed by an entanglement scheme of \({RZ\!Z}_{i,j}\) gates, which are 2-qubit entangling gates

$$\begin{aligned} (C\!X_{i,j}RZ_j(\theta )C\!X_{i,j})RX^{\otimes n}(\theta ) = {RZ\!Z}_{i,j}(\theta )RX^{\otimes n}(\theta ), \end{aligned}$$
(6)

where all \(RX(\theta )\) and \(RZ(\theta )\) rotation gates encode the time series data with a scaling map, \(\theta =\phi (u)\). The purpose and structure of the unitary encoding gates is detailed in subsection: Reservoir circuit designs.

Single-qubit expectation values, \(\langle Z_{i} \rangle = Tr(Z_i \rho )\), are measured for all n qubits at each time-step,

$$\begin{aligned} h_t = [\langle Z_{1} \rangle ,\langle Z_{2} \rangle ,\ldots ,\langle Z_{n} \rangle ]^T, \end{aligned}$$
(7)

as shown in a circuit diagram in Fig. 1. Figure 2 depicts that time series values are encoded to all reservoir qubits and \(\langle Z_{i} \rangle\) are measured of all qubits, which are concatenated for each time step to give n reservoir feature sequences \(q_i = \{\langle Z_{i} \rangle \}_{t=0}^N\), where N is the number of time steps. In turn, \(q_i\) form a design matrix \(\textbf{X}\) and the QNIR model is trained as in Eq. (3). A schematic of the full QNIR computer is shown in Fig. 3.

Figure 2
figure 2

This drawing represents many repeats of data encoding of a single value, \(u_i\), to all reservoir qubits (left) and measurements of single-qubit Z expectation values (right). This two-part process occurs at each time step i to build feature signals by concatenation. Noisy quantum circuits are shown for each time step in Fig. 1. This drawing shows an example of a four-qubit reservoir with fixed, pair-separable dynamics.

Figure 3
figure 3

In this graphic the first layer contains an array of duplicates of a single time series value. Each value in the input array is encoded to all qubits of the reservoir as in Eq. (6). The second layer is a quantum reservoir with arbitrary entanglement scheme, represented by connecting lines between qubit nodes. The Z observable expectation value, \(\langle Z_{i}\rangle\), is measured for all qubits. These measurements are repeated and concatenated to build output signals, \(q_i\). In the final layer, these signals are used in multiple linear regression for time series prediction, as in Eq. (3).

It is important in RC and by extension QRC that the reservoir system can capture the temporal dynamics of the target system. To ensure this we implement a reservoir optimization scheme for QNIR. The artificial noise channels, \(\mathscr {E}_i\), of the quantum reservoir circuit are iteratively updated by an optimization routine with an MSE cost function based on the time series prediction performance. This serves to optimize the quantum reservoir for time series prediction. Details of the optimization approach are in subsection: Reservoir noise parameterization.

Reservoir circuit designs

This section is concerned with the architecture and purpose of the unitary gates of the quantum circuit, the high-level structure of the noisy quantum circuits and entanglement scheme. The details of the noise scheme are covered in subsection: Reservoir noise parameterization.

The initial state of the quantum reservoir, \(|+\rangle ^{\otimes n}\), is prepared by an initial Hadamard gate layer. Continuing with Eq. (6), an n-qubit QNIR circuit has a fixed sequence of quantum gates

$$\begin{aligned} \begin{aligned} U_{b}(u)&= (C\!X_{i,j}RZ_j(\theta )C\!X_{i,j})RX^{\otimes n}(\theta ) \\&= {RZ\!Z}_{i,j}(\theta )RX^{\otimes n}(\theta ) \end{aligned} \end{aligned}$$
(8)

where ij are indices for two qubits that denote the placement of multiple 2-qubit \(RZ\!Z\) entangling gates. The decomposed form of the circuit with \(C\!X\) and RZ gates23 is implemented with noise channels (see subsection: Reservoir noise parameterization). A time series data value, u, is encoded to all \(RX(\theta )\) and \(RZ\!Z(\theta )\) gates by angle \(\theta = \phi (u)\), where \(\phi\) is a scaling map.

To implement the recurrent architecture of QNIR, a set of N quantum circuits are executed for a time series \(\{u_t\}^N_{t=0}\). The first circuit encodes \(\{u_0\}\), the second circuit encodes \(\{u_0,u_1\}\), and the Nth circuit encodes \(\{u_t\}^N_{t=0}\) as

$$\begin{aligned} \text {U}_{t=N} = U_{b}(u_N) \ldots U_{b}(u_1)U_{b}(u_0). \end{aligned}$$
(9)

All unitaries \(\text {U}_t\) for arbitrary t constrain the i expectation values to a zero bitstring

$$\begin{aligned} \langle Z_{i} \rangle _{t} = \langle \Phi _0|\text {U}^{\dagger }_t Z_i \text {U}_t |\Phi _0\rangle = 000..., \end{aligned}$$
(10)

where \(|\Phi _0\rangle = |+\rangle ^{\otimes n}\) is the initial reservoir state and \(Z_i\) represents n single-qubit Z measurement operators. It is the action noise that ensures the qubit signals are non-zero, feature sequences, \(q_i\). Now considering the full QNIR circuits with artificial noise, the noisy quantum circuit for the final iteration, encoding \(\{u_t\}^N_{t=0}\), is the quantum channel

$$\begin{aligned} {\varvec{\mathscr {T}}}_{N} = {\mathscr {T}}_{u_N} {\circ } \ldots {\circ } {\mathscr {T}}_{{u}_{2}} {\circ } {\mathscr {T}}_{{u}_{1}}. \end{aligned}$$
(11)

The noisy quantum circuit with artificial noise scheme will be detailed in the next subsection: Reservoir noise parameterization. This scheme may further reduce resources by circuit truncation based on a memory criterion29,31,32,33.

For \(RZ\!Z_{i,j}\) gates, the degree of entanglement between qubits i and j is a function of \(u_t\). It is important that the range of magnitudes of the data values is constrained and we observe that values much larger than \(2\pi\) cause undesirable effects. We consider benchmarks that do not require re-scaling.

Drawing from the close connection with quantum feature maps23,34,35,36, entanglement schemes are defined by the number and placement, i.e. the architecture, of \(RZ\!Z\) gates in Eq. (6). Common entanglement schemes that could be trialed are full, linear, pair-wise, and what we call pair-separable used in Suzuki et al.11. The pair-separable (PS) and linear entanglement (LE) schemes explored in this work have \(RZ\!Z\) gates indexed as \(i,j \in \{(0,1),(2,3),(4,5),...,(N-1,N)\}\) and respectively \(i,j \in \{(0,1),(1,2),(2,3),...,(N-1,N)\}\). To clarify, for an LE scheme, every additional \(RZ\!Z\) gate is in a new circuit layer, increasing the circuit depth each time. The LE scheme creates whole circuit entangled states23. The state vector for a PS entanglement scheme evolves in a product state of qubit pairs, \(|\psi \rangle = \bigotimes _{i=1}^{n/2} |\phi \rangle _i\), where \(|\phi \rangle _i\) are two-qubit entangled states. The state, \(|\psi \rangle\), can be efficiently classically simulated and can be parallelized in classical simulation or on quantum computers37,38.

Reservoir noise parameterization

QNIR uses noise as a necessary resource to generate non-trivial feature sequences. We use artificial noise that can be programmed to a quantum computer. Within this scheme, many such artificial noise models can be implemented to produce different effects. To implement a noise scheme, we associate parameterized, single-qubit noise channels with each unitary gate in the quantum circuit, Eq. (6), as shown in Fig. 4. Note that this differs from Kubota et al.12, where noise channels were situated at the end of every time step. In the following, we assume each noise channel depends on a single noise parameter.

Figure 4
figure 4

A 2-qubit quantum circuit channel diagram of an reservoir noise parameterization. Each unitary gate has an associated noise channel represented by \(\mathscr {E}(p_i)\). This represents the novel quantum circuit parameterization approach proposed in this work.

Figure 5
figure 5

This graphic shows the QNIR noise optimization scheme. The quantum model is trained and tested iteratively in a classical optimization loop, where dual annealing or evolutionary optimization are used. The quantum reservoir circuits have a number of gate-associated noise channels, each of which has a single error probability parameter that is iteratively updated.

Noise channels are associated with all quantum gates in the reservoir circuit in Fig. 4. Each noise channel \(\mathscr {E}(p)\) is a function of a probability for the noise effect to occur. We use probabilities, \(p_i\), to parameterize the reservoir for optimization. The number of probability parameters scales linearly with the number of qubits. For pair-separable entanglement reservoir, the number of parameters is \(n_{p_i} = \frac{7}{2} n\), where \(n=2,4,6,...\), and for linear entangled reservoir \(n_{p_i} = 6n-5\), where \(n=2,3,4,...\).

QNIR resource-noise optimization is performed through iterative training (Eq. 2) and testing (Eq. 3) of QNIR, giving optimized noise probability parameters, \(p_i \in \textbf{p}\) (see Fig. 5). The parameters in the initial parameter vector, \(\textbf{p}\), are probabilities randomly selected from a uniform distribution, \(p_i \sim U(0,1), \forall i\).

Two optimization approaches were trialed in this work, evolutionary optimization27 and dual annealing39, where the latter is available in the SciPy optimization package40. The mean squared error (MSE) was used as a suitable cost function to measure prediction performance, which is minimized as

$$\begin{aligned} \min _{\textbf{p}}\; \{ \text {MSE}(\hat{\textbf{y}}(\textbf{p}),\textbf{y}) : p_i \in [0,1], \forall i \}, \end{aligned}$$
(12)

where \(\hat{\textbf{y}} = W^T_{opt} \textbf{X}(\textbf{p})\) is the QNIR test set prediction and \(\textbf{X}(\textbf{p})\) are the reservoir signals matrix dependent on noise probabilities \(\textbf{p}\).

In this work, we use only reset noise channels that can be simply implemented with a classical ancilla system (see next subsection: Reset noise).

Reset noise

We propose a simple hybrid quantum-classical algorithm for a reset noise channel that consists of probabilistically triggering a reset instruction using a classical ancillary system. A deterministic reset instruction is an important element of a quantum instruction set, for the need to reset qubit states. A quantum instruction set is an abstract quantum computer model41,42. In this work we consider a reset to \(|0\rangle\) noise channel given by \(\mathscr {E}_{PR}(\rho ) = p|0\rangle \langle 0| + (1-p)\rho\), where p is the reset probability43. \(\mathscr {E}_{PR}(\rho )\) is trace-preserving, \(Tr(\mathscr {E}_{PR}(\rho ))=1\).

Using dynamic circuits, quantum computers can implement a reset instruction with a mid-circuit measurement followed by a classically controlled quantum X gate that depends on the measurement outcome44 (see Fig. 6). For example, this is how a reset is now implemented on IBM quantum computers supported by OpenQASM341.

Figure 6
figure 6

A deterministic RESET instruction (left) is executed with this dynamic circuit. This can be used as a basis for a reset noise channel, \(\mathscr {E}_{PR}\). A single line represents a qubit and a double-line represents a classical bit. A model classical ancillary system (right) would be executed on a classical computer. The classical NOT gate, \(X_p\), is executed with probability p, which in turn triggers a classical controlled RESET instruction with probability p.

In classical computing, execution of a probabilistic instruction is triggered using a random number generator (RNG), such as those widely available in software as PRNGs or in hardware as HRNGs. Here we employ a classical RNG to probabilistically activate a reset, which is identical to reset noise. In this way, artificial reset noise is implemented without ancilla qubits. Ancilla qubits would be an undesirable overhead in the larger scheme presented in this work in which unitary gates require potentially many corresponding noise channels. This hybrid approach may be viable for other noise channels. For example, reset noise can approximate amplitude damping noise to high precision43.

Methods

Reservoir complexity reduction

Reservoir complexity reduction was performed for all benchmark tasks to reduce quantum resource footprint and prevent overfitting. This involved reductions in both reservoir entanglement scheme complexities and numbers of qubits. Reduction was performed as a typical optimization procedure in which resources increase until a stopping condition is satisfied. Reductions in circuit resources was determined largely by reservoir optimization and final MSE. See Methods: Noise optimization.

Entanglement scheme complexity is quantum circuit complexity45, determined by the number of entangling gates and resultant circuit depth, i.e. it is the cost of the quantum circuit. Linear entanglement schemes were trialed first for both benchmarks and were comparable to pair-separable entanglement schemes that were finally selected by the reduction principle.

The numbers of qubits in the quantum reservoirs were reduced to smaller numbers of qubits that still offered good performance. Diminishing returns were observed with reservoirs with larger numbers of qubits.

In preparation for this work, an artificial quantum noise scheme was downsized from a physical device noise model consisting of 10 submodels of thermal relaxation, depolarization, and state preparation and measurement (SPAM) noise, to a single reset noise model. A systematic reduction approach for noise channels is not presented in this work.

Noise optimization

Dual annealing optimization and evolutionary optimization were employed for NARMA and Mackey-Glass benchmarks, respectively. Dual annealing from SciPy’s40 optimization package was used for reservoir optimization using default settings. This stochastic approach, derived from39, dualizes the generalized classical simulated annealing (CSA) and fast simulated annealing (FSA)46 with a local search strategy47. Evolutionary optimization (EO) is a population-based approach to optimization in which candidate solutions, represented as a population of agents, are initialized through random sampling. Subsequently, the fitness of each candidate solution is determined by evaluating it against a predefined objective metric. The superior solutions are then selected and utilized to generate the candidate population for the subsequent iteration. This process continues until satisfactory solutions have been identified. Here we employ the EO algorithm in Chen et al.27.

Reset noise probabilities were optimized to maximize prediction performance, as detailed in the Reservoir noise parameterization. Optimization algorithms require stopping conditions. The three stopping condition were: multiple small changes in MSE, long iteration runtime without update and the maximum number of iterations was 5, which is generally observed to be a large number for the optimization algorithms used. These stopping conditions returned reproducible final MSEs, indicating that they were near optimal for the optimization algorithms.

Simulations

The quantum reservoir circuits with artificial noise channels were simulated using Qiskit SDK48. The QASM Simulator was used with an ideal density matrix simulator. This theoretical approach allows for a simulation with a single computational run of a single QuantumCircuit object. The single-qubit Z expectation values were computed from intermediate density matrices at each time step. Simulations could not be performed with linear entanglement reservoirs larger than 12 qubits because of the demands of a density matrix simulator.

Reset noise channels are coded with Kraus instructions added directly to a QuantumCircuit. A reset_error channel is available with a single probability parameter, the target of optimization. It was passed to the Kraus instruction.

Memory capacity

Recurrency of an RC enables retention of information or a short-term memory of past signals in reservoir states. The memory capacity (MC) is a measure which quantifies this ability to retain information of the past inputs and it plays a crucial role in the prediction abilities of a reservoir computer1.

To calculate MC, first a random sequence from a uniform distribution is prepared that is appropriate to for optimized QNIR model. The minimum and maximum values of the random sequence, i.e. the scale of the values, is made to be equivalent to benchmark time series scale that the model was optimized for. QNIR is then trained to predict signals d timesteps before the input sequence of the reservoir, \(u_k\), where the target signal is \(\hat{y}_k = u_{k-d}\). The memory function (MF) is defined as the square of the Pearson correlation coefficient,

$$\begin{aligned} M\!F_d = \frac{\text {cov}^2(y_k, \hat{y}_k)}{\sigma ^2(y_k)\sigma ^2(\hat{y}_k)}, \end{aligned}$$
(13)

and the MC is then calculated as the sum of the MFs for all the delays as

$$\begin{aligned} M\!C = \sum _{d}^{} M\!F_d. \end{aligned}$$
(14)

In Results, MCs are calculated for QNIR models that were trained and optimized for the three NARMA and two Mackey-Glass systems.

Metrics

Metrics normalized mean squared error (NMSE) and normalized root-mean-square error (NRMSE) are frequently used in the relevant literature and therefore they are used here for convention and comparison. The mean absolute scaled error (MASE) metric of Hyndman and Koehler49 is used due to its many properties that allow for comparison between time series of different scales and is readily interpretable due to symmetry and linearity. Furthermore, MASE is used because we compare QNIR prediction performance with the Naive model, whose performance is better than a linear model and thus provides a more challenging reference prediction. The Naive model is one of the simplest forecasting models, in which the next time series value is predicted to be equal to the current value50.

The mean squared error (MSE) used as an optimization cost function is defined

$$\begin{aligned} \text {MSE} = \frac{1}{n} \sum _i (y_i - \hat{y}_i)^2. \end{aligned}$$
(15)

NMSE used to evaluate prediction performance is defined

$$\begin{aligned} \text {NMSE} = \frac{\sum _i (y_i - \hat{y}_i)^2}{\sum _i y_i^2}. \end{aligned}$$
(16)

NRMSE is defined

$$\begin{aligned} \text {NRMSE} = \frac{\sqrt{\frac{1}{T}\sum _t (y_i - \hat{y}_i)^2}}{\sigma (\textbf{y})} \end{aligned}$$
(17)

where \(\sigma (y)\) is the sample standard deviation of the true values. MASE forecasting metric is defined

$$\begin{aligned} \text {MASE} = \frac{\frac{1}{J} \sum _j \vert {e_j}\vert }{\frac{1}{T-1} \sum _{t=2}^T \vert {y_t - y_{t-1}} \vert } \end{aligned}$$
(18)

where \(e_j = y_j - f_j\) is the true value minus the forecasted value. The denominator is the mean absolute error (MAE) of the non-seasonal Naive out-of-sample forecast.

Results

NARMA

We show that QNIR with noise optimization has excellent theoretical performance for the nonlinear auto-regressive moving average (NARMA) sequence prediction benchmarks11,51. A NARMA regression task involves learning the nonlinear NARMA map between a fixed input sequence and a NARMA output sequence. We label the sequences NARMAN, where N is the order of the NARMA map. We consider three NARMA sequences of orders 2, 5 and 10.

Figure 7
figure 7

Plot (a) is the input sequence for all NARMA tasks, Eq. (20). Plots (bd) are QNIR training and prediction of NARMA2, 5, 10 maps, respectively. Training data sequences are between time step indices 20–80 and test data sequences are between 81 and 100. Test set prediction performance metrics are in Table 1.

The NARMA2 sequence51 is given by the recurrence relation

$$\begin{aligned} y_{t+1} = 0.4y_t + 0.4 y_t y_{t-1} + 0.6 u_t^3 + 0.1, \end{aligned}$$
(19)

where the two initial sequence values are \(\{0.196, 0.19468\}\). The input values \(u_t\) are from the smooth function

$$\begin{aligned} u_t = 0.1 \sin \left( \frac{2\pi a t}{T} \right) \sin \left( \frac{2\pi b t}{T} \right) \sin \left( \frac{2\pi c t}{T} \right) + 0.1 \end{aligned}$$
(20)

where \((a,b,c,T)=(2.11,3.73,4.11,100)\). NARMA5 and NARMA10 are described by the following general recursive function

$$\begin{aligned} y_{t+1} = \alpha y_t + \beta y_t \left( \sum _{i=0}^{n-1} y_{t-i} \right) + \gamma u_{t-( n-1)} u_t + \delta . \end{aligned}$$
(21)

For NARMA5, the initial sequence is \(\{0,0,0,0,0.196\}\) and the first four zeroes are excluded from the target sequence. The function parameters for NARMA5 are \((\alpha ,\beta ,\gamma ,\delta ,T)=(0.3, 0.05, 1.5, 0.1, 100)\). Similarly for NARMA10, the first nine values in the initial sequence are zeroes and are not included in the target sequence. The function parameters used are the same as for NARMA5. NARMA time series values were encoded directly to the angle of the encoding gates. Temporal train and test split indices are 20–80 and 81–100, respectively. The initial 20 time steps were excluded as a washout phase.

Table 1 QNIR performance metrics are explicitly compared with the Naive model, which has an out-of-sample MASE of 1 by definition (Methods: Metrics).

Excellent simulation results have been achieved for the NARMA2, 5 and 10 tasks, plotted in Fig. 7 and recorded in Table 1. The primary reason for the high quality results comes from the effectiveness of the reset noise parameterization and subsequent optimization, first implemented in this work. This effectiveness is demonstrated by a 2 orders of magnitude improvement from random initialization of reset noise probabilities to the final optimization iteration for all three NARMA tasks, as plotted in Fig. 8. For each of the three NARMA tasks, three distinct sets of optimal parameters were obtained. Information processing capacity (IPC) analysis52 has been used to show that QNIR can solve the NARMA2 task12. The excellent performance on NARMA2 in particular provides evidence that noise parameterization is suitable for approaching an optimal solution model.

In combination with reservoir noise optimization, reduction of the quantum reservoirs was performed in number of qubits and reservoir circuit complexity in terms of entanglement schemes (see "Methods"). Pair-separable (PS) and linear entanglement (LE) scheme-based reservoirs were optimally reduced to 12 (6\(\times\)2) qubits for all NARMA tasks. Reservoirs with larger number of qubits did not improve NARMA prediction performance and smaller numbers of qubits show a drop in performance, see Fig. 8. 12-qubit reservoirs are parameterized with 42 and 67 reset noise probabilities for PS and LE reservoirs, respectively. Next we consider the second dimension of reduction of entanglement schemes. For NARMA 2 and 10, the reservoir complexity and associated entanglement was reduced to a PS reservoir. This is because the results for LE reservoirs were comparable. However, for NARMA5 an LE reservoir was not reduced because of improved performance. The quantum state of the LE reservoir is non-separable due to a higher degree of entanglement.

Figure 8
figure 8

Log plots of MSE cost curves for reservoir tuning iterations for both pair-separable (PS) and linear entanglement (LE) reservoir designs. The number of qubits in both PS and LE reservoirs were increased from 4 qubits in steps of 4 qubits until 12-qubit reservoirs were selected. For NARMA2 and 10 both PS and LE reservoirs had similar final MSE values. For NARMA5 the LE reservoir provided a better option. See "Methods" for details on optimization and simulation matters.

This result strongly indicates an improvement from recent work11 in terms of reducing resource-noise requirements to a single reset noise model. Those experimental results11 have inherent measurement sampling error, however, our result demonstrates that multifaceted physical device noise is not required for the NARMA tasks defined in these works, as only reset noise channels were required here. These results may further suggest that a reset noise QNIR would be a favorable direction for more NARMA benchmarks12.

MCs of the systems were observed to be saturated at \(4.55(\pm 0.07)\), \(4.50(\pm 0.07)\), and \(4.70(\pm 0.06)\) at the delays of 8, 8, and 10 for NARMA2, 5, and 10, respectively (see Fig. 11). Confidence intervals at \(\alpha =0.05\) are indicated in brackets. The MF profiles differ from unoptimized QNIR reservoirs, which suggests structural differences in short-term memory. Unoptimized reservoirs have more sigmoid or S-shaped curves with lower MC. The NARMA10 MF plot has very clear structure in MF values that are larger for longer delays up to 10, needed for the higher order function. A partial reason for the lower NARMA10 prediction result may be the smaller MF value at \(d=10\). The memory-nonlinearity trade-off inherent in RC algorithms53 should be established and investigated in a dedicated work for QNIR to aid interpretation of these metrics.

Mackey-Glass

The Mackey-Glass (MG) nonlinear system54 is a commonly used benchmark for time series prediction that is difficult to predict due to challenging chaotic dynamics under some parameters. The Python package ReservoirPy55 was used to generate MG system time series, which are discretized using the Runge-Kutta method and initialized with a default seed value. For MG benchmarking, we extend the training sequence from 60 to 250 data points and the testing sequence from 20 to 100 data points from what was used for NARMA. This extension is designed to stress test QNIR.

The MG delay differential equation (DDE) is

$$\begin{aligned} \frac{dx}{dt} = \frac{ax(t-\tau )}{1+x(t-\tau )^n} - bx(t). \end{aligned}$$
(22)

To generate time series for benchmarking, parameters \((x_0,a,b,n) = (1.2,0.2,0.1,10)\) were used. The input and target time series are defined as \(x(t-\tau )\) and x(t), respectively. We considered two distinct, chaotic MG systems determined by integer delay values \(\tau =19\) and 25, which we denote MG19 and MG25, respectively. The generated time series were then downsampled by a factor of 2. For both downsampled MG19 and MG25 time series, chaoticity is indicated by positive Lyapunov exponents56, calculated using the nolds Python library57.

The time series were downsampled from 800 time steps to 400 time steps. Temporal train and test split indices are 20-300 and 301-400, respectively. The initial 20 time steps were excluded as a washout phase. The MG time series values were encoded directly to the angle of the encoding gates.

Table 2 QNIR performance metrics are explicitly compared with the Naive model, which has an out-of-sample MASE of 1 by definition (Methods: Metrics).
Figure 9
figure 9

The input and target time series are defined as \(x(t-\tau )\) and x(t), respectively (Eq. 22). Plots (a,b) are MG19 input and 19-step delay target time series and prediction result, respectively. The same applies for plots (c,d) for MG25. Training data sequences are between time step indices 20–300 and test data sequences are between 301 and 400. Test set prediction performance metrics are in Table 2.

Figure 10
figure 10

Log plots of MSE cost curves for reservoir tuning iterations. MSEs from initial to final iterations improved up to 2.5 orders of magnitude, demonstrating the effectiveness of noise optimization. In the two plots comparing LE and PS reservoirs, it can be seen that there is no notable difference between final MSEs for MG19 or MG25 data up to 12 qubits. By the principle of resource reduction PS reservoirs should be favored. See Methods for details on optimization and simulation matters.

Figure 11
figure 11

QNIR memory functions for NARMA and Mackey-Glass tasks over 30 trials plotted against delay d in the input signal. The colored bands correspond to the standard deviations.

We report good prediction performances, plotted in Fig. 9 and recorded in Table 2. QNIR has demonstrated prediction performances much better than the Naive model and shows promise for modeling challenging chaotic dynamics with exponential sensitivity to initial conditions. Larger reservoirs with three times the number of qubits were required for the MG modeling compared to NARMA, indicating greater prediction difficulty. However, it is worth emphasizing that 32-qubit reservoirs are still relatively small by conventional approaches.

The effectiveness of noise parameterization and optimization can be seen in Fig. 10 with the large initial drops from iteration 0 to 1, where iteration 0 reflects randomly initialized parameters. This is a main result in demonstrating the effectiveness of this noise parameterization approach.

By the reduction procedure, reservoirs were reduced to 32 (16\(\times\)2) qubits for MG19 and MG25 tasks. Increasing the number of qubits beyond these numbers saw diminishing returns and reductions below caused a drop-off in performance, as can be seen in Fig. 10. The 32-qubit reservoirs were parameterized with 112 noise probabilities. Comparable performances were obtained for both LE and PS reservoirs, therefore by reduction PS reservoirs were selected.

MCs of the systems were saturated at \(4.56(\pm 0.07)\) and \(4.55(\pm 0.08)\) at the delays of 6, and 8 for MG19 and MG25, respectively (see Fig. 11). Confidence intervals at \(\alpha =0.05\) are indicated in brackets. Since MCs for these larger MG reservoirs were similar to those utilized for the NARMA benchmark, the larger modeling complexity may be provided by the threefold number of available reservoir output signals to the linear regression layer. The MF plots suggest a small memory design for these QNIR reservoirs although an in-depth analysis of MG system would be required to confirm. Further investigation would center on memory-nonlinearity trade-off, which may explain why smaller memories were traded for greater reservoir nonlinearity for the chaotic MG time series.

Conclusions

We have demonstrated a new QNIR reservoir optimization approach that uses parameterized resource noise to address the need for quantum reservoir tuning for improved prediction performance. This parameterization approach to reservoir tuning embodies a new, general quantum circuit parameterization approach for QML models.

Benchmarking has demonstrated that resource noise parameterization, and optimization with dual annealing and evolutionary algorithm, is effective for improving prediction performance. Our simulations showed that few-qubit QNIR computers are capable of predicting nonlinear dynamics including challenging chaotic dynamics in the Mackey-Glass system. We demonstrated a significant minimization of noise resource over previous QNIR work, resulting in a single reset noise model being selected for the benchmark samples chosen in this work.

Systematic reduction of quantum resources in the number of qubits and entanglement scheme of the reservoir was employed. While reduction of entanglement scheme complexity may produce quantum circuits that are efficient to compute classically, this process is desirable when the learning task does not require quantum advantage. This is consistent with the machine learning principles of model selection and resource reduction. Furthermore, the QNIR framework is consistent with complex entanglement schemes, and therefore opens a path towards investigating quantum advantage.

Reducing quantum circuit complexity has positive implications for quantum hardware efficiency, which is critical for current quantum computers hindered by noise. Therefore, we recommend implementation on current quantum computers using error mitigation.