Optimizing quantum noise-induced reservoir computing for nonlinear and chaotic time series prediction

Quantum reservoir computing is strongly emerging for sequential and time series data prediction in quantum machine learning. We make advancements to the quantum noise-induced reservoir, in which reservoir noise is used as a resource to generate expressive, nonlinear signals that are efficiently learned with a single linear output layer. We address the need for quantum reservoir tuning with a novel and generally applicable approach to quantum circuit parameterization, in which tunable noise models are programmed to the quantum reservoir circuit to be fully controlled for effective optimization. Our systematic approach also involves reductions in quantum reservoir circuits in the number of qubits and entanglement scheme complexity. We show that with only a single noise model and small memory capacities, excellent simulation results were obtained on nonlinear benchmarks that include the Mackey-Glass system for 100 steps ahead in the challenging chaotic regime.


I. INTRODUCTION
Reservoir computing (RC) derives from artificial recurrent neural networks (RNNs) and neural network models from neuroscience at the beginning of the millennium.RC is proven to be a universal approximator for sequential functions [1].Seminal works on RC can be found in [2][3][4] and authoritative reviews in [5][6][7].
Quantum reservoir computing (QRC) is an algorithm of study in the new field of quantum machine learning (QML) where quantum systems are utilized as reservoirs.There are currently many different QRC approaches.One type that is the focus of this work is QRC with noisy, dissipative quantum reservoirs [8][9][10].In contrast, another approach derives from hybrid quantum-classical RNNs, such as QLSTM or QGRU, and involves training the final layer of weights [11].Quantum Extreme Learning Machines are being investigated [6].Another class of QRC approaches is characterized by reservoir nodes realized with basis states [12].All QRC approaches rely on a state space of a quantum reservoir that scales exponentially in the number of qubits, 2 n , providing a large, non-linear feature space to generate rich output signals.
QRC has been applied to many prediction tasks such as signal processing, time series prediction and classification, speech recognition, natural language processing, sequential motor control of robots, and stock market predictions.Prior work has focused on optimizing RC for chaotic time series prediction [13] and optimizing QRC for financial time series forecasting (S&P 500 index) [14], an area that is investigated in this work.In quantum in-formation science, QRC has been used for entanglement recognition, non-linear function estimation and quantum state tomography [15][16][17].
Numerical studies have shown that quantum systems consisting of 5-7 qubits possess computational capabilities comparable to conventional recurrent neural networks of 100-500 nodes [25].Small quantum systems have demonstrated significant computational capacities [26].In this work we have further demonstrated the capabilities of few qubit reservoirs.Superior QRC capabilities is an area of active research [26,27].In general, quantum advantage is a measurable performance improvement over classical computation on a well-defined objective task (e.g. a business time series prediction task) using quantum computation [28].Quantum advantage with QRC likely exists if the quantum reservoir requires a complex, many-qubit entangled architecture or non-Clifford architecture that is intractable to classical computation.
An evolutionary algorithm is one of the optimization routines used to optimize the reservoir noise.Evolutionary optimization (EO) can be used to optimize various quantum computing components at the various levels.For example, the use of EO in optimizing quantum circuit architecture or quantum circuit parameters can be found in these recent works [29][30][31].A particular EO algorithm is used in this work due to previous success [32], in which model parameters were evolved for quantum reinforcement learning agents in a hybrid quantumclassical neural network approach.
Our perspective is that noise channels are implemented as quantum circuit operations but our approach may also be relevant to quantum hardware noise optimization.An example of the latter is the subject of this work [33], where the authors consider using evolutionary optimization of a single layer of quantum nodes in a universal quantum reservoir computer, with most nodes coupled via random and uncontrolled quantum tunnelling.
This paper is organized as follows: Section II A contains a theoretical overview of the quantum noise-induced reservoir computer; Section II B explains the architecture and purpose of the unitary gates of the quantum circuit, the high-level structure of the noisy quantum circuits, the entanglement scheme, and proposed reservoir circuit truncation; in Section II C a simple argument is given for how noise is a paramount resource and why the type of noise used is very important; Section II D contains details on the artificial noise channels and noise optimization strategy; and in Section II E a resource efficient reset noise implementation is proposed.Section III contains the benchmarking methodology used for univariate NARMA and Mackey-Glass benchmarks, on which the results are presented in Section IV.

A. Theoretical overview
In this work we build on the quantum noise-induced reservoir (QNIR) framework [8,9] where the QNIR reservoir is optimized and reservoir circuit complexity is reduced as determined by the resource requirements of a learning problem.QNIR is a type of QRC that relies on quantum hardware noise or artificial noise as a resource to generate rich, dissipative quantum reservoir dynamics.In the current, transitional phase of noisy, intermediate scale quantum computing, QNIR can use inherent hardware noise.In future strongly error-mitigated and faulttolerant quantum computers, QNIR noise channels can be implemented directly to quantum circuits.Indeed, in this work we focus on artificial noise models programmed to a quantum computer and treat noise probabilities as hyperparameters that are optimized for improved prediction performance.
We develop QNIR theory starting from general RC theory.RC is a computational paradigm and class of machine learning algorithms that derives from recurrent neural networks (RNNs).RC involves mapping input signals, or time series sequences, into higher dimensional feature spaces provided by the dynamics of a non-linear system with fixed coupling constants, called a reservoir.Having a smaller number of trainable weights confined to a single output layer is a core benefit of RC because it makes training fast and efficient compared to RNNs.RC has a number of requirements that should be met [34].Chief among these are adequate reservoir dimensionality, nonlinearity and memory; a fading memory/echo state property and response separability.
For the univariate case, a reservoir, f , is a recurrent function of an input sequence, u t , and prior reservoir output sequences, xt−1 , as From these output sequences, training sequences are selected between time-steps t = t i and t = t f , and form a training design matrix, X tr .A multiple linear regression model is then trained by least squares, calculated by Penrose pseudo-inverse of the linear equation: where y is the target vector and W is an initial weight vector.The trained model has the form with an optimized weight vector, W T opt , to give a predicted sequence, ŷ, from new sequences, X.For QNIR with artificial noise channels, the RC framework that has been developed is now instantiated in the following way.The density operator evolves in time steps as where the reservoir map T ut is composed of a sequence unitary quantum gates, U i , and associated artificial noise channel, E i , that are completely positive and trace preserving (CPTP).The reservoir map can be represented as a composition of quantum channels where the notation E Ui = E i (U i ρU † i ) is used for clarity and to emphasize that each quantum gate is acted on by a noisy channel.We will refer to T ut as a noisy quantum circuit.QNIR has an initial washout phase , t < t i , where the reservoir forgets its initial state before a steady state is reached.This exemplifies the echo state property.
The unitary, noiseless part of the quantum circuit is composed of an initial layer of RX gates followed by an entanglement scheme of RZZ i,j gates, which are 2-qubit entangling gates where all RX and RZ rotation gates encode the time series data with a linear scaling map θ = φ(u).The purpose and structure of the unitary encoding gates is detailed in Section II B.
Expectation values, Z i = T r(Z i ρ), are measured for all n qubits individually at each time-step to give n output sequences from the reservoir, as shown in a circuit diagram in Fig.It is important in RC and by extension QRC that the reservoir system can capture the temporal dynamics of the target system.To ensure this we implement a hyperparameter optimization scheme for QNIR.The artificial noise channels, E i , of the quantum reservoir circuit are iteratively updated by an optimization routine with an MSE cost function based on the time series prediction performance.This serves to optimize the quantum reservoir noise for time series prediction.
A particular type of noise channel is necessary for an initial state of the quantum reservoir, i.e. there is a dependency between the prepared initial state and a noise type needed to generate feature sequences.Within this framework reservoir circuits with various entanglement schemes can be used.
In this drawing the first layer contains an array of duplicates of a single time series value.Each value in the input array are encoded to all qubits of the reservoir as in Eq. 6.The second layer is a quantum reservoir with arbitrary entanglement scheme, represented by connecting lines between qubit nodes.The Z observable expectation value, Zi , is measured for all qubits.These measurements are repeated and concatenated to build output signals, hi, as in Eq. 7. In the final layer these signals are used in multiple linear regression for time series prediction, as in Eq. 3.

B. Reservoir circuit designs
This Section is concerned with the following: The architecture and purpose of the unitary gates of the quantum circuit, the high-level structure of the noisy quantum circuits, the entanglement scheme, and finally reservoir circuit truncation determined by the reservoir memory capacity.The details of the noise scheme are covered in Section II D.
The initial state of the quantum reservoir, |+ ⊗n , is prepared by an initial Hadamard gate layer.Continuing with Eq. 6, an n-qubit QNIR reservoir circuit has a fixed sequence of quantum gates where i, j are two qubit indices that denote the placement of multiple 2-qubit RZZ entangling gates.The decomposed form of the circuit with CX and RZ gates [36] is implemented with noise channels (see Section II D).A time series data value, u, is encoded to all RX(θ) and RZZ(θ) gates by angle θ = φ(u), where φ is a linear scaling map.
To implement the recurrent architecture of QNIR a set of N quantum circuits are executed for a time series {u t } N t=0 .The first circuit encodes {u 0 }, the second circuit encodes {u 0 , u 1 }, and the N th circuit encodes {u t } N t=0 as All unitaries U t for arbitrary t constrain the i expectation values to a zero bitstring where |Φ = |+ ⊗n is the initial reservoir state and Z i represents n single-qubit Z measurement operators.It is the action of particular noise type that ensures the qubit signals are non-zero, feature sequences, as in Eq. 7. Now considering the full QNIR circuits with artificial noise, the noisy quantum circuit for the final iteration, encoding {u t } N t=0 , is the quantum channel The noisy quantum circuit with artificial noise scheme will be detailed in Section II D.
For RZZ i,j gates, the degree of entanglement between qubits i and j is a function of u t .It is important that the range of magnitudes of the data values is constrained, i.e. that the data is scaled before encoding as rotation angles.We observe that values much larger than 2π cause undesirable effects.
We propose that the QNIR reservoir circuits can be truncated based on the short-term memory capacity.This would favorably limit circuit depth to a number of time-steps, i.e. a window, w, after which the memory capacity saturates and further circuit depth is not utilized.Therefore the noisy quantum circuits would be where the window is a number of iterations from i to i+w, encoding time series values {u i , u i+1 , ..., u i+w }.The resource saving for circuit depth is potentially large, as the window can be small compared to the time series length.We do not implement circuit truncation in this work but include it here as part of the resource-reduced QNIR scheme.

C. Noise as a resource
Circuit noise is a necessary resource for QNIR.For a particular initial state of the quantum reservoir a particular noise type is required.Without loss of generality, we explain how noise is used as a resource by considering the expectation value of the Z measurement oper-ator, Z = T r(Zρ), and how it evolves due to singlequbit noise channels.A single-qubit quantum channel is a function C : C 2×2 → C 2×2 that maps between density operators, ρ ∈ C 2×2 , which is ensured by C being a completely positive and trace preserving (CPTP) map.The Z operator has a set of two eigenvalues {1, −1} and therefore the expectation value has a maximum of 1 and a minimum of −1, Z ∈ [1, −1].In the following we will compare how amplitude damping (AD) and depolarization noise channels have different effects on the evolution of Z and how to identify a required noise channel for an initial reservoir state.
Consider an arbitrary Hermitian density operator The expectation value of ρ is Z ρ = a − c and alternatively 1|.An arbitrary density operator evolves by an AD channel as The expectation value after the AD channel is Comparing Z before and after AD allows us to derive an inequality where probabilities c, γ ∈ [0, 1], therefore the inequality holds.This tells us that Z after AD is always larger, except for the special case where the initial state of the qubit is |0 when AD has no effect.In other words after an AD channel the probability to measure the qubit state |0 has increased.If the initial reservoir state is |0 ⊗n then AD noise will have no effect on Z .We compare the AD channel with a depolarization channel, which for a single qubit is where probabilities sum to one, p ρ +p I = 1.An arbitrary density operator evolves by depolarization as The expectation value Comparing Z before and after depolarization allows us to derive a relation This tells us that Z is shrunk towards 0, by a scalar equal to the probability p ρ ∈ [0, 1], except for the special case where the initial state of the qubit is |+ or more generally on the Bloch equator (|0 + e iθ |1 )/ √ 2, where a depolarization channel has no effect on Z .This means depolarization noise is not suitable for the initial state that we use for the quantum reservoir in this work, |+ ⊗n .If the initial state of the quantum reservoir circuit was not on the Bloch equator but for example on the poles, |0 ⊗n or |1 ⊗n , then depolarization noise will evolve the expectation value.
While AD and depolarization channels are discussed here, the argument generalizes to any channels that drives Z away from its initial value.The key implication is that it is necessary that a type of noise is present to evolve the expectation value from its initial value, otherwise noise will have no effect and QNIR will generate invariant Z i signals.Many types of noise may be present but it is necessary that one can evolve the expectation value.Furthermore, it appears to be beneficial for the expectation values to have a larger range over which to be driven, perhaps at least Z ρ = 0 to Z E → +1.
In QNIR we need to understand the cumulative effect of noise over N time-step iterations.Considering Eq. 14, the expectation value tends to 1, Z E AD → 1, as the probability P (|0 ) tends to 1.By composing E AD operations N times and taking the limit of the probability to measure in the ground state, lim we see the limit tends to 1 as the second term tends to 0. Considering the second term factor containing γ, the rate at which P (|0 ) tends to 1 depends on the magnitude of γ, where γ a > γ b and a, b are qubit indices.This implies that i qubit expectation values Z i , evolved by different AD probabilities, γ i , will converge at different rates.
In conjunction with this convergence property, encoding scaled data value sequences to the reservoir circuits prevents convergence to |0 and induces a steady state, allowing the sequential composition of non-trivial feature sequences, {h i (ρ t )} N t=0 .The data values need to be scaled to a magnitude that is in a similar range to the noise signal.Further work is required to understand these dynamics.

D. Artificial noise scheme and optimization
QNIR uses noise as a necessary resource to generate non-trivial feature sequences.We use artificial noise that can be programmed to a quantum computer.There are many noise models that can be implemented in different ways to produce different effects.
To implement a noise scheme we associate parameterized, single-qubit noise channels with each unitary gate in the quantum circuit, Eq. 6, as shown in Fig. 4. In the following we assume each noise channel depends on a single noise parameter.
Noise channels are associated with all quantum gates, {RX, CX, RZ}, in the reservoir circuit in Fig. 4.Each noise channel E(p) is a function of a probability for the noise effect to occur.We use probabilities, p i , to parameterize the reservoir for optimization.The number of probability parameters scales linearly with the number of qubits.For pair-separable entanglement reservoir the number of parameters is n pi = 7  2 n, where n = 2, 4, 6, ..., and for linear entangled reservoir n pi = 6n − 5, where n = 2, 3, 4, .... QNIR resource-noise optimization is performed through iterative training (Eq.2) and testing (Eq. 3) of QNIR, giving optimized noise probability parameters, p i ∈ p (see Fig. 5).The parameters in the initial parameter vector, p, are randomly selected from a uniform distribution on the unit interval, p i ∼ U (0, 1), ∀i.
Two optimization approaches were trialed in this work, evolutionary optimization [32] and dual annealing [40], where the latter is implemented and available in the SciPy optimization package [41].The mean squared error (MSE) was used as a suitable cost function to measure prediction performance, which is minimized as where ŷ = W T opt X(p) is the QNIR test set prediction and X(p) are the reservoir signals matrix dependent on noise probabilities p.
In this work we use only reset noise channels that can be simply implemented with a classical ancilla system (Section II E).

E. Reset noise implementation
We propose an implementation of a reset noise channel by probabilistically triggering a reset instruction [42] using a classical ancillary system.Reset noise can serve as a necessary noise type for QNIR with initial state |+ ⊗n .Only reset to |0 is considered.Reset noise is similar to AD noise and can approximate it to high precision [43].This reset noise channel is given by E P R (ρ) = p|0 0| + (1 − p)ρ, where p is the reset probability [43].E P R (ρ) is trace-preserving, T r(E P R (ρ)) = 1.IBM Quantum systems support dynamic circuits, enabling a reset instruction to be executed by a mid-circuit measurement and classically controlled quantum X gate that depends on the measurement outcome [44].Future quantum computers will likely implement a reset instruction in the same way, due to the general importance of dynamic circuits and the simplicity of the operation.
Xp FIG. 6.A deterministic RESET instruction (left) is executed with this dynamic circuit on IBM Quantum systems.This can be extended for a physical implementation of a reset noise channel, EP R, which is identically a probabilistic RESET instruction.This can be done with an extended classical ancillary system (right).The classical NOT gate, Xp, is executed with probability p, which in turn triggers a classical controlled RESET instruction with probability p.
To implement a reset noise channel, a random number generator can be used to activate a probabilistic trigger of a reset instruction, Fig. 6.In this way artificial reset noise is implemented without ancilla qubits.Ancilla qubits are an undesirable overhead, especially where unitary gates require potentially many corresponding noise channels.

A. Reservoir complexity reduction
Reservoir complexity reduction was performed for each benchmark to reduce quantum resource footprint and prevent overfitting.This involved reservoir reductions of 1. Number of noise models 2. Entanglement scheme complexity

Number of qubits
The artificial quantum noise scheme was reduced from a number of composed noise models consisting of thermal relaxation, depolarization and SPAM noise, to a single reset noise model that is a necessary noise type for this QNIR with initial state |+ ⊗n .Entanglement scheme complexity means quantum circuit complexity [45] and is determined by the circuit depth of the entanglement scheme and the number of gates, i.e., it is the cost of the quantum circuit.Linear entanglement schemes were trialed first for both benchmarks but did not offer an improvement compared to pair-separable entanglement schemes that were finally settled on.
The number of qubits in the quantum reservoirs were reduced to a smaller number of qubits that still offered good performance.Diminishing returns were observed with reservoirs with larger numbers of qubits.

B. Noise optimization
Dual annealing optimization and evolutionary optimization were employed for NARMA and Mackey-Glass benchmarks, respectively.
Dual annealing from SciPy's [41] optimization package was used for reservoir optimization using default settings.This stochastic approach, derived from [40], dualizes the generalized classical simulated annealing (CSA) and fast simulated annealing (FSA) [46] with a local search strategy [47].
Evolutionary optimization (EO) is a population-based approach to optimization in which candidate solutions, represented as a population of agents, are initialized through random sampling.Subsequently, the fitness of each candidate solution is determined by evaluating it against a predefined objective metric.The superior solutions are then selected and utilized to generate the candidate population for the subsequent iteration.This process continues until satisfactory solutions have been identified.The EO algorithm in [32] is employed here.
Reset noise probabilities were optimized to maximize prediction performance, as detailed in Section II D.

C. Simulations
The quantum reservoir circuits with artificial noise channels were simulated using Qiskit SDK [48], with an ideal density matrix backend simulator.This theoretical approach allows for single-shot, single-circuit simulations.The single-qubit, Z expectation values were computed from intermediate density matrices at each time step.

D. Memory capacity
Presence of recurrent nodes in the reservoir computer implies that they can retain information of the past signals in their internal states.Memory capacity is a measure which quantifies this ability to retain information of the past inputs and it plays a crucial role in the prediction abilities of a reservoir computer [2].
The memory capacity of a reservoir is calculated by first creating a random sequence in the range appropriate to the reservoir.The minimum and maximum of this range correspond to those of the input benchmark time series to the QNIR reservoir.The reservoir computer is first trained to predict the signals d timesteps before the current state of the reservoir (x k ), that is, the target signal is given by ŷk = u k−d .The memory function is defined as the square of the Pearson correlation coefficient and the memory capacity is then calculated as the sum of the memory functions for all the delays as In section IV, we calculate the memory function, as well as, the memory capacity for both the NARMA and the Mackey-Glass systems.

E. Metrics
Metrics NMSE and NRMSE are frequently used in the relevant Literature and so they are used here for convention and comparison.The MASE metric of Hyndman and Koehler [49] is used as it has many properties that allow for comparison between time series of different scales and is readily interpretable due to symmetry and linearity.Furthermore, MASE is used because we compare QNIR prediction performance with a Naive model, which has good reference performance that is better than a linear model.See Section VI A for further details.

A. NARMA
We show QNIR with noise optimization has excellent theoretical performance for Nonlinear Auto-Regressive Moving Average (NARMA) sequence prediction benchmark [8,50].A NARMA regression task involves learning the NARMA map between a fixed input sequence and a NARMA output sequence.We label these NARMAn, where n is the order of the NARMA map.We consider three NARMA sequences of orders 2, 5 and 10.It is important to highlight that it been shown in previous work that the NARMA2 task can be solved by a natural noise QNR [8] and our results show that with noise optimization the attainable performance is indeed very high.
The NARMA2 sequence [50] is given by the recurrence relation where the two initial sequence values are {0.196,0.19468}.The input values u t are from the smooth function where (a, b, c, T ) = (2.11,3.73, 4.11, 100).NARMA5 and NARMA10 are described by the following general recursive function For NARMA5 the initial sequence is {0, 0, 0, 0, 0.196} and the first four zeroes are excluded from the target sequence.The function parameters for NARMA5 are (α, β, γ, T ) = (0.3, 0.05, 1.5, 0.1, 100).Similarly for NARMA10, the first nine values in the initial sequence are zeroes and are not included in the target sequence.The function parameters used are the same as for NARMA5.
Temporal train and test split indices are 20-80 and 81-100, respectively.The initial 20 time steps were excluded as a washout phase.
Reservoirs with larger numbers of qubits were trialed and reduced to 12 or 6×2-qubit reservoir for all three NARMA benchmarks.A 12-qubit reservoir is parameterized with 42 reset noise probabilities.For each of the  three NARMA tasks, three distinct sets of optimal parameters were obtained.For these three optimized models, we calculated the memory functions and memory capacities of the reservoir.The results for memory function from 30 trials are shown in Fig. 8.The memory capacities of the systems were observed to be saturated approximately to 4.55(±0.07),3.81(±0.08),and 4.70(±0.06)at the delays of 8, 6, and 10 for NARMA2, 5, and 10, respectively.Note, the numbers in the bracket indicate the confidence intervals at α = 0.05.The theoretical results achieved for the NARMA2 task, plotted in Fig. 7 and recorded in Table I, present a notable improvement from recent work [8] in terms of quantum model complexity, particularly in the reduction of the number of reservoir qubits from 520 to 12.In addition, in terms of circuit complexity with artificial noise, is worth noting that there is a reduction from 10 noise models to 1 reset noise model.Our result also compares favorably with an echo state network of 110 nodes [8].
Information processing capacity (IPC) analysis [8] has shown that QNIR can be used to solve the NARMA2 task.The excellent performance on the NARMA2 task in this work demonstrates the efficacy of QNIR optimization.Furthermore, due to the quantum model com- plexity reduction, it was found that this QNIR model is tractable to classical simulation, therefore there is a strong indication that the NARMA tasks targeted in this work are not candidates for quantum advantage.

B. Mackey-Glass
The Mackey-Glass (MG) system [51] is a commonly used benchmark for time series prediction that is difficult to predict due to chaotic dynamics.The Python package ReservoirPy [52] was used to generate MG system time series, which are discretized using the Runge-Kutta method and initialized with a default seed value.For MG benchmarking we extend the training sequence from 60 to 250 data points and the testing sequence from 20 to 100 data points, from what was used for NARMA.
The MG delay differential equation (DDE) is To generate time series for benchmarking, parameters  II.
(x 0 , a, b, n) = (1.2,0.2, 0.1, 10) were used.The input and target time series are defined as x(t − τ ) and x(t), respectively.We considered two distinct, chaotic MG systems determined by integer delay values τ = 19 and 25, which we denote MG19 and MG25, respectively.The generated time series were then downsampled by a factor of 2. For both downsampled MG19 and MG25 time series, chaoticity is indicated by positive Lyapunov exponents [53], calculated using the Python nolds library [54].
The time series were downsampled from 800 time steps to 400 time steps.Temporal train and test split indices are 20-300 and 301-400, respectively.The initial 20 time steps were excluded as a washout phase.Reservoirs were reduced in size to 32 or 16×2 qubits and 36 or 18×2 qubits, for MG19 and MG25 tasks, respectively.Further reductions caused a drop-off in performance.The 32 and 36-qubit reservoirs were parameterized with 112 and 126 reset noise probabilities, respectively.For these two optimized models, we calculated the memory functions and memory capacities of the reservoirs.The results for memory function from 30 trials are shown in Fig. 10.The memory capacities of the systems were observed to be saturated approximately to 4.56(±0.07)and 4.55(±0.08)at the delays of 6, and 8 for MG19 and MG25, respectively.Note, the numbers in the bracket indicate the confidence intervals at α = 0.05.
We report good prediction performances, plotted in Fig. 9 and recorded in Table II.QNIR has demonstrated prediction performances much better than Naive models and shows promise for modelling chaotic dynamics.Larger reservoirs with three times the number of qubits were required for the MG benchmarks compared to NARMA, indicating greater prediction difficulty.However, it is worth emphasizing that 32 and 36qubit reservoirs are still relatively small by conventional approaches.Since memory capacities for these larger reservoirs were similar to those used for the NARMA benchmark, it stands to reason that the larger modelling complexity was provided by the threefold number of qubits and reservoir output signals.

V. CONCLUSIONS & DISCUSSION
We proposed the first QNIR scheme that uses parameterized resource noise as a target of optimization for improving model performance.This is a new resource noise optimization approach and, furthermore, embodies a new non-unitary quantum circuit optimization approach.Our benchmarking demonstrated that this optimization approach is effective for improving prediction performance.
Our simulations showed that few-qubit QNIR computers are capable of predicting NARMA nonlinear dynamics and Mackey-Glass chaotic dynamics.We demonstrated complexity reductions in multiple dimensions of the reservoir circuits, one of which is a significant reduction in artificial noise models, resulting in a single and necessary reset noise model being selected for the benchmark samples chosen in this work.Moreover, we proposed a theoretical reset noise model implementation that further reduces quantum resources by employing a fully classical ancillary system.While our reduction of entanglement scheme complexity may produce quantum circuits that are efficient to compute classically, this process is desirable when the learning task does not require quantum advantage.This is consistent with the ML principle of resource reduction.Furthermore, our QNIR framework is consistent with complex entanglement schemes, and therefore opens a path towards investigating quantum advantage for different tasks.
We recommend to build on this work by analysing quantum circuit symmetries and parameter importance with the purpose of reducing the number of circuit optimization parameters.
Reducing quantum circuit complexity has positive implications for hardware efficiency, which is critical for the state-of-the-art quantum computers currently hindered by noise.Therefore, we recommend implementation on near-term quantum computers with error mitigation.

A. Metrics
The mean squared error (MSE) used as an optimization cost function is defined The normalized mean squared error (NMSE) used to evaluate prediction performance is defined The normalized root-mean-square error (NRMSE) is defined NRMSE = 1 T t (y i − ŷi ) 2 σ(y) (30) where σ(y) is the sample standard deviation of the true values.The mean absolute scaled error (MASE) forecasting metric is defined where e j = y j − f j is the true value minus the forecasted value.The denominator is the non-seasonal naive forecast.A MASE value less than 1 means the scored model has a lower MSE than a naive forecast.
FIG. 7. Plot (a) is the input sequence for all NARMA tasks, Eq. 25.Plots (b-d) are QNIR training and prediction of NARMA2, 5, 10 maps, respectively.Training data sequences are between time step indicies 20-80 and test data sequences are between 81-100.Test set prediction performance metrics are in TableI.

FIG. 8 .
FIG. 8. Memory functions for NARMA2, 5, and 10 sequences plotted as a function of the delay d in the input signal.The colored bands correspond to the standard deviations in the calculation.
FIG. 9.The input and target time series are defined as x(t − τ ) and x(t), respectively (Eq.27).Plots (a-b) are MG19 input and 19-step delay target time series and prediction result, respectively.The same applies for plots (c-d) for MG25.Training data sequences are between time step indicies 20-300 and test data sequences are between 301-400.Test set prediction performance metrics are in TableII.

FIG. 10 .
FIG. 10.Memory functions for Mackey Glass systems (MG19 and MG25) time series plotted as a function of the delay d in the input signal.The colored bands correspond to the standard deviations in the calculation.
This graphic shows the QNIR noise optimization scheme.The quantum model is trained and tested iteratively in a classical optimization loop, where dual annealing or evolutionary optimization are used.The quantum reservoir circuits have a number of gate-associated noise channels, each of which has a single error probability parameter that is iteratively updated.

TABLE II .
QNIR performance metrics are explicitly compared with a Naive model, which has a MASE of 1 by definition (see Appendix Section VI A).MSE values are minimum optimization cost values.