Adaptive model selection in photonic reservoir computing by reinforcement learning

Photonic reservoir computing is an emergent technology toward beyond-Neumann computing. Although photonic reservoir computing provides superior performance in environments whose characteristics are coincident with the training datasets for the reservoir, the performance is significantly degraded if these characteristics deviate from the original knowledge used in the training phase. Here, we propose a scheme of adaptive model selection in photonic reservoir computing using reinforcement learning. In this scheme, a temporal waveform is generated by different dynamic source models that change over time. The system autonomously identifies the best source model for the task of time series prediction using photonic reservoir computing and reinforcement learning. We prepare two types of output weights for the source models, and the system adaptively selected the correct model using reinforcement learning, where the prediction errors are associated with rewards. We succeed in adaptive model selection when the source signal is temporally mixed, having originally been generated by two different dynamic system models, as well as when the signal is a mixture from the same model but with different parameter values. This study paves the way for autonomous behavior in photonic artificial intelligence and could lead to new applications in load forecasting and multi-objective control, where frequent environment changes are expected.


Introduction
Reservoir computing involves information processing based on recurrent neural networks 1 .This method is known to be suitable for temporal or sequential information processing, such as time series prediction 2 and speech recognition 3 .In reservoir computing, the input data to be processed are fed into a recurrent neural network, which is called a reservoir.The reservoir network produces a transient response when the input signal is injected.The reservoir computing processing result is the weighted linear sum of the node states in the reservoir.The main characteristic of reservoir computing is that the input weights and reservoir are fixed, being specified by the physical characteristics of the reservoir, while the output weights are trained.These characteristics significantly reduce the computational cost of learning compared with those of standard recurrent neural networks.
Nonlinear mapping of the input data into a high-dimensional space is required to achieve reservoir functionality for successful computation 4 .This functionality can be realized using other nonlinear dynamic systems instead of recurrent neural networks.Reservoir computing based on various types of nonlinear dynamic systems has been proposed [5][6][7][8][9] .Photonic implementation of reservoir computing is one example, where a semiconductor laser with a delayed feedback loop is used as a reservoir [10][11][12][13] .One of the advantages of photonic reservoir computing is that it enables the realization of fast information processing with low learning cost using established optoelectronic devices.It has been reported that speech recognition at a rate of 1.1 Gb/s can be achieved using photonic reservoir computing 12 .
Reservoir computing can, however, only adapt to input signals that are used to train the output weights of the reservoir.
In other words, reservoir computing does not work well if the incoming signals do not correspond with the training datasets.
In reality, environmental conditions may change the characteristics of the observations, which could induce variations of the input that are different from the original knowledge used in the training phase.Additionally, it is assumed that the input signals could be generated by many different dynamic source models and the source model is dynamically switched in time or the signal is a mixture of different source models.It may be difficult to train the reservoir computing system to produce the correct outputs for all different models or arbitrary environmental conditions.
To solve this serious issue, we propose a scheme of reservoir computing combined with reinforcement learning in this study.In this scheme, training is conducted with respect to individual input signals generated by a designated model.Hence, multiple output weights of the reservoir are obtained, corresponding to the different types of source signals in the training phase.In the task execution phase, one of the output weights of the reservoir is selected such that the minimum prediction error for the given input signals is achieved by reinforcement learning.This adaptive model selection scheme is expected to be useful for applications such as load forecasting 14 , multi-objective control 15 , and signal recovery in communication 16 when environmental changes or diverse types of input signals are expected; hence, the preparation of multiple output weights of the reservoir prior to execution and dynamic model selection would be highly effective.
Decision making using reinforcement learning is a machine learning scheme concerned with the problem of training an action policy to maximize the total reward 17 .The multi-armed bandit (MAB) problem is a fundamental problem in reinforcement learning, whose goal is to maximize the total reward when agents select one of multiple slot machines with unknown hit probabilities in finite trials 17 .The idea of adaptive model selection stems from associating the slot machines in the MAB problem with the trained output weights of the reservoir.Therefore, the strategy used to solve the MAB problem [18][19][20] could be effective in adaptive model selection.Furthermore, several methods of photonic decision making have been demonstrated with operation in the gigahertz regime by utilizing chaotic laser time series 21 .Notably, both reservoir computing and dynamic model selection can be performed on a photonic platform for ultrafast operation 22,23 .
In this study, we numerically demonstrate adaptive model selection using decision making based on chaotic laser outputs in photonic reservoir computing with reinforcement learning.We consider a situation in which the input signal is generated by one of two dynamic models, specifically, the Lorenz model 24 or Rössler model 25 , and the input signal is switched in time between the two models to mimic environmental changes.We train the reservoir using the time series generated by either one of the two models and prepare two types of output weights for the reservoir corresponding to the two models.We perform time series prediction of the input signal using reservoir computing.Generally, if the output weights of a reservoir do not correspond to the characteristics of the actual input signals, for instance due to environmental changes, a larger prediction error is obtained.In reinforcement learning, action policies are trained based on rewards, and the prediction errors in reservoir computing are regarded as rewards in this study.The proposed scheme autonomously changes the output weights of the reservoir according to the given input signals to reduce the prediction error.We numerically demonstrate correct adaptive model selection for different configurations of the dynamic models.

Adaptive model selection based on decision making in photonic reservoir computing
We propose a scheme for adaptive model selection based on decision making in photonic reservoir computing.Figure 1 schematically illustrates the architecture of the proposed approach.The scheme comprises three parts: photonic reservoir computing, reinforcement learning, and generation of chaotic laser outputs.In this study, we numerically implement photonic reservoir computing and reinforcement learning.We use experimentally generated chaotic temporal waveforms of the laser outputs for reinforcement learning in the numerical simulations.The photonic reservoir computing system consists of a semiconductor laser with optical feedback.(See the Methods section for details.)In this scheme, chaotic time series prediction is numerically performed using photonic reservoir computing, where a predicted signal is generated using two dynamical models: the Lorenz 24 and Rössler models 25 .Considering the situation in which the source of the input signal changes over time, mimicking environmental changes, single-point prediction is performed using photonic reservoir computing.Two types of reservoir output weights are prepared, which are trained by chaotic time series generated separately using the Lorenz and Rössler models.Two predicted time series are generated based on the two output weights.
In the adaptive model selection, the prediction errors for the two output weights are utilized with the objective of determining which model should be used for time series prediction.

Figure 1.
Schematic diagram of adaptive model selection using reservoir computing and reinforcement learning.The system comprises three parts: photonic reservoir computing, reinforcement learning, and chaotic laser system.LD is laser diode, PM is phase modulator, CIRC is optical circulator, ATT is optical attenuator, FC is optical fiber coupler, PD is photodetector, ISO is optical isolator, and OSC is digital oscilloscope.
The input chaotic time series is denoted by () (Fig. 1).The task of the reservoir is to conduct a single-point prediction of (); that is, the reservoir computing predicts ( + 1) when () is injected into the reservoir.Two types of output weights are trained separately using the time series from the two models and are represented as  1 and  2 .The reservoir produces two predicted outputs,  1 () and  2 () , using output weights  1 and  2 , respectively.

Calculation of prediction errors
Change

Chaotic laser outputs
Adaptive model selection is performed by determining whether the prediction error 1 () or 2 () is smaller.
The method of decision making based on chaotic laser output is employed to select one of the two output weights 21,23 .
The smaller prediction error is determined by comparing 1 () and 2 (), and () is changed accordingly.If () is changed via the threshold adjuster () and is defined as follows: ⌊ ()⌋ is the nearest integer to () rounded to 0. In this study, ⌊ ()⌋ was assumed to take the values   , … 1,0,1 … ,   , where   is a natural number.Hence, the number of thresholds is 2  + 1.The threshold number and  in Eq. (1) determine the range of ().The range of () is limited from   to   by setting ()   when () >   and ()   when () <   .() is changed based on the relationship between the magnitudes of 1 () and 2 () as follows: where  is referred to as the forgetting (memory) parameter 27,28 .A large value of  means that the dynamics of () holds memory of the initial value of ().In this scheme, the sum of the hit probabilities of the two slot machines (models) is supposed to be fixed at 1, because one of the two models is always selected.Therefore, the threshold shift is fixed at 1.A temporal waveform of chaotic laser outputs used for decision making was experimentally obtained from a semiconductor laser with optical feedback 21 .The semiconductor laser was subjected to delayed optical feedback by using an external fiber reflector, inducing chaotic temporal waveforms in the intensity of the laser output 26 .The chaotic output was detected using a photodetector and sampled by a high-speed digital oscilloscope.The sampling interval of the digital oscilloscope was 10 ps, and the chaotic laser output was sampled at this interval.In this study, decision making is performed at a sampling interval of 50 ps, because it has been reported that this sampling interval yields the best performance due to the existence of a negative correlation 21 .
The vertical resolution of the digital oscilloscope was 8 bits, and the sampled data had 8-bit resolution.In the decision making method, the chaotic data sampled by the oscilloscope are compared to ().We thus limited the range of () to 128 ≤ () ≤ 128.To determine the shift of (),   8 and  16 were used in this study.The number of threshold levels was 2  + 1 17.

Adaptive model selection between Rössler and Lorenz models
We numerically demonstrate adaptive model selection based on decision making in chaotic time series prediction.To generate a prediction target, we use two models, the Rössler and Lorenz models, which are well-known models that can produce chaotic behaviors (see the Methods section for details).A time series is generated using one of the two models, and the models are switched over time.Figure 2 shows the input signals produced by the two models.The first 500 points of the time series are generated by the Lorenz model, which is then switched to the Rössler model for the next 500 points.
After that, the model is periodically switched every 500 points.respectively.Figures 3(c) and 3(f) depict the prediction errors 1 () and 2 (), respectively.These figures are enlarged in 300 ≤  ≤ 700, which includes the switching of the time series from the Lorenz model to the Rössler model at  500 . 2 () < 1 () when 300 <  < 500 , where the prediction target is the Lorenz model.Meanwhile, 1 () < 2 () when 500 < n < 700, where the prediction target is the Rössler model.These results indicate that the error of a predicted waveform generated using the output weight corresponding to the prediction target is smaller.
An example of adaptive model selection is provided in Fig. 4, where the prediction target is the Lorenz model.Figure 4(a) shows the difference between the two errors, ∆ () 1 () 2 ().The relationship between the magnitudes of the errors can be determined from ∆ ().A positive value of ∆ () indicates that 2 () < 1 () in Fig. 4(a).The temporal evolution of () is shown in Fig. 4(b).() for decision making varies based on ∆ () and increases to 128 after fluctuating around 0 at a small time step.The predicted output is selected by comparing () with the chaotic laser output, and the selection result is shown in Fig. 4(c).When () fluctuates around 0 in Fig. 4(b), either  1 () or  2 () may be selected.After () reaches 128, only  2 () can be selected.Thus, the predicted output corresponding to the target (the Lorenz model) is selected successfully.Figure 5 shows the temporal evolution of CMSR(), which increases quickly to 1 after the prediction begins.When

units] Time n [arb. units]
the target model is switched at  500 , 1,000 , and 1,500 , CMSR() decreases to 0. After the switch, CMSR() increases to 1 again.Therefore, the correct model is selected adaptively under model switching (i.e., environmental changes).In addition, we note that the switching of the model selection may randomly occur in general situations.The cases of the model selection at different switching times are present in the Supplementary Information.

Adaptive model selection with mixed time series from Rössler and Lorenz models
Adaptive model selection in the Rössler and Lorenz models is a simple case because the difference between 1 () and 2 () is large, as shown in Fig. 4. In this subsection, a more difficult case of adaptive model selection is described, in which a mixed time series is used for the prediction target, as shown in Fig. 6  The temporal evolution of ∆ (), (), and the selected sequence of the predicted output are summarized in Figs.CMSR() is calculated to examine the adaptation ability of model selection in the mixed time series.Figure 8(a) shows the temporal evolution of CMSR(), and an enlarged view is provided in Fig. 8(b).The target time series is shown in Fig. 6(b), which is obtained by alternating between  1 and  2 every 500 points at  0.8.The red curve represents CMSR() in the case of the mixed time series.The black curve is the same as that in Fig. 5 and is included for comparison with the mixed time series.The black curve corresponds to the case in which  1 and  2 are switched for  1.0 .CMSR() quickly increases to 1 in both curves after the prediction begins.When the model is switched at  500, 1,000, and 1,500, CMSR() decreases to 0. However, CMSR() quickly increases to 1 after the switch.Therefore, the correct model is selected adaptively with environmental changes, which means successful model selection.For the mixed time series case, the difference between 1 () and 2 () fluctuates around 0, as shown in Fig. 7(b).The fluctuation of ∆ () around 0 results in a slower increase of CMSR() .However, CMSR() becomes 1, and the correct model is selected successfully.Rössler models every 500 steps, as shown in Fig. 2.
The possibility of model selection in the mixed time series is investigated while changing a.

Adaptive model selection between Rössler models with different parameter values
In the previous two cases, adaptive model selection between two different models (the Rössler and Lorenz models) is investigated.In this subsection, Rössler models with different parameter values are considered, where the parameter value change corresponds to model switching, as shown in Fig. 10(a).This situation of parameter switching is expected to be more difficult than the case of switching between models.The dependence of model selection on the value of  is investigated using different values of  1 , with  2 fixed at 0.6.In this case, the target model is fixed to the Rössler model with   1 .Figure 12(a) shows CMSR() as a function of  1 at time steps  100 (black curve) and  300 (red curve).We focus on how the difference between  1 and  2 is related to the speed of adaptation in model selection.In Fig. 12(a), CMSR() is small near  1  2 0.6 .
CMSR() increases and approaches 1 as the difference between  1 and  2 increases.Therefore, if the two parameter values are apart from each other, the correct model can be selected.In addition, the adaptation speed increases as the difference between  1 and  2 increases.Lyapunov exponent is positive in three regions: 0 <  < 0.04, 0.12 <  < 0.26, and 0.36 <  < 0.70, except when  0.51 and 0.625, where periodic windows are observed.In Fig. 12(a), CMSR() at  100 does not reach 1 in certain regions of  1 (e.g.,  1 0.2 ).However, CMSR() becomes more than 0.99 at  100 when 0.04 ≤  1 ≤ 0.12 and 0.

Conclusions
We proposed an adaptive model selection scheme using reinforcement learning for applications in photonic reservoir computing.Two types of time series were generated using the Rössler and Lorenz models and were exchanged over time to emulate dynamic environmental changes of the incoming signals.We prepared two types of output weights for the Rössler and Lorenz models prior to execution of the prediction task and identified one of the two models for accurate time series prediction using photonic reservoir computing.We succeeded in identifying the correct model adaptively using the prediction errors as rewards in reinforcement learning.The adaptive model selection was also achieved in the case of a mixed time series obtained from the Lorenz and Rössler models with different ratios.We also investigated the adaptive selection of Rössler models with different parameter values.The model selection became easier as the difference between the two parameter values increased.Although two models in reservoir computing were considered in the present study, scalable architecture should be possible; indeed, our former work 29 demonstrated a solution for bandit problems with up to 64 arms using chaotic time series.We consider that constructing a single universal reservoir computing model that can deal with any possible input is most likely impossible; hence, dynamic and autonomous model selection will be a promising means of expanding the computing abilities of photonic artificial intelligence.
for reservoir computing, and the input signal is injected into the reservoir via feedback phase modulation 33 .The last term () on the right-hand side of Eq. ( 6) represents the effect of spontaneous emission noise.() is the normalized white Gaussian noise with the properties 〈 ()〉 0 and 〈 ( 0 ) ()〉 (  0 ), where 〈⋅〉 denotes the ensemble average and  is the Dirac delta function.

Chaotic dynamical models for generating prediction targets
The prediction targets were generated by the Rössler and Lorenz models, which are well-known models that can generate deterministic chaos.The temporal dynamics of the Rössler and Lorenz models are represented in the following equations. For In the Rössler model, parameter  was set to 0.2 unless otherwise specified.Variables   and   were used for the prediction test.The time series of   and   were normalized with their variances so as not to be identifiable based on knowledge of the amplitudes of the time series.
1 () is smaller (larger) than 2 () , then () is decreased (increased).The change in () increases the probability of selecting the predicted output with the smaller error.By repeating the change in () based on the comparison of 1 () and 2 (), () becomes much smaller or larger than the probability distribution of the chaotic laser output, and only one of p1(n) or p2(n) is selected.Then, the correct values of w1 and w2 for adaptive model selection are determined.

Figure 2 .Figure 3 .
Figure 2. Temporal waveform generated using the Lorenz and Rössler models.The first 500 points of the waveform are produced using the Lorenz model, and the model is switched every 500 points.

Figure 4 .
Figure 4. (a) Time series of the differences between two prediction errors Δ () 1 () 2 ().(b) Temporal dynamics of () for decision making.(c) Models selected by decision making in each step.The Lorenz model is selected for  > 14.

Figure 5 .
Figure 5. Correct model selection rate (CMSR) in adaptive model selection based on decision making in time series prediction.The models are switched between the Lorenz and Rössler models every 500 steps, as shown in Fig. 2.
7(a), 7(b), and 7(c), respectively.To obtain the mixed time series, a is fixed at 0.8 and the ratio of the Lorenz model is larger than that of the Rössler model.Initially, ∆ () fluctuates around 0, as can be seen in Fig.7(a), indicating that it to identify the correct model.However, the threshold reaches 128 approximately when  > 35, as shown in Fig.7(b), although ∆ () fluctuates around 0. Only  2 () is selected after the threshold reaches 128.In other words, correct model selection is achieved in the mixed time series since  2 () corresponds to the Lorenz model, whose waveform is dominant in the input signal.

Figure 8 .
Figure 8.(a) Correct model selection rate (CMSR) in the time series prediction task.(b) Enlarged view of (a).The red curve represents the case in which the input signal is a mixed time series consisting of  1 and  2 , as shown in Fig. 6(b).For comparison, the black curve represents the case in which the input signal is switched between the Lorenz and

Figure 9 showsFigure 9 .
Figure 9. Correct model selection rate (CMSR) when  300 as a function of .The correct selection is the Rössler model for  < 0.5 and the Lorenz model for  ≥ 0.5.

Figure 10 (Figure 10 .
Figure 10.(a) Schematic diagram of switching in the case of the Rössler model with different values of .(b) Temporalwaveform generated from the Rössler model with   1 0.2 and   2 0.6.In the first 500 points of the time series,  2 is used, and switching between  2 and  1 is performed every 500 points.

Figure 11 .
Figure 11.Correct model selection rate (CMSR) in the time series prediction task.The target model is the Rössler model with parameter values  1 and  2 , and the time series is shown in Fig. 10(b). is changed every 500 points.

Figure 12 .
Figure 12.(a) Correct model selection rate (CMSR) at  100 (black curve) and  300 (red curve) as a function of  1 for the Rössler model. 2 is fixed at 0.6.(b) Bifurcation diagram of the Rössler model as a function of .Local maxima in a time series of   are plotted in the bifurcation diagram.(c) The maximum Lyapunov exponent is plotted as a function of .
(a).A mixed time series is generated by the Rössler and Lorenz models, where the two kinds of time series are mixed with different ratios.The two mixed time series are given by  1   + (1 )  and  2 (1 )   +   , where   and   represent the time series generated by the Rössler and Lorenz models, respectively, and the coefficient a is the ratio of the Lorenz model in the mixed time series.The mixed time series is shown in Fig. 6(b), which is obtained with  fixed to 0.8 and  1 and  2 used alternately every 500 points.The time series generated by the Rössler and Lorenz models are used to train  1 and  2 , respectively.The aim of this model selection using the mixed time series is to select the time series corresponding to the model with the larger , that is,  1 for the Lorenz model and  2 for the Rössler model at  0.8.
26≤  1 ≤ 0.36, where the dynamics is periodic.Although CMSR() at  100 does not reach 1 when  1 ≥ 0.70, it approaches 1 when  300.Therefore, the adaptation speed is slow if the temporal dynamics at   1 and   2 are chaotic, and it is fast if the dynamics of the two target models are different (e.g., chaotic and periodic oscillations).
the Rössler model,

Table 1 .
Parameter values used in numerical simulations