On-chip photonic decision maker using spontaneous mode switching in a ring laser

Efficient and accurate decision making is gaining increased importance with the rapid expansion of information communication technologies including artificial intelligence. Here, we propose and experimentally demonstrate an on-chip, integrated photonic decision maker based on a ring laser. The ring laser exhibits spontaneous switching between clockwise and counter-clockwise oscillatory dynamics; we utilize such nature to solve a multi-armed bandit problem. The spontaneous switching dynamics provides efficient exploration to find the accurate decision. On-line decision making is experimentally demonstrated including autonomous adaptation to an uncertain environment. This study paves the way for directly utilizing the fluctuating physics inherent in ring lasers, or integrated photonics technologies in general, for achieving or accelerating intelligent functionality.

algorithm; the physical dynamic itself was not directly engineered, even though a variety of dynamical features are inherent in laser systems 28 . Furthermore, the previous studies have used a long fiber optic delay line to generate chaotic waveforms 26,27 ; such use could lead to impractically large systems, inhibit stable operation, and may prevent practical deployments.
We here propose a compact (<5 mm 2 area), on-chip photonic decision maker based on a ring laser structure. Unlike previous studies 26,27 , the laser structure can generate fast, complex, but controllable dynamics at a chip scale, without a long delay line. The origin of the dynamics is a spontaneous switching phenomenon, i.e., noise-induced mode-hopping 29,30 ; the phenomenon is used for exploring an optimal solution under uncertainty. We demonstrate that optimal decision-making is efficiently achieved by opto-electronically controlling the spontaneous-switching dynamics.

principle of Ring-Laser-Based Decision-Making
Ring laser dynamics and device structure. The device structure used for decision-maker is shown in Fig. 1a. A ring laser is coupled to adjacent waveguides that are integrated on the same chip as a GaAs/AlGaAs single quantum well structure. The resonator of the ring laser supports clockwise (CW) and counter-clockwise (CCW) propagating waves, and can exhibit various operating regimes, such as bidirectional operation and bistability, depending on the pump current 31,32 . Spontaneous switching between the CW and CCW modes is an interesting dynamic that appears in the transition from the stable bidirectional regime to the bistable regime. Spontaneous switching has been regarded as an obstacle for deterministic optical switching applications [33][34][35] . Conversely, in this work, it is preferably utilized for decision making with feedback control of the CW and CCW modes, as discussed later.
The two waveguides with contact electrodes (denoted by PD i and BC i , = i 1, 2, as shown in Fig. 1a, are used for independent input/output control of the two modes in the ring laser: PD 1 and PD 2 are used as the photodetectors to monitor the intensities of the CW and CCW modes, whereas BC 1 and BC 2 with current injections are used for introducing an asymmetry and changing the dynamics of the CW and CCW modes, respectively 30 . (See Methods section for details.) We note that a similar optoelectronic control method has been used for deterministic optical switching 31 and random number generation 36 . However, unlike the previous studies, we use this method for changing statistical characteristics of spontaneous switching dynamics, as demonstrated later in detail.

Principle of decision-making.
Here, we consider a two-armed bandit (TAB) problem, i.e., the issue is to select the machine with the higher reward probability among two machines, denoted by SM 1 and SM 2 (Fig. 1b). We examine a TAB problem, the simplest MAB problem, so that we can validate the principle of the first ring-laser-based decision making. Meanwhile, the scalability of photonic decision making has been studied in the literature 25,27 , which would be applied to ring-laser-based device architectures. Our decision-making method is based on the tug-of-war (TOW) model, exhibiting highly efficient decision making compared to conventional algorithms 15,16 . Based on the model principle, we solve the TAB problem by repeating the following four steps: (i) Signal detection: The intensity level of CW and CCW outputs, denoted respectively by I CW and I CCW , are detected by photodetector PD 1 and PD 2 , respectively. (ii) Decision of the machine selection: If I CW is larger than I CCW , the decision is to select SM 1 . Otherwise, the decision is to choose SM 2 . (iii) Playing the selected machine. (iv) Learning and feedback: If a reward is provided by playing SM 1 or if a reward is not provided by playing SM 2 , the current (or voltage) applied to BC 1 is increased to facilitate the lasing in the CW mode. Consequently, the probability of selecting SM 1 slightly increases in the next decision making. On the other Figure 1. Ring-laser-based decision-making. (a) Schematic of the ring laser device coupled to waveguides with contact electrodes BC i and PD i ( = i 1, 2). The ring radius is 1 mm, and the waveguide width is approximately 2 μm. PD 1 (2) was used as the photodetectors to monitor the CW(CCW) mode intensity in the ring laser, whereas BC 1(2) was used for introducing an asymmetry and changing the mode-dynamics. (b) Setup for the proof-ofconcept experiment on ring-laser-based decision-making. The two PD signals are sent to a digital oscilloscope, and the current is applied to either of BC 1 or BC 2 according to the results of the slot machine playing. In the experiment, the slot machines are numerically simulated in the embedded signal processing unit in the oscilloscope.
where K is a gain parameter. If ≥ C 0 at the t-th play, the current = J KC The amount of C(t) is updated in accordance with the results of slot machine playing as follows: and where α ∈ [0, 1] is the memory parameter (typically, ≈0.99-0.999) 37 , and Δ is an incremental parameter (Δ = 1 in this study). Ω in Eq. (3) is determined based on the estimated reward probability P i for SM i ( = i 1, 2) from the history of the betting results. P i is given by L i /S i , where S i is the total number of times of playing SM i and L i is the number of wins in selecting SM i . Ω is then given as, The details of the derivation of Eq. (4) are shown in 15 .

Results
Optoelectronic control of spontaneous switching dynamics. In our ring laser, a spontaneous switching phenomenon used for the above decision-making method appears when the pump current J p exceeded ~1.3 times of the laser threshold current J th . Figure 2a shows the examples of the switching dynamics, where the CW and CCW intensities stochastically change due to internal laser noise. For convenience, we hereafter refer to the state of > I I CW CCW CCW CW ( ) ( ) as the CW (CCW) mode. A statistical analysis reveals that the mode switching is characterized by a characteristic time τ c ≈ 43 ns; in a timescale longer than τ c , the switching process is treated as a Poisson (random) process, and the duration time in the CW (CCW) mode obeys an exponential distribution (see Supplementary Fig. A1 for details). We refer to τ c as the correlation time of the switching process. When current J 1 to BC 1 increases with J 2 = 0, the duration time in the CW mode increases [ Fig On-chip decision making: proof-of-concept demonstration. We conducted decision-making experiments based on the controllable dynamics in the ring laser by repeating the processes (i-iv) described in the previous section. In the experimental setup shown in Fig. 1b, the two machines SM 1 and SM 2 were emulated in a computer with the reward probabilities of (P 1 , P 2 ) = (0.7, 0.3). The gain K, step Δ, and memory parameter α were set to be 1, 1, and 0.99, respectively. A machine is selected and played once, and the reward dispensed from SM 1 and SM 2 is assumed to be both 1. The goal of the experiment is to confirm whether the ring-laser-based decision maker selects SM 1 (rather than SM 2 ) since SM 1 has a higher reward probability (P 1 > P 2 ). We assume the situation of zero prior knowledge, where the sum of the two hit probabilities is unknown, unlike ref. 26 .
The experimental results on the decision-making process are displayed in Fig. 3. At first, I CW and I CCW are randomly switched when the number of plays t < 100 [ Fig. 3a(i)], suggesting the exploration to choose the best machine. The accumulated knowledge is used for estimating the reward probabilities and setting the Ω-value, and then the C-value is appropriately updated [ Fig. 3a(ii)]. The updated C-value affects the dynamics, and the dynamical state change from the switching mode to the CW mode. Consequently, the best machine (SM 1 in this case) is selected. We repeated the decision-making experiment = n 200 T times and evaluated the correct decision rate (CDR), which is defined as the ratio of the number of selecting the slot machine with higher reward probability at the t-th play in n T trials 24 . As shown in Fig. 3b, the CDR monotonically increases and approaches 1, suggesting the achievement of correct decision making.
We also conducted similar decision-making experiments with respects to different reward probabilities and parameters; we found that with appropriately tuned parameters (K and Δ), the decision-making performances www.nature.com/scientificreports www.nature.com/scientificreports/ could be comparable to existing decision-making algorithms such as a modified softmax 16 and upper confidence bound 1-tuned (UCB1-tuned) 38,39 . As shown in Fig. 4, the CDR of the ring laser-based method can exceed those of the other methods in some cases.

Discussion
Decision-making strategy and its control. In our decision-making method, the strategy for making good decisions is characterized by the probability function of inducing CW mode configured by the control parameter C(t), denoted by P CW (C). As observed in Fig. 2b, P CW (C) of the ring laser has a plateau region in the range of around −21 ≤ C ≤ 12, where P CW (C) moderately changes when C-value is changed. The plateau region  www.nature.com/scientificreports www.nature.com/scientificreports/ plays a role in explorations to estimate the reward probability (and hence an appropriate Ω-value), and can lead to a correct decision after many slot plays, as demonstrated in Fig. 4; however, it may also lead to a slow convergence of CDR. A better alternative strategy (i.e., the design of P CW (C)) satisfying both fast adaptation speed and decision accuracy can theoretically be estimated in the case when we can obtain prior knowledge on the sum of the reward probabilities, P 1 + P 2 , such as when either of two events inevitably occurs with the probabilities P 1 and = − P P 1 2 1 . Let us here assume that the value of P 1 + P 2 is a priori known and Ω in Eq. (4) is a constant value. For simplicity, we consider α = 1 and assume that the mode switching is random. Under these assumptions, we can treat the time evolution of C as a random walk. The random walk model gives an analytical expression of CDR and suggests that fast and correct decision is made when the probability distribution P CW (C) is close to 1 for C > 0 and 0 for C < 0, and steeply vary from 0 to 1 near C = 0. (See Sec. 2 of Supplementary Information).
In an actual experiment, such a P CW (C) is effectively realized by modifying the relationship between the control parameter C and J 1(2) (Eq. 1) as follows: where K 1 and K 2 are chosen such that the plateau region of P CW (C) shown in Fig. 2b is reduced and the desirable P CW (C) results. Figure 5a shows P CW (C) with (K 1 , K 2 ) = (0, 0), (5,9) and (13,17), depicted by Types I, II, and III, respectively. As predicted by the random walk model, CDR in Type III most quickly increases and the convergence value is higher than the other types, regardless of the reward probabilities P 1 and P 2 (Fig. 5b,c). Thus, we conclude that the decision-making performance can be enhanced by changing the intrinsic characteristics (P CW (C)) of the physical devices with an appropriate mode-control.
Decision-making rate. The rate of decision-making, i.e., the number of decision-making per unit time, in principle, depends on the sampling rate of the CW-and CCW-signal detections. Thus, fast decision making is possible by increasing the sampling rate; however, sampling too rapidly may degrade the accuracy of the decision making because nearly identical signal levels will be observed due to the limitation of the ring laser dynamics. It is important to know how rapidly decision making can be made without degrading the performance. In order to address this question and obtain an insight into the effect of the switching dynamics on the decision-making performance, we numerically examine decision-making processes by standard ring laser model equations 32 . See Methods section for details of the modeling. Figure 6a shows the evolution of the CDR for various values of the sampling rate 1/τ sam when (P 1 , P 2 ) = (0.7, 0.3), where is the sampling time interval of the signal detections. The CDRs at the 30th-play are shown as a function of τ sam in Fig. 6b. These numerical results clearly show that the decision-making performance (accuracy and adaptation) degrades when τ sam is much shorter than the correlation time τ c of the ring laser. Actually, the autocorrelation of the switching signals sampled at τ τ  sam c exhibits a positive value [See Supplementary Fig. A1(d)]. In the decision-making, the positive correlation may result in repetition of the same choice even when the choice is wrong. In contrast, when τ τ  sam c , the correlation becomes close to zero, which enables an exploration without repeating wrong choices. Accordingly, the sampling time interval (i.e., inverse of the decision-making rate) can be shorter up to the correlation time without degrading the performance. The correlation time can be shorter in principle, allowing faster decision making by increasing the noise strength and activating mode-hopping phenomenon. In an actual experiment, this can be achieved by coupling the laser to an external amplified spontaneous emission noise source; the experimental verification will be an interesting future study. The reward probabilities are set as (P 1 , P 2 ) = (0.6, 0.4). The CDR of the UCB1-tuned is better than the ring laserbased method for the first few ten plays, but the CDR of the ring laser-based method more quickly approaches closely to 1 even before the 100th play. In this experiment, K = 4, α = 0.99, and Δ = 1 were used. The parameters used in the modified softmax are similar to those in ref. 16 . The UCB1-tuned is a non-parameter algorithm 38,39 . (b) The CDRs at the 100th play. Here, P 2 was set to be 1 − P 1 .

summary
In this study, we proposed and experimentally demonstrated on-chip photonic decision making by an integrated ring laser. Ring lasers generate statistical characteristics regarding the CW and CCW lasing, which are optoelectronically controllable; we directly utilize such inherent spontaneous dynamics of ring lasers for decision-making functionalities. Correct decision making was successfully demonstrated with appropriate optoelectronic control of the dynamics, and it is found that the performance can be enhanced by changing the decision-making strategy with the statistical characteristics (P CW (C)). These results would open novel research perspectives of controlling complex dynamics based on environmental changes. . Types I, II, and III represent P CW (C) obtained for (K 1 , K 2 ) = (0, 0), (5,9), and (13,17), respectively. (b) Time evolution of CDR for each type. In this experiment, the reward probabilities were set as (P 1 , P 2 ) = (0.6, 0.4), and prior knowledge of + = P P 1 1 2 was assumed. As predicted by the random walk model, the CDR for Type III is superior than the other types. K = 1, α = 0.99, and Δ = 1. (c) The CDRs at the 100th play were compared as a function of the given reward probability, where Type III outperforms other cases. . P P ( , ) (0 7, 0 3) 1 2 . The CDRs were evaluated from the results of n T = 100 trials. (b) Comparison of CDR at the 30th play as a function of τ sam . The reward probabilities were set as = .
. P P ( , ) (0 7, 0 3) 1 2 and (0.6, 0.4). In this simulation, the correlation time τ c was ≈13 ns, which is indicated by dotted line. We can clearly observe that the CDRs are degraded in the regime where τ sam is smaller than τ c . www.nature.com/scientificreports www.nature.com/scientificreports/ One interesting and important future study is to increase the decision-making rate by using faster and more complex switching dynamics. In addition to the above-mentioned method on increasing the noise strength, the use of the delayed feedback structure will be useful. Interestingly, semiconductor ring lasers can exhibit chaotic switching in the GHz regimes by delayed feedback even with a short time delay 40,41 . Combination of noise-induced switching with delayed feedback instability indicates a promising research direction.
As for the ring laser structure, we emphasize that in addition to miniaturization, it would be beneficial for all optical realization of decision-making devices because all photonic components required for decision making can be monolithically integrated on a chip. Instead of the optoelectronic control methods employed in the present study, it would be interesting to use an optical injection method because ring lasers subjected to optical injection enable low power and ultrafast switching at picosecond time scales [33][34][35] .
Another interesting and important future study is to tackle larger-scale MAB problems. MAB problems can be solved based on a hierarchical TOW principle 25,27 . The decision-making based on the hierarchical principle can be achieved by using a number of independent two-choice decision-makers (for two-armed bandit problems) or using a time-division multiplexing scheme 27 . Compact ring lasers could offer a good experimental platform for implementing the hierarchical principle and addressing the MAB problems.
We believe that the combination of photonic integration technologies and competitive fluctuating dynamics, as demonstrated by the proposed ring laser, will shed light on a way toward novel photonic intelligent computing paradigms.

Methods
Device structure and operating regime. The ring laser device used in this study was fabricated in a graded-index separate-confinement-heterostructure (GRIN-SCH) single-quantum well GaAs/Al x Ga 1−x As structure, the emission wavelength of which is designed to be 850 nm. The fabricated laser device was thermally controlled by a heat-sink with an accuracy of 0.01 °C. The ring radius is 1 mm, and the waveguide width is 2 μm. In an actual device, multiple waveguides with independent electrical contacts are coupled to the ring with an angle to the cleaved facet. We used the waveguides with contacts, PD i and BC i ( = i 1, 2), as shown in Fig. 1a. The CW and CCW intensity signals are detected with PD 1 and PD 2 in the waveguide, respectively, and sent to a digital oscilloscope (Tektronix TDS7154B, bandwidth 1.5 GHz, 20 GSample/s) via the bonding wires attached to PD 1 and PD 2 . Bias contacts BC 1 and BC 2 were used for the mode-control inside the ring laser. Sending current to BC 1 and BC 2 reduces the absorption loss of the waveguide. Thus, the light coupled from the CCW(CW) mode in the ring to the waveguide is back-reflected at the BC 1(2) -side end of the waveguide and re-coupled to the ring in the CW(CCW) direction. In addition, BC 1(2) can enhance the spontaneous emission noise coupled to the CW(CCW) mode, and consequently, facilitates the laser operation in the CW(CCW) mode 31,36 . When = = J J 0 mA 1 2 , the threshold current J th of the ring laser used in the experiment was approximately estimated to be 210 mA at 25 °C. The large threshold may partly be attributed to non-optimal etching depth of the ring waveguide 32 . For J/ < . Intensity adjustment. In the experiment, the PD couplings to the CW and CCW modes are not essentially equal to each other due to an imperfect device fabrication. In order to reduce the effect of the asymmetry of the PD-couplings and appropriately evaluate the decision-making performance, the CW and CCW intensities, I CW and I CCW , were adjusted by adding constant biases so that the occurrence probability is calibrated being around 0.5 when = = J J 0 mA 1 2 . This way would realize easy tuning of both intensities, while we should also note that there is another simpler way, which is to measure either of I CW or I CCW only and adjust the switching probability to 0.5 by bias currents J 1 and J 2 without the intensity biases.
Decision-making experiment. First, the BC 1 and BC 2 were connected to a standard current source.
Discrete-valued electrical currents were applied to BC 1 or BC 2 . Then, the CW and CCW optical intensity signals for different values of J 1 and J 2 were recorded by a digital oscilloscope. In the decision-making experiment, the slot machines were numerically simulated in the embedded signal processing unit in the oscilloscope using pseudorandom numbers. The decision is immediately made based on the sampling. The controllers BC 1 and BC 2 were also connected to a two-channel function generator (Tektronix AFG3152C), which reconfigures the oscillation dynamics of the ring laser in an on-line or real-time manner. Rate-equation model for semiconductor ring laser. The numerical simulation was conducted by using a set of dimensionless semiclassical equations for the two slowly varying complex amplitudes of CW and CCW waves, E 1 and E 2 32 .  where α  accounts for phase-amplitude coupling, s and c are the self-and cross-saturation coefficients, and k 1,2 represents the complex backscattering coefficients. We model internal optical noises as complex Gaussian noise satisfying η 〈 〉 = 0 i and η η δ δ 〈 ′〉 = − ′ www.nature.com/scientificreports www.nature.com/scientificreports/ where μ is the dimensionless pumping power (μ = 1 at the laser threshold). In the above equations, t is dimensionless time rescaled by photon lifetime τ p . γ is the ration of τ p to carrier lifetime τ s . In Eq. (6), the asymmetric coupling caused by activating BC 1 and BC 2 is simply modeled as an asymmetric backreflection effect such that β = k k b 1 1 and β = k k b 2 2 , where k b denotes the backreflection coefficient when = C 0, and β 1,2 denotes a dimensionless asymmetry parameter, depending on C as follows: where C(t) is updated by Eq. (2). This is the simplest model of the asymmetric backscattering, although a real asymmetry may be introduced in a more complex way in the actual experiment. We confirmed that regardless of the details of the asymmetry model, the control of spontaneous switching can be achieved. The detailed investigation using more realistic model will be a future work.
In this study, we set some of the parameters as follows: α = . In the decision-making simulation, we assume that the slot machines provide a reward without any time delay and use Eqs (2)(3)(4) and (6)(7)(8)(9).