Abstract
Photonic neuromorphic computing is of particular interest due to its significant potential for ultrahigh computing speed and energy efficiency. The advantage of photonic computing hardware lies in its ultrawide bandwidth and parallel processing utilizing inherent parallelism. Here, we demonstrate a scalable onchip photonic implementation of a simplified recurrent neural network, called a reservoir computer, using an integrated coherent linear photonic processor. In contrast to previous approaches, both the input and recurrent weights are encoded in the spatiotemporal domain by photonic linear processing, which enables scalable and ultrafast computing beyond the input electrical bandwidth. As the device can process multiple wavelength inputs over the telecom Cband simultaneously, we can use ultrawide optical bandwidth (~5 terahertz) as a computational resource. Experiments for the standard benchmarks showed good performance for chaotic timeseries forecasting and image classification. The device is considered to be able to perform 21.12 tera multiplication–accumulation operations per second (MAC ∙ s^{−1}) for each wavelength and can reach petascale computation speed on a single photonic chip by using wavelength division multiplexing. Our results are challenging for conventional Turing–von Neumann machines, and they confirm the great potential of photonic neuromorphic processing towards petascale neuromorphic supercomputing on a photonic chip.
Similar content being viewed by others
Introduction
Nowadays, machine learning techniques are advancing at a tremendous speed^{1}, and their applications for artificial intelligence (AI) systems are penetrating society. This is motivating the development of specialpurpose AI hardware such as applicationspecific integrated circuits (ASICs) and fieldprogrammable gate arrays (FPGAs)^{2,3}, which provide much faster and more energyefficient computational resources. Recently, photonic implementations of artificial neural networks (ANNs) are attracting interest because they have great potential to reduce operational power, increase speed, and reduce latency beyond what is possible in electronic computing^{4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}. Optical circuits can perform a largescale multiplyaccumulate (MAC) operation—a dominant factor in ANN computation—with ultrahigh processing speed thanks to their ultrawide bandwidth (terahertz region) and inherent parallelism in space, time, phase, and wavelength domains. As this operation is executed simply by light propagation and interference, the principal energy consumption is very small. Although the basic operation of this alternative computation was originally proposed in the early 1970s^{22,23,24}, the research was suspended due to the rapid development of electronic largescaleintegration (LSI) technology and an AI winter. The revival of AI through deep learning technology has led to a rethinking of neuromorphic photonic systems. Recent progress in photonic integrated circuits^{25}, which were developed for telecom applications, enables us to implement photonic ANN compactly^{4}. This will boost the research of photonic ANN for alternative computational devices.
Coherent linear photonic processors^{4,26,27} are key engines for implementing such a computation system on compact optical chips. These processors are composed of an array of Mach–Zehnder interferometers (MZIs), which enables various types of optical topologies. As pioneered by Shen et al.,^{4} it is possible to map the mathematical description of a neural network onto a photonic chip with external nonlinear devices. Up to now, photonic circuits have been reported for various ANN models, such as fully connected multilayer perceptrons^{4,5,6}, spiking neural networks^{7}, convolutional neural networks^{8}, and recurrent neural networks, including reservoir computing (RC)^{9,10,11,12,13,14,15,16,17,18,19,20,21}.
Among them, RC^{28,29,30} is gaining attention for its affinity with optical elements and its excellent performance. The standard deep neural network (DNN) requires fine tuning of each weight through the use of the error back propagation algorithm^{31}, which in turn requires highly accurate and uniform largescale integration of tunable optical elements, which is a very challenging issue for fabrication. In addition, the training time of a photonic ANN is generally much longer than that with electrical devices due to the slow response of the phase shifters in the MZIs (on the order of milliseconds for thermooptic devices). Therefore, standard photonic ANNs are effective only at the inference stage. On the other hand, in the RC framework, only output weights are trained, which is carried out by using a linear regression scheme. The other weights are fixed randomly. Thus, there is no need for any fine tuning of the optical system under training. As the training time is determined by forward propagation in RC, it can be accelerated by using photonics. In spite of the simple training, DNNs based on photonics have shown excellent performance comparable to that of standard DNNs on a series of benchmark tasks, such as speech recognition^{20}, economic forecasting^{32}, action detection in movie data^{33}, and telecom signal compensation^{18,19,28}. Apart from optics, many physical implementations have been reported such as spintronic devices^{34}.
However, most conventional demonstrations have been limited to implementations of a middle recurrent layer called a “reservoir”. The scalability for chip integration is highly limited due to their simple architectures (less than 23 neurons in the reservoir^{14,15,16}). In addition, the bandwidth is limited by the electrical components used for the nonlinear activation functions and input–output (I/O) frontend, such as responselasers (gigahertz order)^{12,15} and spatial light modulators (hertz order)^{17}. As the slowest one determines the operating time, the potential bandwidth of light (>THz) is limited to the electrical bandwidth. As a result, the computation speed, the distinguishing feature of photonic RC, is still considered to be below or at the same level as that of commercial electrical hardware such as central processing units (CPUs) or graphics processing units (GPUs). Note that there has not been a clear discussion in the literature. Our estimations are discussed in “Discussion” section.
In this paper, we propose a scalable photonic implementation of an RC system based on coherent linear photonic processors driven by an optical pulse source. In contrast to previous approaches, both the input and reservoir weights are optically encoded in the spatiotemporal domain, which enables scalable integration on a compact chip. The ultrafast optical pulse source upconverts the input signals to the higher frequency region beyond the electrical bandwidth. In the linear photonic processor, we can operate the photonic signals beyond the electrical bandwidth limitation. Although the nonlinear activation functions are only implemented in the input and output, the complexvalued evolution in coherent systems ensures rich dynamics comparable to that of incoherent nonlinear systems as described in the ref. ^{13}. We also demonstrate parallel processing based on wavelength division multiplexing (WDM). This enables the use of ultrawide optical bandwidth (>THz) as a computational resource, which boosts the computation efficiency beyond the computation bottleneck remaining from the I/O bandwidth. Experiments with the standard benchmarks showed good performance for chaotic timeseries forecasting and image classifications with ultrafast processing speed of 17.1 ns per image. The device can achieve ten TMAC/s for each wavelength and can reach petascale computation speed on a single photonic chip by using WDM. Our results are challenging for conventional Turing–von Neumann machines, and they significantly advance photonic neuromorphic processing towards petascale neuromorphic supercomputing on a photonic chip.
Results
Basic operation principle
Figure 1a shows the proposed device architecture for photonic RC. In this system, the radiofrequency (RF) input signal u(t) with modulation time interval T is coded to the amplitude of the electromagnetic field of optical pulse, δ(t), by an optical Mach–Zehnder modulator (MZM). Then, the ultrafast optical pulse upconverts the RFsignal to the optical frequency, and this electrooptic conversion results in a sinusoidal nonlinearity. The modulated optical amplitude u′(t) is described as
where V_{o} is offset voltage, and γ and V_{π} are the characteristic voltages of the MZM. Then, the converted signals u′(t) are input to first stage of the linear optical circuits for the spatiotemporal inputmasking. In this part, they are split into Nbranches and then transmitted through different delay lines with a delay differential θ. The θ is set to satisfy the relationship θ = T/N, where N is the virtual node count in a single optical cavity^{11,12,35}. Then, they are weighted by optical cross connecting units. This means that the input signals spread along time and space division with complexvalued weights. In general, the masked response s_{l}(t) with continuous time representation is described as
where h_{l}(τ) is an impulse response to the lth output port for the input mask circuit. For the comparison with the digital mask operation^{11,12,35}, we consider a discretized time t(n,i) corresponding to each interval of duration θ and T: t′(n, i) = nT + iθ, where n ∈ Z and i ∈ [1; N]. By considering the ideal optical impulse with repeating time of T, the discretized expression of (1) is described as
where m_{l,i} ∈ [0; 1] and Ψ_{l,i}∈ [0; 2π] are the amplitude and phase delay from ith delay line to lth spatial. Thus, the input mask circuit acts as alloptical complexvalued spatiotemporal input weight generator. The matrix shape h_{l,I} and its programmability depend on the type of optical cross connects. More detailed explanations for the input mask circuit are described in the Supplementary Note 1 and Supplementary Fig. 1. The masked optical signals s_{l}(t) are input to the second stage of the optical processor. In this part, the integrated Larray of coherent cavities acts as spatially parallel delaybased optical reservoir with complexvalued evolution. As discussed in the refs. ^{36,37}, a parallel reservoir computer increase the virtual node count in the reservoir, which enhance the performance. The evolution equation of the complexvalued amplitude inside the lth reservoir cavity is given by
where T_{l} is the roundtrip time, α_{l} is the feedback gain that is reconfigured through a programmable optical attenuator inside the cavity, β_{l} is the transmission coefficient of the input fiber coupler, j is an imaginary unit, and φ_{1} is the phase detuning of the cavity. The subscript l denotes that the parameters are for lth cavity. The T_{l} is typically set to T_{l} = TN/(Nq), where 0 ≤ q < N. Then, the continuous time evolution in (2) can be approximated by the following discrete time evolution equations:
Thus, the parallel optical cavities act as parallel reservoir cavities. Although the nonlinear activation functions are only implemented in the input and output, the complexvalued evolution in coherent systems ensures rich dynamics comparable to that in incoherent nonlinear systems, as described in^{13}. The signals from the reservoir cavity are directly detected by photodiodes (PDs) installed after each cavity. Their dynamics are sampled by using an analogtodigital converter or oscilloscope with a sampling interval of θ′. Their discretized dynamic responses are considered as the squared norm of complexvalued virtual node response \(x_{\rm{l}}\left( {n,i} \right)^2\), where n′ ∈ [1; N′(=Nθ′/θ)] and N′ are the measurable nodes. The outputs y(n′) are obtained from weighted summation of \(x_l\left( {n\prime ,i} \right)^2\), which is described as:
where ω_{l,i} are trainable readout weights, which are determined by minimizing the mean square error using the Tikhonov regularization or standard gradient descent method. We can obtain the y(n′) value with time interval T, which is the same as the input one.
The equilibrium network architecture is also shown in Fig. 1a. As the proposed architecture can integrate the spatiotemporal RC system on a compact chip, it is more scalable than the conventional onchip integration^{13,14,15}. In contrast to previous digital preprocessing^{11,12,35}, we can set the short virtual node interval θ beyond the RF input sampling rates by setting a short delay difference and using an ultrafast pulse source. For instance, we can achieve the ultrawideband optical computing on a single photonic chip by employing the femtosecond (>THz) or attosecond pulse lasers (>PHz). However, in this condition, the measurement of the reservoir state and post processing are difficult to perform due to the output sampling rate θ′. Thus, it remains as the final bottleneck. Although the output bandwidth still remains bottleneck to access the ultrawide bandwidth of light, we can overcome it by using WDM and thus use the ultrawide optical bandwidth (>THz) as a computational resource to boost the computation efficiency much more. The details are described in subsection “Parallel data processing using WDM”.
Integrated coherent linear photonic processor
As the proposed system requires coherent interference for the MAC operation, we need to accurately maintain the phases and delays in the optics. In addition, the system needs functional optics, including couplers, phase shifters, variable optical attenuators (VOAs), and MZIs. Thus, it is very difficult to build this optics with bulk fiber optics. Here, we employ our photonic platform technology, called the planar lightwave circuit (PLC), to integrate such a complex system on a compact chip. The PLC is a silicabased waveguide technology that has excellent features for composing functional optical devices: low transmittance loss (~0.02 dB/cm), mass fabrication using standard wafer processes, excellent stability (>10year operation^{38}), many available lineup functional components, including interferometers and delay lines^{39}, and a wide operating bandwidth (visible^{40} to midinfrared wavelengths^{38,39}). Thanks to these features, PLC devices have already been installed as optical systems for optical fiber links and have also been utilized as advanced circuits for fundamental science such as a quantum photonics^{41} and an optical lattice clock^{42}. Here, we apply this technology to the photonic RC for the first time.
As a first demonstration, we integrated the above described linear optical system with N = 32 and L = 16 into the PLC. Thus, there are 512 virtual neurons in the circuit. Figure 1b show a schematic of the PLC layout for the input mask. For simplicity, the case for the RC with N = 8 is illustrated. The delay difference θ in the input mask circuit is set to ~8.3 ps (1/120 GHz) by adjusting the waveguide length. As this delay value corresponds to the optical path length of ~2.49 mm, it can be accurately implemented to the PLC by using a standard lithography technique (submicrometer accuracy). For the splitting and weighting of the signal, the MZIs are integrated in the mask circuits. Each MZI has two heaters to adjust its phase by using the thermosoptic effect. The MZIs apply the following unitary conversion:
where ξ_{1} and ξ_{2} are the phase shift of MZIs shown in the inset of Fig. 1b. By cascading the MZIs, we can realize various N × N optical topologies. Here, we illustrate three types of possible installations at the bottom of Fig. 1b. The simplest case is to use a mirror of a 1:N variable splitter as shown in Type I in Fig. 1b^{43}. This filter supports the arbitrary dense connection to the center output port with only an (N − 1) array of MZIs . However, the connection to the other spatial port becomes sparse. This means that the hl,I for l = L/2 is dense and fully programmable, but the other ports are sparse and not programmable. The dense connection can be realized by using an N(N − 1)/2 array of MZIs as shown in Type II in Fig. 1b, which is called a universal unitary operator^{44,45}. Although this architecture can realize dense and fully programmable connections to the output ports, only unitary conversion is supported. The arbitrary matrix connection is realized by combining 2N(N − 1) MZIs as shown in Type III in Fig. 1b^{4,46,47}. This network uses a physical instantiation of the singular value decomposition, which is a factorization of any matrix (M) as M = UΣV^{†}, where U is an N × N unitary matrix; Σ is an N × N diagonal, rectangular matrix of nonnegative real numbers; and V is an N × N unitary matrix. Here, two universal unitary circuits (U, V^{†}) are connected by a column of single MZIs that are used as variable attenuators implementing Σ. Although the densely connected input mask is preferred to achieve better RC performance, the dense unitary conversion requires 496 MZIs for N = 32, which is challenging for the first trial due to the complex wiring and operation. Thus, in this study we employed Type I, which only requires 31 MZIs.
Figure 1c show a schematic of reservoir coherent cavities. For simplicity, the case for RC with L = 4 is illustrated in this figure. A VOA, phase shifter, and variable coupler is installed in each cavity. A previous coherent cavitybased RC^{13} was implemented in an optical fiber ring with a length of over 200 m. In contrast, we can integrate the system on a compact optical chip thanks to the small θ value. The previous fiber cavity was unstable in terms of temperature and vibration, and thus required the huge thermal isolation and an external feedback operation to stabilize the device. On the other hand, our cavity can be compactly implemented on an optical chip by using silicabased optical waveguide technology. Moreover, it is highly stable and does not require any feedback operation thanks to the telecomgrade PLC technology. The cavity length is set to (6.0 + 0.4 × l) cm, where l is the cavity number. The corresponding roundtrip time is (~290 + 20 × l) ps. In the reservoir PLC, the coupling ratio to the passive cavity is set to 10/90, which corresponds to β in Eq. (11) is (0.1)^{1/2} = 0.316. The α_{l} ∈ [0, α_{l},_{max}] and φ ∈ [0, 2π] for each optical layer is also reconfigurable by tuning the MZIs on the RC chips. The achievable α_{l},_{max} value depends on the cavity number due to the loss increase from the intersection points, which is almost completely determined by \({\upalpha}_{{\rm{l,max}}} = 0.87  0.021 \times l\). This dependence could be reduced by optimizing the waveguide design. Figure 2 shows the appearance of our optical RC circuit fabricated by the PLC technique. The input mask and reservoir circuit were fabricated on the separate chips for the ease of characterization. These chips are connected by the optical fiber array. The footprint of the input and reservoir circuit were 41 × 46 and 28 × 47 mm^{2}, respectively.
Here, we performed experiments to examine the fabricated chip’s performance. First, we input the ASE light to the PLC for input mask and observed the transmittance spectrum to confirm the filtering performance. In this setup, two neighboring port were coupled to the center output port with the same optical power. Figure 3a shows the experimentally obtained filtering shape of the fabricated input mask circuit for the center output port (number 16) over the wavelength range of 1530–1565 nm. A magnified view is shown in Fig. 3b. For this experiment, the delay difference was set to θ as shown in Fig. 3e. As can be seen in Fig. 3a, flat and repeated wavelength dependence is obtained, suggesting the circuit can be used for an ultrawideband optical processor. The observed extinction ratio of 20 dB suggests the good performance of the fabricated MZIs. The observed wavelength duration of 120 GHz suggests accurate implementation of θ. By changing the interfering port as shown in Fig. 3f, we observed a change in the wavelength duration of 60 GHz as expected [Fig. 3c]. When we randomly set the input mask parameter [Fig. 3g], we can set the random filtering shape as shown in Fig. 3d. Thus, the fabricated circuits can be used for the optical mask circuit. Next, we examined the time domain response and the stability of the fabricated chip. For this experiment, we input the masked signal into the center coherent cavity (number 8) to check the overall characteristics of the RC system as shown in Fig. 3k. Figure 3i shows the reservoir response of cavity number 8 before and after the 3h operation. The input signal [Fig. 3h] and the output error [Fig. 3j] are also plotted. In this experiment, we randomly generated the mask function and α and φ were set to 0.5 and 0.1π, respectively. A continuous coherent pulse with repeating time of 512 ps [top of Fig. 3e] was input to the system The response of the center output port was observed with an oscilloscope. As shown in this figure, we can successfully generate the time domain response along the time domain thanks to the input mask and reservoir circuit [Fig. 3i]. As is well known, a coherent system is highly sensitive to the changes in the optical phase due to changes in the surrounding temperature or vibration. Thus, the previous coherent cavitybased RC requires a complex feedback system to stabilize the RC outputs^{13}. On the other hand, thanks to the solidstate circuit technology, the reservoir response for the fabricated circuit is stable over a period of three hours without any feedback operation, which is enough time to execute most RC tasks.
Chaotic timeseries expectation
We used the SantaFe timeseries prediction task^{48} to evaluate the RC performance. The aim of this task is to perform onestepahead prediction of chaotic data. The chaotic laser data were generated from a farinfrared laser. We used 2000 steps for training and 1000 steps for testing. For the training, we utilized the same teacher signal d(n) = u(n + 1). The training was done by standard Tikhonov regression. The amplitude of the Santa Fe time series was normalized so that the input signal u(n) of the Santa Fe time series ranged from 0 to 1.
First, we compared the bandwidth of input signal u(t) and reservoir output x(t)^{2} to confirm our concept described above. We used the center cavity (lane 8) for this experiment. The relative feedback phase was set to Δϕ_{l} = 0.1π. Figure 4a shows the measured bandwidth of input data u(t) of SantaFe chaotic timeseries and reservoir output x_{l}(t)^{2}. As can be seen in this figure, the bandwidth of the RF input is elongated thanks to the optical pulse modulation. Thus, we can execute RC processing beyond the input RF bandwidth. The observed maximum bandwidth was limited by the bandwidth of the oscilloscope (~20 GHz) in this experiment. By utilizing an electrical system with higher bandwidth, the signals can be upconverted to a much higher frequency region with the same optical circuits.
Next, we examined the performance on the chaotictimeseries expectation of our RC for a single cavity. Figure 4b shows the normalize mean square error (NMSE) as a function of feedback gain. For comparison, the simulation performance for the passive cavity and optoelectric reservoir with N = 16 are also plotted in the figure. For this simulation, we estimated the NMSE using Eqs. (3) and (5) with q = 1. The impulse response was set randomly to keep the following constraint: Σh_{l, i}^{2} = 1. The feedback gain was swept from zero to one, and the relative feedback phase was set to Δϕ_{l} = 0.1π, the same as the physical setup. The experimental results well agree with the simulation, suggesting the good performance of our constructed optical circuit. As shown in the figure, the NMSE highly depended on the spectral radius of the reservoir connection (i.e., feedback gain in this case). This behavior was also observed in previous RC systems, and it implies the enhanced memory capacity in the large feedback condition. The observed NMSE was comparable to the optoelectric RC^{11} in spite of the lack of the nonlinear element in the cavity.
Next, we investigated the performance using multiple cavities. The reservoir parameters were randomly set for each cavity. The measured reservoir responses of all the cavities are shown in Fig. 4c. Various responses were observed from each cavity. From these responses, we estimated the NMSE as a function of the number of parallel cavities as shown in Fig. 4d. The NMSE is monotonically reduced by adding the cavities, and NMSE of 0.06 is achieved, which is superior to that of a previous onchip RC^{15} thanks to the increment of virtual nodes. This result indicates that the virtual nodes obtained from the parallel cavities are more effective for the chaotic timeseries prediction task than those obtained from a single temporal sequence of a single cavity.
Image classification
We tested the image classification task to confirm the multidimensional data classification using multiple cavities. The dataset comprised handwrittendigits in the Modified National Institute of Standards and Technology (MNIST) database^{49}. The processing procedure for multidimensional inputs is shown in Fig. 5a. For preprocessing, the original 28 × 28 images were resized to 16 × 16 ones. To covert the temporal data, the 2 × 2 timesliding window was used for data input. Different digit data are multiplexed in different time slots. The sliding step was set to 2. Thus, the input image data were converted to a fourdimensional time series with 64 time steps. These signals were converted to optical data, and they were processed by the input mask and reservoir optical circuits. The detailed operation for the multidimensional input is described in the “Methods” section. The output signals from each reservoir cavity were acquired by multiple photodetectors. The signals were weighted and summed by using Eq. (6). For the training, we used 60,000 images from the training dataset of MNIST. As this requires largescale memory for Tikhonov regression, we optimized the readout weights in real time using the gradient descent method with the minibatch size of 200 and the learning rate of 5 × 10^{−3}. The training was executed in only one epoch. For the test phase, we validated the 10,000 data from the test dataset.
An example of the time domain reservoir responses (cavity number 8) is plotted in Fig. 5b. These responses indicate that we can process the input images at a speed of 17.1 ns per image, which is much faster than the standard RC using electrical hardware; RC with same architecture takes ~1 ms using an 8core 3.1GHz CPU. The training and test results are shown in Fig. 5c, d. As shown in Fig. 5b, the training accuracy almost monotonically increased, suggesting the success of the training. The test accuracies were improved by increasing the parallelism of the RC, and we achieved maximum test accuracy of 91.3% for the 16 parallel RC case. To the best of our knowledge, this is first experimental demonstration of image classification using onchip photonic RC. However, this value is still inferior to the accuracy for standard deep neural network models. As discussed in^{21}, around 16,000 reservoir nodes are required to achieve stateoftheart performance for the MNIST benchmark. As the node number of our device is 512, we need to construct 32 times larger circuits: e.g., the proposed optical circuit with N = L = 128 array is required, which needs the 128 × 128 optical crossconnects for the input mask and 128 array of 32cm delay lines for the reservoir circuit. The fabrication of these components are challenging. However, it could be possible by using stateoftheart manufacturing technology^{50,51,52}.
Parallel data processing using WDM
As described above, we can operate the photonic signals beyond the input RF limitation by using our device. The circuit also has potential for ultrawideband operation thanks to the flat passband over 1530–1570 nm (~5 THz) as shown in Fig. 3a. However, the remaining bottleneck from the output still limits the processing speed. Here, we demonstrate parallel processing based on WDM to boost the computation efficiency. Figure 6a shows the WDM processing with the proposed chip. In this scheme, the optical inputs with wavelength channel spacing of Δλ are multiplexed into a single optical transmission line and then input to the optical circuit. The input mask operation of our circuit is equilibrium to the frequency filtering of the upconverted signals. As shown in this figure, the filtering shape is repeated along the λaxis due to the circulating nature of the optical phase. This feature was experimentally confirmed as shown in Fig. 3a–d. As the repeating frequency is determined by the minimum delay difference in the filter, we can generate the same optical mask function when we set the channel spacing Δλ = cθ. Thus, we can use the full optical bandwidth of our device by multiplexing the multiple wavelength inputs. The feedback α_{l} can be kept constant by setting the optical paths length in the MZMs to the same length. Unfortunately, the relative phase and delay length are changed due to the group delay of the circuit. However, the impact on the performance is relatively smaller than that of the other hyper parameters in RC. The output signals from the optical circuits are demultiplexed and detected by the detectors. The reservoir responses for each wavelength channel are weighted and summed individually, which is considered as the wavelength parallel RC outputs. Thus, we can realize parallel RC on the same optical chip simultaneously. As a result, we can use the full optical bandwidth of our circuit (~5 THz) beyond the remaining bottleneck from the outputs bandwidth.
For the demonstration, we examined the parallel RC processing using two wavelength inputs. We used different datasets (MNIST and FashionMNIST datasets) for the classification task. The input spectrum is shown in Fig. 6b. The FashionMNIST data were modulated onto the wavelength of 1550 nm, and the MNIST data were modulated onto that of 1550.966 nm, corresponding to channel spacing of 120GHz to set the same optical mask shape. We employed only one cavity in this demonstration. The training results are shown in Fig. 6c. The training accuracy increased almost monotonically, which suggests the success of the proposed WDM approach. The observed test accuracies were 79.2% for the MNIST and 70.1% for the FashionMNIST datasets. The results for the MNIST are almost comparable to those for the singlewavelength experiment [the result for the single cavity in Fig. 5d] despite the change of wavelength. This suggests that the performance degradation in the WDM approach is small. Thus, we think that we can improve the accuracy to 91.3% by using the multiple cavities as shown in Fig. 5d. Although we only confirmed the feasibility for twowavelength input due to the limitations of the WDM experimental setup, the fabricated circuit can support 40 wavelength channels over the telecom Cband at least. The results confirm the potential of ultrawideband processing through WDM.
Discussion
Here, we discuss the computational efficiency of the proposed circuit. Although there are many indices for expressing the performance of a computing device, the multiplyaccumulate per second speed (MAC∙s^{−1}) is now widely considered to be a milestone in the photonic neuromorphic computation region^{9,53,54}. Thus, we discuss this index for the reservoir computer. Our photonic circuit can compute the input and reservoir layer propagation as described in Eqs. (3) and (5). The equilibrium algorithm for our photonic reservoir computer with multiple ring cavities for the case of onedimensional input (M = 1) is shown in Algorithm 1. The second term is input mask operation as described in Eq. (3). The first term and the summation of first and second terms are the reservoir operation as described as Eq. (5). Note that Ω_{ijl} in the Algorithm 1 is an interreservoir connection, which depends on the q value (delay length of each cavity). For ease of understanding, we describe the case for q = 1. The computational complexity of the photonic circuit does not depend on the q value. This algorithm requires the following MAC operations for each time step: 6 NL times multiplication for the reservoir operation (first term), 6N Lσ times multiplication for the mask operation, where σ is the nonzero density of h_{li} (second term), and (2 NL) times for summation of the first and second terms. Note that the processing is executed on the complex space, which requires two (six times) calculations for the sum (multiplex) operation. The average σ value for the FIR filter is considered as 0.5. Thus, the total MAC for each timestep in our device is considered as 11 NL. As the operation time of each time step for the optical circuit is determined by modulation time interval T, the MAC/s can be expressed as 11 L/θ. For the Mdimensional case, input mask operation becomes more complex, and it requires 0.2 (4M − 1) σNL times operations (see Supplementary Note 1). Thus, the MAC∙s^{−1} for our photonic RC can be described as follows:
As M is an inputdataoriented value, L/θ is the important value for the device. The L value is the limited to the size and device components of the optical circuit. The T (=Nθ) is determined by the optical pulse width. Note that the typical single delayline RC approach gives MAC∙s^{−1} of only 2 N/θ because it does not have an optical input mask and L = 1 with realvalued processing.
Algorithm 1 Photonic reservoir computing with multiple ring cavity.
for n = 0:n_{max} (time evolution).
for l = 1:L (loop for multiple cavity)
\({\left( \begin{array}{l}{x1(n,l)} \\ {x2(n,l)} \\ {x_3 (n,l)} \\ {\vdots} \\ {x_{N  1}(n,l)} \\ {xN(n,l)}\end{array} \right)} = {{\alpha} _l{\mathrm{exp}}(j\Delta \varphi _l)}{\left( \underbrace {\begin{array}{*{20}{l}}0 & 1 & 0 & \cdots & 0 & 0 & 0 \\ 0 & 0 & 1 & \cdots & 0 & 0 & 0 \\ 0 & 0 & 0 & & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 1 & 0 \\ 0 & 0 & 0 & \cdots & 0 & 0 & 1 \\ 1 & 0 & 0 & \cdots & 0 & 0 & 0 \end{array}}_{{\Omega} _{{\mathrm{ijl}}}} \right)}{\left( \begin{array}{*{20}{l}} {x_1\left( {n  1,l} \right)} \\ {x_2\left( {n  1,l} \right)} \\ {x_3\left( {n  1,l} \right)} \\ \vdots \\ {x_{N  1}\left( {n  1,l} \right)} \\ {x_N\left( {n  2,l} \right)} \end{array} \right)} +\left(\begin{array}{l}{{h}_{l,1}}\\ {{h}_{l,2}}\\ {\vdots} \\ {{h}_{l,3}} \\ {{h}_{{l},{N1}}} \\ {{h}_{l,N}} \end{array} \right){{\mathrm{u}}^{\prime} (n)}\).
end
end
Figure 7 shows estimated MAC∙s^{−1} for each wavelength (MAC∙s^{−1}∙λ^{−1}) as a function of θ for various L values. Here, we assume M = 1 [Fig. 7a] and M = 4 [Fig. 7b], which are the values we used for the chaotic series expectation and image classification tasks in this work. In our circuit, the spatially distributed delay lines, optical cavities, and optical interferometers solve Eqs. (3) and (5) in parallel, which result in much faster computation speed beyond the input RFbandwidth. The performance reached 21.12 and 44.16 T MAC∙s^{−1}. for M = 1 and 4 for our fabricated circuit (θ = 12.5 ps, L = 16). The total power consumption of our system is estimated to be 1330 W: 20 W for the optical circuit; 20 W for the detector, including the post RF amplifier; 120 W for the optical pulse source, including the optical modulator and RF amplifier; 700 W for the oscilloscope, 180 W for the arbitrary waveform generator, and 290 W for our desktop PC. As the values were estimated from nominal values in the catalog spec sheet, the actual total power consumption is less than this estimation. Thus, the energy efficiency of our circuit [MAC s^{−1} per watt (MAC s^{−1} W^{−1})] can be estimated as 15.9 and 33.2 G MAC s^{−1} W^{−1} for M = 1 and 4. This value is for stateoftheart electric computational devices (the present best performance is 21.108 GMAC s^{−1} for MN3^{55} in June 2020). As can be seen, most of the power consumption originates from the electric devices (oscilloscope, AWG, and desktop PC); therefore, we can reduce the power consumption much more by constructing applicationspecific circuits. In addition, as demonstrated in this study, the WDM technique enables the optical circuit to share the parallel data inputs. Our circuit supports the Cband (1530–1570 nm, which has ~ 5THz bandwidth) with wavelengthspacing of 120 GHz. Thus, we can potentially use 40 wavelength channels in our circuit, which realizes petascale optical reservoir processing (0.845 and 1.77 PMAC s^{−1} for M = 1 and 4). These speeds are much higher than the theoretical operation speed of recent CPUs [~500 G MAC s^{−1} (16 MAC × 3 GHz × 10 core)] and GPUs [~6 T MAC s^{−1} (2 MAC × 1 GHz × 3000 core)]. This value is not far from that for current stateofthe art of supercomputers, which ranges from 1 to 100 PMAC/s. Therefore, our approach poses a challenge to conventional Turing–von Neumann machines, and it confirms the great potential of photonic neuromorphic processing.
Conclusion
In this paper, we demonstrated photonic RC based on coherent linear photonic processors. In contrast to previous approaches, both the input signals and reservoir weights are optically encoded in the spatiotemporal domain, which enables scalable integration on a compact chip. The ultrafast optical pulse source upconverts the input signals to a higher frequency region beyond the electrical bandwidth. We also demonstrated parallel processing based on the wavelength division multiplexing (WDM), which enables the use of ultrawide optical bandwidth (>THz) as a computational resource. As a result, the computation efficiency is boosted beyond the computation bottleneck remaining from the I/O bandwidth. Experiments for the standard benchmarks showed good performance for chaotic timeseries forecasting and image classifications with recordhigh processing speed of ~17.1 ns per image. The device can achieve 21.12 T MAC s^{−1} for each wavelength and can reach petascale computation speed on a single photonic chip by using WDM.
Methods
Experimental setup
Optical pulses with a pulse width of 30 ps and repeating time T of ~266.7 ps (1/120 GHz ×32), corresponding to p = 1, were generated by using a coherent laser light source with a 1550nm wavelength and an amplitude modulator with a 45GHz bandwidth. The input data was generated by using an arbitrary waveform generator with a 20GHz bandwidth and sampling rate of 60 GSa/s. The synthesizer and arbitrary waveform generator (AWG) were synchronized by using the same clock. The signal was amplified by an optical fiber amplifier to 10 dBm and input to the fabricated PLC. The mask condition and reservoir condition were randomly set by controlling the Mach–Zehnder modulators and phase shifters in the chip. The output signals were filtered by the wavelength filter. They were measure by using 50GHz bandwidth photodetectors, and the observed RF signals were amplified by using lownoise amplifier with a 40GHz bandwidth. The signals were acquired by using a digital storage oscilloscope with an 18GHz bandwidth and 60 GSa/s (DSO). Thus, the measurable node count decreased to θ′/θ =1/2 [θ′ ~ 16.7 ps, N’′= 16].
Multidimensional data processing
To process the multidimensional data such as image information, RC inputs often become timedependent vector u(t) = [u_{1}(t), u_{2}(t), …, u_{M}(t)]. To process the such type of data, (M × N)sized random matrix are usually employed for the input mask^{20,21}. In the previous work, it is easy to realize because the input mask was processed in the electric domain. However, there are no reports for input masking for photonic implementation. Here, we consider the method for input masking by using the same photonic convolutional filter. In our device, there are two ways to process the multidimensional inputs. The first way is simply inputs the signal to multiple input ports like the conventional onchip ANNs^{4}. However, this architecture requires multiple laser sources, which requires complex and expensive hardware update; e.g., for MNIST task, which is a standard image classification task, requires 28 inputs. It requires 28 laser inputs with optoelectric modulator array. In addition, when their relative phases and/or intensity are fluctuated, the optical output is also fluctuated due to interferometric condition changes. It leads the poor accuracy of the system. Thus, we adapt the second choice, that is time division multiplexing as shown in Fig. 1. This method does not require any hardware update. In this method, the multidimensional inputs are arranged in different time slot. To realize it, the radiofrequency (RF) input signal u(t) is set to following form.
where the modulation interval T′ is set to T′ = T/M. The repeating time of optical pulse is also set to T′. Then, the δ(t) is described as follows;
Then, the modulated optical pulse u′(n) for discretized time t(n,i) can be described as the orthonormal basis of input vector
To satisfy the synchronization between RFinput and optical pulse, N/M should be set to integer value. The reservoir processing and post processing are same as the way for onedimensional case. Thus we can process the multidimensional data by using the same optical circuit.
Training of readout
The parameter of RC is only readout weight ω, which forms a linear combination of the reservoir states. For the training, we collected the reservoir response x^{2} from each photodetector during training period, T_{train} Then, we obtained the N_{r} × T_{train} state matrix S, where N_{r} is the number of reservoir nodes (in our case, N_{r} = NL). The goal of the optimization is to find the ω in such a way that the actual output Y=ωS matches the desired output Y_{teacher} as close as possible in the leastsquares sense. This is a linear problem. For the offline case, the optimum ω is calculated by using the MoorePenrose pseudoinverse S^{†} of state matrix S:
where λ is the parameter for Ticknov regularization to avoid overfitting, and I is an N_{r}dimensional unity matrix. For the online training, we divided the training data into chunks (minibatches). The readout was updated by the following simple gradient descent method.
where ε is learning rate.
Data availability
The data for the current study are available from the corresponding author upon reasonable request.
References
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Merolla, P. A. et al. A million spikingneuron integrated circuit with a scalable communication network and interface. Science 345, 668–673 (2014).
Misra, J. & Saha, I. Artificial neural networks in hardware: a survey of two decades of progress. Neurocomputing 74, 239–255 (2010).
Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photonics 11, 441–446 (2017).
Lin, X. et al. Alloptical machine learning using diffractive deep neural networks. Science https://doi.org/10.1126/science.aat8084 (2018).
Hughes, T. W., Minkov, M., Shi, Y. & Fan, S. Training of photonic neural networks through in situ backpropagation and gradient measurement. Optica 5, 864–871 (2018).
Hamerly, R., Bernstein, L., Sludds, A., Soljačić, M. & Englund, D. Largescale optical neural networks based on photoelectric multiplication. Phys. Rev. X 9, 021032 (2019).
Feldmann, J., Youngblood, N., Wright, C. D., Bhaskaran, H. & Pernice, W. H. P. Alloptical spiking neurosynaptic networks with selflearning capabilities. Nature 569, 208–214 (2019).
Feldmann, J. et al. Parallel convolution processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).
Tait, A. N. et al. Neuromorphic photonic networks using silicon photonic weight banks. Sci. Rep. 7, 7430 (2017).
Larger, L. et al. Photonic information processing beyond Turing: an optoelectronic implementation of reservoir computing. Opt. Express 20, 3241 (2012).
Brunner, D., Soriano, M. C., Mirasso, C. R. & Fischer, I. Parallel photonic information processing at gigabyte per second data rates using transient states. Nat. Commun. 4, 1364 (2013).
Vinckier, Q. et al. Highperformance photonic reservoir computer based on a coherently driven passive cavity. Optica 2, 438 (2015).
Vandoorne, K. et al. Experimental demonstration of reservoir computing on a silicon photonics chip. Nat. Commun. 5, 3541 (2014).
Kuriki, Y., Nakayama, J., Takano, K. & Uchida, A. Impact of input mask signals on delaybased photonic reservoir computing with semiconductor lasers. Opt. Express 26, 5777–5788 (2018).
Harkhoe, K., Verschaffelt, G., Katumba, A., Bienstman, P. & Van der Sande, G. Demonstrating delaybased reservoir computing using a compact photonic integrated chip. Opt. Exp. 28, 3086 (2020).
Bueno, J. et al. Reinforcement learning in a largescale photonic recurrent neural network. Optica 5, 756–760 (2018).
Nakajima, M., Inubushi, M., Goh, T. & Hashimoto, T. Coherently driven ultrafast complexvalued photonic reservoir computing. In Proceedings of Conference on Lasers and ElectroOptics (CLEO) paper SM1C.4 (Optical Society of America, Washington D.C., 2018).
Argyris, A., Bueno, J. & Fischer, I. Photonic machine learning implementation for signal recovery in optical communications. Sci. Rep. 8, art. 8487 (2018).
Larger, L. et al. Highspeed photonic reservoir computing using a timedelay based architecture: million words per second classification. Phys. Rev. X 7, 11015 (2017).
Antonik, P., Marsal, N. & Rontani, D. Largescale spatiotemporal photonic reservoir computer for image classification. IEEE J. Sel. Top. Quantum Electron. 26, 1–12 (2020).
Goodman, J. Four decades of optical information processing. Opt. Photon. News 2122, 11–15 (1991).
Bocker, R. P. Matrix multiplication using incoherent optical techniques. Appl. Opt. 13, 1670–1676 (1974).
Psaltis, D. & Lin, S. Optoelectronic implementations of neural networks. IEEE Commun. Mag. 271112, 37–40 (1989).
Tomson, D. et al. Roadmap on silicon photonics. J. Opt. 18, 073003 (2016).
Harris, N. C. et al. Linear programmable nanophotonic processors. Optica 5, 1623–1631 (2018).
Carolan, J. et al. Universal linear optics. Science 349, 711–716 (2015).
Jaeger, H. & Haas, H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–79 (2004).
Maass, W., Natschläger, T. & Markram, H. “Realtime computing without stable states: a new framework for neural computation based on perturbations,”. Neural Comput. 14, 2531 (2002).
Lukoševičius, M. & Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127 (2009).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Parallel Distributed Processing Vol. 1 (MIT, Cambridge, 1986).
The 2006/07 forecasting competition for neural networks & computational intelligence. http://www.neuralforecastingcompetition.com/NN3/ (2006).
Antonik, P., Marsal, N., Brunner, D. & Rontani, D. Human action recognition with a largescale braininspired photonic computer. Nat. Mach. Intell. 1, 530–537 (2019).
Tanaka, G. et al. “Recent advances in physical reservoir computing: a review. Neural Netw. 115, 100–123 (2019).
Appeltant, L. et al. Information processing using a single dynamical node as complex system. Nat. Commun. 2, 468 (2011).
Ortın, S. & Pesquera, L. Reservoir computing with an ensemble of timedelay reservoirs. Cogn. Comput. 9, 327–336 (2017).
Sugano, C., Kanno, K. & Uchida, A. Reservoir computing using multiple lasers with feedback on a photonic integrated circuit. IEEE IEEE J. Sel. Top. Quantum Electron. 26, 1500409 (2020).
Aratake, A. High reliability of silicabased 1 × 8 optical splitter modules for outside plant. J. Lightwave Technol. 34, 27 (2016).
Takahashi, H. High performance planar lightwave circuit devices for large capacity transmission. Opt. Exp. 19, B173 (2011).
Sakamoto, J., Goh, T., Katayose, S., Kasahara, R. & Hashimoto, T. Shapeoptimized multimode interference for a wideband visible light coupler. Opt. Commun. 443, 221 (2019).
Carolan, J. et al. Universal linear optics. Science 349, 711 (2015).
Akatsuka, T. et al. Optical frequency distribution using laser repeater stations with planar lightwave circuits. Opt. Exp. 28, 9186 (2020).
Sasayama, K., Okuno, M. & Habara, K. Coherent optical transversal filter using silicabased waveguides for highspeed signal processing. J. Lightwave Technol. 9, 1225 (1991).
Clements, W. R. et al. Optimal design for universal multiport interferometers. Optica 3, 1460–1465 (2016).
Reck, M., Zeilinger, A., Bernstein, H. J. & Bertani, P. Experimental realization of any discrete unitary operator. Phys. Rev. Lett. 73, 58–61 (1994).
Miller, D. A. B. Selfconfiguring universal linear optical component. Photon. Res. 1, 1–15 (2013).
Miller, D. A. B. Perfect optics with imperfect components. Optica 2, 747–750 (2015).
A. S. Weigend, A. S. & Gershenfeld, N. A. Time series prediction: forecasting the future and understanding the past. http://wwwpsych.stanford.edu/∼andreas/TimeSeries/SantaFe.html (1993).
LeCun, Y., Cortes, C. & Burges, C. J. C. The MNIST database of handwritten digits (1998).
Seok, T. J., Kwon, K., Henriksson, J., Luo, J. & Wu, M. C. Waferscale silicon photonic switches beyond die size limit. Optica 6, 490–494 (2019).
Kominato, T. et al. Extremely lowloss (0.3 dB/m) and long silicabased waveguides with large width and clothoid curve connection. In Proceeding of European Conference on Optical Communication (ECOC) paper TuI.4.3 (Optical Society of America, Washington D.C., 2004).
Lee, H., Chen, T., Li, J., Painter, O. & Vahala, K. J. Ultralowloss optical delay line on a silicon chip. Nat. Commun. 3, 867 (2012).
Nahmias, M. A. et al. Photonic multiplyaccumulate operations for neural networks. IEEE J. Sel. Top. Quantum Electron. 26, 7701518 (2019).
Totović, A., Dabos, G., Passalis, N., Tefas, A. & Pler, N. Femtojoule per MAC neuromorphic photonics: an energy and technology roadmap. J. Sel. Top. Quantum Electron. 26, 8800115 (2020).
Top500 Project, Green500 June 2020 https://www.top500.org/lists/green500/2020/06/ (2020).
Acknowledgements
The Authors are grateful to Takashi Goh, Masanobu Inubushi, Shiori Konisho, Soichi Oka, and Satoshi Shigematsu for useful discussions. We also thank Fumikatsu Sugimoto for technical assistance with the experiments.
Author information
Authors and Affiliations
Contributions
M.N. conceived the basic concept of the presented onchip reservoir computer. T.H. supervised the project. M.N. designed and characterized the optical circuits. M.N. and K.T. carried out the experiments for the reservoir computing. M.N. wrote the initial draft of the manuscript. All the author analyzed and discussed the results and contributed to write the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nakajima, M., Tanaka, K. & Hashimoto, T. Scalable reservoir computing on coherent linear photonic processor. Commun Phys 4, 20 (2021). https://doi.org/10.1038/s42005021005191
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42005021005191
This article is cited by

Physical reservoir computing with emerging electronics
Nature Electronics (2024)

Highlyintegrable analogue reservoir circuits based on a simple cycle architecture
Scientific Reports (2024)

Emerging opportunities and challenges for the future of reservoir computing
Nature Communications (2024)

Integrated photonic neuromorphic computing: opportunities and challenges
Nature Reviews Electrical Engineering (2024)

The physics of optical computing
Nature Reviews Physics (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.