Photonic matrix multiplication lights up photonic accelerator and beyond

Zhou, Hailong; Dong, Jianji; Cheng, Junwei; Dong, Wenchan; Huang, Chaoran; Shen, Yichen; Zhang, Qiming; Gu, Min; Qian, Chao; Chen, Hongsheng; Ruan, Zhichao; Zhang, Xinliang

doi:10.1038/s41377-022-00717-8

Download PDF

Review Article
Open access
Published: 03 February 2022

Photonic matrix multiplication lights up photonic accelerator and beyond

Light: Science & Applications volume 11, Article number: 30 (2022) Cite this article

28k Accesses
171 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Matrix computation, as a fundamental building block of information processing in science and technology, contributes most of the computational overheads in modern signal processing and artificial intelligence algorithms. Photonic accelerators are designed to accelerate specific categories of computing in the optical domain, especially matrix multiplication, to address the growing demand for computing resources and capacity. Photonic matrix multiplication has much potential to expand the domain of telecommunication, and artificial intelligence benefiting from its superior performance. Recent research in photonic matrix multiplication has flourished and may provide opportunities to develop applications that are unachievable at present by conventional electronic processors. In this review, we first introduce the methods of photonic matrix multiplication, mainly including the plane light conversion method, Mach–Zehnder interferometer method and wavelength division multiplexing method. We also summarize the developmental milestones of photonic matrix multiplication and the related applications. Then, we review their detailed advances in applications to optical signal processing and artificial neural networks in recent years. Finally, we comment on the challenges and perspectives of photonic matrix multiplication and photonic acceleration.

Neural operators for accelerating scientific simulations and design

Article 08 April 2024

Kamyar Azizzadenesheli, Nikola Kovachki, … Anima Anandkumar

Highly efficient vortex generation at the nanoscale

Article 01 April 2024

Qinmiao Chen, Geyang Qu, … Shumin Xiao

Avalanche photodiode with ultrahigh gain–bandwidth product of 1,033 GHz

Article 09 April 2024

Yang Shi, Xiang Li, … Xinliang Zhang

Introduction

Over the past few years, there has been an ever-growing demand for artificial intelligence and fifth-generation communications globally, resulting in very large computing power and memory requirements. The slowing down or even failure of Moore’s law makes it increasingly difficult to improve their performance and energy efficiency by relying on advanced semiconductor technology^1,2. Moreover, the clock frequency of traditional electrical processing methods is generally limited to several GHz³, which can no longer meet the demands of super-high-speed and low-latency mass data processing. Matrix computation is one of the most widely used and indispensable tools of information processing in science and engineering^4,5. Most signal processing, such as the discrete Fourier transform and convolution operation, can be attributed to matrix computations. On the other hand, since the concept of artificial intelligence (AI) was put forward in 1956 for the first time⁶, artificial neural networks (ANNs) have been rapidly developed and widely used in various fields⁷. Due to the continuous substantial increase in information capacity, general electronic processors seem to be incapable of executing high-complexity AI tasks in the foreseeable future¹. To solve this challenge, chips oriented to AI applications have emerged, such as neural network processing units (NPUs)⁸. At present, AI chips have been widely used in almost every type of big data processing in areas such as search, news, e-commerce, cloud computing, and inverse design of functional devices^{9,10,11,12,13}. Typically, neural network algorithms represented by deep learning, such as forward neural networks (FNNs), convolutional neural networks (CNNs) and spiking neural networks (SNNs), are characterized by many training parameters, especially in heavy matrix computations¹⁴.

Traditionally, matrix computation is completed by an electrical digital signal processor, and its speed and power consumption are greatly limited by the nature of the electronic devices themselves. Therefore, traditional electrical methods are hard to simultaneously achieve high-capacity and low-latency matrix information processing limited by the Moore’s law^1,2. However, for some applications, such as ultrafast neural networks¹⁵, large bandwidth and low latency are simultaneously required; thus, a new medium for matrix computations and interconnects is urgently needed for the implementation of high-performance and energy-efficient matrix computations. Optical devices can have a superlarge bandwidth and low power consumption¹⁶. And light has an ultrahigh frequency up to 100 THz and multiple degrees of freedom in their quantum state^17,18, making optical computing one of the most competitive candidates for high-capacity and low-latency matrix information processing in the “More than Moore” era¹. For example, a Fourier transform was performed at the speed of light with a lens¹⁹. Motivated by its very high prospect, photonic matrix multiplication has been developed rapidly in recent years and has been widely applied in photonic acceleration for optical signal processing^20,21,22, AI and optical neural networks (ONNs)^15,23,24. A lot of review works on photonic acceleration have been made, these works mainly focused on integrated photonic neuromorphic systems^{1,15,23,24,25,26,27,28}, nanophotonics and machine learning blend^29,30, reservoir computing³¹, programmable nanophotonics^21,22,32. As a fundamental and important part of photonic acceleration, photonic matrix multiplication computation for photonic acceleration has not been systematically reviewed. Here, we review the advances of photonic acceleration from the perspective of photonic matrix multiplication. We first discuss the methods and developmental milestones of photonic matrix multiplications and then review the progress in cutting-edge fields of optical signal processing and optical neural networks. Finally, a perspective for photonic matrix multiplications is discussed.

Matrix-vector multiplication

The methods for photonic matrix-vector multiplications (MVMs) mainly fall into three categories: the plane light conversion (PLC) method, Mach–Zehnder interferometer (MZI) method and wavelength division multiplexing (WDM) method. The detailed mechanism of these MVMs can be found in ref. ³³, which offers an easy-to-read overview of principle and development of photonic matrix computation. The first kind of optical MVM (PLC-MVM) is implemented by the diffraction of light in free space. Figure 1a shows a typical MVM configuration^34,35. First, the incident vector of X distributed along the x direction can be expanded and replicated along the y direction through a cylindrical lens or other optical elements. Then, the spatial diffraction plane is used to adjust each element independently, and its transmission matrix is W. Finally, the x-direction beams are combined and summed in a similar way, and the final output vector of Y along the y direction is the product of the matrix of W and the vector X, that is, Y =WX. The second MVM mainly consists of an MZI network (i.e., MZI-MVM). Figure 1b shows the configuration diagram, which is based mainly on rotation submatrix decomposition and singular value decomposition³⁶. The calibration of the transmission matrix is more difficult since every matrix element is affected by multiple dependent parameters. The third MVM (i.e., WDM-MVM) is an incoherent matrix computation method based on the WDM technology. Figure 1c shows a typical diagram based on microring resonators (MRRs). The input vector of X is loaded on beams with different wavelengths, which pass through the microrings with one-one adjustment of the transmission coefficients of W. Then, the total output power vector is given by Y=WX.

**Fig. 1: Methods for matrix multiplication computation.**

Photonic matrix multiplication has come a long way and developed rapidly in recent years. Figure 2 summarizes the development history and milestones of photonic matrix computation. In the preliminary stage, only some fixed matrix computations were implemented using optical methods such as the Fourier transform¹⁹. Thereafter, the initially programmable MVM was demonstrated with spatial optical elements based on single PLC (SPLC)³⁴. For example, a fully parallel, high-speed incoherent optical method was employed to utilize the discrete vector multiplier at a high speed³⁷, while the update of the matrix at high frame rates was restrained with current spatial light modulators (SLMs). Matrix multiplications involving optical array modulators, such as electrooptic modulations, direct driven LED arrays, and acousto-optic Bragg cells, were accomplished with faster frame rates^34,38,39. A photorefractive crystal^40,41,42 and nonlinear material⁴³ could be optionally applied to implement MVMs. In the SPLC-MVM method, only one dimension is used for the input/output vectors, and the scale (\(\propto N\)) of vectors is still limited. A more powerful PLC-MVM for unitary spatial mode manipulation was proved with multiplane light conversion (MPLC)^44,45, in which the input/output vectors are distributed in the whole two-dimensional plane, and the scale is proportional to\({N}^{2}\). Afterwards, the MPLC technique was widely used in various fields, such as for all-optical machine learning^46,47,48, the Laguerre-Gaussian or orbital angular momentum (OAM) mode sorter^49,50, the photonic Ising machine^51,52, time-reversed optical waves⁵³, optical logic operations⁵⁴, optical encryption and perceptrons^55,56, optical hybrid⁵⁷ and neuromorphic optoelectronic computing⁵⁸. Although MPLC can achieve ultralarge-scale MVMs, the devices are bulky, and the reprogramming speed for weight encoding is still limited. A mini-sized and universal MVM is more practical, especially in integrated photonic applications. In 2017, Tang et al. first proposed a novel integrated reconfigurable unitary optical mode converter using multimode interference couplers, which shared a similar principle with MPLC⁵⁹. Then, it was used for all-optical on-chip multi-input-multi-output (MIMO) mode demultiplexing⁶⁰. In 2020, the integrated MPLC technique was further analyzed by Saygin et al. as a novel matrix decomposition method based on multichannel blocks⁶¹ and then was experimentally proven on a silicon photonic chip⁶².

**Fig. 2: Timeline of advances in photonic matrix computations and neuromorphic photonics.**

In 1994, Reck et al. proposed a recursive algorithm that could factorize any \(N\times N\) unitary matrix into a sequence of two-dimensional matrix transformations, which paved the way for future photonic integrated computation based MZIs³⁶. Then, Miller et al. suggested that the MZI network could be self-configured to define functions assisted by transparent detectors^63,64,65,66. The MZI mesh was then applied in an add-drop multiplexer for spatial modes⁶⁶, universal linear optical components⁶⁵, automatic MIMO⁶⁴ and universal beam couplers⁶³. In 2016, Clements et al. proposed a brand-new universal matrix framework based on an alternative assemblage of MZIs and phase shifters, which is superior to that proposed by Reck et al. Only half the optical depth of the Reck design is required, and the optical loss is significantly reduced⁶⁷. Ribeiro et al. experimentally demonstrated a 4 × 4-port universal optical linear circuit chip with the MZI mesh on integration platforms⁶⁸. Thereafter, the applications of MZI-MVMs were further extended to ONNs³, light descramblers⁶⁹, modular linear optical circuits⁷⁰, optical CNNs⁷¹, equalizers⁷², digital-to-analog conversion (DAC)⁷³, Ising machines^74,75, mode analysis⁷⁶ and complex ONNs⁷⁷.

Generally, the footprint of the MZI reaches over 10,000 μm² per interferometer unit, which remains a bottleneck to further improve the computing density of the MZI mesh. The WDM-MVM based on microring arrays was proposed by Xu et al., who used compact microrings with a diameter of only a few microns^78,79. This approach encodes information on different optical wavelengths rather than spatial modes. Compared to other physical dimensions, the wavelength dimension has the most abundant orthogonal channels in optics, up to hundreds of channels^80,81. Silicon MRR arrays for matrix operations were first conceptualized by Xu and Soref in 2011⁷⁸. They were then demonstrated by Yang et al. using a 4 × 4 silicon microring modulator array but with binary values of 0 and 1 only⁷⁹. In 2014, Tait and his colleagues proposed using MRR arrays as a matrix computation method primitive for photonic neural networks⁸² and achieved continuous matrix values from -1 to 1 by continuously tuning the MRRs. The WDM-MVM was further used for photonic weight banks^83,84,85,86, principal component analysis (PCA)⁸⁷, independent component analysis (ICA)⁸⁶, blind source separation (BSS)⁸⁸, TeraMAC neuromorphic photonic processor¹⁸, the optical SNN⁸⁹, TeraMAC photonic tensor core⁹⁰, optical CNN^91,92,93, and photonic convolutional accelerator for the ONN^16,94.

Table 1 summarizes the performance comparison of different photonic matrix multiplication methods. In general, the PLC-MVM method is coherent and can operate in the whole complex field. Its scale is very large, input vector sizes of 357 for SPLC-MVM⁴⁸ and 490000 (N = 700) for MPLC-MVM⁵⁸ were reported, easily up to 10³ for SPLC-MVM and 10⁶ for MPLC-MVM with SLMs⁵⁸. However, the device size is quite large, and hence, the integrated counterpart was pursued^59,60,61. The MZI-MVM method is also coherent, but its scale is far smaller than that of the PLC-MVM method (N = 64 was reported by Lightmatter⁹⁵). The main advantage is that it can be integrated into a chip. The WDM-MVM method is more compact. The scale is restricted by the number of wavelengths and can be ~10² with soliton crystal microcombs¹⁶, provided all the wavelengths are used for a single MVM. A balanced photodetector summing weighted signals allows for positive and negative weights⁸². WDM-MVM is incoherent and can be used for real-valued matrices. For these methods, the assigned transmission matrices for SPLC-MVM and WDM-MVM can be directly written in, while some algorithms are needed to load the transmission matrices for the MPLC-MVM and MZI-MVM methods. All these MVM methods have been widely applied in various fields. In the following, we review the detailed applications of MVMs in optical signal processing and photonic AI.

Table 1 Comparison of different photonic matrix multiplication methods

Full size table

MVMs for optical signal processing

The photonic matrix multiplication network itself can be used as a general linear photonic loop for photonic signal processing³². In recent years, MVM has been developed as a powerful tool for a variety of photonic signal processing methods.

MPLC-MVMs

Benefiting from the large-scale computing capability of spatial planes, MPLC can achieve very powerful matrix functions⁴⁴. For example, Joel Carpenter et al. realized the classification of 210 Hermite–Gaussian modes or Laguerre-Gaussian modes using only 7 phase planes with a pixel size of \(274\times 274\)⁴⁹. A schematic diagram of the Laguerre-Gaussian mode sorter is shown in Fig. 3a. First, Gaussian beams from different positions were injected into the device and converted to different orthogonal Hermite–Gaussian modes by MPLC based on the wavefront matching method⁹⁶. Then, a cylindrical lens pair was used to convert the Hermite–Gaussian mode into the Laguerre-Gaussian mode. The realized super-multimode multiplexer and demultiplexer are of great significance in multimode optical communications. As shown in Fig. 3b, this powerful mode sorter was further used to create time-reversed waves, where all classical linear physical dimensions of light were simultaneously controlled independently⁵³. This device can independently address the amplitude, phase, spatial mode, polarization and spectral/temporal degrees of freedom simultaneously through the programming of the SLM. Ninety spatial/polarization modes controlled over 4.4 THz at a resolution of ~15 GHz were demonstrated, covering a total of ~26,000 spatiospectral modes. A reprogrammable metahologram was further designed for optical encryption, as shown in Fig. 3c⁵⁵. The encrypted information was divided into two matrices using two phase planes, and the enciphered message emerged only when the two planes matched.

Some other applications have also been demonstrated. The MPLC technique was a helpful tool for optimal transverse distance estimation, as shown in Fig. 4a⁹⁷. The measurements were performed in two dimensions far beyond the Rayleigh limit over a large dynamic range. Some theoretical studies were performed. For example, a scalable nonmode selective Hermite–Gaussian mode multiplexer was proposed, as shown in Fig. 4b, where 256 Hermite–Gaussian modes were designed using only seven phase masks⁹⁸. In Fig. 4c, Li et al. implemented the linear polarization mode and Hermite–Gaussian mode demultiplexing hybrids with similar methods^99,100. Each input mode was converted to four fundamental modes with a 90-degree phase difference located at nonoverlapping positions. Local light was uniformly mapped to the fundamental modes with the same phase, which exactly overlapped with output spots from the input modes. The complex amplitudes of the input modes could be retrieved from the interference light intensities. Furthermore, an ultrabroadband polarization-insensitive optical hybrid using MPLC was experimentally verified⁵⁷. As shown in Fig. 4d, 14 phase masks and a gold mirror were employed to carry out the optical hybrid, and a measurement bandwidth of 390 nm was obtained.

**Fig. 4: MPLC-MVMs for optical signal processing.**

Integrated MPLC-MVM was also successfully verified. In 2017, Tang et al. first theoretically proved a novel integrated reconfigurable unitary MPLC-MVM using multimode interference couplers⁵⁹. The schematic diagram is presented in Fig. 5a. The transmission matrix was decomposed into a series of programmable unitary diagonal matrices and fixed unitary diffractive matrices. In theory, an arbitrary unitary transmission matrix can be configured by tuning the unitary diagonal matrices, provided that enough phase planes are assigned. In 2018, the integrated MPLC-MVM was experimentally verified for reconfigurable all-optical on-chip MIMO three-mode demultiplexing⁶⁰. Figure 5b shows the details of the three-channel MIMO demultiplexing chip. Furthermore, Saygin et al. built a more universal architecture for integrated MPLC-MVM in 2020⁶¹. In addition, a ten-port unitary optical processor has been experimentally demonstrated⁶². Figure 5c presents the device operating principle, where the fixed unitary diffractive matrices are implemented using multiport directional couplers. This processer offers a new flexible and robust architecture for large-scale MVMs.

MZI-MVMs

The MZI-MVM, as an integrated photonic matrix computation method, is quite suitable in on-chip optical signal processing^32,70. Based on the orthogonal matrix transformation, it is competent to manipulate the spatial orthogonal modes. Figure 6a shows a reconfigurable add-drop multiplexer for spatial modes sampled by the grating array⁶⁶. It could extract a specified spatial mode from a light beam, leaving the other modes undisturbed. It also allows a new signal to be reloaded on that mode. Similarly, as Fig. 6b shows, an MZI mesh based on the orthogonal matrix transformation was used as a 4 × 4-port universal linear circuit, enabling self-adaptation to implement the desired functions⁶⁸. The same structure shown in Fig. 6c could further automatically undo strong mixing between modes as a mode descrambler⁶⁹. The theoretical analysis for the initialization procedure, training and optical multiple-input multiple-output equalizers was discussed in detail in refs. ^72,101,102. More generally, the MZI-based orthogonal matrix mesh was theoretically proved to have the ability to analyze and generate multiple modes using self-configuring methods⁷⁶. The concept and architecture are presented in Fig. 6d, where an example of a square grating coupler array is illuminated by the input light. While these self-configuring methods require many built-in optical power monitors, they bring additional loss and rapidly increase the number of monitors with the extension of the network, making both the electronic layout and iterative algorithm quite complex. In 2020, Zhou et al. proposed and experimentally demonstrated a common self-configuring method without any information from the inner structure^103,104. Figure 6e shows an example of the iteration process, where a switching matrix was self-configured from a random state. The training was finished using the numerical gradient algorithm inspired by deep learning³, which is practicable for a general “black box” system. A similar idea was applied for an all-in-one photonic polarization processor chip^105,106. Other MZI meshes were also reported for multipurpose silicon photonics signal processors, such as a hexagon mesh¹⁰⁷ and a square mesh¹⁰⁸.

WDM-MVMs

The WDM-MVM can be directly executed without any algorithms, benefiting from the one-to-one mapping relation between wavelengths and matrix elements. This correlation makes the WDM methods practicable for wave shaping combined with frequency–time mapping^109,110. As shown in Fig. 7a, b, a 1 × 8 MRR array was fabricated for on-chip programmable pulse shaping. The spectral shape and width could be tuned by changing the resonant wavelengths of the MRRs. The square-shape transfer function is demonstrated and presented in Fig. 7b. Other shapes, such as an isosceles triangle and a sawtooth triangle, were also verified. Furthermore, the MRR array can be used for MVM, provided that a sum operation on multiple wavelengths is performed, called “microring weight banks”, as shown in Fig. 7c⁸³. A balanced photodetector (PD) yielded the sum and difference of weighted signals. The reconfigurability and scalability of the channel count of the MRR weight banks were experimentally demonstrated in ref. ¹¹¹ with a comprehensive theoretical analysis¹¹². Different methods of controlling large-scale MRRs for matrix computation were proposed and demonstrated in refs. ^85,113,114. Afterwards, the microring weight bank was applied for various signal processing methods, such as fiber nonlinearity compensation¹¹⁵ and photonic PCA⁸⁷. PCA aims to extract the principal components (PCs) solely based on the statistical information of the weighted addition output. Figure 7d presents an experimental example of the obtained two-channel waveforms of both the 1^st and 2^nd PCs, evidencing the effectiveness of photonic PCA. The weight bank was further used for photonic ICA to identify the underlying sources that form the basis of the observed data⁸⁶. As shown in Fig. 7e, photonic ICA retrieved the corresponding independent components (ICs) from the received mixture waveforms. By combining the photonic PCA and ICA together, a two-step procedure for a complete photonic BSS pipeline was achieved⁸⁸. The BSS is a powerful technique for achieving signal decomposition with minimal knowledge on either the source characteristics or the mixing process. Figure 7f gives an example of ICs retrieved from mixed radio-frequency waveforms with the BSS technique⁸⁸.

In comparison, coherent MVMs are usually applied in multimode signal processing. The MPLC method can manage massive modes benefiting from the ability of large-scale matrix computation. The main limits are that it is bulky and difficult to refresh with a fast response. The MZI method is easy to integrate, and the functions of the MZI mesh can be autoconfigured since the phase shifters can work faster. However, the scale of matrix computation is limited, and this method can work only for a few modes. Compared with the MZI method, the WDM-MVM method has a more compact footprint, and it is much easier to configure the transmission matrix and apply WDM-MVM for programmable pulse shaping, photonic PCA, ICA and BSS.

MVMs for optical neural networks

AI technology has been widely used in various electronics industries, such as for deep-learning-based speech recognition and image processing. MVM, as the basic building block of ANNs, occupies most of the computing tasks, such as over 80% for GoogleNet and OverFeat models¹¹⁶. Improving the MVM performance is one of the most effective means for ANN acceleration. Compared with electrical computing, optical computing is poor at data storage and flow control, and the low efficiency of optical nonlinearities limits the applications in nonlinear computation¹¹⁷, such as activation functions. While it has significant advantages on massively parallel computing through multiplexing strategies of wavelength, mode and polarization^17,90, extremely high data modulation speeds up to 100 GHz^118,119. Hence, photonic networks are quite good at MVM. The combination of optical computing and AI is expected to realize intelligent photonic processors and photonic accelerators¹²⁰. In recent years, AI technology has also seen rapid developments in the field of optics.