Space-efficient optical computing with an integrated chip diffractive neural network

Large-scale, highly integrated and low-power-consuming hardware is becoming progressively more important for realizing optical neural networks (ONNs) capable of advanced optical computing. Traditional experimental implementations need N2 units such as Mach-Zehnder interferometers (MZIs) for an input dimension N to realize typical computing operations (convolutions and matrix multiplication), resulting in limited scalability and consuming excessive power. Here, we propose the integrated diffractive optical network for implementing parallel Fourier transforms, convolution operations and application-specific optical computing using two ultracompact diffractive cells (Fourier transform operation) and only N MZIs. The footprint and energy consumption scales linearly with the input data dimension, instead of the quadratic scaling in the traditional ONN framework. A ~10-fold reduction in both footprint and energy consumption, as well as equal high accuracy with previous MZI-based ONNs was experimentally achieved for computations performed on the MNIST and Fashion-MNIST datasets. The integrated diffractive optical network (IDNN) chip demonstrates a promising avenue towards scalable and low-power-consumption optical computational chips for optical-artificial-intelligence.


Supplementary Note 1: Superiorities of our integrated chip in footprint, power consumption and scalability.
The physical footprint of the IDNN is dependent on the size of MZI modulators and diffraction regions. Neglecting the electrical control lines for tuning each MZI modulator, the total footprint of the IDNN can be estimated as where SMZI is the area of a single MZI element, N is the input data dimensions and SDIFF is the area of a diffractive region.
Assuming there is a 10 × 10 network, a conservative MZI length is 100 μm and a diffraction region is around 0.04 mm 2 , we estimate that the entire chip would be 0.53 mm 2 , while for MZI-based OIU [1][2][3], the area of the whole chip is calculated to around 5 mm 2 .
Hence our proposed IDNN chip shows superior area efficiency when scaling the network dimension. Neural networks routinely have hundreds of millions of neurons with a huge weight matrix. It has been established that the original weight matrix can be partition into blocks of square circulant matrices allowing the traditional fully connected weight matrix to be replaced with a sequence of smaller circulant matrices without sacrificing effectiveness of the network [4]. Therefore, our IDNN chip can potentially realize hundreds of millions of neurons optically in a scalable and power-efficient manner.
In addition, in our experiment, the resistance of each heater is about 350 Ω, and the average electrical power required for a 2π phase shift is 0.77 mW (2.2 mA). In the IDNN chip, only the MZI modulators need to be modulated via heaters and the number of employed modulators is N. As a result, the power consumption of our IDNN chip is significantly reduced compared to conventional on-chip ONNs which typically need N 2 MZI modulators [1][2][3]. In Fig. S1a, we depict the footprint and power consumption varying with the mode number in a traditional MZI-based network. It can be seen that the footprint and power consumption will increase quadratically with the input data dimension. In our experiment, we designed a network with 10 modes, that only has a hardware footprint of 0.53 mm 2 and power consumption of 17.5 mW, which is less than the traditional MZIbased design. When it extends to 64 modes, our design only needs 3.2 mm 2 area and 112 mW power consumption, compared with traditional 213 mm 2 area and 122.88 W. This means that our designed integrated photonic circuits can achieve higher densities of optical neurons and pave the way for super-large scale and programmable photonic neural networks.
The level of loss sustained in a photonic matrix implementation limits its scalability.
This loss is mainly incurred by three parts: (a) the propagation loss in waveguides; (b) photoelectric conversion loss in photodetectors; (c) an extra electrical loss to maintain the phase of heaters. The propagation loss is attributed to two main parts: waveguide scattering loss and loss arising from the diffraction cell. The tested waveguide propagation loss in our experiment is as low as 1.5 dB cm -1 and the multi-mode interferometer (MMI) loss is 0.2 dB. Based on the principle of the overlap integral [5], the power received by the array waveguide can be calculated to be 78% of the input power. The loss of 1 dB of the input power can be attributed to the mode mismatch between the slab and output array waveguides, which can be reduced by introducing mode converters between them. Fig.   S1b shows the simulated optical loss for matrix sizes up to 64 × 64 using experimental loss data, illustrating our design has a strong scalability.

Supplementary Note 3: Wave analysis.
The propagation process of light in the slot waveguide area can be realized by the following Fourier transform formula: Based on the principle of superposition integration [6], we further calculate the coupling coefficient of each array waveguide with the incident light field imaged on the receiving surface ( ): where l  is the coupling coefficient of l th array waveguide and is distribution of eigenmodes at the entrance of the array waveguide, and da is the array waveguide spacing.
The output field distribution of the array waveguide can be expressed as: where   is the fixed phase difference of light propagating in the array waveguide, and ( 2 ) is output field distribution of the array waveguide." Then, the output field of the array waveguide will go into the second slot waveguide area. The simulation of this area is the same as the simulation of the first one, using the principle of Fourier transform to image the output light field distribution of the array waveguide Eo incident on the receiving surface of the output waveguide: where E0 is the eigenmode of the output waveguide, ' is a distribution of eigenmodes of the output array waveguide, and 2 out  is the energy received by the output waveguide.
Following the diffraction Eq. S5, we can simulate the computing results when the light goes through our IDNN. Assuming the light inputs from the centre waveguide, the simulated electric field distributions from waveguides along the k1 axis is shown in Fig.   S3c, which is a Gaussian distribution. These electric field distributions with an added phase difference among the array waveguides propagate the second slab waveguide region, and field distributions along the k2 axis are depicted in Fig. S3d. One notices that the signal is retrieved after going through two diffractive cells (ODFT and OIDFT operations).
We find that the core of the DFT operation is to keep a suitable phase difference of the incident signals from different waveguides before the diffractive cell. As shown in Figs.
S4c and d, different amplitude distributions are produced if the input signals have different phase differences. The phase difference among the waveguides can be calculated based on the number of waveguides [6]. The phase shift, nk  − , between input k and output n can be designed to satisfy the following relation by setting positions of input and output waveguides: We add the following phase offset, k  , before the input k by using length adjustment or a phase shift: Then the phase difference △ between two paths to the output n originating from the inputs k+1 and k is derived as Eq. (S9) indicates that the ODFT operation can be realized with the diffractive cell.  c The amplitude information of the wave that is propagating within the diffraction cell with a stable phase difference ∆ between two paths of input. The output result is the Fourier transform of the input signal. d The amplitude information of the wave that is propagating 9 within the diffraction cell without a phase difference ∆ between two paths of input. The output result is different from the Fourier transform of the input signal.

Supplementary Note 4: Experimental calibration.
The chip calibration is composed of two parts: the calibration of the inner phase and the outer phase. The transform matrix of the MZI can be determined as Where  is the angle of an inner phase shifter and ϕ is the angle of an outer phase shifter.
We first calibrate  from T1 to T20 (Fig. S5c)  . The phase shifter is controlled by a heater and when the power is switched on, the heater induces a refractive index change which causes a relative phase difference between the two arms.
Here the calibration of θ is done by increasing the power to the phase shifter, while measuring the optical power output at the corresponding optical port. The power-phase relation can be represented as θ(I)= I 2 +θ0 where I is the current supplied to the heater, is a constant related to resistance and the material property and θ0 is the initial phase difference between the two arms. The final form for the normalized output voltage can be written as = 0 + 0 cos 2 ( 2 + 0 ). where k is the visibility of the fitting curve and 10 V0 is the maximum amplitude of the signal. The measured interference pattern shows the periodical change by the increase of I 2 in Fig. S6a.
We calibrate the outer phases ϕ from P1 to P10 by detecting Port0 with the maximum interference intensity. Based on the calculation equation of DFT (Eq. (S9)), the signal intensity of Port0 is the sum of light intensities from all input ports. The relative phase of each channel can then be calibrated to achieve the ODFT operation. One notices that the additional phase compensation ( /2  ) should be added to the phase term if the inner phase shifter change  , according to the Eq. (S9). Therefore, the initial phase is a baseline, if  changes,  will compensate accordingly to maintain the accuracy of the calculation. The outer phase ϕ from P11 to P20 also can be calibrated in the same way.

Supplementary Note 5: Mathematical elucidation of correlation algorithms
In the first task of pattern recognitions of 1D sequences and 2D digit images, the goal is to obtain the similarity between two sequences or images. The cross-correlation, which is a commonly used metric for the similarity between two series, is represented by the symbol where are two series and n and j are the position numbers of the discrete sequence.
The mathematical calculation of the correlation is the same as the convolution, except that the signal is not reversed before the multiplication process. Then, the relation between correlation and convolution is expressed as Finally, the correlation is converted to calculate the Hadamard product between the Fourier transforms of the input series f and the target series g. Since the Fourier transform converts real numbers to complex numbers, the Hadamard product in our operation is for complex numbers. In our chip, the complex numbers are easily encoded into the amplitude and phase components with the MZI modulators instead of real and imaginary parts.
According to the transfer function (Eq. (S10)), the output field after the MZI modulator can be expressed as where the amplitude component is encoded into and phase component is encoded into As a result, the correlation can be achieved by modulating the MZI using phase and amplitude modulation to evaluate the similarity between two series.

Supplementary Note 6: Image recognition
In the process of sequence recognition, we note that if the corresponding intensity of the entire sequence to be tested is relatively high, strongest light may also be detected at the feature point (maximum intensity position). When the sequence to be tested is longer than the target sequence, the sequence with the overall strong intensity is more likely to appear.
To solve this problem, we need to further normalize the detected data by considering both the intensities of the sample and aim sequences. For the 2D image recognition, we use 5×5 binarized matrices to express digits. The correlation results for digits (1, 4,9) are shown in Fig. S9. Because the digits of 4 and 9 are highly similar, the correlation result has a large peak in center of the matrix.
We further discuss the issue of applying the Fourier-based convolution in CNN.
For the classical electronic network, the implementation of convolution in Fourier-space has no distinct acceleration advantage for small kernel size that is often used in CNN. As we know, the Fourier-based convolution has three main computing parts: Fourier transform, element multiplication, and inverse Fourier transform. In the optical field, as Fourier transform and inverse Fourier transform can be implemented passively without resource consumption, Fourier-based convolution can still realize a computational acceleration for the small kernel size used in CNN.

Supplementary Note 7: Iris flower
Combined with Eq. S10, for amplitude modulation, the additional phase compensation ( /2  ) should be added to the phase term . However, there is only one parameter  that needs to be modulated with phase-only modulation. It is also meaningful if we can only utilize the phase channel of the modulators to achieve the classification. Because it is easier to achieve in an experiment considering only half heaters are needed to add voltage to modulate the phase.

Supplementary Note 8: MNIST dataset
The MNIST dataset, which includes 10 classes (from 0 to 9), has 60000 images for training and 10000 images for testing. The image data is first compressed to 16 inputs by converting the 28 × 28 grayscale images to the k-space and extracting the low-frequency information

Supplementary Note 9: Fashion-MNIST dataset
In this part, we test the classification performance of the IDNN framework with a more complicated image dataset-the Fashion MNIST dataset, which includes 10 classes with each