Photonic-dispersion neural networks for inverse scattering problems

Inferring the properties of a scattering objective by analyzing the optical far-field responses within the framework of inverse problems is of great practical significance. However, it still faces major challenges when the parameter range is growing and involves inevitable experimental noises. Here, we propose a solving strategy containing robust neural-networks-based algorithms and informative photonic dispersions to overcome such challenges for a sort of inverse scattering problem—reconstructing grating profiles. Using two typical neural networks, forward-mapping type and inverse-mapping type, we reconstruct grating profiles whose geometric features span hundreds of nanometers with nanometric sensitivity and several seconds of time consumption. A forward-mapping neural network with a parameters-to-point architecture especially stands out in generating analytical photonic dispersions accurately, featured by sharp Fano-shaped spectra. Meanwhile, to implement the strategy experimentally, a Fourier-optics-based angle-resolved imaging spectroscopy with an all-fixed light path is developed to measure the dispersions by a single shot, acquiring adequate information. Our forward-mapping algorithm can enable real-time comparisons between robust predictions and experimental data with actual noises, showing an excellent linear correlation (R2 > 0.982) with the measurements of atomic force microscopy. Our work provides a new strategy for reconstructing grating profiles in inverse scattering problems.


Simulation of ARS-measured dispersion patterns
Data set of dispersion patterns measured by Fourier-optics-based angle-resolved imaging spectroscopy (ARS) was generated with rigorous coupled-wave analysis (RCWA) simulation. With a prior knowledge, the analyzed grating profile is modeled as isosceles trapezoids using 4 geometric parameters: top line width w1, bottom line width w2, pitch a and height h. The grating structures were approximately decomposed into several stairs-like blocks during the simulation. To simulate the full angular incidence of near infrared (NIR) light, we calculated the reflectance for every incident angle and wavelength. When we simulated complete dispersion patterns for inversemapping neural network (NN), the incident angle was sampled at 1° intervals within the maximum incident angle provided by the high numerical aperture of 0.95. The wavelength was sampled at 3 nm intervals from 1.0 to 1.65 μm. When we calculated data set for forward-mapping NN with a parameters-to-point architecture, reflectance of points on dispersion patterns with random geometric parameters and coordinates were simulated and stored in the data set labeled with corresponding parameters.
Due to the C2 symmetry of the grating model, only the positive incident angle was simulated. Additionally, the total power received by ARS at each angle θ contains the contribution from both specular reflection and high-order diffractions. Using Laue equation to track all of light contribution, the total received power can be written as where ⌈… ⌉ and ⌊… ⌋ represent ceil and floor operator, k is the wave vector, subscript ∥ stands for the projection of the wave vector onto the grating plane, G = 2πâ/a is the reciprocal lattice vector of the grating, k m is the wave vector incident at the maximum incident angle, I 0 (k)is the intensity of incident light, and R n is the reflectance of the nth-order diffracted light. The reflectance defined as I(k)/I 0 (k) is calculated for every angle and wavelength to form the simulation of ARS measurement. These simulation results of different grating parameters are finally collected in a data set labeled with their geometric parameters. Figure S1: Inverse mapping NN a, Architecture of inverse mapping NN. The neural network contains 6 residual blocks and 2 fully connected layers, where each residual block consists of 2 convolutional layers, an in-to-out shortcut connection and a max pooling layer. Training loss is in the down inset. b, Statistical histograms of reconstruction result by using inverse mapping NN. Left column is the deviation of reconstruction result on noise-free test set, middle column is the performance on noisy test set (Gaussian noises μ=0, σ=0.1), and right column is the performance of NN after date set augmentation on the same noisy test set.
A deep convolutional neural network with residual learning technique is trained to learn the inverse mapping from data space to model space for reconstructing the grating profiles, whose architecture is shown in Fig.S1. The neural network contains 6 residual blocks and 2 fully connected layers, where each residual block consists of 2 convolutional layers and an in-to-out shortcut connection. Here, shortcut connections skip the convolutional layers to simply perform identity mapping whose outputs are directly added to the outputs of the stacked convolutional layers. After the output of the convolutional layers, the subsequent fully connected layers play roles to map those extracted feature maps into probability distribution of each geometric parameter as the final outputs. The neural network is trained on the dataset by minimizing the cost function that is defined by Eq. S2, where the cross entropy function is used to characterize distance between the predicted probability distribution p of each geometric parameter and the ground true distribution δ(x − g) and then averaged over all the training data set; R in is the input of network, z is the output of neural network, m is the size of data set, n is the quantity of output, and q is the number of feature parameters in the model. The coordinate of each probability distribution is not infinite, instead, a prior range, which defaults the probability distribution outside the range is zero. Then, the predicted probability distribution is discretized into q points with 1 nm as the interval. Each element of the matrix g and r stands for the label of training data set and the discrete coordinate of the output respectively.
The objective of the optimization is to narrow the difference between the prediction and the ground true by iteratively tuning parameters θ of the neural network, which can be described as where parameters θ, including the kernels, weights and biases, are initialized randomly with normal distribution and optimized for 400 epochs by Adam optimizer with batch size 1024. Data set contains 60, 000 pairs of dispersion patterns for p-and s-polarized incident light. The initial learning rate is 0.001 and decay 10 times every 100 epochs. Some training tracks such as dropout and l2 regularization are added to FC layers during training time to prevent overfitting with keep probability 0.8 and α=0.001 respectively. Batch normalization is introduced after each convolution layer and before activation to reduce the influence of learning rate settings and parameters initialization on training. Inverse-mapping NN had a novel performance on noise-free test set with 1, 000 pairs of dispersion patterns for p-and s-polarized incident light, as shown in the first column of Fig.S1. While, when we verified its performance on a test set with Gaussian noise (μ=0, σ=0.1), predicting results had large deviations from labels, as shown in the middle column. To improve robustness of NN, we further trained inverse-mapping NN again on augmented data set by adding random Gaussian noise to these photonic dispersion patterns. After training, NN's performance on the same noisy test set improved distinctly, as shown in the right column. It means that inversemapping NN cannot give a precise prediction from a dispersion pattern with unexpected noise, but its robustness can be enhanced by data set augmentation with corresponding type of noise. To achieve robust prediction from measured photonic dispersion with actual noise, we introduced several potential types of random noise to simulate actual measurement, including Gaussian noise, low-frequency noise and Gaussian blur. These types of noise were generated and added to dispersion bands during the training process. Figure S2: Performance of inverse-mapping NN on actual data. To achieve robust prediction from measured photonic dispersion with actual noise, we introduced several potential types of random noise to simulate actual measurement, including Gaussian noise, low-frequency noise and Gaussian blur. These types of noise were generated and added to dispersion bands during the training process. After data set augmentation, inverse-mapping NN is performed on actual data (the data correspond to Fig. 5 of the manuscript). Since these introduced noises cannot cover all experimental noises, for some samples, some predictions of NN having large deviations from the AFM data. Figure S3: Generation performance of NN with a traditional parameters-to-spectrum architecture. a, NN had 4 hidden layers with 2, 000 neurons per layer. Geometric parameters of SOI grating were input to NN, and the output was an array of reflectance which were further reshaped into a 2D dispersion pattern with 200×51 pixels. NN was trained on a data set containing 60, 000 dispersion patterns (about 14 GB) for s-polarized incident light and tested on another data set whose examples never participated in the training process. Mean square error (MSE) was used as the cost function to characterize the difference between the generated patterns and ground truths. The weight parameters in NN were trained using an Adam optimizer with batch size 1024 for 500 epochs. The initial learning rate was set to be 0.005, and was lowered by 10 times every 100 epochs. b, Training loss. c and d, Comparison between simulated and generated photonic dispersions. Here, around 6×10 7 parameters were used to construct the NN, but its generation performance was unappealing. It could be found that NN was able to generate these thin-film-like features but fail to Fano-shaped features. We believed that with more complex NN architecture and more NN parameters used, the performance of the NN with a parameters-to-spectrum architecture would improve. While, at this time, enormous parameters will make the NN too cumbersome to use. Figure S4: Generation performance of NN with a parameters-to-point architecture. The data correspond to Fig. 2 c-d of the manuscript. a, Generation performance of photonic dispersion with p/s polarized light. b, Detailed comparison between the slices of generated and simulated dispersion patterns. Figure S5: Visualization of hybrid optimization algorithm. The proposed hybrid optimization algorithm contains gradient descent algorithm and greedy/search algorithm. Minimizing a cost function is the objective of this algorithm. Here, cost functions have many options and we choose MSE for reconstruction task. a, Several initial points (10 for example) are chosen randomly on parameter space as the beginning. The corresponding dispersion patterns of these points are subsequently generated by forward-mapping NN. Then, cost function between generated and detected dispersion is obtained. By using back propagation algorithm, we calculated the gradients of parameters for gradient descent algorithm. After iterating step b and c 70 times，the iteration was stopped prematurely. The point with the minimum cost function value is chosen as the candidate solution. d, Greedy/search algorithm starts from the candidate solution to find the final solution.

Generation performance of NN with a parameters-to-point architecture
Table S1： Pseudocode of optimization process. In the first step, gradient descent algorithm starts from 10 initial points to approach the global minimum value with the step length α =0.001 and exponential decay coefficient β =0.99. After convergence, parameters with the minimum cost function was selected as the candidate solution. In the second step, greedy search is then started from the candidate solution. We scanned pitch-height space and w1-w2 space alternately to get reconstruction results. Due to the searching range in our algorithm is an expanded space of radius 5 nm centered at the candidate point, search process is much efficient and always converges in 2~3 iterations.

Fig. S6 Statistical histograms of reconstruction results using forward-mapping algorithm. a,
Deviations on noise-free dispersions. b Same as a but with Gaussian noises (μ=0, σ=0.2). Figure S7: Statistics results of reconstruction from bands with more noise. Statistics results of reconstruction from bands with more noise (peak-like noise, Gaussian blur and low-frequency noise). Peak-like noise is formed by generating a random number (2~5) of Gaussian peak (A∈[-0.1, 0.1], σ∈[10, 30] pixels) at random location on the simulated bands. Gaussian blur is performed by convolution with 5×5 pixels Gaussian kernel. Low-frequency noise is a kind of integral perturbation which is composed of a random number (2~5) of horizontal and vertical sine-like noise. Figure S8: Noise influence on Fano region and non-Fano region. a, Photonic band with s-and p-polarization and are divided into Fano region (gray block) and non-Fano region (yellow block). Identical noises were generated in Fano region and non-Fano region respectively to view the difference in reconstruction result. b, 5 types of noises are considered: white noise, low-frequency noise, peak-like noise, Gaussian blur and bias Gaussian blur. c, Slices of photonic bands. d, Reconstruction results. It is interesting to compare the influence caused by Gaussian blur and bias Gaussian blur. Large parameters' deviations only occur when Fano region is convoluted with a bias Gaussian kernel. For a Gaussian kernel, the convolution only smoothed the peak but didn't change the peak position. It shows that Fano-shape dispersion has robustness to the perturbation on the amplitude of the peak. For a bias Gaussian kernel, it led to a shift in peak position at the same time which led large variation in MSE. Note that the location of these peaks is determined by our well-calibrated spectrometer. Thus, peak position shift should be viewed as a measurement error but not noises. Figure S9: Reconstruction results of experimental data. The data correspond to Fig. 5 of the manuscript. a, Comparisons of photonic dispersions: measured by ARS, generated by forwardmapping NN using optimal parameters, and simulated by RCWA using AFM measured parameters. b, Further comparisons of slices. Figure S10: Repeatability test of optimization algorithm. Statistical result of reconstruction from measured photonic band illustrated in Fig. 5(a) with 500 random initial value. The algorithm can find the same solution every time with no more than 0.1 nm deviation.

Parameter space evolution with different acceptance angle
Figure S11: Parameter space evolution with different acceptance angle a, horizontal axis stands for the acceptance angle range, and vertical axis for log MSE between two simulated photonic bands with 10 nm variation in geometrical parameters. b, parameter space visualization with different acceptance angle range, z-axis of parameter space is log MSE. With wider angle range considered, the difference between two photonic band increased significantly. The minimal variation of parameter is 0.5 nm between two neighboring points. Figure S12: Parameter separation. a, MSE distribution on pitch-height space and w1-w2 space. b, by changing azimuth from 0° to 20° and increasing acceptance angle to 85°, the narrow axis of the canyon is gradually widened that improves the convergence behavior of algorithm from every direction. FmNN stands for forward-mapping NN, ImNN for inverse-mapping NN. Gray lines are simulated dispersion slices using the prediction results of inverse-mapping NN. Green lines are generated dispersion slices after fine tuning.   Photonic bands were input into algorithm to reconstruct grating's profile, and iso-frequency contours reflected symmetry of 2D grating. c, Reconstruction results of 2D grating.