Data-driven acceleration of photonic simulations

Designing modern photonic devices often involves traversing a large parameter space via an optimization procedure, gradient based or otherwise, and typically results in the designer performing electromagnetic simulations of a large number of correlated devices. In this paper, we investigate the possibility of accelerating electromagnetic simulations using the data collected from such correlated simulations. In particular, we present an approach to accelerate the Generalized Minimal Residual (GMRES) algorithm for the solution of frequency-domain Maxwell’s equations using two machine learning models (principal component analysis and a convolutional neural network). These data-driven models are trained to predict a subspace within which the solution of the frequency-domain Maxwell’s equations approximately lies. This subspace is then used for augmenting the Krylov subspace generated during the GMRES iterations, thus effectively reducing the size of the Krylov subspace and hence the number of iterations needed for solving Maxwell’s equations. By training the proposed models on a dataset of wavelength-splitting gratings, we show an order of magnitude reduction (~10–50) in the number of GMRES iterations required for solving frequency-domain Maxwell’s equations.


Introduction
Numerically solving Maxwell's equations is often required in a large number of design problems, ranging from integrated photonic devices to RF antennas and filters. Recent advances in computational capabilities have opened up the door for algorithmic design of electromagnetic devices for a number of applications 1-3 -performing the electromagnetic simulations are usually the primary bottleneck in such design algorithms. However, during such a design process, the electromagnetic simulations being performed are on correlated permittivity distributions (e.g. permittivity distributions generated at different steps of a gradient-based design algorithm). The availability of such data opens up the possibility of using data-driven approaches for accelerating electromagnetic simulations.
Accelerating simulations in a gradient-based optimization algorithm has been previously investigated in structural engineering designs via Krylov subspace recycling 4, 5 -the key idea in such approaches is to utilize simulations performed in an optimization trajectory to speed up future simulations in the same trajectory. However, these approaches become computationally infeasible if an attempt is made to exploit the full extent of the available simulation data (e.g. use simulations performed in a large number of correlated optimization trajectories). Data-driven methods for solving partial-differential equations have only recently been investigated, with demonstration of learning an 'optimal' finite-difference stencil 6 for time-domain simulations, or using neural networks for solving partial differential equations 7 .
In this paper, we investigate the possibility of accelerating finite difference frequency domain (FDFD) simulation of Maxwell's equations using data-driven models. Performing a FDFD simulation is equivalent to solving a large sparse system of linear equations, which is typically done using an iterative solver such as Generalized Minimal Residual (GMRES) algorithm 8 . Here we develop an accelerated solver (data-driven GMRES) by interfacing a machine learning model with GMRES. Since the simulation is still being performed with an iterative solver, it is guaranteed that the result of the simulation will be accuratethe performance of the machine learning model only affects how fast the solution is obtained. This is a major advantage of this approach over other data-driven attempts for solving Maxwell's equations 9, 10 that have been presented so far, in which case a misprediction by the model would result in an inaccurate simulation. Using wavelength-splitting gratings as an example, we show an order of magnitude reduction in the number of GMRES iterations required for solving frequency-domain Maxwell's equations.

Data-driven GMRES
In the frequency domain, Maxwell's equations can be reduced to a partial differential equation relating the electric field E(x) to its source J(x): where ω is the frequency of the simulation, and ε(x) is the permittivity distribution as a function of space. The finite difference frequency domain method 11 is a popular approach for numerically solving this partial differential equation -it discretizes this equation on the Yee grid with perfectly matched layers together and periodic boundary conditions used for terminating the simulation domain to obtain a system of linear equations, A f = b, where A is a sparse matrix describing the operator ∇ × ∇ × −ω 2 ε(x)/c 2 and b is a vector describing the source term −iω µ 0 J(x) on the Yee grid. For large-scale problems, this system of equation is typically solved via a Krylov subspace-based iterative method 8 . Krylov subspace methods have an advantage that they only access the matrix A via matrix-vector products, which can be performed very efficiently since A is a sparse matrix. The iterative algorithm that we focus on in this paper is GMRES 8 . In i th iteration of standard GMRES, the solution to A f = b is approximated by f i , where: . GMRES is a completely data-free algorithm -it only requires knowledge of the source vector b and the ability to multiply the matrix A with an arbitrary vector.
The GMRES iteration can be significantly accelerated if an estimate of the solution A −1 b is known. To this end, we train a data-driven model on simulations of correlated structures to predict a (low-dimensional) subspace V within which A −1 b is expected to lie. More specifically, the model predicts These vectors can then be used to augment the GMRES iterations (we refer to the augmented version of GMRES as data-driven GMRES throughout the paper) -the i th iteration of data-driven GMRES can be formulated as: whereÃ, andb are given by: where P ⊥ (Av 1 , Av 2 . . . Av N ) is the operator projecting a vector out of the space spanned by Av 1 , Av 2 . . . Av N . An efficient update algorithm for data-driven GMRES can be formulated in a manner identical to that formulated for Generalized Conjugate Residual with inner Orthogonalization and outer Truncation (GCROT) 5, 12 .

Results
We investigate two data-driven models to predict the vectors v 1 , v 2 . . . v N : principal component analysis and a convolutional neural network. The dataset that we use for training and evaluating these models comprises of a collection of 2D grating splitters [ Fig. 1(a)] which reflect an incident waveguide mode at λ = 1.4 µm and transmit an incident waveguide mode at λ = 1.55 µm. Throughout this paper, we focus on accelerating simulations of the grating splitters at λ = 1.4 µm -so as to train our data-driven models, we provide the dataset with the full simulations of the electric fields in the grating splitter at 1.4 µm [ Fig. 1(b)]. Additionally, we inuitively expect a well designed data-driven model to perform better if supplied with an approximation to the simulated field as input for predicting the subspace V . To this end, we provide the dataset with effective index simulations 13 of the electric fields in the grating splitter [ Fig. 1(b)]. The effective index simulations are very cheap to perform since they are equivalent to the solving Maxwell's equations in 1D, making them an attractive approximation to the simulated field that the data-driven model can use as an input.

Principal component analysis
The first data-driven model that we consider for accelerating FDFD simulations is using the principal components 14 computed from the simulated fields in the training dataset as v 1 , v 2 . . . v N . The first 5 principal components of the training dataset are shown in Fig. 2(a). The first two principal components appear like fields "reflected" from the grating region, whereas the higher order principal components capture fields that are either transmitted or scattered away from the grating devices. Note that the principal components are not necessarily solutions to Maxwell's equations for a grating structure, but provide an estimate of a basis on which the solutions of Maxwell's equations for the grating structures can be accurately represented. Fig. 2(b) shows the residual r i = ||A f i − b|| 2 /||b|| 2 as a function of the number of iterations i when using data-driven GMRES with v 1 , v 2 . . . v N being the first N principal components of the training dataset to simulate a structure from the validation dataset. We clearly see an order of magnitude speed up in convergence rate for GMRES when supplemented with ≥ 5 principal components. Note that a typical trajectory of data-driven GMRES shows a significant reduction in the residual in the first iteration. This corresponds to GMRES finding the most suitable vector minimizing the residual within the space spanned by b and the supplied principal components. Moreover, the residuals in the data-driven GMRES decrease more rapidly than in GMRES. This acceleration can be attributed to the fact that the Krylov subspace generated corresponds to the matrixÃ defined in Eq. 4 instead of A. Since A corresponds to a double derivative operator, its application is equivalent to convolving the electric field on the grid with a 3 × 3 filter. Therefore, A generates a Krylov subspace almost on a pixel by pixel basis 15 . On the other hand, application ofÃ to a vector is equivalent to application of A (which is a 3 × 3 filter) followed by the projection operator P ⊥ (Av 1 , Av 2 . . . Av N ) (which is a fully dense operator).Ã can thus generate a Krylov subspace that spans the entire simulation region within the first few iterations, leading to a larger decrease in the residual r i in data-driven GMRES as compared to GMRES.
Finally, we study the amount of training data we require to accelerate FDFD. In these calculations, the number of principal components computed equals the number of training samples used for their computations. Note that ∼ 200 training data-samples are sufficient to saturate the acceleration that the data-driven GMRES with principal components can provide over GMRES. This is indicative of the fact that generating training data for the purposes of accelerating GMRES need not necessarily be a compute intensive task. We note that the size of the smallest dataset needed to saturate the acceleration provided by data-driven GMRES is dependent on the specific class of problems being solved, and would likely be different for other devices.

Convolutional neural network
While using principal components achieves an order of magnitude speed up over GMRES, this approach predicts the same subspace V irrespective of the permittivity distribution being simulated. Intuitively, it might be expected that a data-driven model that specializes V to the permittivity distribution being simulated would unlock an even greater speed up over GMRES. To this end, we train a convolutional neural network that takes as input the permittivity distribution of the grating device as well as the effective index electric field and predicts the vectors v 1 , v 2 . . . v N [ Fig. 3(a)] that can be used in data-driven GMRES to simulate the permittivity distribution under consideration.
To train the convolutional neural network, we consider two loss functions: a projection loss function and a residual loss function. For the k th training example, the projection loss l (k) proj is defined by the square of length of the simulated field f (k) that is perpendicular to the space spanned by the vectors v 1 , v 2 . . . v N relative to the square of the length of f (k) :  Fig. 3(b) shows the performance of a convolutional neural network trained with the projection loss function for N = 5. From the data-driven GMRES convergence plots, we note that the data-driven GMRES supplied with the output of the convolutional neural network performs significantly better than GMRES. However, it performs comparably to data-driven GMRES supplied with the first N principal components even though the convolutional neural network specializes the vectors v 1 , v 2 . . . v N to the structure being simulated. In particular, we observe that the residual after one data-driven GMRES step r 1 is significantly lower while using the principal components as opposed to the convolutional neural network [histogram in Fig. 3(b)].
The residual loss function can remedy this issue. For the k th training example, the residual loss l (k) res is defined by: where A (k) is the sparse matrix corresponding to the operator ∇ × ∇ × −ω 2 ε (k) /c 2 for the k th training example. Note that, similar to the projection loss, 0 ≤ l . However, unlike the projection loss function, the residual loss function is an unsupervised loss function i.e. the simulated electric fields for the structure are not required for its computation. Fig. 3(c) shows the performance of the convolutional neural network trained with the residual loss function for N = 5. As can be seen from the data-driven GMRES convergence plots, data-driven GMRES performs significantly better than GMRES. Moreover, from the histogram of the residual r 1 obtained after one iteration of data-driven GMRES, we see that using the convolutional neural network trained on the residual loss function results in a much better initialization of GMRES as compared to when using principal components. This initialization allows data-driven GMRES with a convolutional neural network to perform comparably, if not better than, data-driven GMRES with principal components.
Finally, we note that the use of fields obtained with an effective index simulation as an input to the convolutional neural network significantly improves the performance of the convolutional neural network. Empirically, we observed that if the convolutional neural network was trained with the projection loss function but without the effective index fields as an input, the training loss would saturate at ∼ 0.2, indicating that on an average approximately 20% of the simulated fields f (k) lie outside the subspace V whereas with effective index fields as input, the training loss would saturate at a significantly lower value of ∼ 0.05 indicating that 95% of the simulated field f (k) lies in the subspace V .

Conclusion
In conclusion, we present a framework for accelerating finite difference frequency domain (FDFD) simulations of Maxwell's equations using data-driven models that can exploit simulations of correlated permittivity distributions. We analyze two data-driven models, based on principal component analysis and a convolutional neural network, to accelerate these simulations, and show that these models can unlock an orders of magnitude acceleration over data-free solver. Such data-driven methods 4/6 Figure 3. (a) Schematic of the CNN based data-driven GMRES -a convolutional neural network takes as input the permittivity and effective index field and produces as an output the vectors v 1 , v 2 . . . v N . These vectors are then supplied to the data-driven GMRES algorithm, which produces the full simulated field. Performance of the data-driven GMRES when the convolutional neural network is trained with (b) projection loss function and (c) residual loss function. In both cases, we show the resiudal r i as a function of number of iterations i for 5 randomly chosen samples from the evaluation dataset, and histogram of the distribution of initial residual r 1 over the evaluation dataset. Note that we perform this analysis for data-driven GMRES supplied with principal components (blue) and the output of the CNN (orange). N = 5 is assumed in all these calculations.
would likely be important in scenarios where a large number of simulations with similar permittivity distributions are performed e.g. during a gradient-based optimization of a photonic device.

Dataset
The grating splitters are designed to reflect an incident waveguide mode at λ = 1.4 um and transmit an incident waveguide mode at λ = 1.55 um using a gradient-based design technique similar to that used for grating couplers 3 . Different devices in the dataset are generated by seeding the optimization with a different initial structure. Note that all the devices generated at different stages of the optimization are part of the dataset. Consequently, the dataset not only has grating splitters that have a discrete permittivity distribution (i.e. have only two materials -silicon and silicon oxide), but also grating splitters that have a continuous distribution (i.e. permittivity of the grating splitter can assume any value between that of silicon oxide and silicon). Moreover, the dataset has poorly performing grating splitters (i.e. grating splitters generated at the initial steps of the optimization procedure) as well as well-performing grating splitters (i.e. grating splitters generated towards the end of the optimization procedure). Our dataset has a total of ∼ 30, 000 examples, which we split into a training data set (75%) and an evaluation dataset (25%).

Implementation of data-driven models
Both the data-driven models (PCA and CNN) were implemented using the python library Tensorflow 16 . Any complex inputs (e.g. effective index fields) to the CNN were fed as an image of depth 2 comprising of the real and imaginary parts of the complex input. We use the ADAM optimizer 17 with a batch size of 30 for training the convolutional neural network -it required ∼10,000 steps to train the network.