Abstract
Surrogate models for partial differential equations are widely used in the design of metamaterials to rapidly evaluate the behavior of composable components. However, the training cost of accurate surrogates by machine learning can rapidly increase with the number of variables. For photonicdevice models, we find that this training becomes especially challenging as design regions grow larger than the optical wavelength. We present an activelearning algorithm that reduces the number of simulations required by more than an order of magnitude for an NN surrogate model of opticalsurface components compared to uniform random samples. Results show that the surrogate evaluation is over two orders of magnitude faster than a direct solve, and we demonstrate how this can be exploited to accelerate largescale engineering optimization.
Introduction
Designing metamaterials or composite materials, in which computational tools select composable components to recreate desired properties that are not present in the constituent materials, is a crucial task for a variety of areas of engineering (acoustic, mechanics, thermal/electronic transport, electromagnetism, and optics)^{1}. For example, in metalenses, the components are subwavelength scatterers on a surface, but the device diameter is often >10^{3} wavelengths^{2}. Applications of such optical structures include ultracompact sensors, imaging, and spectroscopy devices used in cell phone cameras and in medical applications^{2}. As the metamaterials become larger in scale and as the manufacturing capabilities improve, there is a pressing need for scalable computational design tools.
In this work, surrogate models were used to rapidly evaluate the effect of each metamaterial components during device design^{3}, and machine learning is an attractive technique for such models^{4,5,6,7}. However, in order to exploit improvements in nanomanufacturing capabilities, components have an increasing number of design parameters and training the surrogate models (using bruteforce numerical simulations) becomes increasingly expensive. The question then becomes: How can we obtain an accurate model from minimal training data? We present an activelearning approach—in which training points are selected based on an error measure—that can reduce the number of training points by more than an order of magnitude for a neuralnetwork (NN) surrogate model of partial differential equations (PDEs). Further, we show how such a surrogate can be exploited to speed up largescale engineering optimization by >100×. In particular, we apply our approach to the design of optical metasurfaces: large (10^{2}–10^{6} wavelengths λ) aperiodic nanopattered (≪λ) structures that perform functions such as compact lensing^{8}.
Metasurface design can be performed by breaking the surface into unit cells with a few parameters each (Fig. 1) via domaindecomposition approximations^{3,9}, learning a surrogate model that predicts the transmitted optical field through each unit as a function of an individual cell’s parameters, and optimizing the total field (e.g. the focal intensity) as a function of the parameters of every unit cell^{3} (see “Results”). This makes metasurfaces an attractive application for machine learning because the surrogate unitcell model is reused millions of times during the design process, amortizing the cost of training the model based on expensive exact Maxwell solves sampling many unitcell parameters. For modeling the effect of 1–4 unitcell parameters, Chebyshev polynomial interpolation can be very effective^{3}, but encounters an exponential curse of dimensionality with more parameters^{10,11}. In this paper, we find that an NN can be trained with orders of magnitude fewer Maxwell solves for the same accuracy with 10 parameters, even for the most challenging case of multilayer unit cells many wavelengths (>10λ) thick. In contrast, we show that subwavelengthdiameter design regions (considered by several other authors^{4,5,6,7,12,13}) require orders of magnitude fewer training points for the same number of parameters (Fig. 2), corresponding to the physical intuition that wave propagation through subwavelength regions is effectively determined by a few effectivemedium parameters^{14}, making the problems effectively lowdimensional. In contrast to typical machinelearning applications, constructing surrogate models for physical model such as Maxwell’s equations corresponds to interpolating smooth functions with no noise, and this requires approaches to training and active learning as described in the “Results” section. We believe that these methods greatly extend the reach of surrogate model for metamaterial optimization and other applications requiring moderateaccuracy highdimensional smooth interpolation.
Recent work has demonstrated a wide variety of opticalmetasurface design problems and algorithms. Different applications^{15} such as holograms^{16}, polarization^{17,18}, wavelength^{19}, depthoffield^{20}, or incident angledependent functionality^{21} are useful for imaging or spectroscopy^{22,23}. Pestourie et al.^{3} introduced an optimization approach to metasurface design using the Chebyshev polynomial surrogate model, which was subsequently extended to topology optimization (~10^{3} parameters per cell) with online Maxwell solvers^{24}. Metasurface modeling can also be composed with signal/imageprocessing stages for optimized endtoend design^{25,26}. Previous work demonstrated NN surrogate models in optics for a few parameters^{27,28,29}, or with more parameters in deeply subwavelengthdesign regions^{4,12}. As shown in Fig. 2, deeply subwavelength regions pose a vastly easier problem for NN training than parameters spread over larger diameters. Another approach involves generative design, again typically for subwavelength^{6,7} or wavelengthscale unit cells^{30}, in some cases in conjunction with largerscale models^{5,12,13}. A generative model is essentially the inverse of a surrogate function: instead of going from geometric parameters to performance, it takes the desired performance as an input and produces the geometric structure, but the mathematical challenge appears to be closely related to that of surrogates.
Active learning is connected with the field of uncertainty quantification (UQ), because active learning consists of adding the most uncertain points to training set in an iterative way (Figs. 3 and 4) and hence it requires a measure of uncertainty. Our approach to UQ is based on the NNensemble idea of ref. ^{31} due to its simplicity. There are many other approaches for UQ^{32,Sec. 5]}, but ref. ^{31} demonstrated performance and scalability advantages of the NNensemble approach. This approach is an instance of Bayesian deep learning^{32}. In contrast, Bayesian optimization relies on Gaussian processes that scale poorly (~N^{3} where N is the number of training samples)^{33,34}. The work presented here achieves training time efficiency (we show an order of magnitude reduction sample complexity), design time efficiency (the actively learned surrogate model is at least two orders of magnitude faster than solving Maxwell’s equations), and realistic largescale designs (due to our optimization framework^{3}), all in one package.
Results
Metasurfaces and surrogate models
In this section, we present the NN surrogate model used in this paper, for which we adopt the metasurface design formulation from ref. ^{3}. The first step of this approach is to divide the metasurface into unit cells with a few geometric parameters p each. For example, Fig. 1 shows several possible unit cells: (Fig. 1a) a rectangular pillar (fin) etched into a 3d dielectric slab^{35} (two parameters); (Fig. 1b) an Hshaped unit cell (four parameters) in a dielectric slab^{4}; or (Fig. 1e) a multilayered 2d unit cell with ten holes of varying widths considered in this paper. As depicted in Fig. 1c, d, a metasurface consists of an array of these unit cells. The second step is to solve for the transmitted field (from an incident plane wave) independently for each unit cell using approximate boundary conditions^{3,24,35,36}, in our case a locally periodic approximation (LPA) based on the observation that optimal structures often have parameters that mostly vary slowly from one unit cell to the next (ref. ^{3} has a detailed section and a figure about this approximation \({}{{(\rm{Sec}}.\, 2.1,\,{\rm{Fig}}\,.2)}\); other approximate boundary conditions are also possible^{9}). For a subwavelength period, the LPA transmitted far field is entirely described by a single number—the complex transmission coefficient t(p). One can then compute the field anywhere above the metasurface by convolving these approximate transmitted fields with a known Green’s function—a neartofarfield transformation^{37}. Finally, any desired function of the transmitted field, such as the focalpoint intensity, can be optimized as a function of the geometric parameters of each unit cell^{3}.
In this way, optimizing an optical metasurface is built on top of evaluating the function t(p) (transmission through a single unit cell as a function of its geometric parameters) thousands or even millions of times—once for every unit cell, for every step of the optimization process. Although it is possible to solve Maxwell’s equations online during the optimization process, allowing one to use thousands of parameters p per unit cell requires substantial parallel computing clusters^{24}. Alternatively, one can solve Maxwell’s equations offline (before metasurface optimization) in order to fit t(p) to a surrogate model:
which can subsequently be evaluated rapidly during metasurface optimization (perhaps for many different devices). For similar reasons, surrogate (or reducedorder) models are attractive for any design problem involving a composite of many components that can be modeled separately^{6,7,38}. The key challenge of the surrogate approach is to increase the number of design parameters, especially in nonsubwavelength regions as discussed in Fig. 2.
In this paper, the surrogate model for each of the real and imaginary parts of the complex transmission is an ensemble of J = 5 independent NNs with the same training data but different random batches^{39} on each training step. Each of NN i is trained to output a prediction μ_{i}(p) and an error estimate σ_{i}(p) for every set of parameters p. To obtain these μ_{i} and σ_{i} from training data y(p) (from bruteforce offline Maxwell solves) we minimize^{31}:
over the parameters Θ_{i} of NN i. Equation (2) is motivated by problems in which y was sampled from a Gaussian distribution for each p, in which case μ_{i} and \({\sigma }_{i}^{2}\) could be interpreted as mean and heteroskedastic variance, respectively^{31}. Although our exact function t(p) is smooth and noise free, we find that Eq. (2) still works well to estimate the fitting error, as demonstrated in Fig. 5. Each NN is composed of an input layer with 13 nodes (10 nodes for the geometry parameterization—p ∈ [0, 1]^{10}—and 3 nodes for the onehot encoding^{39} of three frequencies of interest), three fullyconnected hidden layers with 256 rectified linear units (ReLU^{39}), and one last layer containing one unit with a scaled hyperbolictangent activation function^{39} (for μ_{i}) and one unit with a softplus activation function^{39} (for σ_{i}). Given this ensemble of J NNs, the final prediction μ_{*} (for the real or imaginary part of t(p)) and its associated error estimate σ_{*} are combined as^{31}
Subwavelength is easier: effect of diameter
Before performing active learning, we first identify the regime where active learning can be most useful: unitcell design volumes that are not small compared to the wavelength λ. Previous work on surrogate models^{4,5,6,7,12,13} demonstrated NN surrogates (trained with uniform random samples) for unit cells with ~10^{2} parameters. However, these NN models were limited to a regime where the unitcell degrees of freedom lay within a subwavelengthdiameter volume of the unit cell. To illustrate the effect of shrinking design volume on NN training, we trained our surrogate model for three unit cells (Fig. 2b): the main unit cell of this study is 12.5λ deep, the small unit cell is a vertically scaleddown version of the normal unit cell only 1.5λ deep, and the smallest unit cell is a version of the small unit cell further scaled down (both vertically and horizontally) by 10×. Figure 2a shows that, for the same number of training points, the fractional error (defined in “Methods”) on the test set of the small unit cell and the smallest unit cell are, respectively, one and two orders of magnitude better than the error of the main unit cell when using 1000 training points or more. (The surrogate output is the complex transmission \(\tilde{t}\).) That is, Fig. 2a shows that in the subwavelengthdesign regime, training the surrogate model is far easier than for larger design regions (>λ).
Physically, for extremely subwavelength volumes wave propagation is accurately approximated by an averaged effective medium^{14}, so there are effectively only a few independent design parameters regardless of the number of geometric degrees of freedom. (Effectivemedium theory, also called homogenization theory, arises from the fact that extremely subwavelength features affect waves only in an averaged sense, in the same way that light propagating through glass can be described using a refractive index rather than by explicitly modeling scattering from individual atoms.) Quantitatively, we find that the Hessian of the trained surrogate model (secondderivative matrix) in the smallest unitcell case is dominated by only two singular values—consistent with a function that effectively has only two free parameters—with the other singular values being more than 100× smaller in magnitude; for the other two cases, many more training points would be required to accurately resolve the smallest Hessian singular values. A unit cell with large designvolume diameter (≫λ) is much harder to train, because the dimensionality of the design parameters is effectively much larger.
Activelearning algorithm
Here, we present an online algorithm to choose training points that is significantly better at reducing the error than choosing points from a random uniform distribution. As described below, we select the training points where the estimated model error is largest, given the estimated error σ_{*}.
The online algorithm used to train each of the real and imaginary parts is outlined in Figs. 3 and 4. Initially we choose n_{init} uniformly distributed random points \({{\bf{p}}}_{1},{{\bf{p}}}_{2},...,{{\bf{p}}}_{{n}_{{\rm{init}}}}\) to train a first iteration \({\tilde{t}}^{0}({\bf{p}})\) over 50 epochs^{39}. Then, given the model at iteration i, we evaluate \({\tilde{t}}^{i}({\bf{p}})\) (which is orders of magnitude faster than the Maxwell solver) at M × K points sampled uniformly at random and choose the K points that correspond to the largest \({\sigma }_{* }^{2}\). We perform the expensive Maxwell solves only for these K points, and add the newly labeled data to the training set. We train \({\tilde{t}}^{i+1}({\bf{p}})\) with the newly expended training set, using \({\tilde{t}}^{i}\) as a warm start. We repeat this process T times.
Essentially, the method works because the error estimate σ_{*} is updated every time the model is retrained with an expended dataset. In this way, the model tells us where it does poorly by setting a large σ_{*} for parameters p where the estimation would be bad in order to minimize Eq. (2).
Activelearning results
Our algorithm achieves an orderofmagnitude reduction in training data.
We compared the fractional errors of a NN surrogate model trained using uniform random samples with an identical NN trained using an activelearning approach, in both cases modeling the complex transmission of a multilayer unit cell with ten independent parameters (Fig. 5, inset). With the notation of our algorithm in Fig. 4, the baseline corresponds to T = 0, and n_{init} equal to the total number of training points. This corresponds to no active learning at all, because the n_{init} points are chosen from a random uniform distribution. In the case of active learning, n_{init} = 2000, M = 4, and we computed for K = 500, 1000, 2000, 4000, 8000, 16,000, 32,000, 64,000, and 128,000. Although three orders of magnitude on the loglog plot is too small to determine if the apparent linearity indicates a power law, Fig. 5 shows that the lower the desired fractional error, the greater the reduction in training cost compared to the baseline algorithm; the slope of the activelearning fractional error (−0.2) is about 30% steeper than that of the baseline (−0.15). The activelearning algorithm achieves a reasonable fractional error of 0.07 in 12 times less points than the baseline, which corresponds to more than one order of magnitude saving in training data (much less expensive Maxwell solves). This advantage would presumably increase for a lower error tolerance, though computational costs prohibited us from collecting orders of magnitude more training data to explore this in detail. For comparison and completeness, Fig. 5 shows fractional errors using Chebyshev interpolation (for the blue frequency only). Chebyshev interpolation has a much worse fractional error for a similar number of training points. Chebyshev interpolation suffers from the curse of dimensionality—the number of training points is exponential with the number of variables. The two fractional errors shown are for three and four interpolation points in each of the dimensions, respectively. In contrast, NNs are known to mitigate the curse of dimensionality^{40}.
Application to metamaterial design: we used both surrogates models to design a multiplexer—an optical device that focuses different wavelength at different points in space. The actively learned surrogate model results in a design that much more closely matches a numerical validation than the baseline surrogate (Fig. 6). As explained in the “Results” section, we replace a Maxwell’s equations solver with a surrogate model to rapidly compute the optical transmission through each unit cell; a similar surrogate approach could be used for optimizing many other complex physical systems. In the case of our twodimensional unit cell, the surrogate model is two orders of magnitude faster than solving Maxwell’s equations with a finite difference frequency domain (FDFD) solver^{41}. The speed advantage of a surrogate model becomes drastically greater in three dimensions, where PDE solvers are much more costly while the surrogate model remains the same.
The surrogate model is evaluated millions of times during a metastructure optimization. We used the actively learned surrogate model and the baseline surrogate model (uniform random training samples), in both cases with 514,000 training points, and we optimized a tenlayer metastructure with 100 unit cells of period 400 nm for a multiplexer application—where three wavelengths (blue: 405 nm, green: 540 nm, and red: 810 nm) are focused on three different focal spots (−10 μm, 60 μm), (0, 60 μm), and (+10 μm, 60 μm), respectively. The diameter is 40 μm and the focal length is 60 μm, which corresponds to a numerical aperture of 0.3. Our optimization scheme tends to yield results robust to manufacturing errors^{3} for two reasons: first, we optimize for the worst case of the three focal spot intensities, using an epigraph formulation^{3}; second, we compute the average intensity from an ensemble of surrogate models that can be thought of as a Gaussian distribution \(\tilde{t}({\bf{p}})={\mu }_{* }({\bf{p}})+{\sigma }_{* }({\bf{p}})\epsilon\) with \(\epsilon \sim {\mathcal{N}}(0,1)\), and μ_{*} and σ_{*} are defined in Eq. (3) and Eq. (4), respectively,
where G is a Green’s function that generates the far field from the sources of the metastructure^{3}. The resulting optimized structure for the activelearning surrogate is shown in Fig. 6c.
In order to compare the surrogate models, we validate the designs by computing the optimal unit cell fields directly using a Maxwell solver instead of using the surrogate model. This is computationally easy because it only needs to be done once for each of the 100 unit cells instead of millions of times during the optimization. The focal lines—the field intensity along a line parallel to the twodimensional metastructure and passing through the focal spots—resulting from the validation are exact solutions to Maxwell’s equations assuming the LPA (see “Results” section). Figure 6a, b shows the resulting focal lines for the activelearning and baseline surrogate models. A multiplexer application requires similar peak intensity for each of the focal spots, which is achieved using worstcase optimization^{3}. Figure 6a, b shows that the actively learned surrogate has ≈3× smaller error in the focal intensity compared to the baseline surrogate model. This result shows that not only is the activelearning surrogate more accurate than the baseline surrogate for 514,000 training points but also the results are more robust using the activelearning surrogate—the optimization does not drive the parameters towards regions of high inaccuracy of the surrogate model. Note that we limited the design to a small overall diameter (100 unit cells) mainly to ease visualization (Fig. 6c), and we find that this design can already yield good focusing performance despite the small diameter. In earlier work, we have already demonstrated that our optimization framework is scalable to designs that are orders of magnitudes larger^{42}. In principle, a manufacturing uncertainty measure could also be incorporated into the metasurface design process via robust optimization algorithms^{43}, but in practice metasurface designs are already typically robust enough to manufacture, especially since multiwavelength optimization is already a form of robustness^{3}. Then robustness is robustness to any kind of error (including that of ML).
Previous work, such as ref. ^{44}—in a different approach to activelearning that does not quantify uncertainty—suggested iteratively adding the optimum design points to the training set (reoptimizing before each new set of training points is added). However, we did not find this approach to be beneficial in our case. In particular, we tried adding the data generated from LPA validations of the optimal design parameters, in addition to the points selected by our active learning algorithm, at each training iteration, but we found that this actually destabilized the learning and resulted in designs qualitatively worse than the baseline. By exploiting validation points, it seems that the active learning of the surrogate tends to explore less of the landscape of the complex transmission function, and hence leads to poorer designs. Such exploitation–exploration tradeoffs are known in the activelearning literature^{45}.
Discussion
In this paper, we present an activelearning algorithm for composite materials which reduces the training time of the surrogate model for a physical response by at least one order of magnitude. The simulation time is reduced by at least two orders of magnitude using the surrogate model compared to solving the PDEs numerically. While the domaindecomposition method used here is the LPA and the PDEs are the Maxwell equations, the proposed approach is directly applicable to other domaindecomposition methods (e.g. overlapping domain approximation^{9}) and other PDEs or ordinary differential equations^{46}.
We used an ensemble of NNs for interpolation in a regime that is seldom considered in the machinelearning literature—machinelearning models are mostly trained from noisy measurements, whereas here the data are obtained from smooth functions. In this regime, it would be instructive to have a deeper understanding of the relationship between NNs and traditional approximation theory (e.g. with polynomials and rational functions^{10,11}). For example, the likelihood maximization of our method forces σ_{*} to go to zero when \(\tilde{t}({\bf{p}})=t({\bf{p}})\). Although this allows us to simultaneously obtain a prediction μ_{*} and an error estimate σ_{*}, there is a drawback. In the interpolation regime (when the surrogate is fully determined), σ_{*} would become identically zero even if the surrogate does not match the exact model away from the training points. In contrast, interpolation methods such as Chebyshev polynomials yield a meaningful measure of the interpolation error even for exact interpolation of the training data^{10,11}. In the future, we plan to separate the estimation model and the model for the error measure using a metalearner architecture^{47}, with expectation that the metalearner will produce a more accurate error measure and further improve training time. We will also explore other ensembling methods that could improve the accuracy of our model^{48,49}. We believe that the method presented in this paper will greatly extend the reach of surrogatemodelbased optimization of composite materials and other applications requiring moderateaccuracy highdimensional interpolation.
Methods
Trainingdata computation
The complex transmission coefficients were computed in parallel using an opensource FDFD solver for Helmholtz equation^{50} on a 3.5 GHz 6Core Intel Xeon E5 processor. The material properties of the multilayered unit cells are silica (refractive index of 1.45) in the substrate, and air (refractive index of 1) in the hole and in the background. In the normal unit cell, the period of the cell is 400 nm, the height of the ten holes is fixed to 304 nm, and their widths vary between 60 and 340 nm, each hole is separated by 140 nm of substrate. In the small unit cell, the period of the cell is 400 nm, the height of the ten holes is 61 nm, and their widths vary between 60 and 340 nm, there is no separation between the holes. The smallest unit cell is the same as the small unit cell shrunk ten times (period of 40 nm, ten holes of height 6.1 nm, and width varying between 6 and 34 nm).
Metalens design problem
The complex transmission data are used to compute the scattered field off a multilayered metastructure with 100 unit cells as in ref. ^{3}. The metastructure was designed to focus three wavelengths (blue: 405 nm, green: 540 nm, and red: 810 nm) on three different focal spots (−10 μm, 60 μm), (0, 60 μm), and (+10 μm, 60 μm), respectively. The epigraph formulation of the worstcase optimization and the derivation of the adjoint method to get the gradient are detailed in ref. ^{3}. Any gradient based optimization algorithm would work, but we used an algorithm based on conservative convex separable approximations^{51}. The average intensity is derived from the distribution of the surrogate model \(\tilde{t}({\bf{p}})={\mu }_{* }({\bf{p}})+{\sigma }_{* }({\bf{p}})\epsilon\) with \(\epsilon \sim {\mathcal{N}}(0,1)\) and the computation of the intensity based on the local field as in ref. ^{3},
where the \(\bar{(\cdot )}\) notation denotes the complex conjugate, the notations \({\int}_{\Sigma }(\cdot ){\mathrm d}{{\bf{r}}}^{\prime}\) and \(G({\bf{r}},{{\bf{r}}}^{\prime})\) are simplified to ∫ and G, and the notation \({\bf{p}}({{\bf{r}}}^{\prime})\) is dropped for concision. From the linearity of expectation,
where we used that \({\mathbb{E}}(\epsilon )=0\) and \({\mathbb{E}}({\epsilon }^{2})=1\).
Activelearning architecture and training
The ensemble of NN was implemented using PyTorch^{52} on a 3.5 GHz 6Core Intel Xeon E5 processor. We trained an ensemble of 5 NNs for each surrogate models. Each NN is composed of an input layer with 13 nodes (10 nodes for the geometry parameterization and 3 nodes for the onehot encoding^{39} of three frequencies of interest), three fullyconnected hidden layers with 256 ReLU^{39}, and one last layer containing one unit with a scaled hyperbolictangent activation function^{39} (for μ_{i}) and one unit with a softplus activation function^{39} (for σ_{i}). The cost function is negativeloglikelihood of a Gaussian as in Eq. (2). The mean and the variance of the ensemble are the pooled mean and variance from Eqs. (3) and (4). The optimizer is Adam^{53}. The parameters are initialized using PyTorch’s default settings, i.e., sampled uniformly on a support inversely proportional to the square root of the number of input parameters. The starting learning rate is 0.001. After the tenth epoch, the learning rate is decayed by a factor of 0.99. Each iteration of the activelearning algorithm as well as the baseline were trained for 50 epochs. The choice of training points is detailed in the algorithm of Fig. 4. The quantitative evaluations were computed using the fractional error on a test set containing 2000 points chosen from a random uniform distribution. The fractional error FE between two vectors of complex values u_{estimate} and v_{true} is
where ∣ ⋅ ∣ is the L2norm for complex vectors.
For 128k training points and the surrogate NN architecture mentioned in this section, the time complexity breakdown of active learning is 5.4k seconds (including training and evaluation), and 27.8k seconds for Maxwell’s simulations on a 3.5 GHz 6Core Intel Xeon E5 processor. Maxwell’s equations simulations are the most expensive part of the active learning process and account for 85% of the total computation time.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Code availability
The code used for these findings is available upon reasonable request.
References
Kadic, M., Milton, G. W., van Hecke, M. & Wegener, M. 3D metamaterials. Nat. Rev. Phys. 1, 198–210 (2019).
Khorasaninejad, M. & Capasso, F. Metalenses: versatile multifunctional photonic components. Science 358, eaam8100 (2017).
Pestourie, R. et al. Inverse design of largearea metasurfaces. Opt. Express 26, 33732–33747 (2018).
An, S. et al. A deep learning approach for objectivedriven alldielectric metasurface design. ACS Photonics 6, 3196–3207 (2019).
Jiang, J. & Fan, J. A. Simulatorbased training of generative neural networks for the inverse design of metasurfaces. Nanophotonics 9, 1059–1069 (2020).
An, S. et al. Generative multifunctional metaatom and metasurface design networks. Preprint at https://arxiv.org/abs/1908.04851 (2019).
Jiang, J. et al. Freeform diffractive metagrating design based on generative adversarial networks. ACS Nano 13, 8872–8878 (2019).
Yu, N. et al. Light propagation with phase discontinuities: generalized laws of reflection and refraction. Science 334, 333–337 (2011).
Lin, Z. & Johnson, S. G. Overlapping domains for topology optimization of largearea metasurfaces. Opt. Express 27, 32445–32453 (2019).
Boyd, J. P. Chebyshev and Fourier Spectral Methods, 2nd edn. (Dover Publications, Inc., Mineola, 2001).
Trefethen, L. N. Approximation Theory and Approximation Practice extended, edn. (SIAM, Philadelphia, 2019).
Jiang, J., Chen, M. & Fan, J. A. Deep neural networks for the evaluation and design of photonic devices. Preprint at https://arxiv.org/abs/2007.00084 (2020).
Ma, W., Cheng, F. & Liu, Y. Deeplearningenabled ondemand design of chiral metamaterials. ACS Nano 12, 6326–6334 (2018).
Holloway, C. L., Kuester, E. F. & Dienstfrey, A. Characterizing metasurfaces/metafilms: the connection between surface susceptibilities and effective material properties. IEEE Antennas Wirel. Propag. Lett. 10, 1507–1511 (2011).
Maguid, E. et al. Photonic spincontrolled multifunctional sharedaperture antenna array. Science 352, 1202–1206 (2016).
Sung, J., Lee, G.Y., Choi, C., Hong, J. & Lee, B. Singlelayer bifacial metasurface: Fullspace visible light control. Adv. Opt. Mater. 7, 1801748 (2019).
Arbabi, A., Horie, Y., Bagheri, M. & Faraon, A. Dielectric metasurfaces for complete control of phase and polarization with subwavelength spatial resolution and high transmission. Nat. Nanotechnol. 10, 937–943 (2015).
Mueller, J. B., Rubin, N. A., Devlin, R. C., Groever, B. & Capasso, F. Metasurface polarization optics: independent phase control of arbitrary orthogonal states of polarization. Phys. Rev. Lett. 118, 113901 (2017).
Ye, W. et al. Spin and wavelength multiplexed nonlinear metasurface holography. Nat. Commun. 7, 11930 (2016).
Bayati, E. et al. Inverse designed metalenses with extended depth of focus. ACS Photonics 7, 873–878 (2020).
Liu, W. et al. Metasurface enabled wideangle Fourier lens. Adv. Mater. 30, 1706368 (2018).
Aieta, F., Kats, M. A., Genevet, P. & Capasso, F. Multiwavelength achromatic metasurfaces by dispersive phase compensation. Science 347, 1342–1345 (2015).
Zhou, Y. et al. Multilayer noninteracting dielectric metasurfaces for multiwavelength metaoptics. Nano Lett. 18, 7529–7537 (2018).
Lin, Z., Liu, V., Pestourie, R. & Johnson, S. G. Topology optimization of freeform largearea metasurfaces. Opt. Express 27, 15765–15775 (2019).
Sitzmann, V. et al. Endtoend optimization of optics and image processing for achromatic extended depth of field and superresolution imaging. ACM Trans. Graph. 37, 114 (2018).
Lin, Z. et al. Endtoend inverse design for inverse scattering via freeform metastructures. Preprint at https://arxiv.org/abs/2006.09145 (2020).
Liu, D., Tan, Y., Khoram, E. & Yu, Z. Training deep neural networks for the inverse design of nanophotonic structures. ACS Photonics 5, 1365–1369 (2018).
Malkiel, I. et al. Plasmonic nanostructure design and characterization via deep learning. Light Sci. Appl. 7, 1–8 (2018).
Peurifoy, J. et al. Nanophotonic particle simulation and inverse design using artificial neural networks. Sci. Adv. 4, eaar4206 (2018).
Liu, Z., Zhu, Z. & Cai, W. Topological encoding method for datadriven photonics inverse design. Opt. Express 28, 4825–4835 (2020).
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proc. 31st International Conference on Advances in Neural Information Processing Systems, 6402–6413 (NIPS, 2017).
Tagasovska, N. & LopezPaz, D. Singlemodel uncertainties for deep learning. In Proc. 33rd International Conference on Advances in Neural Information Processing Systems, 64176428 (NIPS, 2019).
Lookman, T., Balachandran, P. V., Xue, D. & Yuan, R. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. NPJ Comput. Mater. 5, 1–17 (2019).
Bassman, L. et al. Active learning for accelerated design of layered materials. NPJ Comput. Mater. 4, 1–9 (2018).
Khorasaninejad, M. et al. Metalenses at visible wavelengths: diffractionlimited focusing and subwavelength resolution imaging. Science 352, 1190–1194 (2016).
Yu, N. & Capasso, F. Flat optics with designer metasurfaces. Nat. Mater. 13, 139–150 (2014).
Harrington, R. F. TimeHarmonic Electromagnetic Fields, 2nd edn. (WileyIEEE Press, New York, 2001).
Mignolet, M. P., Przekop, A., Rizzi, S. A. & Spottswood, S. M. A review of indirect/nonintrusive reduced order modeling of nonlinear geometric structures. J. Sound Vib. 332, 2437–2460 (2013).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, Cambridge, 2016).
Cheridito, P., Jentzen, A. & Rossmannek, F. Efficient approximation of highdimensional functions with deep neural networks. Preprint at https://arxiv.org/abs/1912.04310 (2019).
Champagne II, N. J., Berryman, J. G. & Buettner, H. M. FDFD: a 3D finitedifference frequencydomain code for electromagnetic induction tomography. J. Comput. Phys. 170, 830–848 (2001).
Pestourie, R. Assume Your Neighbor is Your Equal: Inverse Design in Nanophotonics. Ph.D. thesis, Harvard University (2020).
Molesky, S. et al. Inverse design in nanophotonics. Nat Photonics 12, 659–670 (2018).
Chen, C.T. & Gu, G. X. Generative deep neural networks for inverse materials design using backpropagation and active learning. Adv. Sci. 7, 1902607 (2020).
Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: a tutorial introduction. Preprint at https://arxiv.org/abs/1910.09457 (2019).
Rackauckas, C. et al. Universal differential equations for scientific machine learning. Preprint at https://arxiv.org/abs/2001.04385 (2020).
Chen, T., Navrátil, J., Iyengar, V. & Shanmugam, K. Confidence scoring using whitebox metamodels with linear classifier probes. In The 22nd Int. Conf. on Artificial Intelligence and Statistics, 1467–1475 (AISTATS, 2019).
Maddox, W. J., Izmailov, P., Garipov, T., Vetrov, D. P. & Wilson, A. G. A simple baseline for Bayesian uncertainty in deep learning. In Proc. 33rd Int. Conf. Advances in Neural Information Processing Systems, 13153–13164 (NIPS, 2019).
Wilson, A. G. & Izmailov, P. Bayesian deep learning and a probabilistic perspective of generalization. Preprint at https://arxiv.org/abs/2002.08791 (2020).
Pestourie, R. FDFD Local Field. https://github.com/rpestourie/fdfd_local_field (2020).
Svanberg, K. A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J. Optim. 12, 555–573 (2002).
Paszke, A. et al. Automatic differentiation in PyTorch. In Proc. 31st International Conference on Advances in Neural Information Processing Systems Workshop on Automatic Differentiation (NIPS, 2017).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Acknowledgements
This work was supported in part by IBM Research, the MITIBM Watson AI Laboratory, the U.S. Army Research Office through the Institute for Soldier Nanotechnologies (under award W911NF13D0001), and by the PAPPA program of DARPA MTO (under award HR00112090016).
Author information
Authors and Affiliations
Contributions
R.P., Y.M., P.D., and S.G.J. designed the study, contributed to the machinelearning approach, and analyzed results; R.P. led the code development, software implementation, and numerical experiments; R.P. and S.G.J. were responsible for the physical ideas and interpretation; T.V.N. assisted in designing and implementing the training. All authors contributed to the algorithmic ideas and writing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Pestourie, R., Mroueh, Y., Nguyen, T.V. et al. Active learning of deep surrogates for PDEs: application to metasurface design. npj Comput Mater 6, 164 (2020). https://doi.org/10.1038/s41524020004312
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524020004312
This article is cited by

Lowoverhead distribution strategy for simulation and optimization of largearea metasurfaces
npj Computational Materials (2022)

Inverse design enables largescale highperformance metaoptics reshaping virtual reality
Nature Communications (2022)

Julia Language in Computational Mechanics: A New Competitor
Archives of Computational Methods in Engineering (2022)

Intelligent designs in nanophotonics: from optimization towards inverse creation
PhotoniX (2021)

Automated stopping criterion for spectral measurements with active learning
npj Computational Materials (2021)