Fast multi-source nanophotonic simulations using augmented partial factorization

Lin, Ho-Chun; Wang, Zeyu; Hsu, Chia Wei

doi:10.1038/s43588-022-00370-6

Download PDF

Article
Open access
Published: 15 December 2022

Fast multi-source nanophotonic simulations using augmented partial factorization

Nature Computational Science volume 2, pages 815–822 (2022)Cite this article

5747 Accesses
18 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Numerical solutions of Maxwell’s equations are indispensable for nanophotonics and electromagnetics but are constrained when it comes to large systems, especially multi-channel ones such as disordered media, aperiodic metasurfaces and densely packed photonic circuits where the many inputs require many large-scale simulations. Conventionally, before extracting the quantities of interest, Maxwell’s equations are first solved on every element of a discretization basis set that contains much more information than is typically needed. Furthermore, such simulations are often performed one input at a time, which can be slow and repetitive. Here we propose to bypass the full-basis solutions and directly compute the quantities of interest while also eliminating the repetition over inputs. We do so by augmenting the Maxwell operator with all the input source profiles and all the output projection profiles, followed by a single partial factorization that yields the entire generalized scattering matrix via the Schur complement, with no approximation beyond discretization. This method applies to any linear partial differential equation. Benchmarks show that this approach is 1,000–30,000,000 times faster than existing methods for two-dimensional systems with about 10,000,000 variables. As examples, we demonstrate simulations of entangled photon backscattering from disorder and high-numerical-aperture metalenses that are thousands of wavelengths wide.

Efficient full-path optical calculation of scalar and vector diffraction using the Bluestein method

Article Open access 13 July 2020

Generalized Maxwell projections for multi-mode network Photonics

Article Open access 03 June 2020

Fast simulation for multi-photon, atomic-ensemble quantum model of linear optical systems addressing the curse of dimensionality

Article Open access 08 February 2024

Main

The interaction between light and nanostructured materials leads to rich properties. For small systems such as individual nano/microstructures and optical components, or for periodic systems such as photonic crystals and periodic metamaterials, one can readily solve Maxwell’s equations numerically to obtain predictions that agree quantitatively with experiments. However, the computational costs are typically too heavy for more complex systems such as disordered ones¹ that not only are large but also couple many incoming channels to many outgoing ones, requiring numerous simulations. The alternatives all have limitations: the Born approximation does not describe multiple scattering, radiative transport and diagrammatic methods can only compute some ensemble-averaged properties² and coupled-mode theory requires systems with isolated resonances^3,4. For metasurfaces⁵, the widely used locally periodic approximation^5,6 is inaccurate whenever the cell-to-cell variation is large^7,8,9 and cannot describe nonlocal responses¹⁰ or metasurfaces that are not based on unit cells^11,12. Classical and quantum photonic circuits build on individual components that couple very few channels at a time, limiting the number of inputs and outputs. Examples beyond photonics also abound. A wide range of studies across different disciplines are currently prohibited by computational limitations.

Regardless of the complexity of a system, its linear response is described exactly by an $M^{\prime} \times M$ generalized scattering matrix S that relates an arbitrary input vector v to the resulting output vector u via^13,14

$${u}_{n}=\mathop{\sum }\limits_{m=1}^{M}{S}_{nm}{v}_{m}.$$

(1)

The M columns of S correspond to M distinct inputs (Fig. 1a,b), which can be different incoming angles or beam profiles, different waveguide modes, different point dipole excitations, their superpositions or any other input of interest. Similarly, the vector u can contain any output of interest in the near field or far field.

**Fig. 1: Generalized scattering matrix and augmented partial factorization (APF).**

Computing such a multi-input response typically requires M distinct solutions of Maxwell’s equations with the same structure given different source profiles. Time-domain methods¹⁵ are easy to parallelize but cannot leverage the multi-input property. Frequency-domain methods allow strategies for handling many inputs. After volume discretization onto a basis through finite element¹⁶ or finite difference¹⁷, Maxwell’s equations in the frequency domain become a system of linear equations Ax_m = b_m. The sparse matrix A is the Maxwell differential operator, the column vector b_m on the right-hand side specifies the mth input and the full-basis solution is contained in the column vector x_m. When solving for x_m = A⁻¹b_m using direct methods, the sparsity can be utilized via graph partitioning, and the resulting lower–upper (LU) factors can be reused among different inputs^18,19. However, M forward and backward substitutions are still needed, and the LU factors take up substantial memory. Iterative methods compute x_m = A⁻¹b_m by minimizing the residual²⁰, avoiding the LU factors. One can iterate multiple inputs together²¹ or construct preconditioners to be reused among different inputs^22,23, but the iterations still take ${{{\mathcal{O}}}}(M)$ time.

For homogeneous structures with small surface-to-volume ratio, the boundary element method²⁴ can efficiently discretize the interface between materials to reduce the size and the condition number of the matrix A, though its matrix A is no longer sparse. Instead of a surface mesh, the T-matrix method²⁵ uses vector spherical harmonics as basis functions, also resulting in a dense matrix A. The hierarchical structure of the dense matrix A can be utilized through the fast multipole method²⁶ within iterative solvers or through the ${{{\mathcal{H}}}}$-matrix method²⁷ within direct solvers, but the computing time still scales as ${{{\mathcal{O}}}}(M)$.

For systems with a closed boundary on the sides and inputs/outputs placed on the front and back surfaces, the recursive Green’s function (RGF) method²⁸ can obtain the full scattering matrix without looping over the inputs, which is useful for disordered systems¹. However, the RGF method works with dense Green’s function matrices and thus scales unfavourably with the system width W as ${{{\mathcal{O}}}}({W}^{3(d-1)})$ for computing time and ${{{\mathcal{O}}}}({W}^{2(d-1)})$ for memory usage in d dimensions. For layered geometries, the rigorous coupled-wave analysis (RCWA)²⁹ and the eigenmode expansion³⁰ methods use local eigenmodes to utilize the intralayer axial translational symmetry, which also results in dense matrices and the same scaling as the RGF method.

All these methods solve Maxwell’s equations on every element of the discretization basis set, typically one input at a time, after which the quantities of interest are extracted from the solutions. Doing so is intuitive but leads to unnecessary computations and repetitions. Here, we propose the augmented partial factorization (APF) method that directly computes the entire generalized scattering matrix of interest, bypassing the full-basis solutions and without repeating over the inputs. APF is general (applicable to any structure with any type of inputs and outputs, including to other linear partial differential equations), exact (no approximation beyond discretization), does not store large LU factors, scales well with the system size and fully utilizes the sparsities of the Maxwell operator, of the inputs and also of the outputs. These advantages lead to reduced memory usage and a speed-up of many orders of magnitude compared with existing methods (even those that specialize in a certain geometry), enabling full-wave simulations of massively multi-channel systems that were impossible in the past.

Results

Augmented partial factorization

Regardless of the discretization scheme (finite difference, finite element, boundary element, T-matrix, spectral methods, etc.), a frequency-domain simulation for the mth input reduces to computing x_m = A⁻¹b_m. Considering M inputs, the collective full-basis solutions are X = A⁻¹B where ${{{\bf{X}}}}=\left[{\bf{x}}_{1},\ldots ,{\bf{x}}_{M}\right]$ and ${{{\bf{B}}}}=\left[{\bf{b}}_{1},\ldots , {\bf{b}}_{M}\right]$. The full content of this dense and large matrix X is rarely needed. The needed quantities are encapsulated in the generalized scattering matrix S, which we can write as

$${{{\bf{S}}}}={{{\bf{C}}}}{{{{\bf{A}}}}}^{-1}{{{\bf{B}}}}-{{{\bf{D}}}}.$$

(2)

The matrix C projects the collective solutions X = A⁻¹B onto the $M^{\prime}$ outputs of interest (for example, sampling at the locations of interest, a conversion to propagating channels or a transformation from the near field to far field¹⁵). It is sparse since the projections only use part of the solutions, and it is very fat since the number $M^{\prime}$ of outputs of interest is generally far less than the number of discretization basis elements. The matrix ${{{\bf{D}}}}={{{\bf{C}}}}{{{{\bf{A}}}}}_{0}^{-1}{{{\bf{B}}}}-{{{{\bf{S}}}}}_{0}$ subtracts the baseline contribution from the incident field (Supplementary Fig. 1), where A₀ is the Maxwell operator of a reference system (for example, vacuum) for which the generalized scattering matrix S₀ is known. This ensures that S reduces to S₀ when A becomes A₀. Equation (2) has the same superficial structure as scattering matrices in quasi-normal coupled mode theory⁴ but is simpler and does not require the computation of quasi-normal modes (which is expensive for large systems).

Given the generalized scattering matrix S, the response to other inputs can be obtained from superposition, as in equation (1). Time-dependent responses are given by Fourier transforming the frequency-domain response³¹.

Figure 1c,d illustrates equation (2) with a concrete example. Consider the transverse magnetic fields in two dimensions (2D) for a system periodic in y with a relative permittivity profile of ε_r(r) = ε_r(x, y). The Maxwell differential operator on the out-of-plane electric field E_z(r) at wavelength λ is $-{\nabla }^{2}-{\left(2\pi /\lambda \right)}^{2}{\varepsilon }_{{{{\rm{r}}}}}\left({{{\bf{r}}}}\right)$, which becomes the matrix A when volume is discretized with an outgoing boundary in the x direction. Then, the matrix A⁻¹ is the retarded Green’s function $G({{{\bf{r}}}},{{{\bf{r}}}}^{\prime} )$ of this system. A plane wave incident from the left, ${\mathrm{e}}^{{{i}}({k}_{x}^{{{{\rm{in}}}}}x+{k}_{y}^{{{{\rm{in}}}}}y)}$, can be generated with a source proportional to $\delta (x){\mathrm{e}}^{{{i}}{k}_{y}^{{{{\rm{in}}}}}y}$ on the front surface x = 0 where δ(x) is the Dirac delta function, and incident waves from the right can be similarly generated. These source profiles become the columns of the matrix B when discretized. The coefficients of different outgoing plane waves to the left can be obtained from projections proportional to $\delta (x){\mathrm{e}}^{-{{i}}{k}_{y}^{{{{\rm{out}}}}}y}$, and similarly with outgoing waves to the right. They become the rows of the matrix C when discretized. In this particular example, D = I is the identity matrix, and equation (2) reduces to the discrete form of the Fisher–Lee relation in quantum transport ³² (Supplementary Sects. 1 and 2 and Supplementary Fig. 1). We only show a few discretized pixels and a few angles in Fig. 1c,d to simplify the schematic. In reality, the numbers of pixels and input angles can readily exceed millions and thousands, respectively. Note that the matrices A, B and C are all sparse here.

Instead of solving for X = A⁻¹B as is conventionally done, we directly compute the generalized scattering matrix S = CA⁻¹B − D, which is orders of magnitude smaller. To do so, we build an augmented sparse matrix K as illustrated in Fig. 1e and then perform a partial factorization:

$${{{\bf{K}}}}\equiv \left[\begin{array}{ll}{{{\bf{A}}}}&{{{\bf{B}}}}\\ {{{\bf{C}}}}&{{{\bf{D}}}}\end{array}\right]=\left[\begin{array}{ll}{{{\bf{L}}}}&{{{\bf{0}}}}\\ {{{\bf{E}}}}&{{{\bf{I}}}}\end{array}\right]\left[\begin{array}{ll}{{{\bf{U}}}}&{{{\bf{F}}}}\\ {{{{\bf{0}}}}}&{{{\bf{H}}}}\end{array}\right].$$

(3)

The factorization is partial as it stops after factorizing the upper left block of K into A = LU. Such partial factorization can be carried out using established sparse linear solver packages such as MUMPS³³ and PARDISO³⁴. Notably, we do not use the LU factors, and the L and U in this APF formalism do not even need to be triangular. By equating the middle and the right-hand side of equation (3) block by block, we see that the matrix H, called the Schur complement³⁵, satisfies H = D − CA⁻¹B. Thus, we obtain the generalized scattering matrix via S = −H. In this way, a single factorization yields what conventional methods obtain from M separate simulations. Repetitions over inputs are no longer necessary. We name this approach augmented partial factorization (APF).

APF is as general as equation (2), applicable to any linear partial differential equation, in any dimension, under any discretization scheme, with any boundary condition, for any type of inputs generated using any scheme (such as equivalent source for arbitrary incident waves like waveguide modes^17,36, line source and point dipole source) and for any type of output projections. As a frequency-domain method, it works with arbitrary material dispersion, and the response at different frequencies can be computed independently. It is a full-wave method as precise as the underlying discretization.

APF avoids a slow loop over the M inputs or a slow evaluation of the dense Green’s function. The sparsity patterns of A, B and C are maintained in K and can all be utilized during the partial factorization. The matrices L and U are not as sparse as A, so their evaluation is slow, and their storage is the memory bottleneck for typical direct methods. Since APF does not compute the solution X, such LU factors are not needed and can be dropped during the factorization. This means that APF is better than conventional direct methods even when only one input (M = 1) is considered.

APF is more efficient than computing selected entries of the Green’s function A⁻¹ (ref. ³⁷), which does not utilize the structure of equation (2). While advanced algorithms have been developed to exploit the sparsity of the inputs and the outputs during forward and backward substitutions³⁸ or through domain decomposition³⁹, they still require an ${{{\mathcal{O}}}}(M)$ substitution stage, with a modest speed-up (a factor of 3 when M is several thousand) and no memory usage reduction. APF is simpler yet much more efficient as it obviates the forward and backward substitution steps and the need for LU factors.

In most scenarios, the matrix A contains more nonzero elements than the matrices B, C and S, and we find the computing time and memory usage of APF to scale as ${{{\mathcal{O}}}}({N}^{1.3})$ and ${{{\mathcal{O}}}}(N)$, respectively, in 2D (Supplementary Fig. 2), where N = nnz(K) is the number of nonzero elements in the matrix K and is almost independent of M. When B and/or C contain more nonzero elements than A, we can compress matrices B and C through a data-sparse representation to reduce their numbers of nonzero elements to below that of A. For example, a plane-wave source spans a large area, but one can superimpose multiple plane-wave sources with a Fourier transform to make them spatially localized^8,9 and then truncate them with negligible error (Supplementary Sect. 5 and Supplementary Figs. 3 and 4).

Our implementation of APF is described in the Methods section and Supplementary Sects. 2 and 3, with pseudocodes shown in Supplementary Sect. 6.

Below, we consider two multi-channel systems while comparing the computing time, memory usage and accuracy of APF versus open-source electromagnetic solvers including a conventional finite-difference frequency-domain (FDFD) code named MaxwellFDFD using either (1) direct⁴⁰ or (2) iterative⁴¹ methods, (3) an RGF code⁴² and (4) an RCWA code named S4 (ref. ⁴³); see the Methods section for details. We do not include time-domain methods in the comparison since their iteration by time stepping is typically slower than an iterative frequency-domain solver²³. We consider transverse magnetic polarization, starting with systems small enough for these solvers, then with larger problems that only APF can tackle.

Large-scale disordered systems

Disordered systems are difficult to simulate given their large size-to-wavelength ratio, large number of channels, strong scattering and lack of symmetry. Here we consider one that is W = 500λ wide and L = 100λ thick, where λ is the free-space wavelength, consisting of 30,000 cylindrical scatterers (Fig. 2a), discretized into 11.6 million pixels with a periodic boundary condition in y. On each of the −x and +x sides, 2W/λ = 1,000 channels (plane waves with different angles) are necessary to specify the propagating components of an incident wavefront or outgoing wavefront at the Nyquist sampling rate (Supplementary Sect. 1A). So, we compute the scattering matrix with $M^{\prime} =$ 2,000 outputs and up to M = 2,000 inputs (including both sides).

**Fig. 2: Benchmarks on a large-scale disordered system.**

It takes APF 3.3 min and 10 GiB of memory to compute the full scattering matrix; the other methods take 3,300–110,000,000 min using 7.0–1,200 GiB of memory for the same computation (Fig. 2b,c). The computing times of APF (with its breakdown shown in Fig. 2d), RGF and RCWA are all independent of M, though APF is orders of magnitude faster. MaxwellFDFD takes ${{{\mathcal{O}}}}(M)$ time due to its loop over the inputs. Reusing the LU factors helps, but the M forward and backward substitutions take longer than factorization and become the bottleneck when M ≳ 10. Note that APF saves computing time and memory even in the single-input (M = 1) case.

The speed and memory advantage of APF grows further with the system size (Supplementary Fig. 5). Some of these solvers require more computing resources than we have access to, so their usage data (open symbols and grey-edged bars in Fig. 2b,c) are extrapolated based on smaller systems (Supplementary Fig. 5).

The relative ℓ²-norm error of APF due to numerical round-off is 10⁻¹² here and grows slowly with an ${{{\mathcal{O}}}}({N}^{1/2})$ scaling (Supplementary Fig. 6), while the iterative MaxwellFDFD method here has a relative ℓ² error of 10⁻⁶.

Above, the matrices B, C and S all have fewer nonzero elements than the matrix A even for the largest M at the Nyquist rate, so the APF computing time and memory usage are independent of M. Supplementary Sect. 9 and Supplementary Fig. 7 consider inputs and outputs placed in the interior of the disordered medium, where M can grow larger. There, we observe that the APF computing time and memory usage stay constant until $M^{\prime} M$ (the number of elements in S) grows beyond nnz(A) ≈ 5.8 × 10⁷, above which they scale as ${{{\mathcal{O}}}}(M^{\prime} M)$.

It was recently predicted that entangled photon pairs remain partially correlated even after multiple scattering from a dynamic disordered medium⁴⁴. As an example, we demonstrate such two-photon coherent backscattering. Given a maximally entangled input state, the correlation between two photons reflected into directions θ_a and θ_b is⁴⁴

$$\overline{{{{\varGamma }}}_{ba}}=\overline{\langle \psi | :{\hat{n}}_{b}\,{\hat{n}}_{a}:| \psi \rangle }\propto \overline{{\left\vert {({r}^{2})}_{{\theta }_{b},-{\theta }_{a}}\right\vert }^{2}},$$

(4)

where $\left\vert \psi \right\rangle$ is the two-photon wave function, ${\hat{n}}_{a}$ is the photon number operator in the reflected direction θ_a, :(…): stands for normal ordering, r² is the square of the medium’s reflection matrix (that is, the scattering matrix with inputs and outputs on the same side) and the overbar indicates an ensemble average over disorder realizations. This requires the full reflection matrix with all incident angles and all outgoing angles, for many realizations, and the disordered medium must be wide (for angular resolution) and thick (to reach diffusive transport). Figure 3 shows the two-photon correlation function Γ_ba computed using APF before and after averaging over 4,000 disorder realizations for a system that is W = 700λ wide and L = 400λ thick, consisting of 56,000 cylindrical scatterers, with a transport mean free path of ℓ_t = 9.5λ. We find the correlation between photons reflected towards similar directions (∣θ_b − θ_a∣ ≲ 0.1λ/ℓ_t) to be enhanced by a factor of 2. This demonstrates the existence of two-photon coherent backscattering in disordered media.

**Fig. 3: Two-photon coherent backscattering from disorder.**

Large-area metasurfaces

Metalenses are lenses made with metasurfaces⁴⁵. When the numerical aperture (NA) is high, metalenses need to generate large phase gradients, so the variation from one unit cell to the next must be large, and the locally periodic approximation (LPA)^5,6 fails. Full-wave simulation remains the gold standard. Here, we consider metalenses with height of L = 0.6 μm and width of W ≈ 1 mm, consisting of 4,178 unit cells of titanium dioxide ridges on a silica substrate (Fig. 4a), for a hyperbolic⁴⁶ phase profile with an NA of 0.86 and a quadratic⁴⁷ phase profile with an NA of 0.71 operating at wavelength λ = 532 nm (see Supplementary Sect. 10 and Supplementary Fig. 8 for details). Perfectly matched layers (PMLs) are placed on all sides, and the system is discretized with a grid size of Δx = λ/40 into over 11 million pixels. We compute the transmission matrix at the Nyquist sampling rate, with up to M = 2W/λ = 3,761 plane-wave inputs from the substrate side truncated within the width W of the metalens (only considering angles that propagate in air), and sampling the transmitted field across a width W_out = W + 40λ (to ensure that all the transmitted light is captured) projected onto $M^{\prime} =2{W}_{{{{\rm{out}}}}}/\lambda =$ 3,841 transmitted plane waves. Owing to the large aspect ratio of 1 mm to 0.6 μm, the number of nonzero elements in the matrices B and C is larger than that of A, so we compress B and C and denote this as APF-c (Supplementary Sect. 5).

**Fig. 4: Benchmarks on a large-area metasurface.**

It takes APF-c 1.3 min and 6.9 GiB of memory to compute this transmission matrix, while the other methods take 6,300–6,000,000 min using 22–600 GiB (Fig. 4b,c). Some of these values are extrapolated from smaller systems (Supplementary Fig. 9). Note that, even though RCWA is specialized for layered structures such as the metasurface considered here, the general-purpose APF-c still outperforms RCWA by 10,000 fold in speed and 87 fold in memory. The second-best solver here is MaxwellFDFD with the LU factors stored and reused, which takes 4,700 times longer while using 17 times more memory compared with APF-c.

The transmission matrix fully characterizes the metasurface’s response to any input. Here, we use it with angular spectrum propagation (Supplementary Sect. 12) to obtain the complete angle dependence of the exact transmitted profile (two profiles each shown in Fig. 5a,b; more shown in Supplementary Videos 1 and 2), the Strehl ratio and the transmission efficiency (Fig. 5c,d and Supplementary Sect. 13).

**Fig. 5: All-angle full-wave characterization of millimetre-wide metalenses.**

To quantify the accuracy of an approximation, we compute the relative ℓ²-norm error ${\left\Vert {\bf I}-{\bf I}_{0}\right\Vert }_{2}/{\left\Vert {\bf I}_{0}\right\Vert }_{2}$, with I₀ being a vector containing the intensity at the focal plane within ∣y∣ < W/2 calculated from APF without compression, and I from an approximation. We consider two LPA formalisms: a standard one using the unit cells’ propagating fields (LPA I) and one with the unit cells’ evanescent fields included (LPA II) (Supplementary Sect. 14). LPA leads to errors up to 366% depending on the incident angle, with the angle-averaged error between 18% and 37% (Fig. 5e,f). Meanwhile, the compression errors of APF-c here average below 0.01% (Fig. 5e,f) and can be made arbitrarily small (Supplementary Fig. 10).

Discussion

The APF method can enable a wide range of studies beyond the examples above. Full-wave simulations of imaging inside strongly scattering media⁴⁸ are now possible with APF. Inverse design using the adjoint method used to require 2M simulations given M inputs¹². With a suitable formulation, APF can consolidate the 2M simulations into a single or a few computations. Computing the thermal emission into a continuum⁴⁹ requires many simulations and can also be accelerated using APF. One may use APF to design classical and quantum photonic circuits with elements that couple numerous channels.

Beyond photonics, APF can be used for mapping the angle dependence of radar cross-sections, for microwave imaging⁵⁰, for full waveform inversion⁵¹ and controlled-source electromagnetic surveys³⁸ in geophysics and for quantum transport simulations⁵². More generally, APF can efficiently evaluate matrices of the form CA⁻¹B in numerical linear algebra, not limited to partial differential equations.

The present work performs partial factorization using MUMPS³³, for which the matrix K must be square. Therefore, we pad $M^{\prime} -M$ columns to matrix B or $M-M^{\prime}$ rows to matrix C, which is suboptimal when $M^{\prime} \gg M$ (for example, when computing the field profile across a large volume for a small number of inputs) or $M^{\prime} \ll M$. To efficiently handle these scenarios with APF, partial factorization that works with a rectangular K is desirable.

As the number of channels and the LU factor size are both much larger in three dimensions (3D), the advantage of APF over existing methods can potentially be greater in 3D than in 2D. In 3D, the memory usage due to the LU factors is the bottleneck for direct methods. Future work could develop partial factorization schemes that minimize the temporary storage of such factors or even compute CA⁻¹B without triangular factors. The expected usage of computing time and memory usage by APF in 3D follow that of the factorizing matrix A, which is ${{{\mathcal{O}}}}({N}^{2})$ and ${{{\mathcal{O}}}}({N}^{1.33})$, respectively, when using nested dissection ordering but could potentially be lowered by leveraging the low-rank property of the off-diagonal blocks⁵³. APF-c can naturally work with overlapping-domain distribution strategies^7,8,9. Multi-frontal parallelization can be used through existing packages such as MUMPS³³, and one may employ hardware accelerations with GPUs^9,54. For systems with a small surface-to-volume ratio, it is also possible to apply APF to the boundary element method or T-matrix method, using the ${{{\mathcal{H}}}}$-matrix technique²⁷ for fast factorization.

Methods

We implement APF under finite-difference discretization on the Yee grid in 2D (Supplementary Sect. 2) and compute the Schur complement using the MUMPS package³³ (version 5.4.1) with its built-in approximate minimum degree ordering. Outgoing boundaries are realized with PMLs⁵⁵. We order the input/output channels and/or pad additional channels so that the matrix K is symmetric (Supplementary Sect. 3).

We use the same discretization scheme, same grid size and same subpixel smoothing⁵⁶ for the APF, MaxwellFDFD and RGF benchmarks. Numerical dispersion is not important for the disordered media example in Fig. 2, so we use a relatively coarse resolution of 15 pixels per λ there. A finer resolution of 40 pixels per λ = 532 nm is used for the metasurface examples in Figs. 4 and 5 to have their transmission phase shifts accurate to within 0.1 rad (Supplementary Fig. 11).

In RGF⁴², the outgoing boundary in the longitudinal direction is implemented exactly through the retarded Green’s function of a semi-infinite discrete space²⁸. For APF and MaxwellFDFD, one λ of homogeneous space and 10 pixels of PML⁵⁵ are used to achieve an outgoing boundary with a sufficiently small discretization-induced reflection. The uniaxial PML is used in APF so that the matrix A is symmetric. The stretched-coordinate PML is used in MaxwellFDFD to lower the condition number⁵⁷.

For the MaxwellFDFD method with an iterative solver⁴¹, we use its default biconjugate gradient method with its default convergence criterion of relative ℓ² residual below 10⁻⁶. For the MaxwellFDFD method with a direct solver⁴⁰, we consider an unmodified version where the LU factors are not reused and a version modified to have the LU factors stored in memory and reused for the different inputs.

For the RCWA simulations, we use its default closed-form Fourier-transform formalism implemented in S4 (ref. ⁴³). For the example in Fig. 4, we use a single layer with five Fourier components per unit cell where the cell width is 239 nm (that is, 11 Fourier components per λ), which gives accuracy comparable to APF, MaxwellFDFD and RGF (Supplementary Fig. 11). For the example in Fig. 2, we use 15 layers per λ axially (the same as the discretization grid size used in the other methods) with 4.1 Fourier components per λ laterally (by scaling it down in proportion to the reduced spatial resolution in APF, MaxwellFDFD and RGF).

Note that the RGF⁴² and S4 (ref. ⁴³) codes do not support an outgoing boundary in the transverse y direction. The computing time and memory usage for RGF and S4 in Fig. 4 are extrapolated from simulations on smaller systems adopting a periodic transverse boundary (Supplementary Fig. 9). To simulate the example in Fig. 4 using RGF or S4, one needs to additionally implement PML in the y direction and to further increase the system width. Doing so will slightly increase their computing time and memory usage, which we disregard.

All the computing time and memory usage values are obtained from computations using a single core without parallelization on identical Intel Xeon Gold 6130 nodes on the USC Center for Advanced Research Computing’s Discovery cluster with 184 GiB of memory available per node.

Data availability

Numerical source data for Figs. 2, 4 and 5c–f are available with this manuscript in the Source Data section. Numerical source data for Figs. 3b–c and 5a,b are available on Zenodo⁵⁸. All data in this study are generated by running our code⁵⁹, MaxwellFDFD^40,41 and S4 (ref. ⁴³).

Code availability

We implement the APF method and the RGF method within our software Maxwell’s Equations Solver with Thousands of Inputs (MESTI). The code, documentation and examples are available on GitHub⁵⁹ under the GPL-3.0 license. MESTI supports both polarizations, all common boundary conditions, real or complex frequencies, with any permittivity profile and any list of input source profiles and output projection profiles (user specified or automatically built). The specific version of MESTI used to produce the results in this manuscript is also available on Zenodo⁶⁰.

References

Yılmaz, H., Hsu, C. W., Yamilov, A. & Cao, H. Transverse localization of transmission eigenchannels. Nat. Photon. 13, 352–358 (2019).
Article Google Scholar
Carminati, R. & Schotland, J. C. Principles of Scattering and Transport of Light (Cambridge Univ. Press, 2021).
Zhou, M. et al. Inverse design of metasurfaces based on coupled-mode theory and adjoint optimization. ACS Photon. 8, 2265–2273 (2021).
Article Google Scholar
Zhang, H. & Miller, O. D. Quasinormal coupled mode theory. Preprint at https://arxiv.org/abs/2010.08650 (2020).
Kamali, S. M., Arbabi, E., Arbabi, A. & Faraon, A. A review of dielectric optical metasurfaces for wavefront control. Nanophotonics 7, 1041–1068 (2018).
Article Google Scholar
Pestourie, R. et al. Inverse design of large-area metasurfaces. Opt. Express 26, 33732–33747 (2018).
Article Google Scholar
Lin, Z. & Johnson, S. G. Overlapping domains for topology optimization of large-area metasurfaces. Opt. Express 27, 32445–32453 (2019).
Article Google Scholar
Torfeh, M. & Arbabi, A. Modeling metasurfaces using discrete-space impulse response technique. ACS Photon. 7, 941–950 (2020).
Article Google Scholar
Skarda, J. et al. Low-overhead distribution strategy for simulation and optimization of large-area metasurfaces. NPJ Comput. Mater. 8, 78 (2022).
Article Google Scholar
Li, S. & Hsu, C. W. Thickness bounds for nonlocal wide-field-of-view metalenses. Light: Science & Applications 11, 338 (2022).
Article Google Scholar
Elsawy, M. M. R., Lanteri, S., Duvigneau, R., Fan, J. A. & Genevet, P. Numerical optimization methods for metasurfaces. Laser Photon. Rev. 14, 1900445 (2020).
Article Google Scholar
Lin, Z., Roques-Carmes, C., Christiansen, R. E., Soljačić, M. & Johnson, S. G. Computational inverse design for ultra-compact single-piece metalenses free of chromatic and angular aberration. Appl. Phys. Lett. 118, 041104 (2021).
Article Google Scholar
Popoff, S. M. et al. Measuring the transmission matrix in optics: an approach to the study and control of light propagation in disordered media. Phys. Rev. Lett. 104, 100601 (2010).
Article Google Scholar
Rotter, S. & Gigan, S. Light fields in complex media: mesoscopic scattering meets wave control. Rev. Mod. Phys. 89, 015005 (2017).
Article Google Scholar
Taflove, A. & Hagness, S. C. Computational Electrodynamics: the Finite-Difference Time-Domain Method, 3rd edn (Artech House, 2005).
Jin, J.-M. The Finite Element Method in Electromagnetics (Wiley–IEEE, 2014).
Rumpf, R. C. Simple implementation of arbitrarily shaped total-field/scattered-field regions in finite difference frequency-domain. Prog. Electromagn. Res. 36, 221–248 (2012).
Article Google Scholar
Davis, T. A. Direct Methods for Sparse Linear Systems (Society for Industrial and Applied Mathematics, 2006).
Duff, I. S., Erisman, A. M. & Reid, J. K. Direct Methods for Sparse Matrices (Oxford Univ. Press, 2017).
Saad, Y. Iterative Methods for Sparse Linear Systems (Society for Industrial and Applied Mathematics, 2003).
Puzyrev, V. & Cela, J. M. A review of block Krylov subspace methods for multisource electromagnetic modelling. Geophys. J. Int. 202, 1241–1252 (2015).
Article Google Scholar
Dolean, V., Jolivet, P. & Nataf, F. An Introduction to Domain Decomposition Methods: Algorithms, Theory, and Parallel Implementation (Society for Industrial and Applied Mathematics, 2015).
Osnabrugge, G., Leedumrongwatthanakun, S. & Vellekoop, I. M. A convergent Born series for solving the inhomogeneous Helmholtz equation in arbitrarily large media. J. Comput. Phys. 322, 113–124 (2016).
Article MathSciNet MATH Google Scholar
Gibson, W. C. The Method of Moments in Electromagnetics (Chapman and Hall/CRC, 2021).
Doicu, A., Wriedt, T. & Eremin, Y. A. Light Scattering by Systems of Particles: Null-Field Method with Discrete Sources–Theory and Programs (Springer, 2006).
Song, J., Lu, C.-C. & Chew, W. C. Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans. Antennas Propag. 45, 1488–1493 (1997).
Article Google Scholar
Hackbusch, W. Hierarchical Matrices: Algorithms and Analysis (Springer, 2015).
Wimmer, M. Quantum Transport in Nanostructures: from Computational Concepts to Spintronics in Graphene and Magnetic Tunnel Junctions. PhD thesis, Univ. Regensburg (2009).
Li, L. in Gratings: Theory and Numeric Applications, 2nd edn (ed. Popov, E.) ch. 13 (Institut Fresnel, 2014).
Bienstman, P. Rigorous and Efficient Modelling of Wavelength Scale Photonic Components. PhD thesis, Ghent Univ. (2001).
Mounaix, M. et al. Spatiotemporal coherent control of light through a multiple scattering medium with the multispectral transmission matrix. Phys. Rev. Lett. 116, 253901 (2016).
Article Google Scholar
Fisher, D. S. & Lee, P. A. Relation between conductivity and transmission matrix. Phys. Rev. B 23, 6851–6854 (1981).
Article MathSciNet Google Scholar
Amestoy, P. R., Duff, I. S., Koster, J. S. & L’Excellent, J.-Y. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23, 15–41 (2001).
Article MathSciNet MATH Google Scholar
Petra, C. G., Schenk, O., Lubin, M. & Gäertner, K. An augmented incomplete factorization approach for computing the Schur complement in stochastic optimization. SIAM J. Sci. Comput. 36, C139–C162 (2014).
Article MathSciNet MATH Google Scholar
Zhang, F. The Schur Complement and Its Applications (Springer, 2015).
Oskooi, A. & Johnson, S. G. in Advances in FDTD Computational Electrodynamics: Photonics and Nanotechnology (eds Taflove, A., Oskooi, A. & Johnson, S. G.) ch. 4 (Artech House, 2013).
Amestoy, P. R., Duff, I. S., L’Excellent, J.-Y. & Rouet, F.-H. Parallel computation of entries of A⁻¹. SIAM J. Sci. Comput. 37, C268–C284 (2015).
Article MathSciNet MATH Google Scholar
Amestoy, P. R., de la Kethulle de Ryhove, S., L’Excellent, J.-Y., Moreau, G. & Shantsev, D. V. Efficient use of sparsity by direct solvers applied to 3D controlled-source EM problems. Comput. Geosci. 23, 1237–1258 (2019).
Article MathSciNet MATH Google Scholar
Hackbusch, W. & Drechsler, F. Partial evaluation of the discrete solution of elliptic boundary value problems. Comput. Vis. Sci. 15, 227–245 (2012).
Article MathSciNet Google Scholar
Shin, W. MaxwellFDFD. GitHub https://github.com/wsshin/maxwellfdfd (2019).
Shin, W. FD3D. GitHub https://github.com/wsshin/fd3d (2015).
Hsu, C. W. RGF. GitHub https://github.com/chiaweihsu/RGF (2022).
Liu, V. & Fan, S. S⁴: a free electromagnetic solver for layered periodic structures. Comput. Phys. Commun. 183, 2233–2244 (2012).
Article MathSciNet MATH Google Scholar
Safadi, M. et al. Coherent backscattering of entangled photon pairs. Nature Physics https://doi.org/10.1038/s41567-022-01895-3 (in press).
Lalanne, P. & Chavel, P. Metalenses at visible wavelengths: past, present, perspectives. Laser Photon. Rev. 11, 1600295 (2017).
Article Google Scholar
Aieta, F. et al. Aberration-free ultrathin flat lenses and axicons at telecom wavelengths based on plasmonic metasurfaces. Nano Lett. 12, 4932–4936 (2012).
Article Google Scholar
Pu, M., Li, X., Guo, Y., Ma, X. & Luo, X. Nanoapertures with ordered rotations: symmetry transformation and wide-angle flat lensing. Opt. Express 25, 31471–31477 (2017).
Article Google Scholar
Yoon, S. et al. Deep optical imaging within complex scattering media. Nat. Rev. Phys. 2, 141–158 (2020).
Article Google Scholar
Yao, W., Verdugo, F., Christiansen, R. E. & Johnson, S. G. Trace formulation for photonic inverse design with incoherent sources. Struct. Multidisc. Optim. 65, 336 (2022).
Article MathSciNet Google Scholar
Haynes, M., Stang, J. & Moghaddam, M. Real-time microwave imaging of differential temperature for thermal therapy monitoring. IEEE Trans. Biomed. Eng. 61, 1787–1797 (2014).
Article Google Scholar
Virieux, J. & Operto, S. An overview of full-waveform inversion in exploration geophysics. Geophysics 74, WCC1–WCC26 (2009).
Article Google Scholar
Groth, C. W., Wimmer, M., Akhmerov, A. R. & Waintal, X. Kwant: a software package for quantum transport. N. J. Phys. 16, 063065 (2014).
Article Google Scholar
Amestoy, P. R., Buttari, A., L’Excellent, J.-Y. & Mary, T. On the complexity of the block low-rank multifrontal factorization. SIAM J. Sci. Comput. 39, A1710–A1740 (2017).
Article MathSciNet MATH Google Scholar
Hughes, T. W., Minkov, M., Liu, V., Yu, Z. & Fan, S. A perspective on the pathway toward full wave simulation of large area metalenses. Appl. Phys. Lett. 119, 150502 (2021).
Article Google Scholar
Gedney, S. in Computational Electrodynamics: The Finite-Difference Time-Domain Method, 3rd edn (eds Taflove, A. & Hagness, S. C.) ch. 7 (Artech House, 2005).
Farjadpour, A. et al. Improving accuracy by subpixel smoothing in the finite-difference time domain. Opt. Lett. 31, 2972–2974 (2006).
Article Google Scholar
Shin, W. & Fan, S. Choice of the perfectly matched layer boundary condition for frequency-domain Maxwell’s equations solvers. J. Comput. Phys. 231, 3406–3431 (2012).
Article MathSciNet MATH Google Scholar
Lin, H.-C., Wang, Z. & Hsu, C. W. Source data for ‘Fast multi-source nanophotonic simulations using augmented partial factorization’ [Data set]. Zenodo https://doi.org/10.5281/zenodo.7306089 (2022).
Lin, H.-C., Wang, Z. & Hsu, C. W. MESTI. GitHub https://github.com/complexphoton/MESTI.m (2022).
Lin, H.-C., Wang, Z. & Hsu, C. W. complexphoton/MESTI.m. Zenodo https://doi.org/10.5281/zenodo.7295995 (2022).

Download references

Acknowledgements

We thank Y. Bromberg, M. Safadi, A. Goetschy, C. Sideris, A. D. Stone, S. G. Johnson, S. Li and M. Torfeh for useful discussions. This work is supported by the National Science Foundation CAREER award ECCS-2146021 and the Sony Research Award Program. Computing resources are provided by the Center for Advanced Research Computing (CARC) at the University of Southern California.

Author information

Authors and Affiliations

Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
Ho-Chun Lin, Zeyu Wang & Chia Wei Hsu

Authors

Ho-Chun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chia Wei Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.-C.L. and C.W.H. performed the simulations and data analysis. C.W.H., H.-C.L. and Z.W. wrote the APF codes. C.W.H. developed the theory and supervised the research. All authors contributed to designing the systems, discussing the results and preparing the manuscript.

Corresponding author

Correspondence to Chia Wei Hsu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Jie Pan, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sects. 1–16, Figs. 1–11 and Table 1.

Peer Review File

Supplementary Video 1

Intensity profile of light transmitted through the millimetre-wide hyperbolic metalens as the incident angle varies. The profiles are normalized such that the incident flux is the same for all incident angles, and the colour bar is saturated near normal incidence to show the profiles at oblique incidence. The Strehl ratio and transmission efficiency are also shown.

Supplementary Video 2

Corresponding intensity profiles for the quadratic metalens. See the caption of Supplementary Video 1.

Source data

43588_2022_370_MOESM5_ESM.xls

Numerical source data for computing time and memory usage in Fig. 2.

43588_2022_370_MOESM6_ESM.xls

Numerical source data for computing time and memory usage in Fig. 4.

43588_2022_370_MOESM7_ESM.xls

Numerical source data for the Strehl ratio, transmission efficiency and relative error in Fig. 5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lin, HC., Wang, Z. & Hsu, C.W. Fast multi-source nanophotonic simulations using augmented partial factorization. Nat Comput Sci 2, 815–822 (2022). https://doi.org/10.1038/s43588-022-00370-6

Download citation

Received: 29 June 2022
Accepted: 10 November 2022
Published: 15 December 2022
Issue Date: December 2022
DOI: https://doi.org/10.1038/s43588-022-00370-6

This article is cited by

Coherent backscattering of entangled photon pairs
- Mamoon Safadi
- Ohad Lib
- Yaron Bromberg
Nature Physics (2023)
Efficient simulators for multi-source nanophotonics
- Haitao Liu
Nature Computational Science (2022)