## Introduction

A multitude of parameters determine the performance of a photonic device, encompassing the optical properties of the constituent materials, structural geometry, and dimensions. Similarly, the choice of the best design to proceed to fabrication, integration, and system implementation needs to take into account many performance criteria. These not only include the primary functionality but also other metrics, such as insertion loss, the effect of temperature, and the influence on other system components (e.g., back reflections), to name a few. Susceptibility of manufacturing yield to the inherent variability of the fabrication processes is another important consideration.

Historically, conception of a new device relies on theoretical knowledge and physical intuition to identify the potential structure and design parameter range. The design parameter space is explored semi-analytically or numerically, and the relevant performance metrics are analyzed. This approach is constrained in scope by computational resources and limited to structures governed by only a few parameters and where the evaluation process can be decomposed into sequential steps. As the scope of nanophotonics broadens in complexity and application range1,2, this conventional approach poses increasing challenges. For example, in devices employing metamaterials3,4,5,6 or complex geometries generated by inverse design and topology optimization7,8,9,10,11,12, not only the number of design parameters vastly increase but they are often strongly interdependent. Sequential optimization is no longer applicable, and simultaneous optimization of multiple parameters is required.

Optimization tools such as genetic algorithm13,14,15, particle swarm16,17, and gradient-based optimization18,19,20,21 are now commonly used to search more efficiently for high-performance designs22. More recently, supervised machine-learning methods, such as the artificial neural network, have begun to enter the fray in speeding up the search process23,24. While all these approaches represent significant improvements to the design flow, they still suffer constitutive limitations: usually a single performance criterion is optimized; only a single or a handful of optimized designs are discovered; and the optimization process needs to be repeated for modified performance criteria. Furthermore, optimized designs in isolation reveal very little on the characteristics of the design space and the influence of the design parameters on the device performance. Consequently, careful balancing of different figures of merit becomes difficult. A global perspective on the design space of nanophotonic devices is presently missing.

We propose here a methodology based on machine-learning (ML) tools to map and characterize a multiparameter design space. As a first demonstration-of-concept, we analyze a vertical fiber grating coupler in the silicon-on-insulator (SOI) platform consisting of multiple segments whose dimensions need to be optimized simultaneously17,21,25,26,27,28,29. From an initial sparse set of optimized designs, unsupervised dimensionality reduction reveals a lower-dimensional design sub-space where good designs with high-performance reside. Since computational effort grows exponentially with dimensionality, this sub-space can be mapped faster by orders of magnitude than the original design space, enabling the efficient evaluation and visualization of an arbitrary number of performance criteria. The comprehensive characterization of the continuous region that includes all possible good design solutions highlights their performance as well as structural differences and limitations. This provides a clear understanding of the design space, making possible the discovery of superior designs based on the relative priorities for a particular application. Furthermore, this global perspective on the design space can be exploited to arrive at conclusive arguments as to whether certain features can be obtained with a given structure. With the considered grating geometry, an upper limit to the minimum feature size is identified. This design bottleneck revealed by dimensionality reduction inspired a new grating structure that incorporates subwavelength metamaterial and allows a minimum feature size greater than 100 nm without compromising the device performance. The composite design space of refractive index and segment dimensions involved here can be equally well characterized using the same global mapping procedure, indicating the general applicability of our optimization approach to a wide range of high-dimensional design problems. To the best of our knowledge, this is the first time such a global perspective is obtained by leveraging unsupervised machine-learning techniques for high-dimensional design problems in nanophotonics.

## Results

### Strategy for characterizing a multiparameter design space

In designing multiparameter devices, it is often difficult or impossible to obtain extensive information on device performance variation over the large parameter space due to constrains on computational resources. We tackle this problem by introducing the three-stage process illustrated in Fig. 1.

In the first stage, multiple iterations of an optimization algorithm are used to generate a sparse collection of different good designs, i.e., designs that optimize a primary performance criterion (Fig. 1a). Supervised ML techniques are exploited to speed up the search process by quickly providing promising design candidates as starting points for the optimizer (see the Methods section). In the second stage (Fig. 1b), dimensionality reduction is applied to analyze the relationship in the parameter space between these degenerate designs. The goal is to find a lower-dimensional sub-space where all good designs reside. This design sub-space is described by significantly fewer parameters compared with the original design space. In the last stage (Fig. 1c), we map the design sub-space by computing across it all required performance criteria and identifying a continuous region of good design solutions. Through this process, the sparse initial set of good designs efficiently leads to the identification and comprehensive characterization of the continuum of all good designs in the sub-space.

A vertical fiber grating coupler with five segments per period is taken as the study case. The considered grating structure is illustrated in Fig. 2. Although desirable, perfectly vertical emission makes the design a challenging problem due to the necessity to suppress the second-order diffraction that reflects back into the waveguide26. In a recent work by Watanabe et al.17, a single optimized design was generated using particle swarm optimization, providing a good fiber–chip coupling efficiency and a fairly low level of back reflections. Each period of the grating consists of a pillar of 220 nm in height and an L-shaped section with a partial etch to 110 nm17. The multiple segments in each period need to be simultaneously optimized, and therefore provide a good target to demonstrate our machine-learning-based design approach.

### Discovery of a sparse collection of good designs

In the first stage, an in-house optimizer launched from random starting points in the original parameter space is used to search for the initial sparse set of grating designs with state-of-the-art fiber coupling efficiency. A supervised machine-learning predictor is trained to determine the diffraction angle of these complex gratings without performing a first principles Bloch mode calculation. The predictor is used to rapidly screen out random start designs that do not radiate close to the vertical direction, thereby speeding up the search process by ~250% (see the Methods section). This algorithm achieves a wide coverage of the initial design space.

For the grating illustrated in Fig. 2, the dimensions [L1L5] define the five-dimensional design parameter space we explore in this work. As a primary optimization objective, we choose the coupling efficiency η of the diffracted TE-polarized light to a standard single-mode optical fiber (SMF-28) placed vertically on top of the grating. Low back reflection r is another important criterion in minimizing the impact of the grating to other upstream components in the system30,31. These considerations lead to the formulation of the optimization problem for the first stage represented in Fig. 1a as:

$$\mathop {{{\mathrm{maximize}}}}\limits_{{{L}}_1 \cdots {{L}}_5}\quad {\eta} \left( {L_1 \cdots L_5} \right) \\ {\mathrm{subject}}{\kern 2pt} {\mathrm{to}}\quad r \left ( {L_1 \cdots L_5} \right) < - {\!\!} {15} {\mathrm{dB}} \\ 400{\kern 2pt} {\mathrm{nm}} < {\Lambda} < {1}{\kern 2pt} {\mu} {\mathrm{m}};{\kern 2pt} L_{{i}} > {50}{\kern 2pt} {\mathrm{nm}}, i = 1 \cdots 5.$$
(1)

The optimization is guided by a single performance metric, the coupling efficiency η. Only solutions with η larger than 0.74 are retained, defined here as good designs. Back reflection r is not optimized, but simply constrained by rejecting design solutions with r > −15 dB. Additional constraints on the grating period Λ and the minimum feature size are included to confine the optimization to designs that are physically manufacturable. The wavelength of light is set at λ = 1550 nm. A highly efficient Fourier-type 2-D eigenmode expansion simulator32 is used to compute the device performance. As detailed in the next section, the optimization stage is halted after a sufficient number of good designs is collected. Each good design requires on average 1000 simulations to be identified (computational details in the Methods section).

### Sub-space identification through dimensionality reduction

In this second and key step, we study the relationship between the sparse set of good solutions obtained solving the optimization problem (1) through machine-learning dimensionality reduction. The goal is to transform a set of correlated variables into a smaller set of new uncorrelated variables that retain most of the original information. Here, the linear principal component analysis (PCA)33 is used obtaining a good level of accuracy (see PCA description in the Methods section).

We find that two principal components are sufficient to accurately represent the entire pool of good designs, each defined by five segment length in the original design space. That is, all good designs approximately lie on a 2D hyperplane—the reduced design sub-space. The rest of the design space can be excluded from further investigation. As we discuss in Supplementary Note 1, post-processing error analysis demonstrates that five good designs are sufficient for PCA to provide an accurate result. In order to verify convergence we collect here 45 good designs. The linear design sub-space is defined by two orthogonal basis vectors V1αβ and V2αβ. Any design k with dimensions Lk = [L1,k L5,k] can hence be written as

$${\mathbf{L}}_{\mathrm{k}} = \alpha _{\mathrm{k}}{\mathbf{V}}_{1{\mathrm{\alpha \beta }}} + \beta _{\mathrm{k}}{\mathbf{V}}_{2{\mathrm{\alpha \beta }}} + {\mathbf{C}}_{{\mathrm{\alpha \beta }}}.$$
(2)

Cαβ is a constant vector that defines the reference origin on the hyperplane. Two scalar coefficients αk and βk are thus sufficient to completely describe design k. Details of vector definitions are provided in the Methods section.

Now that the area of interest in the design space is limited to a 2D hyperplane, it becomes feasible to adopt a classical design approach and perform an exhaustive mapping of this sub-space, as illustrated in Fig. 1c. First, we generate a uniform grid of 60 × 60 points covering the αβ hyperplane. Therefore, 3600 sampling points are sufficient to provide a wealth of information. As a comparison, sampling with the same resolution in the original design space would require ~1.5 million points, increasing the computation time by about over 400 times (details in the Methods section). For each point [αk, βk], we obtain the corresponding dimensions [L1,kL5,k] in the original design space through Eq. (2) and compute the coupling efficiency η. The results are shown in Fig. 3a only for the designs with η > 0.7. Note that not all points on the αβ plane provide high coupling efficiency. A unit division in α or β corresponds to a Manhattan distance $$\mathop {\sum }\nolimits_{i = 1}^5 |L_{i,A} - L_{i,B}|$$ of 100 nm. The 60 × 60 grid covering the αβ hyperplane has a resolution of 5 nm in Manhattan distance. This exhaustive mapping results in the discovery of a large and well-defined region of degenerate designs with η > 0.74, highlighted by the black contour line in Fig. 3a. This region encloses a continuum of designs in addition to those discovered in stage 1. Remarkably, although all the good designs have similar coupling efficiencies ranging from η = 0.74 to η = 0.76, the actual structure of the gratings can vary quite significantly, as will be detailed in the next section. Without dimensionality reduction, there is no obvious way to discover these alternative designs with similar coupling efficiency, but potentially different properties in other aspects.

As the last step, we validate the PCA outcome by verifying that the projection on the low-dimensional design sub-space (the αβ hyperplane) is sufficient to represent the region of good grating designs. We generate two additional 2D hyperplanes Γ–Π and X–Π that are orthogonal to each other and to the αβ plane (details in the Methods section). They provide two different cuts through the αβ sub-space and their projections are shown in Fig. 3a with dashed black lines. We generate a uniform grid on Γ–Π and X–Π and simulate the coupling efficiency of the corresponding grating design. Coupling efficiencies η > 0.7 are plotted in Fig. 3b and c, respectively. The intersection with the α–β hyperplane is marked with a white dashed line. The axis uses the same scale as in Fig. 3a: a unit division on X, Γ, or Π corresponds to a Manhattan distance of 100 nm. When projected on Γ–Π and X–Π hyperplanes, the region of good designs essentially reduces to a thin stripe whose thickness depends on the range of accepted coupling efficiencies η. This confirms that this region has approximately a 2D geometry. Although it appears slightly curved (see the Γ–Π projection, Fig. 3b), it is still well approximated by the αβ hyperplane which has the advantage of being a simple linear structure.

### Characterization of the low-dimensional good design sub-space

The exhaustive mapping of the sub-space of all good designs can now be readily extended to other performance metrics beyond the primary criterion originally chosen as the optimization objective (the coupling efficiency). This provides the designer a complete picture of the device behavior, including the upper and lower limit of each performance metric. Informed trade-offs and identification of the best design that fits specific application needs are hence made possible.

Along with the coupling efficiency, we evaluate here three additional criteria throughout the sub-space, i.e., back reflections, minimum feature size, and tolerance to fabrication uncertainty, as presented in Figs. 3 and  4. All maps use the same axis scale, range, and sampling as the αβ plane, and the black contour line marks the region with η > 0.74 for reference. Three designs are selected for further examination. Their structural and performance parameters are listed in Table 1. Designs 1 and 2 are on the αβ plane (marked on Fig. 3a, d), while design 3 (marked in Fig. 3b, c) is the global optimum in both Γ–Π and X–Π projections (not exactly represented on the αβ plane). The design proposed in ref. 17, which was found through particle swarm optimization, also belongs to the sub-space of good solutions and its location on the αβ hyperplane is marked for reference. Despite the very different design parameter (especially for the L-shaped structure), all these gratings have a highly directional vertical emission and good overlap with the fiber mode, leading to a high coupling efficiency (η > 0.75 as listed in Table 1). On the other hand, the attainable back reflection differs markedly, from −21 dB for design 1 to −37 dB for design 2. The possibility to exhaustively map other metrics throughout the sub-space allows the designer to identify a design area with particularly low back reflections around design 2.

As an additional comparison of the performance for designs 1–3, Fig. 3e and f plots the two performance criteria η and r as a function of wavelength, now computed using 2D finite-difference-time-domain (FDTD) method as a cross-check. The results agree well with that predicted by the Fourier-type 2D simulator. All three designs have a 1-dB bandwidth larger than the telecommunication C band (1530 nm–1565 nm, green-shaded area) with design 2 slightly out-performing the other two. Regarding back reflections, the behavior of the three designs is remarkably different (Fig. 3f). Back reflections of design 2 near wavelength 1550 nm are very low, which is particularly important for coupling to a laser. However, a reflection of less than −30 dB can only be achieved within a 7-nm wavelength band. In contrast, the reflection of design 1 and 3 oscillates between −26 dB and −17 dB within the entire C band.

Minimum feature size determines the manufacturability of any nanophotonics device. Here, feature sizes can be easily retrieved exploiting the αβ hyperplane through a query process without performing any additional photonic simulations. For each point, the dimensions [L1L5] are computed with Eq. (2) and the shortest section is identified. Figure 4a shows the shortest segment among the five-dimensional parameters. It is immediately evident that a design with minimum feature size greater than 88 nm does not exist for this grating structure, even accepting a small penalty in the coupling efficiency. This essential information can be easily retrieved because of the global perspective that our method offers, and it would be difficult to obtain with a conventional optimization procedure. Clearly, the bottleneck is predominantly in either L1 or L2. This finding inspired an improved grating structure (described in the next section) that allows a minimum feature size greater than 100 nm while maintaining similar performance.

Another important aspect for nanophotonic devices is the robustness against unavoidable fabrication uncertainty. Here, we examine two sources of common dimensional variability: A width deviation δw for both shallow and deeply etched sections and etch depth deviation δe from the nominal 110 nm for the shallow-etched section. A good measure of the sensitivity of coupling efficiency and back reflections to variability is provided by a quantity denoted here as a degradation derivative, defined as the average of the two directional derivatives with respect to positive and negative values of δw or δe (definition in the Methods section). The computed values of the four degradation derivatives are shown in Fig. 4b–e, with a high value indicating a high sensitivity. For width deviations, the coupling efficiency has a particularly sensitive region close to design 1. On the contrary, back reflection is more sensitive to width deviations in the region close to design 2. This region has a large overlap with the region of minimum back reflection shown in Fig. 3d, making design 2 and surrounding designs high-performing when back reflection is considered, but with stringent fabrication requirements. In Supplementary Note 2, we provide a direct verification of these results through a polynomial chaos-based stochastic analysis22. Regarding sensitivity to etch depth variations, the entire region of good designs largely overlaps with a region of low sensitivity for both coupling efficiency and back reflection.

### Dimensionality-reduction generality and ML inspired geometry

The proposed design approach requires no physical assumptions on the device under study, enabling its application to other design problems with different types of input parameters and/or objectives. A straightforward demonstration of its generality is carried out by designing vertical grating couplers for the optical communication O-band, centered at 1310 nm (see Supplementary Note 4). We further demonstrate here this generality by investigating a new class of grating couplers utilizing subwavelength metamaterials3,4 to achieve designs with a minimum feature size greater than 100 nm in both the propagation and the transverse directions. This new grating geometry (see Fig. 5) is inspired by the global mapping of minimum feature size described in the previous section. The optimization now involves both dimensions and the effective material index used to represent the subwavelength segments. Dimensions and refractive index have different numerical magnitudes, but they both significantly impact the device performance when varied. Below, we show that the method presented in the previous sections can successfully map out the high-performance region of this new mixed design space.

The subwavelength metamaterial grating structure, schematically shown in Fig. 5a, is represented by a mixed set of four geometrical parameters and one material parameter $${\bar{\mathbf{L}}} = \left[ {L_1,L_2,L_3,L_4,n_{{\mathrm{swg}}}} \right]$$. These complex grating couplers can still be efficiently simulated using the 2D eigenmode expansion simulator. The formulation of the optimization problem remains the same as in the Eq. (1), but we additionally set 1.6 < nswg < 3. Good designs found by the optimization algorithm (η > 0.74) are used to perform both PCA and the corresponding error analysis (see Supplementary Note 1). Since the new design space includes parameters of different nature, before executing PCA it is essential to normalize the variables through their statistically estimated standard deviations. Performing the PCA analysis (stage 2) reveals that again only two principal components are sufficient to identify all the good designs within the original 5D design space (vectors are defined in the Methods section).

In Fig. 5b and c, we exhaustively map out the coupling efficiency and the back reflection over the 2D hyperplane (stage 3), discovering again a large continuous region with η > 0.74. The map covers the region where the coupling efficiency exceeds 0.7. The black contour encloses all devices with a minimum feature size greater than 100 nm in both propagation and transverse directions and maintains η ≥ 0.74. Three devices are marked on the maps for further investigation, where independent 2D-FDTD simulations provide full wavelength analysis in Fig. 5d, e (see Supplementary Note 3 for structural details and performance metrics). The results show that physically distinct devices with similar performances can be identified very efficiently. Delivering on such objectives would be very challenging by conventional optimization methods.

## Discussion

We have demonstrated a new approach for the design of complex photonics devices with a large number of parameters, case studied on a multiparameter vertical fiber grating coupler. Rather than generating a single optimized design solution, our methodology exploits the dimensionality reduction technique from the suite of machine-learning pattern recognition tools to identify within the large design space the lower-dimensional sub-space of good devices with high performance. This approach exponentially scales down the complexity of the problem, making it feasible to exhaustively map the continuous region of grating couplers with comparable fiber coupling efficiencies ($$\eta {\,}> {\,} 0.74$$ at 1550 nm). Significant differences emerge when different performance criteria are considered, such as back reflection, minimum feature size, and tolerance to fabrication uncertainty. Such a global perspective also reveals performance and structural limitations of the design geometry. In particular, we were able to conclude that good coupling efficiency and a minimum feature size greater than 88 nm could not be obtained simultaneously for this first structure. The analysis of this shortcoming inspired a new class of grating couplers that uses subwavelength metamaterial, achieving a minimum feature size greater than 100 nm while maintaining state-of-the-art coupling efficiency and back reflection.

Given the generality of our implementation, the presented methodology can be readily exploited to navigate and comprehend a wide range of high-dimensional design spaces that photonic designers often encounter. While it is demonstrated here for two different design problems in nanophotonics, applications to photonic circuits or even sub-systems can be foreseen. This design methodology opens up new avenues in photonic device analysis and design where other dimensionality reduction methods such as Kernel PCA34, Principal manifolds35, and Autoencoders36 can deal with navigating through even more complex design spaces. Indeed, automation and integration of dimensionality reduction within the design flow will provide a powerful platform potentially transforming how advanced photonic devices are discovered and investigated.

## Methods

### Grating coupler simulation

The simulation of coupling efficiency and back reflections for each design of the grating coupler is performed exploiting either a 2D vectorial Fourier eigenmode expansion simulator32 or a commercial 2D-FDTD solver. We consider a structure including silicon substrate, 2-μm buried oxide, 220-nm-thick silicon core and a silica upper cladding of 1.5 μm thickness. Silicon and silica refractive indices are 3.45 and 1.45 at λ = 1550 nm. The mode of an SMF-28 single-mode optical fiber vertically coupled on top of the grating is modeled with a Gaussian function with a mode field diameter of 10.4 μm (λ = 1550 nm). The fiber facet is assumed to be in direct contact with the top of the upper cladding and its longitudinal position along the grating is optimized for each design to maximize the coupling efficiency. The latter is calculated as the overlap integral between the simulated field diffracted upwards by the grating and the Gaussian function.

### Machine learning enhanced optimization

We implemented a random restart local-search algorithm to solve problem (1), although other search methods could also be used. For each initial random design, small perturbations are made until a better solution in terms of coupling efficiency is found and a line search is exploited to seek further improvement until convergence. The perturbation and line search process is repeated until no improvement is found in the perturbation stage.

Following an initial optimization round with random-restart where a small collection of good designs was obtained, we trained a supervised machine-learning model, specifically gradient boosted trees, to predict if the radiation angle is within 5° of vertical. Once trained, we used the predictor to sample random designs that are nearly vertical while rejecting other random designs. Only near vertical emitting designs proceed to the local search stage. The use of the predictor in the optimization led to ~250% increase in optimizer speed to find new designs meeting the coupling efficiency criteria η > 0.74.

The performance of the general purpose machine-learning predictor is comparable to that of a predictor based on the scalar grating equation relating the section lengths, effective indices and the radiation angle:

$$\mathop {\sum }\limits_{i = 1}^5 N_iL_i = c + a\,{\mathrm{sin}}\,\theta \cdot \mathop {\sum }\limits_{i = 1}^5 L_i,$$

where $$a,c$$ are constants related to the wavelength and the overcladding effective index, $$N_{{i}}$$ is the effective index of the ith section, $$L_{{i}}$$ is the length of the ith section, and $$\theta$$ is the radiation angle. Given the simulated data ($$\left( {\theta ,L_{{i}}} \right)$$, one can use linear regression to estimate all the constants and then use those to predict the radiation angle of the structure for any combination of the section lengths. Despite the fact that the general machine-learning predictor is not aware of this approximation, its predictions are comparable with those obtained using the scalar grating equation. Furthermore, this approach can be applied to predict other quantities that cannot be described by simple closed-form expressions.

### PCA

PCA is a dimensionality reduction technique that has been used widely and successfully across various engineering and science disciplines37,38,39,40 and is implemented in most scientific computing platforms (e.g., Matlab, R, Python, etc.). Consider m data points in an n-dimensional design space. This can be written as a centered data matrix $${\mathbf{L}} \in {\Bbb R}^{m \times n}$$, where the statistical average along each dimension is subtracted. PCA finds a sequence of best orthogonal linear projections (called principal components) that maximizes the corresponding variances. Mathematically, this is done by finding a set of vectors $${\mathbf{V}}_1,{\mathbf{V}}_2, \ldots ,{\mathbf{V}}_n \in {\Bbb R}^n$$ that are L2 normalized and orthogonal, $$\left\| {{\mathbf{V}}_i} \right\|_2 = 1,{\mathbf{V}}_i\, \bot\, {\mathbf{V}}_j$$, and for every $$k < n$$, minimize

$$\left\| {{\mathbf{LR}} - {\mathbf{L}}} {\left[ {{\mathbf{R}}_{\mathbf{k}},{\mathbf{0}}^{n \times (n - k)}} \right]} \right\|_2,$$

where, $${\mathbf{R}} = [{\mathbf{V}}_1,{\mathbf{V}}_2, \ldots ,{\mathbf{V}}_n] \in {\Bbb R}^{n \times n}$$ is the transformation matrix that is formed by using the PCA vectors as its columns, $${\mathbf{R}}_{\mathrm{k}} = [{\mathbf{V}}_1,{\mathbf{V}}_2, \ldots ,{\mathbf{V}}_{\mathrm{k}}] \in {\Bbb R}^{n \times k}$$ is similar but transforms to a lower-dimensional space consisting of only $${\mathrm{k}}$$ first components, and $$0^{n \times (n - k)}$$ is a null matrix used for padding. Such an exercise also returns the weights associated with PCA vectors from which the main principal components can be selected and the remaining can be dropped, with negligible amount of information lost. The original data set can now be approximately represented by $$k < n$$ effective parameters using a matrix product $${\mathbf{L}}^{{\mathrm{PCA}}} = {\mathbf{LR}}_k$$, where each row corresponds to a transformed representation of a single data point. The transformation back to the original n-dimensional space can be done using $${\mathbf{L}}^{{\mathrm{est}}} = {\mathbf{L}}\,{\mathbf{R}}_k{\mathbf{R}}_k^{\mathrm{T}}$$, which can be used to quantify approximation errors incurred by reducing the dimensionality.

### Hyperplanes definitions

The hyperplane approximation given in Eq. (2) and computed by PCA is defined by three 5D vectors V1, V2, and C, the latter being the reference origin within the hyperplane. The scaled vectors computed for the grating structure in Fig. 2 are: V1αβ = [−0.43, 3.78, −20.82, 44.77, −30.21] nm, V2αβ = [−25.80, 10.81, −37.69, −3.86, 21.93] nm and the origin Cαβ = [102,73,156, 243,156] nm.

For the two orthogonal hyperplanes Γ–Π and X–Π described in Fig. 3b and c, the vectors are found following linear algebra considerations. Γ–Π is defined enforcing the plane to pass through the two points (defined in the 5D design space) [77, 84, 115, 249, 171] nm (design 1 in Table 1) and [95, 83, 104, 336, 98] nm and to be orthogonal to the αβ hyperplane. The resulting vectors are V1ΓΠ = [9.45, −0.35, −6.13, 45.95, −38.35] nm, V2ΓΠ = [−22.48, 33.08, 9.87, −17.87, −28.78] nm, CΓΠ = [85, 84, 110, 289, 138] nm. X–Π is defined as orthogonal to both αβ and Γ–Π and passing through the point [82, 87, 111, 283, 139] nm (design 3 in Table 1). The vectors are V1×Π = [−25.92, 11.96, −44.16, 10.05, 12.61] nm, V2×Π = V2ΓΠ, and C  = [85, 84, 110, 284, 142] nm.

For the grating structure with subwavelength transverse metamaterial (Fig. 5a), the 5D vectors defining the 2D hyperplane are V1αβ = [13.37 nm, −3.34 nm, 17.21 nm, −19.5 nm, −0.03], V2αβ = [5.07 nm, 14.07, −7.33 nm, 2.53 nm, −0.076] and Cαβ = [272 nm, 71 nm, 247 nm, 120 nm, 2.63].

### Computational resources

In the proposed method, the largest fraction of the computational time is dominated by design simulations with a negligible overhead time for data processing. The initial optimization (stage 1) enhanced by the ML angle predictor required ~5000 photonic simulations to identify the initial five good designs used to find the reduced design sub-space through PCA. The exhaustive mapping was then performed by sampling 3600 points (design) arranged in a square grid in the sub-space and doing the corresponding 3600 photonic simulations to compute simultaneously coupling efficiency and back reflections. Other sampling strategies or sparser grids can be used to reduce the number of simulations. Likewise, a different simulation approach could be used to retrieve additional metrics within the same simulation. The computation of additional performance metrics may require additional simulations, for example, when fabrication tolerance is estimated through the degradation derivatives as reported in Fig. 4b–e. On the other hand, the computation of the minimum feature size does not require any additional simulation.

It is worth calculating the number of points of a grid with the same resolution used for the αβ hyperplane, but across the five grating dimensions and covering all designs with coupling efficiency η > 0.74 found in stage 1. The five dimensions of these good designs span a range of 60 nm, 27 nm, 86 nm, 138 nm, 118 nm, respectively. A grid with 5 -nm resolution across all segments and covering all good designs, including two extra points to confirm the boundaries per segment, results in 14 × 7 × 19 × 30 × 26 ≈ 1.5 ∙ 106 points (designs). Compared with mapping directly the original 5D design space, mapping the αβ hyperplane thus reduces the computation time of about 400 times. For higher dimensions, the reduction can be even more significant.

### Uncertainty model

For the investigation of design tolerance to fabrication uncertainty, we assume a width deviation δw for both shallow-etched and deeply etched sections

$$L_1^\prime = L_1 - \delta _w;\,L_2^\prime = L_2 + \delta _w;\,L_3^\prime = L_3 - \delta _w;\,L_4^\prime = L_4;\,L_5^\prime = L_5 + \delta _w$$

We define a degradation derivative that is computed from two directional derivatives, assuming that over-etch and under-etch are equally likely (positive and negative values of δw can in general affect the device performance differently). Calculating the common derivative would not be informative, as for locally optimized devices it would be close to zero. For coupling efficiency, we are interested in calculating

$$\alpha _{\mathrm{\eta }} = - \frac{1}{2}\left( {\frac{{\partial ^ + \eta }}{{\partial \delta _w^{}}} - \frac{{\partial ^ - \eta }}{{\partial \delta _w^{}}}} \right) \cong - \frac{1}{{2|{\mathrm{\Delta }}\delta _w|}}\left( {\eta ^ + + \eta ^ - - 2\eta _0} \right),$$

where ∂+ and ∂ are the derivatives computed for positive and negative values of δw. The minus sign ensures that a positive value of αη indicates a worse (lower) coupling efficiency. Derivatives are numerically computed considering a small width variation and simulating the coupling efficiencies η+ (when Δδw = 5 nm) and η (when Δδw = −5 nm). η0 is the coupling efficiency for δw = 0. Similarly, for back reflections

$$\alpha _{\mathrm{r}} = \frac{1}{2}\left( {\frac{{\partial ^ + \eta }}{{\partial \delta _w^{}}} - \frac{{\partial ^ - \eta }}{{\partial \delta _w^{}}}} \right) \cong \frac{1}{{2|\Delta \delta _w|}}\left( {\eta ^ + + \eta ^ - - 2\eta _0} \right).$$

Also in this case, a positive αr indicates worse (higher) back reflections. The degradation derivatives plotted in Figs. 4b and c are finally computed as

$$d = \left\{ {\begin{array}{*{20}{l}} {\alpha {\quad}\,{\mathrm{if}}\,\alpha{\,} > {\,} 0}\hfill \\ {0{\quad}\,{\mathrm{otherwise}}} \end{array}} \right.$$

When etch uncertainty is considered, δe represents the variability on the 110 nm etch depth in the fourth section of the grating. The same definitions apply for the degradation derivatives.