Machine Learning Inverse Problem for Topological Photonics

Topological concepts open many new horizons for photonic devices, from integrated optics to lasers. The complexity of large scale topological devices asks for an effective solution of the inverse problem: how best to engineer the topology for a specific application? We introduce a novel machine learning approach to the topological inverse problem. We train a neural network system with the band structure of the Aubry-Andre-Harper model and then adopt the network for solving the inverse problem. Our application is able to identify the parameters of a complex topological insulator in order to obtain protected edge states at target frequencies. One challenging aspect is handling the multivalued branches of the direct problem and discarding unphysical solutions. We overcome this problem by adopting a self-consistent method to only select physically relevant solutions. We demonstrate our technique in a realistic topological laser design and by resorting to the widely available open-source TensorFlow library. Our results are general and scalable to thousands of topological components. This new inverse design technique based on machine learning potentially extends the applications of topological photonics, for example, to frequency combs, quantum sources, neuromorphic computing and metrology.

The rapidly growing interest in topological photonics [1,2] is leading to the design of complex structures for the many applications of optical topological insulators. [3] One leading goal of topological photonics is photon transport protected from unwanted random scattering. This is achieved by realizing analogues of the quantum Hall effect [8][9][10] through magnetic-like Hamiltonians in photonic systems [11]. In the optical domain, topological insulators [12] have been implemented in modulated honeycomb lattices [11], in arrays of coupled optical-ring resonators [13] and optical quantum walks [14]. Geometryindependent topological structures have been proposed to obtain nonreciprocal single mode lasing [15][16][17][18] as well as systems with balanced gain and loss for parity-time symmetric structures with topological order [19,20]. Emulations of four-dimensional physics have also been reported [21,22].
One challenge in this field is to find an effective methodology for the inverse problem in which the target optical properties result from topological characteristics. Although various computational techniques are available, these require specific implementations tailored to the task at hand. Machine learning (ML) [4][5][6] has recently been proposed as an encompassing technology for dealing with greatly differing problems through a unified approach. ML techniques have shown a remarkable growth in sophistication and application scope in multiple fields [23][24][25]; ML offers exciting perspectives in topological photonics. ML is applied in two main classes of problems: (i) classification for categorizing information and (ii) regression to predict continuous values from supervised training. Unlike parametric regression -in which a best fit of the data is determined on the basis of a specific function -ML regression employs a neural network (NN) emulating the behaviour of the data on which it has been trained: "the NN learns the model".
In this paper, we employ ML regression for solving the inverse problem in topological photonics. We apply advanced ML techniques to design photonic topological insulators enabling innovative applications through custom tailoring of desired optical parameters. In our approach, we introduce a twist in order to ensure that only physically possible solutions are found. This twist is based on a self-consistent cycle in which a tentative solution obtained from the inverse problem NN is run through the direct problem NN in order to ensure that the solution obtained is indeed viable.
We consider one of the simplest structures that support non-trivial topological properties. In one-dimensional (1D) systems, synthetic magnetic fields occur by lattice modulation [26] of the optical structure. In the Aubry-Andre-Harper (AAH) model [27,28], identical sites -resonators, two-level atoms, waveguides, etc. -are centered at positions z n = d o n + ηδ H n , with n an integer label, d o the primary lattice period, η the modulation strength and δ H n = cos(2πβn+φ) the Harper modulation [28]. The parameter β is the frequency of the Harper modulation. Together, β and the phase shift φ furnish the topological properties by a "two-dimensional ancestor" mapping [29]. The 2D ancestor is characterized by the dependence of the dielectric function on the coordinate z and on the parameter φ, which acts as a periodic artificial coordinate. Hence the phase φ can be treated as a wave vector in a fictitious auxiliary direction [29]. For β = p/q with p > 0 and q > 0 integers, the lattice displays two commensurate periods with q sites z n in the unit-cell.  erly chosen parameters give rise to nontrivial topological phases with protected states at the border of the structure. These "edge-states" are hallmarks of topological insulators. The phase φ tunes edge-state eigenfrequency in the photonic band-gaps.
Our photonic topological insulator is an array of layers A of normalized thickness ξ = L A /d o , centered in z n , in an homogeneous bulk of material B. This kind of structure can be effectively modeled by the transfer matrix technique [20,30], as reported in Fig. 1a. In this figure A 0 and A n are the initial and final amplitudes of the right-travelling waves; while B 0 and B n are their equivalent for the left-travelling wave amplitudes. As detailed in Methods, we obtain the transfer matrix for the single period T (1) (ω, φ, ξ) with elements T 21 and T 22 . Fig. 1a shows the final wave amplitudes A n , B n by the n-fold repeated action of T (1) (ω, φ, ξ) on A 0 , B 0 . The dielectric constant profile -for the case β = 1/3 -is schematically illustrated in Fig. 1b.
As detailed in Methods and illustrated in Fig. 1d, enforcing boundary conditions at the left edge [32,33] and defining the function Q(ω, φ, ξ) enables one to establish the presence of edge states corresponding to poles ω t of the reflection coefficient. However, the function ω t = ω(χ, ξ) cannot be analytically inverted to express the geometrical parameters χ and ξ in terms of the variable ω t . Exploiting ML techniques we solve this inverse problem and design topological insulators with target edge modes.
The inverse problem in artificial NN theory -and therefore in ML -is widely discussed in numerical modelling, engineering and other fields [34][35][36]. Regression in ML optimizes a NN so that a given vector input (R n ) will result in a scalar (R) output, emulating the behaviour of the training data. A regressive NN is a configuration of computational layers such that a specific set of input nodes I is connected to a single output node, through a configurable set of N h hidden layers each containing n i nodes h ij , where i = 1, ...N h and j = 1, ...n i . Examples of such regressive NNs are shown in Figs. 2a,b. A generic node k + 1, j, shown in part c, receiving as inputs h ki , with i = 1, ..., n k , yields on output h k+1j = g ( l w k+1jkl h kl + b k+1j ), with g(x) being a nonlinear activation function, w k+1jkl the weight of h kl on h k+1j with a bias term b k+1j . Following accepted practice, our activation function is g(x) = tanh(x).
Optimization of the NN is performed by minimizing a cost function by a gradient descent method that updates weights and biases. In the initial state, weights w ijkl are selected from a truncated normal and biases are set to zero. Training applies this procedure to a data-set randomly split into two separate classes: (i) an actual training set and (ii) a validation set. The network is iteratively updated until the error on the validating dataset converges to a given rate.
The inverse topological problem at hand is to obtain the desired optical behaviour: a target edge-state at frequency ω t , which is an input to the design (Fig. 2a). ML techniques achieve this result by modeling the multidimensional nonlinear relationships among all the structure parameters ω t , χ, β, A , B and ξ. In our specific case, the data-set fixes A , B , β at the values A = 9, B = 4 and β = 1/3.
First we generate a data-set to train our NNs by numerically computing the complex roots of T (1) 12 (ω, χ, ξ) covering the region interest for parameters χ and ξ. The real part of these roots, shown in Fig 3a, represents the edge states dispersion. Interestingly the same data-set can be used both for the inverse and direct NN training phase, by suitably selecting the features and target fields. The inverse problem NN (Fig. 2a) targets a value χ = χ o , a topological parameter on the basis of features including ω t . For a direct problem (Fig. 2b) the mode frequency ω t would be the target of a network whose features include the topological parameters (χ, ξ).
The data-set contains various branches since there exist an edge state for each band gap (i,j) with j = 3, as results by Eq. (2) in Methods. Due to the folding of the Brillouin zones, the edge state frequency ω(χ, ξ) is then a multi-mode function, which we unfold by introducing a label m ± ij for each mode; here i = 1, ...∞ and j = 1, ...q, while the sign ± indicates modes in the positive/negative χ domain. In Fig. 3a, data points with different ij val-ues are identified with different colors and, solving the inverse problem is a matter of determining when these surfaces intercept a specific target value of the ω axis. Three outcomes are possible: a single value for χ and ξ when a monotonic mode surface is intercepted, no solution for values of ω laying between surfaces, and multiple solutions in other cases. This implies that the feature set (χ, ξ, m ± ij ) is insufficient. To tackle this problem we take into account the trend s ± = sgn (dω t /dχ) as an additional variable. The NNs with this enlarged feature set are illustrated in Figs. 2a,b.
In the terminology used in ML the mode index m ± ij and trend s ± labels are categorical features and lead to two possible courses of action for the actual implementation of the NNs used in our problem. One in which a single NN is constructed in a hybrid feature space with both continuous variables (real valued ξ's and χ's) and categorical features, as illustrated in Fig. 2b. Another course is to adopt multiple independent NNs, one NN for each mode and each trend. The single NN approach is hindered by the presence of discontinuities in the features domain, as evidenced in Fig. 3a, so we have chosen to use multiple independent NNs.
Moreover, when considering the solution provided by the inverse NNs, we identify a specific problem in the use of ML as they may furnish solutions that are not physical. An example of this issue is given in Fig. 3b wherefor a fixed band and a fixed ξ -the curve representing ω as a function of χ is shown together with its inverse (inset 3c). Inverting the function ω(χ), we consider an interval of values for ω spanning from its minimum ω min to the maximum ω max , but for the two branches of the inverse function χ(ω) -identified by colors in inset c -the range of ω is different. For example, for the red branch the maximal value of ω is ω max < ω max . When the target frequency is outside of this range, the NN produces an output outside of the physically acceptable range for χ. The inverse NN can furnish spurious non-physical solutions.
Our approach tackles this issue by a two-step selfconsistent cycle: (i) in the first stage a desired input ω t forms part of the feature set (ω t , m ± ij , s ± ) resulting in the output χ o of the inverse NN; this set is used as input (χ o , m ± ij , s ± ) to a direct problem network; (ii) in the second stage, the target of this direct network ω sc is compared with the input value ω t and χ o is retained as a solution of the inverse model if |ω sc − ω t | < δ with δ is a user-defined small positive quantity. The value of δ affects the model accuracy. A reasonable choice can be δ ∼ E max j (with j=I,D), i.e., the maximum value of the squared error functions for the inverse (I) and the direct (D) networks. The training dataset was generated with eleven sets of ξ ranging from 0.10 to 0.20 in steps of 0.01 and for each set χ spans −π to π with 997 equally spaced values. Results The results of applying the direct and inverse NNs, portrayed in Fig. 4a and b respectively, show that the proposed method gives accurate solutions matching the original data in the whole range of interest. Figure 4 clearly shows that our machine learning strategy solves the inverse topological design problem.

DISCUSSION
The inverse problem in topological design is solved by a supervised machine learning regression technique. We employ a self-consistent procedure to rule out unphys-ical solutions enabling tailored engineering of protected edge-states. We successfully tackle multivalued functions introducing categorical features, as the trend, which tags training data according to their gradient's sign. Discontinuous domains are effectively treated by adopting multiple independent neural networks each one specific to its domain. Our general method can be extensively applied -well beyond the example considered in this work -and may also be exploited for other physical systems in topological science, as polaritonics [37,38], quantum technologies and ultra-cold atoms [39,40]. The method is scalable to very complex structures involving hundreds of topological devices, as those recently considered for large scale synchronization [41], and frequency comb generation [42], eventually including non-hermitian systems [43,44]. Further applications include 2D and 3D topological systems [15] and quantum sources and simulations [21,22].

METHODS
TensorFlow -Tensorflow is Google's versatile opensource multiplatform dataflow library capable of efficiently performing machine learning tasks such as im-plementing neural networks. Multidimensional data arrays, referred to as "tensors" are executed on the basis of stateful dataflow graphs, hence the name TensorFlow. For our final code implementation Tensorflow version 1.3 with python API bindings was used.
The nature of our problem is such that there is a discontinuity in ξ = 0 which cannot be correctly handled by a single NN bridging this point; this is relevant to both the inverse and direct cases. Breaking up the data-set into two parts to be used for two separate NNs is the simplest solution to this problem.
Another interesting aspect is related to the fact that the feature set in our inverse and direct NNs contain both continuous and discrete variables. The discrete variables can either be treated as such or handled by constructing multiple NNs each relative to a specific value of the discrete variable. The trend variable which has two possible values is one such case as is the mode number. In our code we have implemented a flexible system which allows one to decide which discrete variables are to be included in each NN, the others being broken up into arrays of NNs one for each value of the variable. Once the bookkeeping issues have been tackled this generalized approach allows one to tailor the problem to the given data-set.
Transfer matrix -Given the stepped and periodic dielectric function of period D = qd o : in each layer the electric field can be represented as the superposition of a left-and a right-traveling wave. Applying the boundary conditions, the matrices M αγ = q γ + q α 2q γ 1 r αγ r αγ 1 with α, γ = A or B and r αγ = qγ −qα qγ +qα , describe the light propagation through the interfaces, having introduced q α = (ω/c) √ α , while the propagation within each layer A and B is given by: where s n = [z n+1 −z n −L A ]/d o are the normalized thicknesses of the B layers. From these we obtain the transfer matrix for the single period T (1) (ω), the matrix connecting the fields in the left side of the elementary cell to the ones in the right side: The quantity ρ = − 1 2 T rT (1) (ω, φ, ξ) allows one to locate bulk bands in the regions where ρ 2 1, and gaps where ρ 2 > 1. Alternatively, the amplitude |r ∞ (ω, φ, ξ)| 2 of the reflection coefficient of the structure [29] r ∞ (ω, φ, ξ) = e ik(ω)D − T where e ik(ω)D is an eigenvalue of the matrix T (1) (ω, φ, ξ), can also be used to locate the gaps of the system. Band structure of the unmodulated system -The unmodulated structure (η = 0) features stopbands at is the characteristic size ratio. Q(ω, φ, ξ) function-To determine the existence of the edge states one needs to specify the boundary conditions on each edge of the structure. For the left edge this condition is given by: where A 1 and B 1 are the amplitudes of the right and lefttravelling waves in the first layer of the structure. This condition can be reformulated as det(b 1 , a 1 ) = 0 with b 1 = ((q a − q b ), (q a + q b )) T and a 1 = (A 1 , B 1 ) T , and together with the eigenvalues λ ± and eigenvectors v ± = (T (1) 12 , λ ± − T (1) 11 ) of the transfer matrix T (1) it is possible to determine existence and dispersion of edge states.
Following [32,33] it can be in fact shown that a proportionality relation exists between the boundary vector b 1 and the eigenvectors v ± of the transfer matrix. So the condition for the existence of the edge states is given by det(b 1 , v ± ) = 0 in a gap where |λ ± | < 1. This entails searching for the zeros of the function F l,± = (q A − q B )(λ ± − T   Fig. 1c, this implies that edge states exist only in the gaps where |ρ| > 1 and Q(ω, φ, ξ) · ρ > 0. At the same time, edge states cannot exist in gaps where Q(ω, φ, ξ) does not change sign. Moreover, due to a bulk-boundary correspondence [45], the number of these edge modes is equal to the modulus of the associated topological invariant |ν ij |, given by the winding number of the reflection coefficient: i.e., the extra phase (divided by 2π) of r ∞ (ω, χ) when χ varies in the range (−π, π) with ω in the stop band [46].