Levering the unique properties of light to perform computations in novel ways is a subject with a long history1,2. Although an all-optical processor for universal computing seems to be a far reach goal, photonics can provide exciting opportunities for unconventional computing building on analog logic and with novel information processing configurations. What makes photonics an intriguing option for unconventional computing are exotic potentials such as the intrinsically high speed and low energy consumption, the capabilities for massive parallelization, and long-range interactions. Nevertheless, a significant challenge in optical computing lies in the absence of appropriate computing paradigms, methods, and algorithms that harness the unique capabilities of this technology to develop efficient and application-specific photonic processors. In particular, matrix-by-vector multiplication is one of the most basic mathematical operations that lies at the core of various tasks ranging from optical convolution schemes3,4 and matrix eigenvalue solver5 to novel optical memristors6,7 and optical artificial neural networks8,9. In the past decade, with rapid technological progress, there has been a resurrection in efforts devoted to developing such programmable photonic integrated circuit that performs matrix-vector multiplication10,11,12. The utility of such a device as an energy-efficient photonic accelerator in conjunction with electronic processors appears to be a distant possibility, considering the inherent difficulties associated with scaling and precision. However, there is no doubt that an on-chip programmable photonic matrix-vector multiplier can create exciting opportunities in classical and quantum computing through various applications that range from quantum information and quantum transport simulations13,14,15, to optical signal processing16, neuromorphic computing17, and optical neural networks3,18,19, as well as putting forward a platform for rapid prototyping of linear multiport photonic devices20,21.

Indeed, the optical realization of arbitrary unitary operations has been known since the seminal paper of Reck et al.22, which originally concerned free space optics but successfully translated to photonic integrated circuits by Miller23,24,25. This architecture builds on breaking down unitary matrices of any order into lower-dimension unitary matrices, which ensures the existence of an optical realization through two fundamental building blocks that are beam splitters (couplers) and phase shifters. Despite its generality, this method uses a pyramid-shaped array of Mach-Zehnder interferometers (MZI), which is impractical for larger implementations because the number of beam splitters grows quadratically with the number of ports. In turn, Clements et al.26 introduced an alternative and symmetric rectangular-shaped array, resulting in a device with half the total optical depth and, consequently, more loss-tolerant. Such a rectangular array has been proved robust enough to create photonic realizations of Haar-random matrices27. Further unitary realizations related to other mesh geometries have been explored in28,29, as well as topological photonic lattices with hexagonal-shaped arrays of MZI30,31. In turn, free-space propagation setups have been devised based on plane-light conversion32,33 and using diffractive surface layers34.

While the latter devices originally consisted of bulky optical components, the principle has recently been applied to on-chip structures as well. Recent studies have explored the use of particular transfer matrices (henceforth called F) alternating with phase mask layers to obtain an arbitrary unitary transformation10,12,35,36,37. Pastor et al.12 considers wave propagation in multimode slab waveguides to implement a Discrete Fourier transform (DFT) as their transformation F. They showed that an arbitrary transformation could be performed when \(6N+1\) phase layers and 6N DFT elements, where N is the number of ports. Tanomura et al.35 interleave the phase masks with multimode interference couplers connected with single-mode waveguides and use simulated annealing optimization to argue for well-approximated conversions when \(M \approx N\). Fully functional unitary four-, eight-, ten-, and twelve-port devices have been proposed and manufactured38,39,40,41. Moreover, an alternative device using polarization and multiple wavelength degrees of freedom has been considered in42. Markowitz and Miri37 have explored similar structures and have found rigorous numerical evidence that interleaved phase arrays and discrete fractional Fourier transform (DFrFT) are indeed universal, whereas the use of Haar-random unitary matrices has been proved to lead to the desired universality10. A further waveguide array with varying propagation constants with step-like profiles has been reported43. The interlacing architecture appears to exhibit interesting auto-calibrating properties, which makes it resilient to fabrication errors44. Furthermore, we recently have shown that the intervening structure can go beyond implementing unitaries to directly implementing arbitrary non-unitary operations when the diagonal matrices are relaxed to leave the unitary circle in the complex domain45. In this sense, by utilizing both amplitude and phase modulations, one can realize a fully programmable device for arbitrary matrix operations45. This shows an important generalization of the previous results that show by itself an advantage of the interlacing architecture over the mesh geometries.

This manuscript discusses a broad class of universal on-chip photonic architectures based on a layered configuration of phase mask layers as programmable units interlaced with a passive random matrix F. It is shown that the proposed interlaced architecture is far more flexible by showing that broad families of matrices F can serve as the fixed intervening operator. The phases are steered to reconstruct a unitary target matrix, provided that F has well-posed properties. Numerical evidence based on rigorous optimization algorithms reveals that universality is reached for dense matrices F, while a phase transition in the accuracy of reconstructed \(N \times N\) targets occurs at \(M=N+1\), with M the total number of phase mask layers. Tests using the discrete Fourier transform (DFT) and discrete fractional Fourier transform (DFrFT) confirm the latter claim, and Haar-random matrices also show outstanding convergence. To generalize the domain of the valid intervening operators, a density criterion is derived so that matrices F can be classified according to their elements to ensure universality. To demonstrate this result, photonic lattices with uniform, nonuniform, and disordered coupling coefficients are considered as photonic realizations for the matrix F, while using the proposed density criterion, it is shown that universality is reached for specific length intervals. Furthermore, we explore waveguide coupler meshes as an alternative intervening unit F and determine the minimum number of coupler layers required to guarantee the universality of the interlacing architecture.

Results

Architecture and mathematical foundation

Figure 1
figure 1

Universal architecture scheme. The proposed architecture involving alternating layers of random unitary matrices F and diagonal phase shifts layers (PL) \(\{\phi _{n}^{(p)}\}\), with \(p=1,\ldots ,N+1\). The upper insets depict the modulus and argument of the potential candidates for the unitary matrix F, which have been selected as the DFT, DFrFT, and a random unitary matrix. The lower insets illustrate potential photonic implementations to perform the unitary matrix F.

Let us consider an arbitrary unitary matrix \(\mathscr {U}\in U(N)\), with U(N) the group of \(N\times N\) unitary matrices. Our goal is to implement a proper factorization of \(\mathscr {U}\) in terms of another unitary matrix F to be defined and a set \(\{P_{k}\}_{k=1}^{M}\) composed of phase matrices \(P_{k}=e^{i D_{k}}\), with \(D_{k}=diag(\phi _{1}^{(k)},\ldots ,\phi _{N}^{(k)})\) a diagonal matrix and \(\phi _{n}^{(k)}\in (0,2\pi ]\), where \(n\in \{1\ldots ,N\}\) and \(k\in \{1,\ldots ,M\in \mathbb {N}\}\). The factorization proposed here is such that it intercalates F with a phase matrix \(P_{k}\) through the relation

$$\begin{aligned} \mathscr {U}=FP_{M}F\ldots FP_{1}F, \end{aligned}$$
(1)

which is resourceful as it allows for optical implementations. The phase matrices can be implemented through layers of phase shifters (active optical elements), and the matrix F (passive optical element) has to be selected so that arbitrary unitary target matrices \(\mathscr {U}_{t}\) can be reconstructed with minimal error by adequately tuning the phase shifters \(\phi _{n}^{(k)}\). If the latter is achieved, it is said that the universality property has been met. An arbitrary matrix \(\mathscr {U}\in U(N)\) requires \(N^{2}\) real parameters to be fully defined. Therefore, while performing the factorization, it is vital to consider at least the same number of parameters. For the device proposed in (1), we have MN free parameters in total and expect that \(M\ge N\) in order to achieve the desired universality. Although there are some cases where \(\mathscr {U}\) has a particular symmetry that reduces the number of parameters, we aim for the general case. The proposed architecture (1) and its optical implementation are sketched in Fig. 1. On the one hand, the mathematical structure of the passive unitary F (see top panels of Fig. 1) can be that of the discrete Fourier transform (DFT), discrete fractional Fourier transform37, or simply a Haar-random matrix. On the other hand, the photonic implementation of F can be performed through different optical realizations involving a unitary wave evolution (see bottom panels of Fig. 1), such as waveguide arrays45,46, meshes of directional coupler26,28,29, and multimode interference (MMI)12,47. Particularly, MMI couplers have been shown to be suitable to represent the DFT matrix12,48, whereas waveguide arrays lead to simple representations of the DFrFT46.

Figure 2
figure 2

Numerical universality test. (a) Architecture depiction (left column) and optimization objective function (right column) for 100 target matrices at various values of M and N. Black boxes denote any possible realization for the F matrix. (b) Multiple trials for \(N=8\) and \(M=9\) using 250 random F matrices were considered; 250 targets were used for each matrix F. Shown is the distribution of the number of LMA runs to achieve a norm lower than the stopping norm of \(10^{-10}\), with a maximum of 50 iterations per run. (c) Norm (log\({}_{10}\)) in terms of the number of iterations for the run with the best norm. Using 100 random matrices F, each with a single target matrix.

Here, we focus on architectures based on the first two solutions, the numerical analysis and universality of which are discussed below. Layered architectures akin to Eq. (1) have been numerically validated in previous works using different optical arrays and MZI meshes, where different optimization algorithms such as gradient-descent, stochastic gradient descent, simulated annealing, and basin-hopping have been implemented. See for instance10,40,43,49. In order to demonstrate the universality of this device, we optimize the NM phases for a variety of randomly chosen target matrices \(\mathscr {U}_{t}\) generated in accordance with the Haar measure50. The objective function to be minimized, also called error norm, is defined by

$$\begin{aligned} L = \frac{1}{N^2} ||\mathscr {U}-\mathscr {U}_{t}||^2 \end{aligned}$$
(2)

where \(\Vert \cdot \Vert\) stands for the Frobenius definition of the norm, \(\mathscr {U}_{t}\) is the target matrix being tested, and \(\mathscr {U}\) is the reconstructed matrix using the factorization (1). The Levenberg-Marquardt (LM) algorithm51,52 is used to find the minimum of this function. For a given target \(\mathscr {U}_{t}\), the phases are randomly initialized between 0 and \(2\pi\). The optimization was performed in MATLAB. In Fig. 2a, the norm for 100 target unitary matrices is shown for various cases, fixing the default tolerance values to \(10^{-10}\). A phase transition occurs between the \(M=N\) and \(M=N+1\) layers, which is unsurprising given the system becomes over-determined by N parameters. These jumps are larger than reported by Tang et al.53 due to their usage of a probabilistic algorithm (Simulated Annealing) rather than a gradient-based one such as LMA. A downside of gradient-based approaches is they may require many runs with different starting conditions. We can decrease the overhead by using a stopping criteria for the norm along with a maximum iteration for each run of LMA. Using a maximum iteration of 50, we find that we rarely need more than 100 runs in the case \(N=8\), \(M=9\) to achieve norms less than \(10^{-10}\) (Fig. 2b). For systems with a lower number of ports N, the distribution skews towards lower values. To more confidently label choices of F which are not Haar-random generated as “bad” mixing layers, we set the maximum number of runs somewhat higher to 250 or 500 as found appropriate.

The density criterion and the Goldilocks principle

Our preliminary numerical results suggest that Haar random matrices F also possess the required properties to render a universal architecture from (1). Random matrices drawn from the Haar measure are typically dense matrices whose sparsity is low, suggesting that density might be a criterion to classify F matrices as good candidates. To test this idea, it is thus desirable to find a proper measure to quantify the density for unitary matrices. Indeed, random matrix theory establishes robust criteria for studying random complex-valued matrices at the limit of large size N based on the singular value decomposition (SVD)54. On the one hand, in the current setup, we focus on unitary matrices of relatively small size, as the number of ports in our architecture is not necessarily long enough to take into account the asymptotic analysis of random matrix theory. On the other hand, the singular value decomposition is not particularly helpful when dealing with unitary matrices, as for a complex-valued matrix A, the SVD requires the computation of the spectral properties of \(AA^{\dagger }\) and \(A^{\dagger }A\), which for unitary matrices is always equal to the identity matrix. For these reasons, the density criterion discussed below suits better for the interlaced architectures here constructed.

To better understand the importance of dense matrices in our universal architecture, let us further inspect the factorization in (1). By fixing \(F=diag(e^{i \xi _{1}},\ldots , e^{i \xi _{N}})\) as a diagonal unitary matrix, for \(\xi _{j}\in (0,2\pi ]\) for \(j=1,\ldots ,N\), it is straightforward to notice that (1) reduces to a diagonal unitary matrix as well, which is far from representing a universal device. That is, diagonal F matrices can only reconstruct diagonal unitary matrices. This behavior extends to any of the \(N!^2\) different permutations allowed to the diagonal matrix F, as the factorized matrix \(\mathscr {U}\) acquires the same structure as that of the permuted F matrix. Now, let us consider the set of unitary matrices \(\{\mathscr {V}_{n_{j}}\}_{j=1}^{k}\), where \(\mathscr {V}_{n_{j}}\in U(n_{j})\) and \(\sum _{j=1}^{k}n_{j}=N\), such that \(F=diag(\mathscr {V}_{n_{1}},\ldots ,\mathscr {V}_{n_{k}})\) is an N-dimensional block-diagonal unitary matrix. This particular selection for F leads to a unitary operator with the same structure, lacking the required universality property. Although block-diagonal matrices F lose their structure when randomly rearranged, anti-diagonal block matrices will always result in either block-diagonal or anti-block-diagonal matrices, neither of which are universal.

Thus, we have identified a particular class of bad-performing interlacing matrices F, which are useful to trace a suitable Goldilocks region where the passive F matrices have the required density property. To this end, let us first remark that any N-dimensional unitary matrix can be written as \(\mathscr {V}=(\vec {v}_{1},\ldots ,\vec {v}_{N})\), where \(\vec {v}_{j}\in \mathbb {C}^{N}\) are complex-valued column vectors that form an orthogonal set through the Euclidean inner product in \(\mathbb {C}^{N}\), i.e., \(\vec {v}_{j_{1}}\cdot \vec {v}_{j_{2}}\equiv \vec {v}_{j_{1}}^{\dagger }\vec {v}_{j_{2}}=\delta _{j_{1},j_{2}}\). From the orthogonality condition, we can simply focus on the columns (or equivalently the rows) of \(\mathscr {V}\), whereas the normalization imposes a constraint on elements across the columns (rows). Since the elements of \(\mathscr {V}\) are complex numbers, we alternative work with the matrix \(\widetilde{\mathscr {V}}\), composed of the modulus of the elements of \(\mathscr {V}\). Let us define \(v_{p;q}\) as the q-th element of the p-th column vector \(\vec {v}_{p}\), with \(p,q=1,\ldots , N\), so that \(\widetilde{\mathscr {V}}_{p;q}=\vert v_{p;q}\vert\). From the unitarity of \(\mathscr {V}\), it follows that \(\sum _{q=1}^{N}\vert v_{p;q}\vert ^{2}=\sum _{p=1}^{N}\vert v_{p;q}\vert ^{2}=1\) for all pq, so that we can focus on the density of either the columns or rows of \(\widetilde{\mathscr {V}}\). Without loss of generality, we work with the columns. Since we are interested in how the elements are sparse across each column, we compute the corresponding variance

$$\begin{aligned} S_{p}=\frac{1}{N}-\mu _{p}^{2}, \quad \mu _{p}=\frac{\sum _{q=1}^{N}\vert v_{p;q}\vert }{N}\ge \frac{1}{N}, \end{aligned}$$

for each column p.

Following the normalization condition, it is straightforward to prove that the variance is a bounded quantity in the closed interval \(S_{p}=\left[ 0,\frac{N-1}{N^{2}}\right]\), where the lower bound corresponds to the case where all the elements of \(\vec {\widetilde{v}}_{p}\) are equal, i.e., \(\vec {\widetilde{u}}_{p}\) is maximally spread. The upper bound corresponds to the case where \(\vec {\widetilde{v}}_{p}\) is one of the canonical basis vectors, \((0,\ldots ,1,\ldots ,0)^{T}\). Thus, a given column p of \(\mathscr {V}\) is said to be denser if its corresponding variance \(S_{p}\) approaches \((N-1)/N^2\). The more sparse the elements of the column p, the more \(S_{p}\) approaches 0. These are the key ideas we use henceforth to characterize density across the full matrix \(\mathscr {V}\). Let \(\mathscr {S}=\{S_{p}\}_{p=1}^{N}\) be the set of variances associated with each column of \(\widetilde{\mathscr {V}}\). We define the mean \(\widetilde{\mu }\) and standard deviation \(\widetilde{\sigma }\) associated with the elements of \(\mathscr {S}\), so that the density of a given unitary matrix \(\mathscr {V}\) can be characterized by defining the point

$$\begin{aligned} \vec {R}:=(N\widetilde{\mu },N\widetilde{\sigma })). \end{aligned}$$
(3)

Since row permutation leaves \(S_{p}\) invariant and column permutation only permutes the index p, the quantities \(\widetilde{\mu }\) and \(\widetilde{\sigma }\), and consequently the point \(\vec {R}\), are permutation invariant.

There are two note-worthy extremal cases, namely the maximally dense and the diagonal cases (sparsest cases) unitary matrices. In the former case, \(\mathscr {V}\) is composed of column vectors so that the variances vanish, \(S_{p}=0\), for all \(p=0,\ldots ,N\). The DFT matrix of dimension N is such an example, leading to \(\widetilde{\mu }=\sigma =0\), which we consider as the ideal case. For the second case, the variances are maximal per each column, \(S_{p}=(N-1)/N^{2}\), for all \(p=1,\ldots ,N\), and the statistical information of the matrix reduces to \(\widetilde{\mu }=(N-1)/N^{2}\) and \(\widetilde{\sigma }=0\). We thus have two comparison points, from which we find the bounded interval \(N\widetilde{\mu }\in [0,(N-1)/N]\). Additional reference points can be traced out if we take the block-diagonal matrices \(F=diag(\mathscr {V}_{n_{1}},\ldots ,\mathscr {V}_{n_{k}})\), with \(\mathscr {V}_{n_{j}}\in U(n_{j})\) unitary and maximally dense (DFT) matrices of dimension \(n_{j}\) for \(j=1,\ldots ,k\), \(1\le k\le N\), and \(\sum _{j=1}^{k}n_{j}=N\), so that bad-performing matrices are generated (see discussion above). Particularly, let us consider the case \(k=2\) so that \(F=diag(\mathscr {V}_{n_{1}},\mathscr {V}_{n_{2}})\), with \(n_{1}+n_{2}=N\). One can assign the indexes \(n_{1}=\ell\) and \(n_{2}=N-\ell\), with \(\ell =1,\ldots ,\lfloor N/2 \rfloor\) nonequivalent ways to define the two block-diagonal matrices F that leads to the \(k_2\) reference points

$$\begin{aligned} \vec {R}_{k=2}=\left( \frac{2\ell (N-\ell )}{N^{2}},\frac{\sqrt{\ell (N-\ell )}(N-2\ell )}{N^{2}} \right) , \quad \ell =1,\ldots , \lfloor \frac{N}{2} \rfloor . \end{aligned}$$

Although further reference points exist for \(k\ge 3\), those points are farther from the ideal (maximally dense) case \(\vec {R}_{0}\equiv \vec {R}_{k=1}=(0,0)\) than those marked with \(k=2\), and are thus disregarded.

Figure 3
figure 3

Density estimation and performance test. (a) Points \(\vec {R}\) associated with density criterion for the set of unitary matrices \(\{e^{A_{j}},e^{D_{j}}\}_{j=1}^{50}\) (left column) and \(\{e^{B_{j}},e^{C_{j}}\}_{j=1}^{50}\) (right column). The shaded blue area denotes the Goldilocks region where universality is expected for \(N=6\). In turn, the blue heat maps denote the modulus of some particular choices of the unitary \(F=e^{iX_{j}}\) matrices, with \(X\in \{A,B,C,D\}\). (b) Error norm (log\(_{10}\)) L in (2) for each unitary matrix under consideration with fifty testing targets per matrix. (c) Mean and standard deviation \(N\widetilde{\mu }\) and \(N\widetilde{\sigma }\), respectively, related to the density estimation for each unitary matrix in (a). The horizontal blue and red lines denote the universality threshold for \(N\widetilde{\mu }\) and \(N\widetilde{\sigma }\), respectively.

In this form, we can just focus on the area spanned by the points marked between \(k=1\) and \(k=2\). Interestingly, for \(k=2\) and \(\ell =\lfloor N/2 \rfloor\), one obtains the reference points with smaller standard deviation, which are \(\vec {R}_{k=2,\ell =\frac{N}{2}}=\left( \frac{1}{2},0 \right)\) and \(\vec {R}_{k=2,\ell =\frac{N-1}{2}}=\left( \frac{1}{2}-\frac{1}{2N^{2}}, \frac{\sqrt{N^2-1}}{2N^2} \right)\) for even and odd N, respectively. Note that in the limit \(N \rightarrow \infty\), the mean converges to the non-vanishing value 1/2. In turn, the maximum standard deviation is determined by minimizing \(N\widetilde{\sigma }\) in terms of \(\ell\), from which one obtains the critical value \(\ell _{c}=\frac{N}{2\sqrt{2}}(\sqrt{2}-1)\) and the maximum standard deviation \(N\widetilde{\sigma }\vert _{\ell _{c}}=1/4\). That is, the standard deviation is bounded to the interval \(N\widetilde{\sigma }\in [0,1/4]\), where the upper bound is independent of N and is given by \(max\left( N\widetilde{\sigma }\vert _{\lfloor \ell _{c} \rfloor }, N\widetilde{\sigma }\vert _{\lceil \ell _{c} \rceil } \right)\). The latter allows us forming a polygon with vertices at \(\vec {R}_{k=1}=(0,0)\) and \(\vec {R}_{k=2}\), the area of which is non-null and finite even for \(N\rightarrow \infty\) (see Fig. 3). We focus on unitary matrices whose vector \(\vec {R}\) lies inside the latter polygon while avoiding the vertices, as the latter are the well-known bad-performing cases (with the exception of \(\vec {R}_{0}\)). We can go a step further and make a better prediction of unitary matrices by reducing the area of the polygon and imposing a threshold to \(N\widetilde{\sigma }\) and \(N\vec {\mu }\). For the former, we already know that the standard deviation reaches its maximum value at \(N\widetilde{\sigma }\vert _{k=2,\ell _{\sigma }}=1/4\) for all N. We thus implement the threshold at half of the maximum allowed standard deviation, i.e., \(N\widetilde{\sigma }_{th}=1/8\). For even N, the reference points \(\vec {R}_{k=2,\ell \ge \ell _{\sigma }}\) are above such a threshold for \(\ell _{\sigma }=\frac{N}{2}-\lceil \frac{N}{4}\sqrt{2-\sqrt{3}} \rceil\). Likewise, we fix the threshold for the mean at \(N\widetilde{\mu }_{th}=N\widetilde{\mu }\vert _{k=2,\ell =\ell _{\sigma }}=2\ell _{\sigma }(N-\ell _{\sigma })/N^{2}\).

Therefore, the Goldilocks region is defined as the region spanned by the interception of the polygon spanned by the set of points \(\{\vec {R}_{0}\}\cup \{\vec {R}_{k=2}\}_{\ell =1}^{\lfloor \frac{N}{2} \rfloor }\) and the thresholds \(N\widetilde{\mu }_{th}\) and \(N\widetilde{\sigma }_{th}\). This region is illustrated in Fig. 3a by the blue-shaded area. Any unitary matrix F whose associated vector \(\vec {R}\) lies inside the Goldilocks region is said to fulfill the Goldilocks principle; i.e., the components of F have the statistical properties to be deemed as dense enough to render a universal factorization in (1).

Random matrices and performance test

To test the performance of the Goldilocks principle, we generate sets of random \(6\times 6\) unitary matrices (not necessarily Haar random) and determine the corresponding \(\vec {R}\) in the plane \((N\widetilde{\mu },N\widetilde{\sigma })\). To ensure control over the testing matrices, we generate random matrices using the decomposition \(F_{X}=e^{i X}\), with \(X^{\dagger }=X\) a Hermitian matrix in \(\mathbb {C}^{N\times N}\) to be defined. Furthermore, we introduce the four non-symmetric matrices \(X_A=(\vec {\chi }_{1;A},\vec {0},\vec {0},\vec {0},\vec {0},\vec {0})\), \(X_B=(\vec {\chi }_{1;B},\vec {\chi }_{2;B},\vec {0},\vec {0},\vec {0},\vec {0})\), \(X_C=(\vec {\chi }_{1;C},\vec {\chi }_{2;c},\vec {\chi }_{3;C},\vec {0},\vec {0},\vec {0})\), and \(D=(\vec {\chi }_{1;D},\vec {\chi }_{2;D},\vec {\chi }_{3,D},\vec {\chi }_{4;D},\vec {0},\vec {0})\), with \(\vec {0}\) the column null-vector in \(\mathbb {C}^{N}\). The column vectors \(\vec {\chi }_{j;\wp }\) are composed of zeros in the first j inputs and random numbers elsewhere, with \(\wp \in \{A,B,C,D\}\). We thus consider the Hermitian construction as \(X=X_{\wp }+X_{\wp }^{\dagger }\), with \(\wp \in \{A,B,C,D\}\). In this form, the number of random parameters increases in each case as additional non-null columns are included, rendering random unitary matrices defined by higher number of random parameters. In other words, the density of the random matrices is expected to be higher for D than ABC.

In this form, we establish a controlled benchmark for the Goldilocks principle and the subsequent universality of the matrix under consideration. Here, 50 target unitary matrices are considered for each testing unitary matrix \(F_{X_{j}}\) so that the relative error of the optimized targets from (1) and the corresponding vector \(\vec {R}_{X_{j}}\) can be analyzed for a broad number of cases. Particularly, Fig. 3a depicts the reference points \(\vec {R}_{k}\) (filled-circles) and the points \(\vec {R_{X_{j}}}\) associated with the sets of random unitary matrices \(F_{X_{j}}\) for \(X\in \{A,B,C,D\}\) and \(j\in \{1,\ldots ,50\}\). This allows determining which unitary matrices have points \(\vec {R_{x_{j}}}\) lying inside the Goldilocks region. One may notice that, for the random unitary matrices \(F_{A}\), only the points \(A_{4}\) and \(A_{7}\) are expected to fulfill the Goldilocks principle. In turn, we expect more well-behaved matrices \(F_{D}\), with only \(D_{9}\) outside of the Goldilocks region. This is corroborated in Fig. 3b, where the optimization routine using the LMA has been implemented for each testing target matrix. Indeed, the error norm for the matrices \(A_{4}\) and \(A_{7}\) render values within the preestablished tolerance values, as predicted in Fig. 3a. On the other hand, the numerical optimization also reveals that \(A_{10}\) and \(D_{9}\) should be good candidates, whereas the density criterion has ruled them out. This is an example of false-negative outcomes. As seen from both Fig. 3b,c, this is usually the case for matrices F with \(\vec {R}_{F}\) lying in the vicinity of the Goldilocks region. Due to the existence of false-negative results, we deem the density criterion as only a sufficient condition. Likewise, a similar analysis can be carried out for the testing matrices \(F_{B_{j}}\) and \(F_{C_{j}}\). To complement the analysis, and to better visualize and assess the Goldilocks region, we put forward an alternative representation in Fig. 3c, where we depict both \(N\widetilde{\mu }\) and \(N\widetilde{\sigma }\) separately. Here, the Goldilocks principle is established if both quantities lie below their respective thresholds \(N\widetilde{\mu }\) and \(N\widetilde{\sigma }\). In this form, it is no longer required to draw the universality region, and one can assess the Goldilocks principle in a simple plot.

As previously discussed, the the computational time required to optimize (1) and test the corresponding g universality for a given choice of F increases with the total number of ports N. In turn, the identification of the Goldilocks region and the associated vector \(\vec {R}\) for a specific matrix F enables a quick classification for preselecting the feasibility of the matrix. Furthermore, the Goldilocks region spanned in the \(\vec {R}\)-space is finite and non-null for \(N\rightarrow \infty\), making it a suitable measure for architectures with an arbitrary number of ports.

Photonic platform and feasible realizations

So far, the universality of the proposed architecture has been numerically established using different choices of the intervening matrix F as the passive mixing layer possessing the required density criterion. In the following, we will discuss potential candidates for creating F matrices using photonic systems, with a specific emphasis on photonic lattices and meshes of directional couplers. Such systems can be readily implemented with silicon photonics. One can build such structures, e.g., using buried silicon waveguides at the telecommunication wavelength of 1550nm. The waveguide system comprises a silicon (Si) core surrounded by a silica (SiO2) cladding and substrate with refractive indices \(n_{\text {Si}}=3.47\) and \(n_{\text {SiO2}}=1.47\), respectively. The core dimensions are 500nm in width and 220nm in height. Such a geometry renders the fundamental quasi-TE01 mode with an effective mode index of \(n_{\text {mode}}^{(\text {eff})}=2.4456\), which is the operational mode used for the unitary devices discussed in the following. This applies to passive F matrix solutions based on both waveguide arrays and directional coupler meshes. For the active layers, phase shifters based on thermo-optical effects can be considered. This might include solutions based on metal heaters55,56, which are widely implemented by open-access foundries and occupy an approximate area of 370 \(\mu m\times\)30 \(\mu m\). Alternatively, one can consider ultra-compact phase shifters based on phase-change materials (PCMs)7, achieving phase shifts of approximately \(\pi /11\) radians per one-micron length57. Although the latter is not as broadly implemented as metal-heater solutions in foundries, it lights up the path for the future of dense programmable photonic chips.

Waveguide lattices

Waveguide arrays can be modeled with high precision using coupled-mode theory58. The latter takes into account the coupling of evanescent waves from one waveguide interacting with its nearest neighbor while neglecting farther neighbors due to their weak coupling. In this form, the effective Hamiltonian describing an array of N waveguides is characterized by a tridiagonal and symmetric matrix Hamiltonian \(\mathbb {H}\) of dimension N. The wave evolution through the lattice is ruled by the dynamical law \(i\frac{d}{dz}\vec {u}(z)=\mathbb {H}\cdot \vec {u}(z)\), where \(\vec {u}(z)\in \mathbb {C}^{N}\) is the mode field amplitude at each waveguide at the propagation distance z. Since \(\mathbb {H}\ne \mathbb {H}(z)\), the wave evolution is determined through the unitary evolution operator \(\mathbb {F}(z)=e^{-iz\mathbb {H}}\) as \(\vec {u}(z)=\mathbb {F}(z)\vec {u}(z=0)\).

Figure 4
figure 4

Photonic platform and lattice universality. Sketch for the waveguide array associated with the \(J_{x}\) lattice (a), homogeneous lattice (d), and homogenous lattice with disorder effects (e). Density criterion as a function of the lattice length \(\ell\) for \(N=10\) considering the \(J_x\) (b), homogeneous (e), and disordered (h) lattices. The corresponding numerical performance test for \(N=10\) at the reference lengths \(\ell ^{(m)}_{j}\) and \(\ell ^{(M)}_{j}\) for the \(J_x\) (c), homogeneous (f), and disordered (i) lattices.

We can thus implement waveguide arrays in the universal architecture, provided they fulfill the desired universality. To this end, we can test the behavior of a given lattice evolution operator for specific lengths using the Goldilocks principle. Particularly, we consider the photonic \(J_{x}\) lattice46, the homogeneous lattice59, and the disordered homogeneous lattice60 as the physical waveguide arrays under consideration described by the respective Hamiltonians \(\mathbb {H}^{(J_{x})}\), \(\mathbb {H}^{(h)}\), and \(\mathbb {H}^{(h,d)}\). The matrix elements of the latter are explicitly given by

$$\begin{aligned} \begin{aligned}{}&\mathbb {H}^{(J_{x})}_{p,q}:=\kappa (p)\delta _{p+1,q}+\kappa (p-1)\delta _{p-1,q}, \\&\mathbb {H}^{(h)}_{p,q}:=\kappa _{0}\delta _{p+1,q}+\kappa _{0}\delta _{p-1,q}, \\&\mathbb {H}^{(h,d)}_{p,q}:=(\kappa _{0}+\Delta \kappa _{p})\delta _{p+1,q}+(\kappa _{0}+\Delta \kappa _{p-1})\delta _{p-1,q}, \end{aligned} \end{aligned}$$
(4)

with \(p,q\in \{1,\ldots ,N\}\). Here, \(\kappa (p)=\frac{\kappa _{0}}{2}\sqrt{(N-p)p}\) stands for the coupling parameter between nearest waveguide neighbors in the \(J_x\) lattice, where \(\kappa _{0}\) is a design scaling factor, and \(\Delta \kappa _p\in N(\mu ,\sigma )\) are random numbers taken from the normal distribution \(N(\mu ,\sigma )\) characterizing the disorder effects. The coupled waveguide implementation for each Hamiltonian is depicted in Fig. 4.

Table 1 Lattice lengths at the local minima \(\ell _{j}^{(m)}\) and local maxima \(\ell _{j}^{(M)}\) of the density criterion \(N\widetilde{\sigma }\) for the \(J_{x}\) and homogeneous lattice.

The corresponding unitary evolution operators are simply given by \(\mathbb {F}^{(J_{x})}(z)=e^{iz\mathbb {H}^{(J_{x})}}\) and \(\mathbb {F}^{(h)}(z)=e^{iz\mathbb {H}^{(h)}}\). Although both are functions of the lattice length z, the \(J_x\) lattice (Fig. 4a) has equidistant eigenvalues that lead to a periodic unitary evolution operator \(\mathbb {F}^{(J_{x})}(z)\) in z, so that we can simply focus on the interval \(z\in [0,2\pi )\). We first estimate the lengths that induce universality in our architecture, which is depicted in Fig. 4b. In the latter, we mark the particular lengths \(z_{j}^{(m)}\) and \(z_{j}^{(M)}\) that denote the local minima and maxima of the standard deviation \(N\widetilde{\sigma }\), the exact values of which have been determined numerically and presented in Table 1. In turn, the black-thick line in Fig. 4b denotes the lengths where both \(N\widetilde{\mu }\) and \(N\widetilde{\sigma }\) are below the universality threshold; i.e., the lengths where universality is expected. Without any prior performance test, one can see that the lengths \(z^{(m)}_{1}\), \(z^{(M)}_{1}\), and \(z^{(M)}_{2}\) may fail in obtaining the desired universality. Recall that our estimation criterion is only a sufficient condition and may rule out positive cases. Nevertheless, all the other marked points can be considered candidates for the matrix F in our architecture, as no false positive cases will be included. This is indeed verified in the performance test portrayed in Fig. 4c, where, for each point, we have used fifty randomly generated unitary matrices as targets. The latter confirms our predictions, where the only bad-performing length is found at \(z^{(M)}_{1}\), reinforcing the fact that only two positive cases were discarded, but no false positives were included. In this form, we can confidently conclude that a universal architecture can be built using \(J_{x}\) lattices with lengths as small as \(z=\pi /4\) for \(N=10\).

We alternatively consider the homogeneous lattice, which contains waveguide arrays homogeneously distributed (Fig. 4d). The eigenvalues accordingly distributed as \(\lambda _{n}^{(h)}=2\kappa _{0}\cos (\frac{n\pi }{N+1})\), and no periodic behavior is expected. We thus focus on the interval \(\ell \in [0,4\pi ]\) for this particular lattice. The density criterion shown in Fig. 4e reveals that lattice lengths in the interval \(z/\kappa _{0}\in (2.4098,5.9595)\) are suitable for our universal architecture. Particularly, note that the interval \(z/\kappa _{0}\in [\ell _{4}^{(m)},\ell _{6}^{(m)}]\) contains lengths so that \(N\widetilde{\mu }\) and \(N\widetilde{\sigma }\) remain mostly constant with minor variations. Thus, the performance test in this interval is expected to perform well. In the Goldilocks region, there is a local minimum \(\ell _{2}^{(m)}\) isolated and associated with a shorter lattice length. This reference point may be useful for reducing the size of the universal structure. Figure 4f displays the corresponding performance test, which supports our previous statements. As expected, the performance for lattices with length \(\ell _{1}^{(m)}\) and \(\ell _{8}^{(M)}\) is particularly poor. However, the length \(\ell _{1}^{(M)}\) has a generally good performance, with only two test targets displaying slightly higher errors than the other well-performing cases.

We additionally take into account the effects of disorder on the homogeneous lattice, which may be caused by impurities or imperfections during the manufacturing process, resulting in waveguides not being in their ideal positions or displaying deviations in their sizes (see Fig. 4g). The defects here considered are such that the nearest-neighbor interactions deviate from the ideal homogeneous lattice by a factor of twenty percent; i.e., the disorder couplings in (4) take values from the normal distribution as \(\Delta \kappa _p\in N(\mu =0,\sigma =0.2\kappa _{0})\). Although the lattice structure is modified in the latter disorder, the estimation of density does not differ significantly from the ideal case, as shown in Fig. 4h–i.

Directional coupler mesh

Figure 5
figure 5

Geometric array for the passive matrix F using power dividers (3-dB directional coupler). (a) Power divider array composed of p layers as defined in (5). Light-shaded and dark-shaded layers denote \(\mathscr {L}_{1}=\mathbb {I}_{5}\otimes \mathscr {T}_{0}\) and \(\mathscr {L}_{2}=\mathbb {I}_{1}\oplus (\mathbb {I}_{4}\otimes \mathscr {T}_{0})\oplus \mathbb {I}_{1}\), respectively. Density criterion (b) and error norm (log\(_{10} L\)) (c) of the mesh architecture in (a) as a function of the number of layers p.

Alternatively, the matrix F can be optically realized through proper mesh arrays of directional couplers. Particularly, we consider a construction based on two-port passive elements, which act as a power divider (3-dB directional coupler) equivalent, up to a global phase, to the unitary matrix \(U(2)\ni \mathscr {T}_{0}=\frac{1}{\sqrt{2}}\left( \sigma _{0}-i\sigma _{1}\right)\), with \(\sigma _{j}\) the conventional Pauli matrices for \(j\in \{1,2,3\}\) and \(\sigma _{0}\) the \(2\times 2\) identity matrix. The latter can be used as a building block to construct other U(2) matrices61 as well as higher dimensional unitary matrices U(N) through appropriate Kronecker products22. The silicon photonics platform discussed above allows the implementation of each 3-dB directional coupler through a coupling length and waveguide separation of 29.35 \(\mu\)m and 630 nm, respectively. In this section, we consider the symmetric 10-port array portrayed in Fig. 5a, composed of power dividers \(\mathscr {T}_{0}\) interconnected through different layers \(\mathscr {L}_{1}\) and \(\mathscr {L}_{2}\). Here, each layer is described by the U(10) matrices \(\mathscr {L}_{1}=\mathbb {I}_{5}\otimes \mathscr {T}_{0}\) and \(\mathscr {L}_{2}=\mathbb {I}_{1}\oplus (\mathbb {I}_{4}\otimes \mathscr {T}_{0})\oplus \mathbb {I}_{1}\), with \(\otimes\) and \(\oplus\) the Kronecker product and Kronecker sum (Such operations are also known as direct product and direct sum.), respectively, and \(\mathbb {I}_{n}\) the \(n\times n\) identity matrix. The p-layered unitary matrix describing the power divider array is thus given by

$$\begin{aligned} \mathbb {F}_{p}=\underbrace{\mathscr {L}_{\widetilde{p}}\ldots \mathscr {L}_{1}\mathscr {L}_{2}\mathscr {L}_{1}}_{p\text {-times}}, \quad \widetilde{p}= {\left\{ \begin{array}{ll} 1, &{} p\in \{1,3,5,7,9 \} \\ 2, &{} p\in \{2,4,6,8,10\} \end{array}\right. }, \end{aligned}$$
(5)

where we have truncated the maximum number of layers to ten.

It is not mandatory to truncate these layers, and the procedure can involve additional layers if needed. However, for practical physical implementations and to reduce the device footprint, we aim for devices with a minimal number of layers. To this end, we estimate the Goldilocks region in Fig. 5b as a function of the number of layers p, revealing that \(p=7,8,10\), are indeed good candidates. This is further corroborated through the LMA optimization results shown in Fig. 5c, which indeed shows that \(p=7,8,10\) layers render a well-performing F layer. Notably, the latter also indicates that \(p=6\) is also a valid choice. The case \(p=6\) was originally deemed inadequate from the Goldilocks principle, but Fig. 5b shows that \(N\widetilde{\sigma }\) is in the vicinity of the threshold, which, as discussed above, usually renders false negative outcomes. Despite the latter, no false positives were detected during the analysis; that is, the Goldilocks principle did not show good-performing cases that contained high error norms L. This is strictly necessary to avoid faulty designs in the final architecture.

Conclusions

We have introduced the design for a lossless universal photonic architecture based on a layered scheme of interlaced active phase shifter layers and passive random matrices. Numerical results obtained from the LMA optimization revealed that generating Haar random matrices F leads, in a vast majority of cases, to the desired universal architecture. It is observed that well-behaved matrices F show a phase transition on the error norm L of the reconstructed target at \(M=N+1\), with M the total number of phase shifter layers. In such a layer number, the error drops significantly to numerical noise values. While this is not proof that the factorization is exact, the error involved in the reconstruction process lies in the numerical error regime, and it is thus low enough to ensure that any unitary matrix is reconstructed with the desired accuracy.

Despite the accuracy of the LMA optimization, the computational time required for testing the universality of the random matrices F scales with the total number of ports, which becomes impractical for particularly large architectures. Numerical evidence shows that denser matrices perform better than sparse ones, usually involving relatively large errors. Therefore, a density criterion has been devised and introduced to classify the candidates for the matrix F used in the architecture. This criterion is built on preliminary knowledge of bad-performing matrices, such as diagonal and block diagonal matrices, which are analytically known to fail but serve as reference points to look for good-performing matrices, such as the DFT case. In this form, instead of performing a long optimization routine on the candidate for F, we simply analyze the standard deviation of the modulus of its columns or rows, which provides information about its density. This allows defining a mapping \(\vec {R}:U(N)\rightarrow \mathbb {R}^{2}\), which renders a vector that estimates whether F is suitable for the architecture. We thus possess a tool to preselect matrices F beforehand, making the design process more practical than generating and testing several random matrices.

Our tests using randomly generated unitary matrices showed that matrices within the threshold marked by the density criterion led to the required universality. Thus, universality is not limited to a specific realization of F; as shown in the results, infinitely many unitary matrices can meet our requirements. This paves the way for more efficient construction and optimization of compact devices that are simultaneously resilient to random defects. Particularly, the photonic Jx lattice was found suitable for this task at lengths different than the previously reported critical value \(\ell =\pi /2\)37. This defines intervals in the lattice length for which the architecture is universal, leading to more flexibility in the manufacturing process so that one can allow for deviations in lattice length. This fact is further supported in the context of homogeneous lattices, which are also suitable for our architecture and robust against disorder effects due to waveguide impurities or mismatching sizes. The latter was tested by introducing deviations of up to \(20\%\) into the homogeneous lattice, from which the density estimation showed no significant difference in the universality performance for the lattice lengths considered. Further constructions for the F matrices are indeed allowed, and an alternative construction based on a layered array of power dividers was shown to be efficient for our purposes, the analysis of which allowed us to determine the optimal number of passive elements required for the architecture.

The universality of the interlaced architecture in (1) can be further assessed using the density criterion for optical implementations of the passive matrix F beyond the waveguide arrays and directional coupler mesh discussed in the manuscript, as long as F is described by a unitary matrix. For instance, the DFT can be implemented using MMI couplers using the self-image property48. The MMI construction of the DFT has been used to develop an interlaced unitary akin to (1), but in such a construction, it was analytically proved that \(6N+1\) phase layers are required using a DFT as the intervening layer12. Our results in Fig. 2b provide numerical evidence that such a construction can be realized with only \(N+1\) phase layers with the prescribed accuracy. Furthermore, a microwave implementation of the DFrFT has been theoretically and experimentally validated62, offering an alternative approach for the interlaced architecture. The latter helps facilitate the design of the passive layer to reduce the overall architecture size, account for potential manufacturing errors, and diminish the device footprint. This is particularly handy when deploying more complex optical circuits.