Heavy tails and pruning in programmable photonic circuits for universal unitaries

Developing hardware for high-dimensional unitary operators plays a vital role in implementing quantum computations and deep learning accelerations. Programmable photonic circuits are singularly promising candidates for universal unitaries owing to intrinsic unitarity, ultrafast tunability and energy efficiency of photonic platforms. Nonetheless, when the scale of a photonic circuit increases, the effects of noise on the fidelity of quantum operators and deep learning weight matrices become more severe. Here we demonstrate a nontrivial stochastic nature of large-scale programmable photonic circuits—heavy-tailed distributions of rotation operators—that enables the development of high-fidelity universal unitaries through designed pruning of superfluous rotations. The power law and the Pareto principle for the conventional architecture of programmable photonic circuits are revealed with the presence of hub phase shifters, allowing for the application of network pruning to the design of photonic hardware. For the Clements design of programmable photonic circuits, we extract a universal architecture for pruning random unitary matrices and prove that “the bad is sometimes better to be removed” to achieve high fidelity and energy efficiency. This result lowers the hurdle for high fidelity in large-scale quantum computing and photonic deep learning accelerators.


Introduction
A unitary operation is an essential building block of quantum [1][2][3] and classical 4,5 linear systems because any linear operator can be decomposed into a set of unitary and diagonal operators 6 .With advances in quantum computations 3 and deep learning accelerators 7 , development of reconfigurable hardware for universal unitary operations has become a topic of intense study.A programmable photonic circuit is one of the most widely used platforms 8,9 for unitary operations in optical neural networks 4,10 , modal decoding 5 , and quantum computations [1][2][3] .
The fundamental strategy to realize universal unitary operators is to factorize the target operator of the n-degree unitary group U(n) into the diagonal operators and unitary operators of the lower-degree group, such as SU(2) 11 .These subsystems can be realized with conventional optical elements, such as beam splitters, Mach-Zehnder interferometers (MZIs), and phase shifters, which constitute a programmable photonic circuit with reconfigurable modulation.Although the mesh composed of these unit elements can perform universal unitary operations, the connectivity inside the mesh is nonunique and involves an optimal design issue for more compact and robust platforms [12][13][14][15] .To improve the original proposal-the Reck design 11 -for the mesh topology, recent approaches have successfully demonstrated advanced arrangements of two-channel subsystems-the Clements design 12 -and the advantages of utilizing multichannel building blocks-the Saygin design 13 .
When each channel of the mesh is assigned as a node, a photonic circuit can be interpreted as a graph network 16 regardless of design strategy.Accordingly, it is logical to seek inspiration from network science 17 to understand and improve the large-scale mesh topology of high-degree unitary groups, which should inherit intriguing features of complex networks.In this context, one promising issue is the degree distribution describing the differentiated importance of network nodes, which has been a hot topic through the concepts of heavy-tailed distributions, hub nodes, and scale-freeness [17][18][19][20][21] .When the multiple decomposition processes are applied to U(n) 11,12 , a natural question arises: Do every decomposition and corresponding optical element contribute equally to the designed unitary operation?The answer to this question is of fundamental and practical importance in quantum physics and photonics for devising more advanced hardware architecture applicable to universal quantum evolutions and deep learning accelerators, especially with large-scale photonic circuits.
In this paper, we reveal that some subsystems are more important than others in the common architecture of large-scale programmable photonic circuits.By applying various statistical models to programmable photonic circuits targeting universal unitaries, we verify that a type of unit rotation operator has a heavy-tailed distribution.This finding shows the presence of hub optical elements and the Pareto principle in photonic circuits, which enables the development of the pruning technique 22 for linear quantum or classical hardware.We demonstrate that the suggested hardware pruning for random unitaries allows for improved fidelity when the elements with noise above a specific threshold are removed.This result provides a novel design strategy for high fidelity and energy efficiency in large-scale quantum computations and photonic deep learning accelerators.

Rotation operators in programmable photonic circuits
Before applying the statistical analysis to large-scale programmable photonic circuits, we revisit the Clements design 12 , which is one of the most widely used architectures for universal unitaries.
Figure 1 shows a schematic of the photonic circuit for the n  n unitary matrix Un ∈ U(n) obtained from the Clements design.Both the Reck and Clements designs employ nulling the off-diagonal elements of Un by sequentially multiplying the programmable unit operations Tm l ∈ U(n) (1 ≤ m ≤ n -1, 1 ≤ l ≤ n, m and l are integers).Tm l leads to the SU(2) operation on the Bloch sphere defined for the m th and (m+1) th channels to set the off-diagonal element (l, m) or (m+1, l) to be zero.
To maximally cover the SU(2) group with Tm l , reconfigurable and independent control of the amplitude and phase differences between the m th and (m+1) th channels is necessary 8 .One of the most popular platforms for Tm l is to utilize two pairs of a stationary MZI and a tunable phase shifter in one arm 8,9,12 , which involves two adjustable parameters of θ ∈ [0, π/2] and φ ∈ [0, 2π) (Fig. 1a).While the phase shifts θ and φ correspond to tunable z-axis rotations on the Bloch sphere, the stationary MZIs constitute the -π/2 x-axis rotations (Fig. 1b,c).The unit operator then becomes where Ra m (ξ) is the ξ-rotation to the a-axis on the m-(m+1) Bloch sphere, and θ and φ are determined to satisfy nulling of the (l, m) or (m+1, l) element.The target unitary operator Un is reproduced with multiple Tm l operators and the remaining diagonal matrix Dn after nulling, as follows: , where Sn is the ordered sequence of {m, l} pairs determined by the nulling process 12 and Dn is realized with phase shifters (Fig. 1d).By newly defining Sn, the Clements design employs the highly symmetric arrangement of the MZIs (Fig. 1e), which decreases the device footprint by half and enhances robustness to optical losses compared to the Reck design (see Supplementary Note S1 for the detailed processes).
The reconfigurability for universal unitary operators is thus realized with the z-axis rotation Rz obtained from tunable phase shifts θ and φ.As programmable devices, the noise and power consumption of photonic circuits are determined by the performance of modulating optical refractive indices Δn in the phase shifters and the following changes of θ and φ, as ~LΔn, where L is the modulation length.Therefore, the statistical analysis of the two adjustable phases θ and φ is critical in examining the performance of large-scale programmable photonic circuits.

Heavy tails in rotations
Due to the highly symmetric form of the photonic circuit (Fig. 1e), at first glance, it may appear to be reasonable to predict that the building blocks Tm l in the circuit have equal importance.Under this presumption, the distributions of θ and φ should be statistically uniform for an ensemble of photonic circuits that generate random unitary operations uniformly distributed in U(n) 23 .
Furthermore, it may also seem reasonable to expect similar distributions for θ and φ, both of which perform z-axis rotations.
However, upon closer inspection, we reveal that those presumptions are invalid.Instead, there are differences in the contributions of individual building blocks as well as the rotation operators θ and φ.First, revisiting the nulling process of the Clements design 12 , we note that each off-diagonal element of Un undergoes differentiated transformations.For example, in nulling the 5  5 unitary matrices (Fig. 2a), nulling the (5,1) and (4,1) components results in the (T1 These disparate transformations of each matrix element do not guarantee nontrivial distributions of the phase shifts θ and φ.However, the decomposed form of the building block operation Tm l (θ, φ) = Rx m (-π/2)Rz m (-2θ)Rx m (-π/2)Rz m (-φ) results in the nontrivial distribution of θ, which is clearly distinct from that of φ.Figures 2b and 2c show the transformations of the initial states uniformly distributed in the polar (ξ) and azimuthal (η) axes of the Bloch sphere by respectively, where nonzero θ and φ also have uniformly distributed values in their ranges.Notably, the transformed states by Tm l (θ, φ=0) become nonuniform (Fig. 2b), in sharp contrast to the uniform distribution from Tm l (θ=0, φ) (Fig. 2c).Such a discrepancy originates from the difference between the pure z-axis rotation Rz m (-φ) and the transformed rotation Rx m (-π/2)Rz m (-2θ)Rx m (-π/2) and eventually leads to nonuniformity on the Bloch sphere for Tm(θ, φ) (Fig. 2d).We emphasize that the unequal contributions of each nulling (Fig. 2a) will accumulate the nonuniform distribution from the θ rotations, which leads to nontrivial statistics in the phase shift design.
To confirm this prediction, we investigate the statistics of θ and φ in realizing programmable photonic circuits that reproduce random unitary operations achieved by uniformly sampling the U(n) group with the Haar measure 23 .We calculate the probability density functions (PDFs) p(θ) and p(φ) and the complementary cumulative distribution functions (CCDFs) P(θ) and P(φ) for an ensemble of 100 Un realizations at each n.As expected from the uniform distribution with Tm l (θ=0, φ) (Fig. 2c), the distribution of p(φ) is trivially uniform (Supplementary Note S2).
One of the key findings of this work is the nonuniform distribution of θ. Figure 2e shows an example of the θ distribution for U128, which includes 8128 values.As shown in the linearized plots of the CCDF and PDF on the log-log scale, θ possesses a heavy-tailed distribution 17,19,21 , indicating a smaller decrease in p(θ) for increasing θ than that of the exponential distribution.For the quantitative analysis, we employ three representative heavy-tailed distribution models 19 power-law, power-law with an exponential cutoff, and log-normal distributions-and the exponential distribution model.The models are fitted with the θ data set of each realization of photonic circuits by utilizing analytical or numerical maximization of the model likelihoods 19,24 and the Kolmogorov-Smirnov test 25 for the models with lower bounds (see Methods for details).
Notably, all the heavy-tailed models provide good fits for large n, showing the consistent behaviours of their estimators for each realization, which is a critical condition for model consistency 21 .For example, the exponent α (Fig. 2f) and the lower bound θmin (Fig. 2g) in the power-law model P(θ) = (θ/θmin) -α+1 converge with increasing n, which demonstrates that the heavy-tailed distribution becomes more apparent in larger-scale programmable photonic circuits.
The average of the power-law exponents at n = 128 is αavg = 3.18 for 100 realizations (or 812,800 values of θ), while lower and upper limits are αmin = 2.75 and αmax = 3.78.Such consistency clearly proves the validity of the power-law model 20,21 for describing the distribution of the θ-rotation operators (Supplementary Notes S3-S5 for the results of the crossover heavy-tailed distributions and exponential distribution).We note that the averaged lower bound θmin = 0.08π with P(θmin) = 0.24 shows that most of the significant rotations 0.08π ≤ θ ≤ 0.50π come from ~24% of the building blocks, which illustrates the Pareto principle for large-scale programmable photonic circuits.

Hub units and pruning
The observed heavy-tailed distribution of θ rotation operators signifies that some building blocks Tm l are more critical than others.In realizing programmable photonic circuits for universal unitary operations (Fig. 1a), many phase shifters in the "body" of the distribution may be unnecessary because θ ~ 0. On the other hand, the "tail" phase shifters with large θ values operate as hub units.
Because such hub units deliver most of the necessary θ-rotations for realizing Un, we can envisage the application of the pruning technique in computer science 22 to photonic hardware.
Figure 3a shows the concept of pruning for programmable photonic circuits.The entire photonic circuit for Un includes n(n -1)/2 number of SU(2) building blocks and the same number of θ values.We define the set of sorted θ values for a given photonic circuit as Θn = {θr|1 ≤ r ≤ n(n -1)/2 for integer r, and θp ≤ θq for p ≤ q}, where θr with larger r represents a more important building block.The pruning of less important ones-body elements-for the photonic circuit is then defined by imposing θr = 0 for 1 ≤ r ≤ σ, where the integer σ determines the degree of pruning: σ = 0 for preserving the original circuit and σ = n(n -1)/2 for entirely removing θ rotations in the circuit.In the hardware implementation, pruning corresponds to leaving out the phase shifters for θ and preserving the symmetry in the MZI arms.
The refractive index modulation in the phase shifters is responsible for much of the energy consumption and noise generation in programmable photonic circuits 8,9 .Therefore, pruning of superfluous phase shifters allows for more energy-efficient and noise-tolerant photonic circuits for reconfigurable unitary operations, provided that the circuit after the pruning accurately reproduces unitary operations.To examine the performance of pruning in a practical situation, we prepare three control groups: one group with the pruning of more important building blocks-tail elements-with θr = 0 for n(n -1)/2 -σ + 1 ≤ r ≤ n(n -1)/2, and two groups with noisy elements.
For the noisy elements, we assume random noise from the phase shifter by assigning the noise δk to the k th original rotation as θk + δk, where δk = u[0,δ0] represents the uniform random distribution between 0 and δ0.For a fair comparison, we construct the groups of noisy elements by replacing the body-or tail-pruned elements in the pruning groups with noisy elements.
To characterize the precision of the operation of the circuits with pruning or noises, we define the fidelity that quantifies the metric between the original and defective operators 26 as follows (see Methods for the derivation): where Un O and Un D represent the original unitary matrix and its defective (pruned or noisy) one, respectively, and Tr(A) is the trace of the square matrix A. Figure 3c shows the fidelities of each photonic circuit with the pruning or noise as a function of the ratio of defective elements: 2σ/n(n -1) in the pruning groups.As expected, the fidelity is preserved much better when the body is pruned instead of the tail.More critical results are shown in comparison with the noisy circuits.
When the noise amplitude increases, removing a specific ratio of the "body" phase shifters can be better for higher fidelity than the noisy ones, whether the noise is imposed on body or tail elements.
Such a ratio, called the pruning threshold, increases with the noise level and scale of photonic circuits (Fig. 3d).This result states that there is a substantial restriction on the noise level in a large-scale programmable photonic circuit.If a phase shifter cannot meet this restriction, then it is better to remove the phase shifter to increase accuracy and decrease energy consumption for reconfigurability.

Universal architecture for pruning
Although the result shown in Fig. 3 demonstrates hub functionality and the advantage of pruning in realizing an individual unitary operator, it is insufficient to apply pruning to programmable photonic circuits for universal unitary operators.This is because the sorted set Θn for pruning varies with the form of a unitary operator.To apply the pruning method for universal unitaries with reconfigurability, it is necessary to construct an adaptable architecture for the pruning process.
Because the position of each building block for nulling a specific off-diagonal element is fixed in an n-degree photonic circuit, the averages of the phase rotations <θm,l> and <φm,l> are well-defined in hardware for random unitary operations that are uniformly sampled from U(n).
Figures 4a and 4b describe the universal architectures defined by <θm,l> and <φm,l>, respectively, for 100 Un realizations of n = 16 and n = 32 (see Supplementary Note S6 for n = 64).As expected from the distinct SU(2) operations from θ and φ (Fig. 2b,c), we observe a spatially inhomogeneous distribution of <θm,l> in contrast to that of <φm,l>.More specifically, the universal architectures show significant θ-rotation contributions from the building blocks near the boundary of the programmable photonic circuits.Such a consistent distribution allows for a universal sorted set <Θ>n = {<θm,l>r|1 ≤ r ≤ n(n -1)/2 for integer r and <θm,l>p ≤ <θm,l>q for p ≤ q} to develop a pruning process applicable to any unitary operations.
From this guideline, we again employ pruning and add noise to the body and tail elements to the set <Θ>n.As shown in Figs 4c and 4d, the general tendencies in Figs 3c and 3d are preserved; the tail is more important than the body, the bad is better to be removed, and pruning is more efficient for larger-scale photonic circuits.Although the minimum noise level increases, there is still a pruning threshold that guarantees the advantage of removing θ phase shifters, and this tendency is much more apparent in larger-scale programmable photonic circuits.Notably, the importance of protecting hub elements from noise becomes evident at the strong noise level (0 = 0.20π cases in Fig. 4c).

Discussion
Due to the mathematical generality of our study, the presented results should be universal for programmable photonic 8,9 or superconducting 27 processors for reconfigurable unitary operations when the unit SU(2) operation is nonuniform on the Bloch sphere and the target degree n is finite.
Notably, we observed excellent fitting with the power-law model, the crossover behaviours from exponential to heavy tails in the truncated power-law and log-normal models, and the evident failure of the exponential model nearly above the degree n ≥ 80.Therefore, the heavy-tailed features are evident at the scale near and beyond the present state-of-the-art degrees (n ~10 2 ) in deep learning accelerators 4,28,29 and noisy intermediate-scale quantum (NISQ) computers [30][31][32] .The suitable application of the demonstrated pruning method, which allows for leaving out a significant portion of electro-optic modulations in programmable photonic circuits, will become particularly beneficial for the next era of quantum computing and deep learning hardware.
The presence of heavy-tailed distributions in programmable photonic circuits inspires the extension of seminal achievements in probability theory and network science to wave physics.As shown in our study, the intriguing features related to heavy-tailed distributions are also demonstrated in wave platforms, such as the observed Pareto principle in wave physics and the critical role of hub elements in pruning and noise immunity.When the connectivity of integrated wave systems becomes more extensive and complex 33 , the concepts of complex networks will provide a foundation for novel design strategies in wave physics.
In conclusion, we demonstrated that some of the unit elements in a large-scale programmable photonic circuit are more important than others, exhibiting the heavy-tailed feature verified with conventional statistical models, i.e., the power-law, power-law with an exponential cutoff, and log-normal distributions, and the exponential distribution as a counterexample.The observed heavy-tailed distribution originates from nonuniform rotations on the Bloch sphere, which are ubiquitous in conventional SU(2) units for programmable photonic circuits.The result allows for a new design strategy-pruning-for high fidelity and energy efficiency, which offers intriguing insight into the design of large-scale photonic structures for classical and quantum applications.Further research on devising other forms of SU(2) units or units with higher degree for Un factorization is desirable to alter the observed heavy-tailed distributions.

Methods
Model fitting process.To analyse the θ distributions in an ensemble of programmable photonic circuit realizations, we employ multiple statistical models: power-law, power-law with an exponential cutoff, log-normal, and exponential distributions.Each model is defined by a set of model parameters {qs}.To calculate the model parameters for the fitting of a given data set {θ1, θ2, …, θM}, we employ an analytical or numerical calculation of the maximum likelihood estimators (MLEs) 24 .First, the probability of obtaining the data set from the statistical model with the given model parameters {qs} and the PDF p(θm;{qs}) is which is called the likelihood for the data and model.Because the employed statistical models have exponential forms, it is conventional to utilize the log-likelihood L: ( ) The fitting of the model to a given set of data, which requires the calculation of {qs}, then corresponds to the maximization of L with respect to {qs}.Therefore, the MLE is defined as Power-law distribution model.In analysing the heavy-tailed statistics of the θ-rotations, we mainly employ the power-law distribution model [17][18][19] , which supports the PDF and CCDF, as follows: where α and θmin are the exponent and lower bound of the power-law model, respectively.The model is defined in the range α > 1, and the model parameter set is {qs} = {α}.For a given data set, the log-likelihood becomes The MLE then leads to , as We calculate an array of  values using Eq. ( 9) for all the possible values of θmin, where each pair of  and θmin comprises a candidate power-law model.
Power-law model with an exponential cutoff.To obtain a thorough confirmation of the heavytailed statistics, we test crossover distributions between a power-law and an exponential distribution.First, we apply the power-law model with an exponential cutoff, which is the truncated version of the original power-law model.The PDF and CCDF of the model are 17,19 : ) (1 Although the MLE with the model parameters {qs} = {αc, c} leads to the following relations: ( ) we instead employ the numerical minimization of -L with the constraints αc ≥ 0 and c ≥ 0 due to the difficulty in handling the analytical derivative of the upper incomplete gamma function.We calculate the pairs of c and c for all the possible values of θc,min, where a set of c, c, and θc,min comprises a candidate for the model.

Log-normal distribution model.
To cover the intermediate regime between the power-law and exponential distributions 17 , we employ another crossover distribution: the log-normal distribution model.The PDF and CCDF of the model are 17,19 : where μ and σ are the mean and standard deviation of log(θ), respectively, and erf is the error function.With the model parameters {qs} = {μ, σ}, the log-likelihood and the MLE relation are shown in Eqs ( 16) and ( 17), respectively, as follows: ( ) We calculate an array of e values using Eq. ( 21) for all the possible values of θe,min, where each pair of e and θe,min comprises a candidate for the model.
We select θmin and the corresponding {qs} to minimize D, determining the optimum statistical model for each case of the power-law, power-law with an exponential cutoff, and exponential distribution models.
Fidelity for unitary matrices.We consider the n × n unitary matrix Un O and its defective one Un D , which could be nonunitary in general.The cost function or the square of the metric between the matrices is defined by 26 : where A(i,j) is the (i,j) matrix component and Tr(A) is the trace of the square matrix A. Because JU ≥ 0, we obtain the relationship: where equality is achieved with the minimum defect, as Un O = Un D .Because the left side of Eq. ( 24) is positive, the definition of fidelity is F(Un D , Un O ) in Eq. ( 2) in the main text.transformations; elements that are nulled earlier get fewer transformations.Such a discrepancy in the numbers of SU(2) transformations applied to the off-diagonal elements is general for any design strategies that adopt a series of nulling processes for the factorization of the target unitary operation, such as the Reck design 1 .However, the specific number of SU(2) transformations of each matrix element can differ depending on the nulling algorithm.

Note S1. Differentiated transformations of off-diagonals through nulling processes
As described in the Clements design 2 , nulling processes are applied only to the lower triangular off-diagonal elements because a unitary triangular matrix is diagonal.After the entire nulling process, we employ the relationship (Tm l ) † D = D'Tm l , where D is the diagonal matrix resulting from the nulling processes, and D' is its transformation with Tm l .This relation that is valid for Hermitian systems is used to derive Eq. ( 1) in the main text.

Fig. 1 .
Fig. 1.Programmable photonic circuits for universal unitary operators.a, Programmable photonic building block of Tm l composed of MZIs and phase shifters for the SU(2) operation between the m th and (m+1) th channels.Red and blue boxes represent the phase shifters for θ and φ, respectively.b,c, The rotation operators of b, Rx m (-π/2)Rz m (-φ) and c, Rx m (-π/2)Rz m (-2θ), described in Bloch spheres.Black and coloured solid lines indicate x-axis and z-axis rotations, respectively.b and c correspond to the parts indicated by blue and red arrows in a, respectively.d, Phase shifters for the diagonal components of Dn. e, Schematic diagram of the programmable photonic circuit for U16.The tunability of θ and φ allows for the programming of U16.

Fig. 2 .
Fig. 2. Heavy-tailed distributions in unitary photonic circuits.a,b, Two origins of the heavytailed distributions of the rotation operators: a, unequal transformations in the nulling process and b-d, nonuniform SU(2) rotations.a, An example of the nulling process for U5.Orange and green arrows denote the nulling of the off-diagonal elements with UT † and TU, respectively.Red and blue boxes indicate the rotating components for the nulling of the (5,1) and (4,1) components, respectively.b-d, Rotated states with b, T(θ, 0), c, T(0, φ), and d, T(θ, φ).Each point in b and c denotes the transformed state through the corresponding T applied to the uniformly random initial states on the Bloch sphere.The colours in the map in d depict the nonuniform density of the transformed states on the Bloch sphere.The initial states in b and c are obtained with 10 polar grids and 20 azimuthal grids (200 points), while 200 polar grids and 400 azimuthal grids (80,000

Fig. 3 .
Fig. 3. Pruning is often better than noise.a, The concept of pruning in programmable photonic circuits.The phase shifter 2θ of the building block is replaced with an ordinary waveguide, which

Fig. 4 .
Fig. 4. Universal architecture for pruning in reconfigurable unitaries.a,b, The averages of a, <θm,l> and b, <φm,l> for the photonic circuits of 100 Un realizations with n = 16 and n = 32.We set )where αc, c, and θc,min are the power-law exponent, cutoff exponent, and the lower bound of the model, respectively, and Γ(s,x) is the upper incomplete gamma function.The model is defined in the range of αc ≥ 0 and c ≥ 0. The log-likelihood for the data set {θ1, θ2, …, θM} is of utilizing the analytical MLE, we employ numerical minimization of -L with the constraint σ ≥ 0.Exponential distribution model.For the comparison with models other than heavy-tailed distributions, we test the exponential distribution model17,19,21 , which has the following PDF and the model parameter is {qs} = {e}.The log-likelihood and the MLE relation are e

Kolmogorov-
Smirnov test.In the power-law, power-law with an exponential cutoff, and exponential distribution models, we obtain multiple candidates for the models with different values of lower bounds θmin, θc,min, and θe,min, respectively.Each candidate of a model supports a distinct range of data for model validity and possesses different values of model parameters {qs}.To extract the optimum model among the candidates, we apply the Kolmogorov-Smirnov (KS) test19,25 .When the CDFs of the data set and the statistical model are S(θ) and P(θ; θmin, {qs}) for the lower bound parameter θmin, we define the maximum distance D between the data and model

Figure S1 describes the first 4
FigureS1describes the first 4 steps of nulling the off-diagonal elements in the 5-degree random unitary matrix U5.Each nulling process is achieved with Tm l , which has the SU(2) block matrix for the m th and (m+1) th channels (green squares in Tm l or (Tm l ) † ).The nulling processes are composed Figure S2shows an example of the φ distribution for U128, which is obtained from the uniform sampling of the U(128) group with the Haar measure3  .The ensemble includes 8128 values of φ,

Fig. S2 .
Fig. S2.Statistical distributions of φ-rotations.a,b, Distributions of φ described by its a, CCDF, and b, PDF.The range of φ is φ ∈ [0, 2π).All the scales of the axes are linear.

Fig. S6 .
Fig. S6.Universal architecture of programmable photonic circuits for U64.a,b, The averages a, <θm,l> and b, <φm,l> for the photonic circuits of 100 Un realizations with n = 64.We set the upper bound of the colormap in a to be 0.3π for better visibility.