All-optical information-processing capacity of diffractive surfaces

The precise engineering of materials and surfaces has been at the heart of some of the recent advances in optics and photonics. These advances related to the engineering of materials with new functionalities have also opened up exciting avenues for designing trainable surfaces that can perform computation and machine-learning tasks through light–matter interactions and diffraction. Here, we analyze the information-processing capacity of coherent optical networks formed by diffractive surfaces that are trained to perform an all-optical computational task between a given input and output field-of-view. We show that the dimensionality of the all-optical solution space covering the complex-valued transformations between the input and output fields-of-view is linearly proportional to the number of diffractive surfaces within the optical network, up to a limit that is dictated by the extent of the input and output fields-of-view. Deeper diffractive networks that are composed of larger numbers of trainable surfaces can cover a higher-dimensional subspace of the complex-valued linear transformations between a larger input field-of-view and a larger output field-of-view and exhibit depth advantages in terms of their statistical inference, learning, and generalization capabilities for different image classification tasks when compared with a single trainable diffractive surface. These analyses and conclusions are broadly applicable to various forms of diffractive surfaces, including, e.g., plasmonic and/or dielectric-based metasurfaces and flat optics, which can be used to form all-optical processors.

Here we present a coefficient and basis vector generation algorithm, which is given by Table S1, specific for an optical network formed by two equally-sized diffractive surfaces and input/output fields-of-view, i.e., 1 = 2 = = = . The presented algorithm is a special case of the algorithm reported in Table 1 of the main text, when the chunk partition is 1 = 2 = ⋯ = = 1. So, this analysis further confirms the fact that the solution space created by a 2-layered diffractive network forms a (2 − 1)-dimensional subspace of an = 2 -dimensional vector space.
For the special case of 1 = 2 = = = , , given by Equation 6 of the main text, can be written as: where and are an arbitrary complex-valued coefficient and the corresponding basis vector, respectively. These are reported in the last column of Table S1 such that is the term that remains inside the parenthesis and is the associated coefficient. As can be seen from Table S1, the coefficient, 1, or 2, for , ∈ {1,2, … , }, can be chosen independently in each step. It is also straightforward to show that all the 2 -many vectors are consumed in the generation of new basis functions by computing the summations of the used vectors at each step. Also, without loss of generality, the chosen transmittance index and the corresponding in each step may change. This ends up with a different basis vector , but the dimensionality of the solution space does not change.
= ∑ 2 −1 =1 S1 Step Choice from  Table S1. Coefficient and basis vector generation algorithm for a 2-layered diffractive network (K = 2) when 1 = 2 = = = S2. Coefficient and basis vector generation algorithm for an optical network formed by three diffractive surfaces: Special case for 1 = 2 = 3 = = = In the subsection of the main text titled "Analysis of an optical network formed by three or more diffractive surfaces", we showed that a network having equally-sized three diffractive layers and input/output fields-of-views can cover (3 − 2)-dimensional subspace of an 2 -dimensional vector space, where the proof was based on the addition of a third diffractive layer that has size N to an existing diffractive network that has two surfaces, presenting a dimensionality of (2 − 1).
In this subsection we demonstrate the same conclusion by directly employing a diffractive network that has equally-sized three diffractive surfaces, i.e., 1 = 2 = 3 = = = .
The input-output relation for a network that has three diffractive surfaces can be written as: where for ∈ {1,2,3,4} are × matrices which represent the free-space propagation, for ∈ {1,2,3} are × matrices which represent the multiplicative effect of the i th diffractive surface, and and are length-N vectors which represent the input and output fields-of-view, respectively. Details of these matrices and vectors can be found in "Theoretical analysis of the information-processing capacity of diffractive surfaces" subsection of the main text. In this subsection, we show that ( ) = can be written as: where and are an arbitrary complex-valued coefficient and the corresponding basis vector, respectively. We first perform the vectorization of as: In order to uncouple the transmittance and free-space propagation parts in Equation S4, we define a matrix ̂ having size 2 × 4 from ⊗ as: where is the th row of ⊗ and is an all-zero row vector with size 1 × 2 . Therefore, in each row of ̂, there are 4 − 2 many zeros. Also, let be a size 4 × 1 vector that is generated as: where = ( ⊗ 3 ) [ , ]. Then, Equation S4 can be written as: where ( ⊗ )̂ has rank 2 since both ⊗ and ̂ have rank 2 .
There are also 3 -many nonzero elements of , each of which takes the form = 1, 2, 3, for , , ∈ {1,2, … , }. Therefore can be written in the form: where, represents the corresponding column vector of ( ⊗ )̂. Analogous to the 2-layer case, all in Equation S8 cannot be chosen arbitrarily since it includes multiplicative terms of 1, , 2, 3, . However, by extending the algorithm detailed in Section S1 for the

S8
current 3-layered network case, we can show that can be formed as: where and are generated by following an algorithm which is the counterpart of the layer and neuron selection algorithm presented for the 2-layer case in Table 1 of the main text. Based on this algorithm, we randomly select a diffractive layer and a neuron on that selected layer at each step of the algorithm to form and in Equation S9. We also present a special case of this algorithm in Table S2, where the layers are chosen in a regular, pre-determined order. By summing the used vectors in the 5 th column of Table S2, it can be shown that 3 -many coefficients and vectors are used to generate the new coefficients. As a result, a set of vectors can be generated from a (3 − 2)-dimensional subspace of 2 -dimensional vector space. Also see Supplementary Section S4.3 and Supplementary Figure S3 for further analyses on K=3 case. Step Choice from Choice from 2

S3. Coefficient and basis vector generation algorithm for an optical network formed by K diffractive surfaces: Special case for
In the subsection of the main text titled "Analysis of an optical network formed by three or more diffractive surfaces", it was shown that each additional diffractive surface that has size N increases the dimensionality of the resulting transformation vector by − 1, unless the diffractive network reaches its capacity for a given pair of input-output fields-of-view. The previous two sections (S1 and S2) have also confirmed the same conclusion. In this section, we independently confirm this statement by representing a special case of the algorithm presented in Table 1 of the main text.
Let us assume a diffractive network that has the dimensionality of the optical solution space as ( − 1) − ( − 2). This dimensionality can potentially be achieved by a network that has − 1 many independent diffractive layers where each surface has trainable neurons or through only one diffractive layer that has ( − 1) − ( − 2) neurons. Here, we assume that there is initially a single diffractive surface that has 1 = ( − 1) − ( − 2) trainable neurons and then a second diffractive surface that has 2 = neurons is added to this diffractive network. Based on this assumption, we can rewrite Equations 3 and 4 of the main text as: and = ′ ′ = S10 where, in this case, 1 = ( − 1) − ( − 2) and 2 = . Therefore, can be written as: where is the corresponding column vector of ( ′ ⊗ ′ )̂ which has rank 2 and = 1, 2, , where 1, and 2, are the trainable complex-valued transmission coefficients of the th neuron of the 1 st diffractive surface and the th neuron of the 2 nd diffractive surface, respectively, for ∈ {1,2, … , ( − 1) − ( − 2)} and ∈ {1,2, … , }. Hence, we can show that can be written as: where and are an arbitrary complex-valued coefficient and the corresponding basis vector, respectively. The algorithm that we use here is a special case of the algorithm given in the subsection of the main text titled "Coefficient and basis vector generation for an optical network formed by two diffractive surfaces" for the chunk partition 1 = 3 = ⋯ = 2 −1 = ⋯ = −1 = 1 and 2 = 4 = ⋯ = 2 = ⋯ = = . That is, after choosing one coefficient from 1, and 2, at the initialization step, we choose one coefficient from 2, in the second step and one coefficient from 1, in each of the following K steps. In each step, we use the previously chosen coefficients and the corresponding vectors in order to create the next basis vector. Then, this procedure continues until all the coefficients have been used.
We summarized these steps of the coefficient and basis generation algorithm in Table S3,

S4. Computation of the dimensionality (D) of the all-optical solution space for K = 1, 2 and 3
In order to calculate D for various diffractive network configurations, we used the symbolic toolbox of MATLAB to compute the rank of diffraction related matrices using their symbolic representation.
Therefore, we consider only those vectors in our computation. We define ′ as the matrix which is subject to the rank computation: for ∈ {0,1, … , 1 − 1}. Here, [: , ( 1 + 1)] indicates the column associated with the ( + 1) th neuron in the vectorized form. Hence, in 2D discrete space, corresponds to a certain neuron position and discrete index, [ 1 , 1 ].

′[ , ]
takes its values through the multiplication of the appropriate free space impulse response functions from the associated input pixel (within Ni) to the ( + 1) th neuron and from the ( + 1) th neuron to the associated output pixel (within No). Thus, a given corresponds to a certain position at the input plane, [ , ], paired with a certain position at the output plane, [ , ]. As a result, ′[ , ] can be written as: where d1 ≠ d2 ≠ 0 and ℎ ( , ) is the impulse response of free space propagation, which can be written as: where = √ 2 + 2 + 2 .
In MATLAB, we used various symbolic conversion schemes to confirm that each method ends up with the same rank. For a given = = , 1 , d1 and d2 configuration, in the first four methods, we generated ′ numerically in the double precision. Then we converted it to the corresponding symbolic matrix representation using either one of these commands: ( 2 − 1 ) 2 , S16 In the second set of symbolic conversion schemes, in order to further increase the precision in our computation, we generated symbolically at the beginning as: Then we generated ′ of Equation S15 using the symbolic , which ended up with a symbolic ′ matrix. Note that, although the second set of methods has a better accuracy in symbolic representation, they require more computation memory and time in generating the rank result. So, in our rank computations, we used Method 1.a as the common method for all the diffractive network configurations reported in Supplementary Figures S1-S3. Besides Method 1.a, we also used at least one of the remaining seven methods in each diffractive network configuration to confirm that the resulting rank values agree with each other. Figure S1 summarizes the resulting rank calculations for various different K = 1 diffractive network configurations, all of which confirm = ( 1 , 2 ). = 1 results reported in Figure S1 indicate that all the columns of ′ are linearly independent, and therefore any subset of its columns are also linearly independent. This shows that the dimensionality of the solution space for 1 ≠ 2 is a linear function of 1 when 1 ≤ 2 , and 2 defines the upper limit of (also see Figure 2 of the main text). In Supplementary Section S5, we also show that the upper limit for the dimensionality of the all-optical solution space reduces to ( + 1) 2 ⁄ when 1 = 2 for a single diffractive layer, = 1.

S4.2. 2-Layer Case (K = 2):
For K = 2, we deal with the matrix ( ′ ⊗ ′ )̂ of Equation 4 of the main text. We first generated a matrix ′ from ( ′ ⊗ ′ )̂ such that the columns of ( ′ ⊗ ′ )̂ that correspond to the zero entries of are discarded. First, we converted ′ into a symbolic matrix and then applied the algorithm presented in Table 1 of the main text on the columns of ′. Here the th column of ′ is the vector that multiplies the coefficient 1, 2, of of Equation 4 of the main text for a certain ( , ) pair, i.e., there is a one-to-one relationship between a given and the associated ( , ) pair. Therefore, a given indicates a certain neuron position in the first diffractive layer, [ 1 , 1 ], paired with a certain neuron position in the second diffractive layer, [ 2 , 2 ]. Similar to the K = 1 case, the l th row of ′ corresponds to a certain set of input and output pixels as part of Ni and No, respectively, and ′[ , ] can be written as: • ℎ 2 ( 1 − 2 , 1 − 2 ) S17 After generating ′ based on Equation S17, we converted it into the symbolic matrix representation as described earlier for the 1-layer case, K = 1. Then, we applied the algorithm presented in Table 1 of the main text to generate the basis vectors and their coefficients. Note that, for each diffractive network configuration that we selected, we independently ran the same algorithm three times with different random initializations, random selection of the neurons and random generation of complex-valued transmission coefficients. In all of the rank results that are reported in Figure S2, these repeated simulations agreed with each other and gave the same rank, confirming = ( 1 + 2 − 1, 2 ). Also note that, unlike the d1 = d2 case for K = 1 (Section S5), different combinations of d1, d2 and d3 values for K = 2 do not change the results or the upper bound of D, as also confirmed in Figure S2.

S4.3. 3-Layer Case (K = 3):
For K = 3 case, we start with ( ⊗ )̂ of Equation S7. Then, we generate the matrix ′ by discarding the columns of ( ⊗ )̂ that correspond to the zero entries of of Equation S7. Here, the th column of ′ is the vector that multiplies the coefficient 1, 2, 3, for a certain ( , , ) triplet. Hence, there is a one-to-one relationship between a given m and the pixel/neuron locations from the first, second and third diffractive layers, which are represented by [ 1 , 1 ], [ 2 , 2 ] and [ 3 , 3 ], respectively. Similar to the K = 1 and K = 2 cases discussed in earlier sections, a given row, l, corresponds to a certain set of input (from Ni) and output (from No) pixels, [ , ] and [ , ], respectively. Accordingly, ′ [ , ] can be written as: Then, we applied a coefficient and basis generation algorithm that is similar to Table 1 of the main text, where we randomly select the diffractive layer and the neuron in each step of the algorithm to obtain the resulting coefficients and the basis vectors. Then we converted the resulting vectors into their symbolic representations as discussed earlier for the K = 1 case and computed the rank of the resulting symbolic matrix. For K = 3 the selection order of the 1 st , 2 nd and 3 rd diffractive layers in consecutive steps and the location/value of the chosen neuron at each step may affect the computed rank. Especially, when 1 + 2 + 3 is close to 2 , the probability of repeatedly achieving the upper-bound of the dimensionality of the solution space, i.e., ( 1 + 2 + 3 − 2, 2 ), using random orders of selection decreases. On the other hand, when 1 + 2 + 3 is considerably larger than 2 , due to the additional degrees of freedom in the diffractive system, the probability of repeatedly achieving the upper-bound of the dimensionality using random orders of selection significantly increases. In Figure S3, we present the computed ranks for different K=3 diffractive network configurations; for each one of these configurations that we considered in our simulations, we obtained at least one random selection of the diffractive layers and neurons that attained full rank, numerically confirming = ( 1 + 2 + 3 − 2, 2 ). ′ [ , ] = ℎ 1 ( − 1 , − 1 ) • ℎ 4 ( − 3 , − 3 ) • ℎ 2 ( 1 − 2 , 1 − 2 ) • ℎ 3 ( 2 − 3 , 2 − 3 ) S18

S5. The Upper Bound of the Dimensionality (D) of the Solution Space Reduces to
For K = 1 and the special case of 1 = 2 = , we can rewrite ′[ , ] given by Equation S15 as: To quantify the reduction in rank due to 1 = 2 , among 2 entries of ′′[: , ], let us first consider the cases where ( , ) ≠ ( , ). For a given neuron or , assuming that ( , ) ≠ ( , ), the number of different entries that can be produced by Equation S19 becomes ( 2 ), indicates the combination operation and = = . Stated differently, since ℎ 1 = ℎ 2 the order of the selections from ( , ) and ( , ) does not matter, making the selection defined by a combination operation, i.e., ( 2 ). In addition to these combinatorial entries, there are additional entries that represent ( , ) = ( , ). Therefore, the total number of unique entries in a column, ′′[: , ], becomes: This analysis proves that, for K = 1, the upper limit of the dimensionality (D) of the all-optical solution space for 1 = 2 reduces from 2 to ( 2 + ) 2 ⁄ due to the fact that ℎ 1 = ℎ 2 = ℎ in Equation S19.
Note that, when 1 ≠ 2 , we have ℎ 1 ≠ ℎ 2 , which directly implies that the combination operation in Equation S20 must be replaced with the permutation operation, ( • • ), since the order of selections from ( , ) and ( , ) matters (see Equation S15). Therefore, when 1 ≠ 2 , Equation S20 is replaced with: which confirms our analyses in the main text, reported in the subsection "Analysis of a single diffractive surface" as well as the results reported in Figure S1. ′′ [ , ] = ℎ ( − 1 , − 1 ) ℎ ( − 1 , − 1 ). S19 ( 2 ) + = ( 2 + ) 2 ⁄ . S20 ( 2 ) + = 2 S21 Figure S1: Computation of the dimensionality (D) of the all-optical solution space for K=1 diffractive surface under various network configurations. The rank values are obtained using the symbolic toolbox of MATLAB from ′ matrix, using Equation S15. The calculated rank values in each table obey the rule = ( 1 , 2 ). = 1 results indicate that all the columns of ′ are linearly independent, and therefore any subset of its columns are also linearly independent. Therefore, the dimensionality of the solution space for 1 ≠ 2 is a linear function of 1 when 1 ≤ 2 , and 2 defines the upper limit of (see Figure 2 of the main text). In Supplementary Section S5, we also show that the upper limit for the dimensionality of the alloptical solution space reduces to ( + 1) 2 ⁄ when 1 = 2 for a single diffractive layer, = 1.

Figure S3: Computation of the dimensionality (D) of the all-optical solution space for K=3
diffractive surfaces under various network configurations. The rank values are obtained using the symbolic toolbox of MATLAB from ′ matrix, based on Equation S18 and the algorithm discussed in Section S2 (also see the supplementary text, Section S4.3, for details). The presented results indicate that it is possible to obtain the maximum dimensionality of the solution space, numerically confirming that = ( 1 + 2 + 3 − 2, 2 ). Figure S4: Randomly selected samples from the 50K training set in the original CIFAR-10 dataset. The images are converted from RGB to gray-scale by selecting the Y-channel in the corresponding YCrCb representations. There are in total 600 samples (60 per data class) depicted here. Every 3 rows contain the samples from a data class resulting in a total of 30 rows. Figure S5: Randomly selected samples from the 10K testing set in the original CIFAR-10 dataset. The images are converted from RGB to gray-scale by selecting the Y-channel in the corresponding YCrCb representations. There are in total 600 samples (60 per data class) depicted here. Every 3 rows contain the samples from a data class resulting in a total of 30 rows.