Universal linear intensity transformations using spatially incoherent diffractive processors

Under spatially coherent light, a diffractive optical network composed of structured surfaces can be designed to perform any arbitrary complex-valued linear transformation between its input and output fields-of-view (FOVs) if the total number (N) of optimizable phase-only diffractive features is ≥~2NiNo, where Ni and No refer to the number of useful pixels at the input and the output FOVs, respectively. Here we report the design of a spatially incoherent diffractive optical processor that can approximate any arbitrary linear transformation in time-averaged intensity between its input and output FOVs. Under spatially incoherent monochromatic light, the spatially varying intensity point spread function (H) of a diffractive network, corresponding to a given, arbitrarily-selected linear intensity transformation, can be written as H(m, n; m′, n′) = |h(m, n; m′, n′)|2, where h is the spatially coherent point spread function of the same diffractive network, and (m, n) and (m′, n′) define the coordinates of the output and input FOVs, respectively. Using numerical simulations and deep learning, supervised through examples of input-output profiles, we demonstrate that a spatially incoherent diffractive network can be trained to all-optically perform any arbitrary linear intensity transformation between its input and output if N ≥ ~2NiNo. We also report the design of spatially incoherent diffractive networks for linear processing of intensity information at multiple illumination wavelengths, operating simultaneously. Finally, we numerically demonstrate a diffractive network design that performs all-optical classification of handwritten digits under spatially incoherent illumination, achieving a test accuracy of >95%. Spatially incoherent diffractive networks will be broadly useful for designing all-optical visual processors that can work under natural light.

Information processing with a diffractive network involves local modulation of the amplitude and/or the phase of the incident optical wave by structured surfaces containing diffractive neurons/features, each with a lateral size of ~λ/2, where λ is the wavelength of the spatially-coherent illumination light.The entire propagation of a spatially-coherent wave from the input plane to the output FOV comprises such optical modulations by  spatially-optimized diffractive surfaces, which in total contain  independent diffractive features (for example, evenly distributed over the  diffractive surfaces).These  diffractive features represent the complex-valued transmission coefficients, forming the independent degrees of freedom of the diffractive processor, which can be optimized to all-optically execute different tasks 22,26- 30,35,36 .It was shown that a spatially-coherent diffractive optical network could be trained to perform any arbitrary complex-valued linear transformation between its input and output FOVs if  ≥     , where   and   refer to the number of useful (diffraction-limited) pixels at the input and the output FOVs 21 .For a phase-only diffractive network where the transmission coefficients of the diffractive features of each structured surface only modulate the phase information of light, the requirement for universal linear transformations increases to  ≥ 2    due to the reduced degrees of freedom that can be optimized independently.
For a given complex-valued linear transformation that a coherent diffractive network is designed to approximate, any arbitrary point on the input plane defined by ( ′ , ′) will result in a unique complexvalued coherent point spread function (ℎ) at the output FOV defined by (, ).This 4-dimensional complex-valued function, ℎ(, ;  ′ , ′), that maps the input and output FOVs represents a spatiallyvarying coherent point spread function.Stated differently, unlike traditional spatially-invariant imaging systems, a coherent diffractive optical network provides a framework to approximate any arbitrary ℎ(, ;  ′ , ′) that corresponds to an arbitrarily-selected complex-valued linear transformation between its input and output FOVs.It was also shown that different/independent complex-valued linear transformations could be multiplexed in a single spatially-coherent diffractive processor by utilizing polarization and wavelength diversity 37,38 .
All of these earlier studies on universal linear transformations implemented in free-space through diffractive processors were based on spatially-coherent illumination.In this paper, we report the first demonstration of universal linear transformations in optical intensity performed under spatiallyincoherent monochromatic illumination of an input FOV.We show that, under spatially-incoherent light, a diffractive optical processor can perform any arbitrary linear transformation of time-averaged intensities between its input and the output FOVs.Our numerical analyses revealed that phase-only diffractive optical processors with a shallow architecture (for example, having a single trainable diffractive surface) are unable to accurately approximate an arbitrary intensity transformation irrespective of the total number () of diffractive features available for optimization; on the contrary, phase-only diffractive optical processors with deeper architectures (one diffractive layer following others) can perform an arbitrary intensity linear transformation using spatially-incoherent illumination with a negligible error when  ≥ 2    .These analyses and conclusions are important for all-optical information processing and visual computing systems that use spatially-incoherent light, such as in natural scenes; this framework can also find applications in computational microscopy and incoherent imaging through point spread function engineering, among others.

Theoretical analysis
Spatially-coherent monochromatic diffractive optical networks can be characterized by a 4-dimensional complex-valued coherent impulse response function (i.e., the point spread function) that is spatiallyvarying, connecting the input and output FOVs: ℎ(, ;  ′ ,  ′ ).Stated differently, each arbitrarilyselected complex-valued linear transformation that is desired between the pixels of an input FOV and output FOV results in a spatially-varying impulse response function ℎ(, ;  ′ ,  ′ ), where ( ′ ,  ′ ) and (, ) define the input and output FOVs, respectively.Based on this definition, the complex-valued output field   (, ) of a spatially-coherent diffractive processor is related to the complex-valued input field   ( ′ ,  ′ ) by: The subscript  indicates that the quantities are functions of continuous spatial variables , , ′, ′, representing the transverse coordinates on the output and input planes.If these optical fields are sampled at an interval () sufficiently small to preserve the spatial variations, satisfying the Nyquist criterion 39 , one can write: Here, , ,  ′ ,  ′ refer to discrete indices such that (, ) =   (, ) and ( ′ ,  ′ ) =   ( ′ ,  ′ ).The instantaneous output intensity can be written as: where (.) is the phase function of the input field , i.e.,  = ||  , and ℎ * denotes the complex conjugate of ℎ.The time-averaged output intensity can be written as: where 〈•〉 denotes time-average operation and ∆ = ( ′ ,  ′ ) − ( ′′ ,  ′′ ).Since the illumination light is spatially-incoherent, the phases at different spatial points of the input vary randomly over time and are independent of each other. 40Stated differently for stationary objects/scenes that are uniformly illuminated with a spatially-incoherent light, ∆ varies randomly between 0 and 2 over time, yielding 〈 ∆ 〉 = 0 for ( ′ ,  ′ ) ≠ ( ′′ ,  ′′ ).As a result of this, under spatially-incoherent illumination, Eq. ( 4) can be written as: where  = 〈|| 2 〉 is the time-averaged input intensity and (, ;  ′ , ′) = |ℎ(, ;  ′ , ′)| 2 is the intensity impulse response of the diffractive optical processor under spatially-incoherent illumination.From now on, unless otherwise stated, we use the term optical 'intensity' to imply time-averaged intensity functions.Similarly, whenever all-optical linear transformation of intensity is mentioned, spatially-incoherent monochromatic illumination is implied unless stated otherwise.
We should emphasize that while (, ;  ′ , ′) = |ℎ(, ;  ′ , ′)| .For the numerical forward model corresponding to each input object, as will be detailed in the next section, we used a large number of random phase distributions at the input plane to approximate (, ) = 〈|(, )| 2 〉 under spatiallyincoherent illumination.

Numerical analysis
In this subsection, we numerically explore the design of diffractive optical processors to perform an arbitrary linear intensity transformation between the input and the output FOVs under spatiallyincoherent illumination.We assume, as shown in Fig. 1a,  independent diffractive features (phase-only elements) that are distributed over  diffractive surfaces, each with   ⁄ diffractive features, between the input and output planes.Following from Eq. ( 5), if we rearrange the pixel intensities of ( ′ ,  ′ ) and (, ) as column vectors  and , then we can write  = ′ , where ′ represents the linear intensity transformation performed by the diffractive optical network under spatially-incoherent illumination.The elements of ′ correspond to the elements of the intensity impulse response (, ;  ′ , ′); see Eq. (5).Note that all the elements of ′ are real and nonnegative since it represents a linear intensity transformation with (, ;  ′ , ′) = |ℎ(, ;  ′ , ′)| 2 .Hence, in the context of arbitrary linear transformations in intensity, only real transformation matrices with nonnegative elements are considered.
For our target linear transformation that is to be approximated by the spatially-incoherent diffractive processor, initially, we selected an arbitrary matrix , as shown in Fig. 1b.In the following numerical analysis, we optimize  diffractive features of a phase-only diffractive processor so that  ′ ≈  under spatially-incoherent illumination.The size of  is chosen as   ×   = 64 × 64, i.e., the number of pixels at both the input (  ) and the output (  ) FOVs are 8 × 8, arranged in a square grid.Each element of the matrix  is randomly sampled from a uniform probability distribution between 0 and 1, i.e., [, ]~(0, 1) where [, ] is the element at -th row and -th column of ,  = 1, … ,   and  = 1, … ,   .
For the deep learning-based optimization of the design of a phase-only diffractive processor to achieve  ′ ≈  , we followed two different data-driven supervised learning approaches: (1) the indirect approach and (2) the direct approach.In the indirect approach, instead of directly training the diffractive network to perform the linear intensity transformation , we trained the network, under spatiallycoherent illumination, to perform the complex-valued linear transformation  � between the input and output FOVs such that � � [, ]� = �[, ], which would result in an intensity linear transformation ] under spatially-incoherent illumination.For the purpose of the training, we defined the phase of  � [, ] to be zero, i.e.,  � [, ] = �[, ] exp(0); however, any other phase distribution could also be used since the design space is not unique.Stated differently, in this indirect approach, we design a diffractive network that can achieve a spatially-coherent impulse response ℎ(, ;  ′ , ′), which will ensure that the same design has a spatially-incoherent impulse response of (, ;  ′ , ′) = |ℎ(, ;  ′ , ′)| 2 such that  ′ ≈  can be satisfied under spatially-incoherent illumination.To achieve this goal, we used the relationship  � =  � ̃ to generate a large set of input-target complex-valued optical field pairs (,  �), and used deep learning to optimize the phase values of the diffractive features by minimizing the mean squared error (MSE) loss between the target complex field  � and the complex field  �′ obtained by coherently propagating ̃ through the diffractive network (see the Methods section).In other words, spatially-coherent design of a diffractive network is used here as a proxy for the design of a spatially-incoherent diffractive network that can achieve any arbitrary intensity linear transformation between its input and output FOVs.
In the second approach (termed the direct approach), we trained the diffractive network to perform the desired intensity linear transformation  between the input and the output FOVS, by directly using the relationship  =   to generate a large set of input-target intensity pairs (, ).Using this large training set of input/output intensity patterns, we optimized the transmission phase values of the diffractive layers using deep learning, by minimizing the MSE loss between the output pixel intensities of the diffractive processor  ′ and the ground-truth intensities  (see the Methods section).During the training phase, the output intensity of the diffractive processor was simulated through the incoherent propagation of the input intensity patterns,  or ( ′ ,  ′ ).To numerically simulate the spatiallyincoherent propagation of ( ′ ,  ′ ), we assumed the input optical field to be √  where  is a random 2D phase distribution, i.e., (′, ′)~(0, 2) for each (′, ′).This input field with the random phase distribution  was coherently propagated through the diffractive surfaces to the output plane, using the angular spectrum approach 22 .We repeated this coherent wave propagation   times for every , each time with a different random phase (′, ′) distribution, and averaged the resulting   output intensities.As   → ∞, the average intensity would approach the theoretical timeaveraged output intensity for spatially-incoherent illumination, i.e., (, ) = 〈|(, )| 2 〉.Due to the limited availability of computational resources, for the direct training (the second design approach) of the spatially-incoherent diffractive optical processors, we used   =  , = 1000.
The diffractive models reported in Figs.1-5 and 10 were trained using the indirect approach while the ones in Figs.6-9 were trained using the direct approach.All the diffractive networks reported in this work, after their training using either the direct or indirect design approaches, were evaluated and blindly tested through the incoherent propagation of input intensities with  , = 20,000.Since the testing is computationally less cumbersome compared to the training, we used  , = 20,000 ≫  , .
After the training phase, we tested the resulting diffractive processor designs using 20,000 test intensity patterns  that were never used during training; the size of this testing intensity set (20,000) should not be confused with  , = 20,000 since for each input intensity test pattern of this set, we used  , = 20,000 random 2D phase patterns to compute the corresponding spatially-incoherent output intensity.In Fig. 1c, the approximation errors of eight different phase-only diffractive processors trained using the indirect approach, each with  = 5 diffractive layers, are reported as a function of .The mean error (Fig. 1c) for each diffractive design was calculated at the output intensity patterns  ′ with respect to the ground truth  = , by averaging over the 20,000 test intensity patterns.Figure 1c reveals that the approximation error of the spatially-incoherent diffractive processors reaches a minimum level as  2    approaches 1, and stays at the same level for  ≥ 2    .
To understand the impact of  , on these approximation error calculations, we took the diffractive processor design # 1E shown in Fig. 1c (i.e.,  = 5,  ≈ 2.1 × 2    ), and used different  , values at the blind testing phase for evaluating the average test error on the same intensity test set composed of 20,000 patterns .As shown in Fig. 1d, the computed error values decrease as  , increases, as expected.On the right y-axis of the same Figure 1d, we also show, as a function of  , , the expectation value of � �, reported in Fig. 1d.This comparison also highlights the fact that our choice of using  , = 20,000 random 2D phase patterns to compute the spatially-incoherent output intensity patterns in the blind testing phase is an accurate approximation.
Next, we show in Fig. 2 the scaled intensity linear transformations,  � , that were approximated by five of the trained diffractive networks of Fig. 1c. � is related to the physical transformation  ′ by a scalar factor   (see the 'Evaluation' subsection in 'Methods' section) which compensates for diffraction efficiency-related optical losses.We also show the error matrix with respect to the target , i.e.,  = � −  � �, and report the average of the error matrix elements in the table on the right.Here |•| denotes the elementwise operation.As  increases, the diffractive networks' resulting matrices resemble the ground truth target better and the approximation error decreases steadily; however, the improvement is more prominent as  approaches 2    and stagnates beyond  ≈ 2    .
To provide visually more noticeable illustrations of the diffractive networks' all-optical intensity transformations under spatially-incoherent illumination, we used structured intensity patterns such as the letters U, C, L, and A as input intensity to the diffractive networks (see Fig. 3).Because of the randomness of the elements of the intensity transformation matrix, the output pixel intensities also appear random (harder to compare visually against the ground truth).However, the reappearance of the letters after a numerical inversion through the multiplication of the scaled output intensity  � by the inverse of the target transformation,  −1 , would indicate  � ≈  and validate the correctness of the diffractive networks' approximations in a visually noticeable manner (see the 'Evaluation' subsection of the Methods section for the definition of  �).In the case of the diffractive network # 1A ( = 5,  = 5 × 38 2 ≈ 0.88 × 2    ), the result of such an inversion does not quite reveal any recognizable patterns, indicating the near-failure of the all-optical approximation of this design # 1A.However, such inversion reveals the recognizable patterns (U, C, L, and A) as  approaches 2    (design # 1B) and becomes identical to the inputs as  exceeds 2    (e.g., design # 1C).These results show that for the  = 5 phase-only diffractive networks with a sufficiently large  ≥ ~2    , we have  � ≈ , indicating that these networks could faithfully approximate the target intensity linear transformation under spatially-incoherent illumination.
For computational imaging and sensing applications, such as in microscopy, exploring patterns of closely spaced lines and points would be interesting.Motivated by this, we repeated the same procedures outlined in Fig. 3 for various intensity patterns consisting of closely separated line pairs and sets of points, the results of which are summarized in Fig. 4. The same conclusions drawn previously in Fig. 3 hold: for  ≥ ~2    we have  � ≈ .
We also investigated the dependence of the all-optical approximation of intensity linear transformations on the number of diffractive layers ; see Fig. 5.The results of this analysis reveal that even with  ≈ 2 × 2    ,  = 1 and  = 2 diffractive designs failed to approximate the target linear transformation despite having a large , whereas the designs with  > 2 successfully approximated the target transformation under spatially-incoherent illumination.This confirms that the depth of the diffractive network design is a key architectural factor in the computational capacity of diffractive processors to perform arbitrary linear transformations 21,22,37,38 .
Next, we present the blind testing results of the diffractive processors that were trained using the second design approach (i.e., direct approach), to perform the same arbitrary intensity linear transformation as has been considered so far.In Fig. 6a, the approximation errors of eight different phase-only diffractive processors trained using the direct approach, each with  = 5 diffractive layers, are reported as a function of .The mean error was calculated over the same 20,000 test intensity patterns used in Fig. 1c; for each test intensity pattern, the incoherent output intensity  ′ was calculated using  , = 20000 (same as before).In these alternative diffractive designs, the approximation error of the diffractive processors reaches a minimum level as  2    approaches 1, and stays at the same level for  ≥ 2    -the same conclusion that we reached for the indirect designs reported earlier.
However, compared with the previous designs that used the indirect approach, here, the minimum error level obtained using the direct approach is approximately three times higher.This can be attributed to the use of a relatively small  , = 1000 during the training, and these designs can be further improved by increasing  , using a longer training effort with more computational resources.
In Fig. 7, we show the scaled linear intensity transformations,  � , that were approximated by five of the trained diffractive networks of Fig. 6a.For each case, we also show the error matrix with respect to the target , i.e.,  = � −  � �, and report the average of the error matrix elements in the table on the right.As  increases, the mean intensity transformation error decreases, except for design # 2B which we believe is an outlier resulting from poor convergence.The relatively large error of the design # 2B is due to the diffraction efficiency imbalance among the individual input pixels, as evident from the uneven magnitudes across the columns of  � .Similarly, the other designs of the direct approach reveal uneven magnitudes across the columns of , indicating some diffraction efficiency imbalance among the individual input pixels, albeit not as severe as the design # 2B.Despite such imperfections, these diffractive networks designed using the direct approach effectively learned the target intensity transformation, as evident from Figs. 8 and 9. Figure 8 reveals that, for all the designs, the multiplication of the output intensity patterns  � by the inverse of the target transformation,  −1 brings back the patterns U, C, L, A. Although, the reconstruction quality is better for  ≈ 2    and remains similar beyond  > 2    , the improvement is not as sharp as it was with the indirect approach (see Fig. 8 vs. Fig. 3 and Fig. 9 vs. Fig.4).In contrast with the diffractive networks designed using the indirect approach, here in this case, the diffractive networks with  < 2    (e.g., design # 2A) succeeded in approximating the linear transformation to the extent of revealing recognizable patterns after a numerical inverse mapping.These same observations also hold for the intensity patterns that consist of closely spaced lines and points, as shown in Fig. 9.
Finally, we report in Fig. 10 the performance of a diffractive network ( = 5,  ≈ 2 × 2    ) trained using the indirect approach to approximate another arbitrary intensity linear transformation, defined by a non-invertible matrix.The target transformation , the approximate all-optical transformation  � , and the error matrix  = � −  � � are shown in Fig. 10a, revealing that the diffractive network design performed the target intensity transformation with negligible error.We also show the performance of this diffractive network design on test patterns (U, C, L, and A as well as line pairs and points) in Fig. 10b.The all-optical outputs are identical to the ground truth outputs, further confirming that we have  � ≈ .
Another example of the all-optical approximation of an arbitrary intensity transformation (defined by a random permutation matrix) is also reported in Supplementary Figure S1.

Discussion
We demonstrated that phase-only diffractive networks under spatially-incoherent illumination could perform arbitrary linear transformations of optical intensity with a negligible error if  ≥ 2    .The same conclusions would be applicable to complex-valued diffractive networks where the phase and amplitude of each diffractive feature could be independently optimized; in that case, the critical number of complex-valued diffractive features for approximating an arbitrary linear transformation of optical intensity would reduce by half to     due to the increased degrees of freedom per diffractive layer.Because of the practical advantages of phase-only diffractive networks, without loss of generality, we limited our analyses in this work to phase-only modulation at each diffractive surface.
Our results suggest that the two different training approaches (indirect vs. direct design) converge differently.If  is comparable to or larger than 2    , the indirect approach results in significantly better and faster convergence and accurate approximation  � ≈ ; on the other hand, the direct design approach works better when  is considerably less than 2    , even if its approximation error is larger.For example, although the designs # 2A and # 2B have higher errors than the design # 1A, the performances of the former on various test patterns are manifestly better as compared in Figs. 3, 4, 8 and 9.These direct designs can be further improved in their approximation power by increasing  , ≫ 1000 through a longer training phase, utilizing more computational resources.
An important advantage of the direct approach over the indirect one is that the former is compatible with data-driven design and can be applied even if the only information available to the designer is the sample data representing the target incoherent linear process, without a priori knowledge of the transformation matrix itself.By the same token, the direct approach also lends itself to data-driven optimization of incoherent diffractive processors for all-optical linear approximation of some nonlinear processes.As a consequence of this, data-driven design of incoherent processors for performing other inference tasks such as e.g., all-optical image classification under spatially-incoherent illumination, can be accomplished using the direct approach.
The failure of shallow diffractive networks to perform an arbitrary intensity transformation (see e.g.,  = 1 and  = 2 designs shown in Fig. 5) indicates that shallow architectures with phase-only diffractive layers are unable to effectively balance the ballistic photons that are transmitted from the sample/input FOV over a low numerical aperture; as a result of this, the lower spatial frequencies of the input intensity patterns dominate the output intensity patterns of a shallow diffractive network, sacrificing the approximation accuracy.Therefore, shallow diffractive network architectures, even with large numbers of trainable diffractive features (), fail to approximate an arbitrary intensity transformation, as shown in Fig. 5. Deeper architectures, on the other hand, utilize their trainable diffractive features more effectively by distributing them across several layers/surfaces, one following another, and mixing the propagating modes of the input FOV over a series of layers that are optimized using deep learning.
Spatially-incoherent diffractive processor designs can also be extended to temporally incoherent broadband illumination light.In fact, multiplexing of >100 arbitrary complex-valued linear transformations for complex optical fields was shown to be possible under spatially-coherent but broadband illumination light 38 .Following a similar multi-wavelength optimization process and the indirect design principles outlined earlier, one can design a diffractive network to simultaneously approximate a group of arbitrarily-selected linear intensity transformations (   ,    ,…    ) under spatially-incoherent illumination, where each intensity transformation is assigned to a unique wavelength   { = : }.The success of such a spatially-and temporally-incoherent diffractive optical network to accurately perform all the target intensity transformations will require an increase in the number of trainable features within the diffractive volume, i.e.,  ≥  × 2    would be needed for a phase-only diffractive network.Such diffractive processor designs that work under spatially-and temporally-incoherent light can be useful for a number of applications, including fluorescence and brightfield microscopy and the processing of natural scenes.

Methods
Model for the propagation of spatially-coherent light through a diffractive optical network Propagation of spatially-coherent complex optical fields through a diffractive processor {•} constitutes successive amplitude and/or phase modulation by diffractive surfaces, each followed by coherent propagation through the free space separating consecutive diffractive surfaces.The diffractive features of a surface locally modulate the incident optical field (, ).For this paper, the trainable diffractive features are phase-only, i.e., only the phase, but not the amplitude, of the incident field is modulated by the diffractive surface.In other words, the field immediately after the surfaces would be (, ) exp�  (, )� where the local phase change   (, ) induced by the surface is related to its height ℎ(, ) as Here  is the refractive index of the diffractive surface material.
Free-space propagation of an optical field between consecutive diffractive surfaces was modeled using the angular spectrum method 8 , according to which the propagation of an optical field (, ) by distance  can be computed as follows: where ℱ (ℱ −1 ) is the two-dimensional Fourier (Inverse Fourier) transform and �  ,   ; � is the freespace transfer function for an axial propagation distance : where  is the wavelength of light.

Model for the propagation of spatially-incoherent light through a diffractive optical network
With spatially-incoherent light, the (average) output optical intensity (, ) of a diffractive network, for a given input intensity (, ), can be written as where {•} denotes the coherent propagation of the optical field through the diffractive processor as described in the preceding subsection, and 〈•〉 denotes the statistical average, over all the realizations of the spatially-independent random process (, ) representing the 2D phase of the input optical field, i.e., (, )~(0,2) for all ,  40 .
As for the spatially-incoherent propagation of average intensity, it is only possible to approximate the true average (Eq.
In the training phase of the direct training approach, incoherent propagation of intensities through the diffractive processors was simulated with  , = 1000.However, in the blind testing phase we used  , = 20000 while evaluating the diffractive processors once they were trained, irrespective of whether the indirect or the direct approach of training was used.

Diffractive network architecture
The heights ℎ(, ) ≜ ℎ(, ) of the  diffractive features distributed over  surfaces were optimized for designing the diffractive processors to perform the desired transformation.To keep the connectivity between successive diffractive layers 22

Linear transformation matrix
In this paper, the input and the output of the diffractive networks have dimensions of

Training details
The height ℎ of the diffractive features at each layer was confined between zero and a maximum value ℎ  by using a latent variable ℎ  : We chose ℎ  ≈  −1 so that the corresponding phase modulation depth is 2.The diffractive layers were optimized using the AdamW optimizer 41 for 50 epochs with a minibatch size of 8 and an initial learning rate of 10 −3 .The learning rate was decayed by a factor of 0.7 every five epochs.The latent variables were initialized randomly from the standard normal distribution (0,1).We evaluated the mean loss of the trained model on the validation set after the completion of each epoch and selected the trained model state at the end of the epoch corresponding to the lowest validation loss.These details were the same for both the indirect and the direct training approaches.

Evaluation
The evaluation procedure was the same across all the trained diffractive networks irrespective of whether the direct approach or the indirect approach was used to train them.To evaluate the trained diffractive networks, we generated a test set comprising 20,000 pairs of input and target intensity vectors  = .Note that these 20,000 test examples were generated using a different random seed from the ones used to generate the training and the validation sets to ensure they were not represented during the training.For a given , the corresponding input intensity pattern was incoherently propagated through the trained diffractive network (as in Eq. 9) using  , = 20,000 to compute the output intensity  ′ .The mean of the error between  ′ and  over the 20,000 test examples was used to quantify the output error of the diffractive network for comparing different designs, as in Figs. 1 and 6.
the same across the trained diffractive networks with different , the layer-to-layer separation was set as  =   , where  = �    is the width of each diffractive layer.The distances between the input FOV and layer-1 and between layer- and the output FOV were also set as .The pixel size on both the input and the output FOVs was ~2.13 × 2.13, i.e., 4 × 4.

Figures and Figure captions
Figures and Figure captions: Figures and Figure captions:

Fig. 1 : 1 𝑁𝑁
Fig. 1: All-optical linear transformation of intensity performed by diffractive networks under spatiallyincoherent illumination.(a) Schematic of a diffractive network formed by  = 5 diffractive surfaces that all-optically perform a linear transformation of intensity between the input and output FOVs.The  diffractive features are distributed evenly among the  = 5 surfaces.(b) An arbitrary   ×   matrix , representing the target intensity transformation to be performed all-optically by the diffractive network.Here   = 8 2 and   = 8 2 are the number of pixels at the input and the output FOVs of the diffractive network, respectively.(c) The expectation value of the MSE between the all-optical output intensity ′ and the ground-truth output intensity , as a function of  for different diffractive networks trained using the indirect approach.To simulate the incoherent propagation of intensity for each test input, we used  , = 20000.(d) Dependence of the calculated output MSE on  ,  , demonstrated for network

Fig. 6 : 1 𝑁𝑁
Fig. 6: All-optical linear transformation of intensity under spatially-incoherent illumination, by diffractive networks trained using the direct approach.(a) The expectation value of the MSE between the alloptical output intensity ′ and the ground-truth output intensity , as a function of  for different diffractive networks trained using the direct approach.(b) Dependence of the calculated output MSE on  ,  , demonstrated for network # 2E of Fig. 6a.The right y-axis shows the expectation value of the residual magnitude of 1  , ∑     , =1
with Compute Unified Device Architecture (CUDA) version 11.3.1.Training and testing were done on GeForce RTX 3090 graphics processing units (GPU) in workstations with 256GB of random-access memory (RAM) and Intel Core i9 central processing unit (CPU).The training time of the models varied with the training approach as well as the size of the models in terms of  and .For example, the indirect training of  = 5,  = 5 × 52 2 diffractive network model took around 5 hours , whereas with the direct approach, the training time for the  = 5,  = 5 × 52 2 model with  , = 1000 was around 58 hours.
3, 4, 8, 9, 10, we defined the scaled all-optical output intensity vector  � = ′  ′ (see Supplementary Information for details).For evaluating the intensity transformation ′ performed by the diffractive networks at the end of their training, we used   intensity vectors {  } =1   where   [] = 1 if  =  and 0 otherwise.In other words,