Polarization multiplexed diffractive computing: all-optical implementation of a group of linear transformations through a polarization-encoded diffractive network

Research on optical computing has recently attracted significant attention due to the transformative advances in machine learning. Among different approaches, diffractive optical networks composed of spatially-engineered transmissive surfaces have been demonstrated for all-optical statistical inference and performing arbitrary linear transformations using passive, free-space optical layers. Here, we introduce a polarization-multiplexed diffractive processor to all-optically perform multiple, arbitrarily-selected linear transformations through a single diffractive network trained using deep learning. In this framework, an array of pre-selected linear polarizers is positioned between trainable transmissive diffractive materials that are isotropic, and different target linear transformations (complex-valued) are uniquely assigned to different combinations of input/output polarization states. The transmission layers of this polarization-multiplexed diffractive network are trained and optimized via deep learning and error-backpropagation by using thousands of examples of the input/output fields corresponding to each one of the complex-valued linear transformations assigned to different input/output polarization combinations. Our results and analysis reveal that a single diffractive network can successfully approximate and all-optically implement a group of arbitrarily-selected target transformations with a negligible error when the number of trainable diffractive features/neurons (N) approaches \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_pN_iN_o$$\end{document}NpNiNo, where Ni and No represent the number of pixels at the input and output fields-of-view, respectively, and Np refers to the number of unique linear transformations assigned to different input/output polarization combinations. This polarization-multiplexed all-optical diffractive processor can find various applications in optical computing and polarization-based machine vision tasks.


Introduction
With the increasing global demand for machine learning and computing in general, using light to perform computation has been a rapidly growing focus area of optics and photonics [1][2][3][4][5] .The research on optical computing has a long history spanning decades of exciting research and development efforts  . Motiated by the massive success of artificial intelligence and deep learning, in specific, a myriad of new hardware designs for optical computing have been reported recently, including, e.g., on-chip integrated photonic circuits [16][17][18][19][20][21][22] , free-space optical platforms 23- 28 , and others [29][30][31] .Among these different optical computing systems, the integration of successive transmissive diffractive layers (forming an optical network) has been demonstrated for optical information processing, such as object classification 23,[32][33][34][35][36][37][38][39][40][41][42][43] , image reconstruction 38,44 , all-optical phase recovery and quantitative phase imaging 45 , and logic operations [46][47][48] .A diffractive network is trained using deep learning and error-backpropagation methods implemented in a digital computer, after which the resulting transmissive layers are fabricated to form a physical network that computes based on the diffraction of the input light through these spatially-engineered transmissive layers.Because the computational task is completed as the light passes through thin and passive optical elements, this approach is very fast, and the inference process does not consume power except for the illumination light.It is also scalable since an increase in the input field-of-view (FOV) can be handled by fabricating larger transmissive layers and/or deeper diffractive designs with more successive layers positioned one after another.Furthermore, both the phase and the amplitude information channels of the input scene/FOV can be processed by a diffractive optical network, without the need for phase retrieval or digitizing, vectorizing an image of the scene, which makes diffractive computing highly desirable for machine vision applications 38,44 .Harnessing light-matter interactions using engineered diffractive surfaces also enabled the inverse design of optical elements for e.g., spatially-controlled wavelength demultiplexing 49 , pulse engineering 50 , and orbital angular momentum multiplexing/demultiplexing 51,52 .It has also been shown that a diffractive network can be trained by optimizing its diffractive layers to perform an arbitrary complex-valued linear transformation between its input and output fields-of-view, demonstrating its computing capability for complexvalued matrix-vector operations at the speed of light propagation through a passive diffractive system.
All these results highlight the unique capabilities of diffractive networks to manipulate various physical properties of light, including e.g., its amplitude and phase distribution, spatial frequency, spectral bandwidth, orbital angular momentum, for performing specific computational tasks that are desired.As another important physical property of light, polarization specifies the geometrical orientation of electromagnetic wave oscillations.Utilizing the polarization state of light has played a pivotal role in numerous applications, including telecommunications [53][54][55] , imaging [56][57][58][59][60][61] , sensing [62][63][64] , computing 65 , and displays 66,67 .For example, polarization-division multiplexing (PDM) has been used in telecommunication systems to permit two channels of information to be simultaneously transmitted using orthogonal polarization states over a single wavelength 54,68 .
Here, we report the design of polarization-multiplexed diffractive optical networks to perform a group of arbitrary linear transformations using a common set of diffractive layers that are jointlyoptimized to all-optically perform each one of the target complex-valued linear transformations at a different combination of input/output polarization states.In our earlier work 69 , we showed that a diffractive optical network composed of spatially-engineered layers could all-optically perform an arbitrary complex-valued linear transformation between an input and output field-of-view with a negligible error when the number of trainable diffractive elements/neurons () approaches     , where   and   represent the number of pixels at the input and output FOVs, respectively.In this work, we use polarization multiplexing between the input and output FOVs of a diffractive network to increase the capacity of diffractive computing and all-optically perform a group of arbitrary linear transformations that are complex-valued.These polarization multiplexed diffractive network designs are not based on birefringent, anisotropic or polarization-sensitive materials; instead, our designs utilize standard diffractive surfaces where the phase and amplitude transmission coefficients of each trainable diffractive feature are independent of the polarization state of the input light.Using a network design solely based on standard isotropic diffractive materials makes our designs simpler in terms of material selection, fabrication and scale-up; however, it also makes the diffractive network insensitive to different polarization states, and therefore, polarization multiplexed all-optical computation of different transformations becomes impossible.To overcome this challenge, we used a non-trainable, pre-determined array of linear polarizers (at 0, 45, 90 and 135) within the diffractive network that acted as polarization seeds for the trainable isotropic diffractive layers to all-optically execute different linear transformations through input/output polarization multiplexing (see Fig. 1a).Stated differently, we used datadriven training and optimization of isotropic diffractive layers to encode different linear transformations into different input/output polarization combinations, and this encoding is made possible by the polarization mode diversity introduced by a non-trainable, pre-determined array of linear polarizers within the diffractive volume.
In our first implementation, we performed two different, arbitrarily-selected linear transformations (i.e.,   = 2) using a diffractive network composed of four transmissive layers that are jointly optimized using deep learning, where the first target linear transformation was assigned to x (0) linear input and x linear output polarization combination, and the second target linear transformation was assigned to y (90) linear input and y linear output polarization combination.For this case of   = 2, there are two different schemes (Fig. 1b) to all-optically access/implement the desired linear transformations: sequential (x and y input polarization states encode the input information sequentially, one after another) or simultaneous (x and y input polarizations encode the input information at the same time within the input FOV).Our numerical results (Figs. 2-5) reveal that one can successfully train a diffractive network under each one of these operation modes (sequential vs. simultaneous) to approximate the two target, arbitraryselected linear transformations with a negligible error when the number of trainable diffractive neurons  approaches       = 2    .
In our second implementation (Fig. 6), we performed four different, arbitrary linear transformations (i.e.,   = 4) using a diffractive network composed of eight transmissive layers that are jointly optimized using deep learning and examples of input/output fields corresponding to the selected complex-valued linear transformations (ground truth).In this case, the first target transformation was assigned to x linear input and 45 linear output polarization combination, the second target transformation was assigned to y linear input and 135 linear output polarization combination, the third target transformation was assigned to x linear input and 135 linear output polarization combination and finally the fourth target transformation was assigned to y linear input and 45 linear output polarization combination.Our analyses of this 4-channel polarization multiplexed diffractive system show that when  ≥       = 4    , all the target linear transformations can be successfully approximated, following a similar conclusion as in the first implementation case (  = 2).
Without the use of a non-trainable, pre-determined array of linear polarizers acting as polarization seeds within the network, none of these multiplexing results could be achieved using isotropic diffractive materials, no matter how they are trained or optimized, since they would normally perform the same transformation under different input polarization states.
Our results should not be confused with polarization multiplexed (or wavelength/illumination multiplexed) projection of a set of desired complex fields at the output of a metamaterial design; such multiplexed metamaterial systems do not implement an arbitrary matrix multiplication operation.Each input-output polarization combination in our diffractive design represents an alloptical implementation of a unique linear transformation between the input and output FOVs.Therefore, for each input-output polarization combination, infinitely many different target complex fields can be all-optically synthesized by the trained diffractive network in response to different input field distributions; and this capability accurately defines the corresponding complex-valued linear transformation at the output FOV for all the possible and infinitely many combinations of phase and amplitude distributions at the input FOV.
A polarization multiplexed diffractive network can perform an arbitrary set of target linear transformations using the same diffractive layers that all-optically implement a distinct complexvalued linear transformation at a selected input/output polarization combination.We believe that this unique framework will be valuable in developing high-throughput optical processors and polarization-based machine vision systems operating at different parts of the electromagnetic spectrum.Moreover, the presented diffractive computing platform and the underlying concepts can be used to develop polarization-aware optical information processing systems for e.g., detection, localization, and statistical inference of objects with unique polarization properties.

Results
Throughout this section, the terms "diffractive optical network," "diffractive network," and "diffractive processor" are interchangeably used.The schematic of our framework for 2-channel polarization multiplexed all-optical computing (  = 2) is shown in Fig. 1a.A polarizationencoded diffractive neural network, composed of 4 trainable diffractive layers, is trained to alloptically perform 2 distinct, complex-valued linear transformations between the input and output FOVs through 2 orthogonal polarization channels.The pre-determined polarizer array (which is treated as non-trainable) consists of multiple linear polarizer units with four different polarization directions: 0°, 45°, 90° and 135°, and it is located between the 2 nd and 3 rd trainable diffractive layers.More details about the architecture, optical forward model and training details of the polarization diffractive network can be found in the Methods section.
We use  and  to denote the complex-valued, vectorized versions of the 2D input and output complex fields located at the input and output FOVs of the diffractive network, respectively, as presented in Fig. 1a.Based on the scalar diffraction theory, here   and   represent the column vectors of the complex fields generated by sampling the x-polarized optical fields within the input and output FOVs, respectively, and vectorizing the resulting 2D matrices in a column-major order.Similar to   and   ,   and   are their counterparts generated by sampling the y-polarized optical fields within the input and output FOVs, respectively.Based on this notation, (  ,   ) and (   ,   ) can be considered to represent the input and output channels of our polarization multiplexed diffractive network, respectively.In our analyses, the number of pixels in the input and output FOVs are both taken as   =   = 8 2 = 64, such that each target linear transformation matrix has 64 2 complex-valued entries.
In this first implementation with   = 2, we randomly generated two complex-valued matrices   and   , each with a size of   ×   = 64 2 , to serve as two unique arbitrary linear transformations that we would like to all-optically implement using a single polarization diffractive network.Visualized in Fig. 2a with their amplitude and phase components, these two matrices are independently generated using different random seeds, and the difference between the two matrices can be found in Fig. S1.We also randomly generated two training sets of complex-valued vectors {  } and {  } with   = 64 as input fields, and constructed the corresponding sets of output field vectors {  } and {  } using   =     and   =     , respectively.For each one of these training sets, {  } and {  }, we used 55,000 randomly generated complex fields in our training process.A further increase in the size of this training dataset (to e.g., >100,000 randomly generated complex fields) could improve the transformation approximation accuracy of the trained diffractive networks, but would not change the general conclusions of this manuscript and therefore is left as future work.
Based on the given inputs of {  } and {  }, the ultimate goal of training our polarization multiplexed diffractive network is to simultaneously compute the all-optical output fields {  ′ } and {  ′ } to come close to the output ground truth (target) fields {  } and {  }; this way, the alloptical transformations   ′ and   ′ performed by the trained single diffractive system represent an accurate approximation to their ground truth (target) transformation matrices   and   .It should be emphasized that we are not aiming to train the diffractive network to implement the correct linear transformations for only a few input-output field pairs.Instead, despite the limited number of input/output field patterns used during the training process, our goal is to generalize to any pairs of (  ,   ) and (  ,   ) that satisfy   =     and   =     .More details about the training data generation can be found in Methods.
To form two unique diffractive information processing pipelines in the same diffractive network for performing the linear transformations given by   and   , as shown in Fig. 1a we matched the input fields and the diffractive output pairs, {(  ,   ′ )} and {(  ,   ′ )}, with the input and output polarization channels of our diffractive system, i.e.,   =   ,   =   ,   =   ′ and   =   ′ .That is to say, the   ′ transformation is performed by encoding the corresponding input field data   into the x-polarized optical field within the input FOV, using e.g., an x-aligned linear polarizer, and decoding (sampling) the x-polarized component of the field within the output FOV as the computed output field   ′ using e.g., an x-polarized analyzer.We denote this diffractive information processing channel as the channel ① in Fig. 1b.It is also a similar case for the   ′ transformation, except this time the y-polarization is employed at the input and output FOVs, and this diffractive information processing channel is denoted as the channel ②.With this polarization encoding scheme, there are potentially two modes to perform the data inference through the same diffractive network: (1) in two sequential, successive accesses to the diffractive system, each time feeding the input data using its assigned polarization channel, and obtaining the corresponding output (see Fig. 1b, left); and (2) in single access to the diffractive system, by feeding the input data of both of the two polarization channels in parallel, and obtaining the two corresponding outputs simultaneously (see Fig. 1b, right).We term the former and latter approaches as the "sequential polarization access" (SeqPA) mode and the "simultaneous polarization access" (SimPA) mode, respectively, and analyze below these two modes of operation separately.

2-channel polarization multiplexed all-optical diffractive computing using the sequential polarization access (SeqPA) mode
As shown in Fig. 1b, left, with the input data   and   being separately and sequentially fed into the polarization channels ① and ②, respectively, the all-optical computed outputs   ′ and   ′ are also collected successively using the same diffractive network hardware.By employing this SeqPA strategy, we trained polarization multiplexed diffractive networks with different numbers of trainable diffractive neurons, i.e., N = {32 2 , 44 2 , 64 2 , 92 2 , 128 2 , 180 2 , 256 2 }, all using the same training datasets {(  ,   )} and {(  ,   )} and the same number of epochs.To benchmark the performances of these multiplexed diffractive networks, for each transformation dataset and N, we also trained regular diffractive networks without the polarizer array or any polarization encoding/decoding at the input/output FOVs, which constitute our baseline.These regular diffractive networks, denoted as "No pol." in our analyses, are trained to approximate only one linear transformation (i.e., either   or   ), and therefore they are referred to as   = 1 (no polarization multiplexing).
Figures 2b-e present the quantitative comparison of the all-optical transformation results obtained using the trained diffractive networks described above.Three different metrics were used to quantify the transformation accuracy and generalization performance of these diffractive networks: (1) the normalized transformation mean-squared error (  Transformation ), (2) the cosine similarity () between the all-optical transforms and the target transforms, and (3) the meansquared error between the diffractive network output fields and their ground truth ( Output ).These performance metrics are reported in Figs.2b,c,d, as a function of the number of diffractive neurons (N) used in each design.Note that the transformation error of the polarization multiplexed diffractive systems is calculated per polarization channel.More details about the formulations of these performance metrics can be found in Methods.In Fig. 2b, it can be seen that the transformation errors of all the trained diffractive models monotonically decrease as N increases, which is expected due to the increased degrees of freedom in the diffractive processor.In the standard diffractive networks without polarization multiplexing (dash-dotted curves labeled with "No pol.  " or "No pol.  "), the transformation errors for implementing   or   are almost the same (which indicates that these randomly selected matrices,   and   , represent similar computational complexity; also see Fig. S1).The approximation errors of these standard diffractive networks, No pol.  and No pol.  , both approach to 0 as  approaches     = 64 2 ≈ 4.1.In the polarization multiplexed diffractive models (solid curves labeled with "SeqPA ①" or "SeqPA ②"), the transformation errors  Transformation for the two distinct transforms computed through the two polarization channels are also very close to each other for all values of N, demonstrating no bias toward any specific polarization channel or transform.The approximation errors of these polarization multiplexed models approach to 0 as  approaches       = 2    = 92 2 ≈8.2k.This finding indicates that compared with the baseline diffractive models that can only perform a single transform, performing two unique transforms using polarization multiplexing through the same diffractive model requires the number of trainable neurons  to double.This conclusion is further supported by the results of the other two performance metrics,  (Fig. 2c) and  Output (Fig. 2d) that both show the same trends as in Fig. 2b: for the baseline diffractive models  and  Output approach 1 and 0 as  approaches     , respectively, while for the polarization multiplexed diffractive models, the two metrics approach 1 and 0 as  approaches       = 2    .Apart from the metrics that are used to evaluate the transformation performance, we also report the output diffraction efficiencies () of these diffractive models in Fig. 2e, which reveal that compared with the baseline diffractive networks (No pol.), the diffraction efficiencies of the polarization multiplexed diffractive models trained using the SeqPA mode reach a similar level.
To further demonstrate the performance of our polarization multiplexed diffractive networks, in Fig. 3 we show examples of the ground truth transformation matrices (i.e.,   and   ) and their counterparts (i.e.,   ′ and   ′ ) resulting from the diffractive designs with  = {44 2 , 92 2 , 180 2 }, along with the amplitude and phase absolute errors.Exemplary complex-valued input-output fields from the same set of diffractive designs are also presented in Fig. 4. Figs. 3 and 4 reveal that for both of the polarization channels, when  ≥       = 2    , the all-optical transformation matrices and the output complex fields very well match their ground truth targets with negligible absolute errors, which are also in line with the observations made in Fig. 2.

2-channel polarization multiplexed all-optical diffractive computing using the simultaneous polarization access (SimPA) mode
As an alternative to the sequential polarization access (SeqPA) used earlier, we also explored the use of the simultaneous polarization access (SimPA) mode in our all-optical computing framework.As shown in Fig. 1b, right, in single access to the diffractive system, the input complex-valued data   and   are fed into the polarization channels ① and ②, respectively, and the all-optical diffractive outputs   ′ and   ′ are collected at the same time through two orthogonal polarization states at the output FOV.Before we trained a new polarization multiplexed diffractive network from scratch using the SimPA mode, we first took our earlier diffractive designs trained using the SeqPA mode and tested them directly using the SimPA mode by inputting both polarization channels ① and ② at the same time, deviating from their training scheme, which only used SeqPA.The results of blindly testing the SeqPA-trained diffractive networks under the SimPA mode are shown in Fig. S2, which reveals inference results with significantly higher values of  Transformation and  Output and decreased values of  , all of which indicate a performance degradation, when we operate a SeqPA-trained diffractive network using the SimPA mode.As shown in Fig. S3, this performance degradation is due to the "cross-talk" between the two transformation channels when both of the input polarization states are at the same time present, which was not considered during the SeqPA-based training process.These results highlight the necessity of training the diffractive system from scratch under the SimPA mode, so that the impact of this cross-talk can be taken into account and minimized during the iterative design process.
After training our diffractive models from scratch using the SimPA mode, we report their blind testing results in Fig. 5 using the solid curves labeled with "SimPA ①" and "SimPA ②".The results of the new diffractive designs trained using the SimPA mode demonstrate the success of all-optically performing two different linear transformations in parallel using polarization multiplexing.Our analysis (Fig. 5) also reveals the same conclusions discussed earlier for the models trained using the SeqPA mode: the all-optical transformation performance of polarization multiplexed diffractive networks very well match the ground truth, desired transformations as  approaches       = 2    .
We further compared the blind testing results of these two different modes of operation (SeqPA vs. SimPA) and performed a cross-talk field analysis (see Fig. S3).We found out that the amount of transformation cross-talk in the diffractive models trained using the SimPA mode (shown in the right column of Fig. S3c-d), is ~300-fold lower when compared with the amplitude values of the cross-talk observed in the diffractive designs trained using the SeqPA mode (shown in the left column of Fig. S3c-d).During the diffractive model training, these cross-talk fields are gradually eliminated (penalized) using the SimPA mode of operation to better approximate the ground truth fields.However, for the diffractive models trained under the SeqPA mode, such cross-talk fields are ignored (i.e., remain non-penalized during the training phase) since the SeqPA operation assumes successive access to the diffractive network, one input polarization state at a time.Stated differently, Seq-PA trained diffractive networks successfully approximate the target transformations only when they are tested under the same SeqPA mode of operation, and fail due to the field cross-talk when tested under the Sim-PA mode.

4-channel polarization multiplexed all-optical diffractive computing
So far, we have demonstrated to perform all-optical computing with 2-channel polarization multiplexing through a single diffractive network.To further exploit the polarization multiplexing capability of this diffractive computing framework, next, we explored a 4-channel polarization multiplexed design for performing 4 different arbitrarily-selected linear transformations through a single diffractive network (i.e.,   = 4).Figure 6 illustrates the schematics of this framework.As depicted in Fig. 6b, by sequentially connecting one of the two input polarization states with one of the two output polarization states, four transformation channels, ①, ②, ③ and ④, can be formed to all-optically perform 4 distinct complex-valued transforms using the same diffractive processor.Compared to the 2-channel polarization multiplexed system reported earlier, the polarization states for the output field sampling in this 4-channel system are selected to be 45° and 135° linear polarization.This design choice is made to balance out the diffraction efficiencies of the resulting 4 different linear transformations that are all-optically performed by the diffractive network.Stated differently, this design choice introduces symmetry to all the input/output polarization combinations that are each assigned to a different linear transformation.In Fig. 6a and b, we denote the two output fields corresponding to the linear polarization directions at 45° and 135° as   and   , respectively.
In the light of our earlier findings that point to the need for more diffractive neurons in the case of   = 2 when compared to   = 1, here we employed 8 successive trainable diffractive layers to increase our degrees of freedom for   = 4 design (see Fig. 6a).Also, compared to the earlier 2channel polarization multiplexed design, we included an additional linear polarizer array with the same configuration as before (with polarization orientations of 0°, 45°, 90° and 135°) to further enhance the spatial diversity of polarization modes within the diffractive processor.Same as the   = 2 diffractive designs, these linear polarizer arrays are pre-determined (i.e., non-trainable) and act as "polarization seeds" within the trained diffractive network.
Next, we generated random data to train and test our diffractive networks under   = 4.In addition to the two randomly-generated ground truth transforms   and   that were earlier used for the 2-channel models, we randomly generated two additional complex-valued transforms   and   and accordingly constructed the training and testing dataset consisting of the input and ground truth output fields.These four ground truth (target) transforms are visualized in Fig. 7a, and their differences can be found in Fig. S1.Following the training of the polarization multiplexed diffractive networks with different , their transformation performance for   = 4 is analyzed in Fig. 7b-d based on the same set of performance metrics that were used earlier.These results reveal that, when  approaches       = 4    = 16.4, the  Transformation and  Output of all the four diffractive transformations approach 0, while the  approaches 1, demonstrating that all the target linear transformations (  ,   ,   and   ) can be successfully approximated by a single diffractive processor with a negligible error if  ≥       .This is the same conclusion that was reached earlier for   = 2.
To further demonstrate the success of these 4-channel polarization-multiplexed diffractive systems, in Fig. S4 we present the ground truth transformation matrices (i.e.,   ,   ,   and   ) and their diffractive counterparts (i.e.,   ′ ,   ′ ,   ′ and   ′ ) designed with  = {14.3k,66.5k}, along with the amplitude and phase errors made in each case.Furthermore, exemplary complex-valued output fields achieved by these diffractive systems are also shown in Fig. S5, all of which confirm the success of the presented 4-channel polarization-multiplexed diffractive designs.Finally, we also analyzed the output diffraction efficiencies of these diffractive models, reported in Fig. 7e.The results show that, compared to their counterparts without polarization encoding (  = 1), the polarization multiplexed diffractive models with   = 4 turn out to be less power efficient (per transformation), with an efficiency decrease of ~6 dB at the output FOV.

Discussion
Our results and analysis demonstrated that, using polarization multiplexing in a single diffractive network, one can all-optically perform a group of complex-valued arbitrary linear transformations at the same output FOV of the diffractive network.In practical applications, these different transformations can cover, for example, various machine vision tasks, such as detection, classification, and localization of objects, which can be programmed into different input/output polarization states.Compared to employing multiple diffractive subsystems, each one dedicated to performing a single task, integrating multiple tasks within the same diffractive system offers advantages of speed, versatility, compactness, and cost-effectiveness.
In addition to polarization multiplexing, we should note that other degrees of freedom can be used to implement multiple computational tasks through a single diffractive network.For example, one can divide the input/output FOVs of the diffractive network into multiple regions, where each region is assigned to a unique computing task through spatial-division multiplexing.It is also possible to achieve wavelength-division multiplexing by assigning different wavelengths or spectral bands to independent computing tasks and employing dispersive elements in the diffractive computing system.In contrast to these other possible methods of information multiplexing, the polarization-based multiplexing that we reported here requires solely the addition of linear polarizers to a diffractive network without changing its architecture.Such polarizers are readily available (e.g., polarizing films), even integrated with the individual pixels of polarizationbased imaging systems 60 , and can be adapted to a wide range of wavelengths.Furthermore, polarization multiplexing can be flexibly coupled with other multiplexing methods (such as spectral and/or spatial multiplexing) to further increase the computing capacity of the diffractive network.
Unlike the diffractive layers, where the transmission coefficients are trained and optimized to alloptically perform the target transformations, the design and arrangement of the seed polarizer arrays between the diffractive layers are treated as hyperparameters that are pre-determined and non-trainable.Optimization of the topology of such polarizer seeds within the diffractive volume could further enhance the approximation power of polarization multiplexed diffractive networks, which is left as future work.
We would like to also emphasize that the reported polarization-multiplexed diffractive networks can be directly applied to 2D arrays of phase and amplitude input data.Compared to other optical computing systems operating based on e.g., integrated photonics, which require 1D inputs and phase recovery if the information is represented in the phase channel, the capability to directly process and analyze raw 2D complex fields makes our framework highly advantageous for visual computing tasks.On the other hand, unless spatial light modulators (SLMs) are employed as part of the diffractive system (see e.g., the supplementary information of Ref. 23 for a discussion on reconfigurable networks), each physically fabricated diffractive network is fixed and would need to be retrained and fabricated again as the target transformations change, which is a limitation of passive diffractive systems.
There are additional limitations of the presented diffractive computing framework.First, polarization multiplexed diffractive computing systems present lower diffraction efficiencies at their output FOV compared to regular diffractive networks without polarization multiplexing.This is especially the case for the 4-channel polarization multiplexed system (see Fig. 7e).Several remedies can be used to improve the output diffraction efficiency such as e.g., adding a diffractionefficiency-related penalty term to the training loss function, and/or restricting the diffractive layers to perform phase-only modulation.The efficacy of using these approaches in a regular diffractive network design (without polarization multiplexing) to improve the output diffraction efficiency has already been demonstrated in our earlier work 69 .To exemplify the performance of a phaseonly diffractive design and how it can be used to improve the output diffraction efficiency, we trained phase-only diffractive networks from scratch for the 4-channel polarization multiplexing case (  = 4) , the results of which are summarized in Fig. S6.This analysis revealed that phaseonly diffractive designs can achieve significantly better output diffraction efficiencies (improved on average by ~12dB), while still successfully approximating the target linear transformations (  ,   ,   and   ).As a trade-off, however, these phase-only diffractive designs also exhibit reduced degrees of freedom compared to their complex-valued counterparts.As a result of this, we observed that all the target linear transformations were successfully approximated by a single phase-only diffractive processor when  approached 2      = 8    .This 2-fold "threshold increase" in the number of diffractive features (i.e., 2      vs.       ) is a direct reflection of the reduced number of trainable transmission parameters per diffractive layer due to the phaseonly operation, which is a limitation of phase-only diffractive networks, despite their enhanced output diffraction efficiency.To further validate this conclusion, we also selected another set of 4 target linear transformations by changing the matrix elements to be real-valued, and used them as ground truth to train phase-only polarization multiplexed diffractive networks with   = 4.As shown in Fig. S7, our results reveal that these phase-only diffractive networks can successfully approximate the real-valued target linear transforms when  ≥       = 4    , demonstrating a similar approximation performance, with significantly higher output diffraction efficiency compared to their complex-valued diffractive counterparts.These findings emphasize the value of phase-only diffractive network designs as a photon-efficient solution in polarization multiplexed diffractive computing, also providing an important rationale for planning the diffractive neuron budget () for a given computational task.
Another practical concern that needs to be discussed is the potential fabrication and alignment errors, surface reflections and material absorption within the diffractive network, which may altogether limit the performance and accuracy of diffractive computing.In our previous work, 36 we demonstrated that the performance degradation of a diffractive network caused by some of these experimental factors can be compensated by incorporating them as random variables into the physical forward model of the diffractive network during the training process.Some of these errors can also be mitigated by selecting appropriate fabrication methods, e.g., high-precision lithography, and using less absorptive materials.Moreover, our previous results 23,38,44,49,50 showed that these uncontrolled physical errors and imperfections did not lead to a significant discrepancy between the experimental and numerical, expected results, indicating the correctness of the assumptions involved in our optical forward model and training procedures.
In conclusion, we introduced a diffractive network-based all-optical computing framework that can perform multiple complex-valued, arbitrary linear transformations using polarization multiplexing.This framework is very compact; for instance, the system depicted in Fig. 1 has a total length of only 20λ in depth, where λ is the illumination wavelength.Our results show that when the number of diffraction elements/neurons,  , in a given diffractive network design approaches       , a group of   arbitrarily-selected linear transforms can be all-optically computed at the output FOV of the network with negligible error.We believe that this polarization multiplexed diffractive computing framework can be used to build all-optical, passive processors that can execute multiple inference tasks in parallel.We further envision that artificially engineered materials with polarization manipulation capabilities (e.g., photonic crystals [70][71][72] and metamaterials 73,74 ) can also be combined with advanced diffractive surface fabrication techniques (e.g., high-precision 3D additive manufacturing and photolithography) to allow the use of our diffractive computing framework in different parts of the electromagnetic spectrum.

Materials and Methods
Forward model of the polarization multiplexed diffractive optical network.Using Jones calculus 75 , the complex-valued, polarization-multiplexed electrical field  at a spatial location (  ,   ,   ) can be represented as: .
In our implementation,  x and  y are computed in parallel throughout the entire diffractive system.Since the trainable diffractive layers are not polarization-sensitive, the complex-valued modulation generated by these thin diffractive layers is the same for the two orthogonal polarization states.
The diffractive layers are assumed to be thin optical modulation elements, where the  th feature on the  th diffractive layer at location (  ,   ,   ) represents a complex-valued transmission coefficient,   , given by:   (  ,   ,   ) =   (  ,   ,   )exp(  (  ,   ,   )) In Eq. 2,  and  denote the amplitude and phase coefficients, respectively.The amplitude and phase coefficients of the diffractive neurons,   and   ( ∈ {1, 2, ⋯ , }), are both trainable, with a permitted range of 0 to 1 and 0 to 2π, respectively.Before the training starts,   and   are randomly initialized with a uniform () distribution of [0, 1] and [0, 2π), respectively.For a phase-only diffractive design   = 1.The size of each diffractive neuron on the transmissive layers and the width of the pixels of the input/output fields are both chosen as /2.
The diffractive layers are connected to each other by free-space wave propagation, which is modeled through the Rayleigh-Sommerfeld diffraction equation 23,32 : , where    (, , , ) is the complex-valued field on the  th neuron of the  th layer at (, , ) with a wavelength of , which can be viewed as a secondary wave generated from the source at (  ,   ,   ); and  = √( −   ) 2 + ( −   ) 2 + ( −   ) 2 and  = √−1.For the  th layer ( ≥ 1, treating the input plane as the 0 th layer), the modulated optical field    at location (  ,   ,   ) with a polarization state of  ( ∈ {x, y}) is given by: where  denotes all the pixels on the previous diffractive layer.For all the diffractive networks trained in this paper, the axial distances  0 ,  1 , . . .,   are all chosen as 4.
When modeling the polarizer elements in our diffractive system, we used Jones matrices to represent the modulation of the complex field brought by the input polarizer, output analyzer, or the polarizer array at location (, , ), the process of which can be written as: out (, , ) =  linear (, , ) in (, , ) where  in and  out are the vectors denoting the input and output complex field before and after the polarization modulation, each containing two orthogonal components along the x and y directions, i.e.,  out (, , ) = [  out,x (, , )  out,y (, , ) ] and  in (, , ) = [  in,x (, , )  in,y (, , ) ].
linear (, , ) represents the Jones matrix of a linear polarizer element, which is given by:  linear (, , ) = [ cos 2 (, , ) cos(, , ) sin(, , ) sin(, , ) cos(, , ) sin 2 (, , ) ] ( 6), where (, , ) is the angle between the x-axis and the polarizing axis of the linear polarizer located at (, , ).For the non-trainable, pre-determined polarizer array that is composed of multiple square-shaped linear polarizers, we used in total 4 types of linear polarizer units with 4 different polarizing axis directions,  ={0, 0.25π, 0.5π, and 0.75π}.As illustrated in Fig. 1a, these 4 different types of linear polarizers are spatially binned to have a 2×2 period and repeated with 3 periods in each direction, extending into a square region.The side length of each linear polarizer array is 24.The residual space surrounding the polarizer array is filled with air, without any polarization modulation.For all the diffractive network designs presented in this paper, the axial distances (i.e.,   ,  1 and  2 ) between the pre-determined polarizer arrays and the adjacent diffractive layers in front of them are all empirically chosen as 0; stated differently, each linear polarizer array is attached to the isotropic diffractive layer in front of it.

Training loss function.
For training of our diffractive networks, we used the mean-squared-error (MSE) loss function, which is defined as: , where [•] denotes the average across the current batch,  stands for the c th polarization channel that is being accessed, and [] indexes the n th element of the vector.  and   ′ are the coefficients used to normalize the energy of the ground truth (target) field   and the diffractive-network output field   ′ , respectively, which are given by: =1 (9).
During the training of the diffractive networks using the SeqPA mode, each polarization channel of the diffractive network is accessed and evaluated cyclically based on the order of the channel number.For instance, for the 2-channel polarization multiplexed design illustrated in Fig. 1b, left, the access sequence during the training is set to be {①, ②, ①, ②, …}; for the 4-channel polarization multiplexed design illustrated in Fig. 6, the access sequence is {①, ②, ③, ④, ①, ②, ③, ④, …}.During the access of a certain polarization channel, the diffractive network is fed with one batch of the training input/output complex fields corresponding to the transformation matrix assigned to this channel, and then trained based on the average loss across this batch.Thus, the loss function for training the diffractive designs through the c th polarization channel using the SeqPA mode, ℒ Seq, , can be simply written as: ℒ Seq, = ℒ MSE, (10).
During the training of the diffractive networks using the SimPA mode, as illustrated in Fig. 1b, right, all the polarization channels of the diffractive network are accessed simultaneously, and the training data are fed into the channels at the same time.For this SimPA mode, the diffractive network is trained based on the loss averaged across the different polarization channels and complex-valued fields in the current batch, where the loss function ℒ Sim can be written as: =1 (11) .
Performance metrics used for the quantification of all-optical transformation errors.To quantitatively evaluate the transformation results of the polarization-multiplexed diffractive networks, four performance metrics were calculated per polarization channel of the diffractive designs using the testing data set: (1) the normalized transformation mean-squared error ( Transformation ), (2) the cosine similarity () between the all-optical transforms and the target transforms, (3) the normalized mean-squared error between the diffractive network output fields and their ground truth ( Output ) , and (4) the output diffraction efficiency ( ) .The transformation error for the c th polarization channel of the diffractive network,  Transformation, , is defined as: , where   is the vectorized version of the ground truth transformation matrix assigned to the c th polarization channel   , i.e.,   = vec(  ).  ′ are the vectorized version of   ′ , which is the alloptical transformation matrix computed using the optimized diffractive transmission coefficients.  is a scalar normalization coefficient used to eliminate the effect of diffraction-efficiency related scaling mismatch between   and   ′ , i.e., =1 (13).
The cosine similarity between the all-optical transform and their target transform for the c th polarization channel,   , is defined as: (14).
The normalized mean-squared error between the diffractive network outputs and their ground truth for the c th polarization channel,  O, , is defined using the same formula as in Eq. 7 (the loss function used during the training process), except for that [•] is calculated across the entire testing set.
The mean diffraction efficiency   for the c th polarization channel of the diffractive system is defined as: .
Training-related details.All the diffractive optical networks used in this work were simulated and trained using Python (v3.8.11) and TensorFlow (v2.6.0,Google Inc.).We selected Adam optimizer 76 for training all the models, and its parameters were taken as the default values in TensorFlow and kept identical in each model.The batch size and learning rate were set as 8 and 0.001, respectively.The training of the diffractive network models using the SimPA mode was performed with 50 epochs.For training the diffractive models using the SeqPA mode, the 2channel and 4-channel polarization multiplexed designs were trained for 100 and 200 epochs, respectively, so that equivalently 50 epochs are dedicated for training each polarization channel of these designs.The best models were selected based on the MSE loss calculated on the validation data set.For the training of our diffractive models, we used a desktop computer with a GeForce GTX 1080Ti graphical processing unit (GPU, Nvidia Inc.) and Intel® Core TM i7-8700 central processing unit (CPU, Intel Inc.) and 64 GB of RAM, running Windows 10 operating system (Microsoft Inc.).The typical time to train a diffractive network model using the SeqPA mode with 2 and 4 polarization channels is ~7 and ~14 hours, respectively.The training time for a diffractive model using the SimPA mode with 2 polarization channels is ~4 hours.) resulting from the trained diffractive networks as a function of the number of diffractive neurons .The solid curves are achieved by the polarization multiplexed diffractive systems trained using the SimPA mode, which are compared with the dashed curves achieved by the regular diffractive networks trained with the same set of  but without any polarization multiplexing.For the polarization multiplexed models, the results for the two polarization channels ① and ②, corresponding to transforms  1 ′ and  2 ′ , are shown in separate curves that are labeled with "SimPA ①" and "SimPA ②", respectively.For the regular models without polarization multiplexing, the results for all-optical approximation of   and   are shown in separate curves labeled with "No pol.  " and "No pol.  ", respectively.The space between the simulation data points is linearly interpolated.b, Same as (a) but the cosine similarity between the all-optical transforms and their ground truth is reported.c, Same as (a) but the mean-squared error between the diffractive network output fields and their ground truth is reported.d, Diffraction efficiency of the presented diffractive networks.6).a, Amplitude and phase of the arbitrarily generated matrices   ,   ,   and   , which serve as the ground truth (target) for the diffractive all-optical transformations.b, Curves representing the normalized meansquared error between the ground truth transformation matrices (  ,   ,   and   ) and the all-optical transforms ( 1 ′ ,  2 ′ ,  3 ′ and  4 ′ , examples shown in Fig. S4) resulting from the trained diffractive networks as a function of the number of diffractive neurons .The solid curves are achieved by the 4-channel polarization multiplexed diffractive systems, which are compared with the dashed curves achieved by the regular diffractive networks trained with the same set of  but without polarization multiplexing.For the polarization multiplexed models, the results for the four polarization channels ①, ②, ③ and ④ are shown in separate curves but jointly labeled with "Pol.①/②/③/④" due to the spatial overlap of these curves.For the regular diffractive models without polarization multiplexing, the results for all-optical approximation of   ,   ,   and   (individually) are shown in separate curves but jointly labeled with "No pol.  /  /  /  " due to the spatial overlap of these curves.The space between the simulation data points is linearly interpolated.c, Same as (b) but cosine similarity between the all-optical transforms and their ground truth is reported.d, Same as (b) but the mean-squared error between the diffractive network output fields and their ground truth is reported.e, Diffraction efficiency of the presented diffractive networks.

Figure 2 .
Figure 2. Diffractive all-optical transformation results for 2-channel polarization multiplexing using the sequential polarization access (SeqPA) mode.a, Amplitude and phase of the arbitrarily generated matrices   and   , which serve as the ground truth (target) for the diffractive all-optical transformations.b, Curves representing the normalized mean-squared error between the ground truth transformation matrices (  and   ) and the all-optical transforms ( 1 ′ and  2 ′ ) resulting from the trained diffractive networks as a function of the number of diffractive neurons .The solid curves are achieved by the polarization multiplexed diffractive networks trained using the SeqPA mode, which are compared with the dashed curves achieved by the regular diffractive networks trained with the same set of  but without any polarization multiplexing.For the polarization multiplexed models, the results for the two polarization channels ① and ②, corresponding to transforms  1 ′ and  2 ′ , are shown in separate curves that are labeled with "SeqPA

Figure 3 .
Figure 3. All-optical transformation matrices estimated by the 2-channel polarization multiplexed diffractive designs trained using the SeqPA mode with N = 44 2 , 92 2 and 180 2 , and their differences from the ground truth matrices.

Figure 6 .
Figure 6.Schematic of 4-channel polarization multiplexed all-optical diffractive computing framework for performing four unique linear transformations through a single diffractive network.a, Optical layout of the polarization-encoded diffractive network, where eight trained diffractive layers and two arrays of linear polarizers are jointly used to perform four distinct, complex-valued linear transformations between the input field  and the output field  by using polarization encoding/decoding at the input/output FOVs.b, Schematic for the operation of the 4channel polarization multiplexed all-optical computing framework, where the four polarization channels, ① , ② , ③ and ④ , are formed by sequentially connecting one of the two input polarization states with one of the two output polarization states.

Figure 7 .
Figure 7. Diffractive all-optical transformation results for 4-channel polarization multiplexing of four distinct arbitrary linear transforms (depicted in Fig.6).a, Amplitude and phase of the arbitrarily generated matrices   ,   ,   and   , which serve as the ground truth (target) for the diffractive all-optical transformations.b, Curves representing the normalized meansquared error between the ground truth transformation matrices (  ,   ,   and   ) and the all-