Terahertz pulse shaping using diffractive surfaces

Recent advances in deep learning have been providing non-intuitive solutions to various inverse problems in optics. At the intersection of machine learning and optics, diffractive networks merge wave-optics with deep learning to design task-specific elements to all-optically perform various tasks such as object classification and machine vision. Here, we present a diffractive network, which is used to shape an arbitrary broadband pulse into a desired optical waveform, forming a compact and passive pulse engineering system. We demonstrate the synthesis of various different pulses by designing diffractive layers that collectively engineer the temporal waveform of an input terahertz pulse. Our results demonstrate direct pulse shaping in terahertz spectrum, where the amplitude and phase of the input wavelengths are independently controlled through a passive diffractive device, without the need for an external pump. Furthermore, a physical transfer learning approach is presented to illustrate pulse-width tunability by replacing part of an existing network with newly trained diffractive layers, demonstrating its modularity. This learning-based diffractive pulse engineering framework can find broad applications in e.g., communications, ultra-fast imaging and spectroscopy. Diffractive networks have recently been discussed as an all-optical analogue for performing neural network operations. The authors present a method using deep learning-designed 3D-printed diffractive surfaces to engineer temporal waveforms and perform pulse shaping in the terahertz regime.

I nspired by neural interactions in human brain 1 , artificial neural networks and deep learning have been transformative in many fields, providing solutions to a variety of data processing problems, including for example image recognition 2 , natural language processing 3 and medical image analysis 4 . Datadriven training of deep neural networks has set the state-of-theart performance for various applications in e.g., optical microscopy 4-10 , holography [11][12][13][14][15][16] , and sensing [17][18][19][20] , among others. Beyond these applications, deep learning has also been utilized to solve inverse physical design problems arising in e.g., nanophotonics and plasmonics [21][22][23][24] . These advances cover a wide range of engineering applications and have motivated the development of new optical computing architectures [25][26][27][28][29][30][31] that aim to benefit from the low-latency, power-efficiency and parallelization capabilities of optics in the design of machine learning hardware. For example, Diffractive Deep Neural Networks (D 2 NN) 32 have been introduced as an optical machine learning framework that uses deep learning methods, e.g., stochastic gradient-descent and error-backpropagation, to train a set of diffractive layers for computing a given machine learning task as the light propagates through these layers. Early studies conducted on this framework showed its statistical inference capabilities, achieving >98% numerical blind testing 33,34 accuracy for the classification of the images of handwritten digits. Recently, the D 2 NN framework has also been extended to harness broadband radiation in order to design spatially-controlled wavelength de-multiplexing systems 24 ; however, this former work did not engineer the spectral phase values at different frequencies of the input radiation and therefore did not report any temporal wave control or pulse shaping.
In parallel to these recent advances at the intersection of optics and machine learning, there has been major progress in optical pulse shaping, including pulse compression for optical telecommunication 35 and pulse stretching for chirped pulse amplification 36 . Dynamic, customizable temporal waveform synthesis has been achieved using time [37][38][39] or frequency domain [40][41][42] modulation. Among different approaches, the Fourier-transform based configuration 43 , which relies on conventional optical components such as lenses to establish a mapping between the pixels of an optical modulation device and the spectral components of the input broadband light, is one of the most commonly employed techniques. In various forms of its implementation, the optical modulation device placed at the Fourier plane in between two gratings can be a dynamic component e.g., a spatial light modulator [44][45][46][47] , an acousto-optic modulator 48,49 , a movable mirror 50 or even a metasurface 51 , offering engineered dispersion and wavefront manipulation, tailored for different applications.
However, these earlier pulse shaping techniques have restricted utility at some parts of the electromagnetic spectrum, such as the terahertz band, due to the lack of advanced optical components that can provide spatio-temporal modulation and control of complex wavefronts, covering both a broad bandwidth and a high spectral resolution at these frequencies 52,53 . As a result, direct shaping of terahertz pulses by independent control of the spectral amplitude and phase of the input wavelengths has not been achieved to date; instead, the synthesis of terahertz pulses has been generally performed indirectly through the engineering of the optical-to-terahertz converters or shaping of the optical pulses that pump terahertz sources [54][55][56][57][58] . Previous work also demonstrated an active device using an external pump-induced inhomogeneous medium to shape input terahertz pulses 59 .
Here, we demonstrate the use of diffractive networks designed by deep learning to all-optically shape pulses by simultaneously controlling the relative phase and amplitude of each spectral component across a continuous and wide range of frequencies using only trainable diffractive layers, forming a small footprint, compact and passive pulse engineering system. This framework uses a deep learning-based physical design strategy to devise taskspecific diffractive systems that can shape various temporal waveforms of interest. Following the digital training stage in a computer, we fabricated the resulting diffractive layers (Fig. 1) and experimentally demonstrated the success of our pulse shaping diffractive networks by generating pulses with various temporal widths using a broadband terahertz pulse as input. Despite using passive diffractive layers, the presented pulse shaping networks offer temporal pulse-width tunability that is experimentally demonstrated by varying the inter-layer distances within a fabricated diffractive network. We also investigated a physical transfer learning approach to show the modularity of the design space provided by our framework. In addition to engineering terahertz pulses, the fundamental design approach that is presented here can be readily adapted to different parts of the electromagnetic spectrum for shaping pulses. We believe that this study extends the engineering and precise control of electromagnetic fields through deep learning-designed diffractive networks into time-domain shaping of pulses, further motivating the development of all-optical machine learning and information processing platforms that can better harness the 4D spatiotemporal information carried by light.

Results
Synthesis of arbitrary temporal waveforms. Synthesis of arbitrary temporal waveforms through small footprint and compact systems has been of great interest for various applications in e.g., tele-communications, ultra-fast imaging and spectroscopy, and it represents a challenging inverse design problem. Specifically, it requires accurate control of the complex-valued weights of the spectral components across a wide bandwidth and with high spectral resolution. We addressed this challenging inverse design problem through the training of diffractive networks as shown in Fig. 1c. The forward training model of our diffractive networks formulates the broadband light propagation using the angular spectrum representation of optical waves 24 . Based on the complex dispersion information of a diffractive material, the thickness of each diffractive feature (i.e., 'neuron') of a given diffractive layer is iteratively trained and optimized through the errorbackpropagation with respect to a target cost function (see the "Methods" section). After the convergence of this deep learningbased training in a computer, we fabricated the resulting diffractive layers (Fig. 1c) using a 3D-printer to physically form our pulse shaping network as shown in Fig. 1a. This diffractive network was then experimentally tested for its desired/targeted pulse shaping capability using a terahertz time-domain spectroscopy (THz-TDS) setup 60 that provides a noise equivalent bandwidth of 0.1-5 THz (Fig. 1b, d).
Each one of our pulse shaping diffractive networks consists of 4 trained layers that process the input terahertz pulse to synthesize a desired temporal waveform over an output aperture of 0.2 cm × 0.2 cm. Based on this system layout and a given input pulse profile to be shaped (Fig. 2b), we trained and fabricated diffractive networks that generate square pulses with different temporal widths. For example, Fig. 2a demonstrates the diffractive layers of a pulse shaping network that was trained to generate a 15.5 ps square pulse by processing the spectrum carried by the input terahertz pulse. Figure 2c demonstrates the time-domain amplitude of the output waveform numerically computed (blue) based on the trained diffractive layers and the corresponding experimentally measured temporal waveform (orange), along with the associated spectral amplitude and phase distributions. The carrier frequency of the desired temporal waveform at the output was a non-learnable, predetermined parameter set to be 0.35 THz to avoid water absorption bands in the terahertz regime (depicted by the red arrows in Fig. 2b). The numerically predicted output waveform (blue) in Fig. 2c indicates that a 4-layer diffractive network can synthesize a square temporal waveform with a pulse width of 15.69 ps without using any conventional optical components, in a compact architecture that spans approximately 250-times the carrier wavelength in the axial direction. The pulse width of the temporal waveform created by the 3D printed diffractive layers at the output aperture is measured as 15.52 ps, closely matching the numerically predicted result (15.69 ps). Similarly, a comparison of the output spectral amplitude profiles for the numerical and experimental results shows a good agreement in terms of the peak locations of the main and side lobes, as well as the relative amplitude carried by each spectral component. On the other hand, an examination of the unwrapped phase profiles (experimental vs. numerical) reveals that the 3D-fabricated, physical diffractive network could not exactly create the sharp phase transitions at the expected spectral locations, but rather generated smoothened transitions. This smoothening contributes to some of the differences observed between the experimentally measured and the numerically calculated time-domain waveforms (Fig. 2c). The power efficiency of this diffractive network was experimentally measured as 0.51% at the carrier frequency (f 0 = 355 GHz), quantified at the output aperture, normalized with respect to the input; here we should emphasize that >70% of the input optical power at the carrier frequency is in fact lost due to absorption within the 3D printed diffractive layers. Therefore, to create our diffractive layers, the selection of a different fabrication material with a much lower loss (e.g., polymers such as poly-methylpentene: TPX) [61][62][63] can significantly boost the overall efficiency of these diffractive pulse shaping networks. Other strategies to improve the power efficiency include increasing the output aperture size and introducing additional power-related penalty terms during the training phase of the diffractive network (see the "Discussion" section).
Supplementary Fig. 1 further illustrates another diffractive network that was designed to create a narrower square pulse at its output aperture. At the end of its deep learning-based training, the numerical forward model converged to the thickness profiles shown in Supplementary Fig. 1a in order to synthesize a 10.96 ps square pulse (blue) illustrated in Supplementary Fig. 1c. When the diffractive layers depicted in Supplementary Fig. 1a were 3D printed and experimentally tested using the setup shown in Fig. 1d, the output pulse waveform was measured to have a temporal width of 11.85 ps (orange curve in Supplementary  Fig. 1c), providing a good match to our numerical results, similar to the conclusions reported in Fig. 2.
Beyond fabrication artefacts and misalignments observed in the 3D-printed diffractive networks, the variation of the input terahertz pulse from experiment to experiment is one of the significant contributors for any mismatch between the numerical and experimental output waveforms. The deep learning-based design of the diffractive networks shown in Fig. 2 and Supplementary Fig. 1 relies on a known input terahertz pulse profile that is experimentally measured over the input aperture.
To be able to take into account uncontrolled variations of the input pulse profile from run to run, we used 5 different experimentally measured input pulse profiles (dashed curves in their numerically computed counterparts for the same diffractive network models (also see Supplementary Fig. 3 and Supplementary Fig. 4).
To shed more light onto this, next we normalized the experimentally measured spectral amplitude profiles depicted in Fig. 2c and Supplementary Fig. 1c, based on the ratio between the average spectral amplitudes carried by the input pulses used in the training phase and the input pulse measured at the experimental testing phase. This simple spectral normalization procedure nullifies the effect of input terahertz source variations from experiment to experiment and provides us an opportunity to better evaluate the accuracy of the complex-valued spectral filtering operation performed by the 3D-fabricated diffractive network. Supplementary Figs. 2c and 2d demonstrate the experimental spectral amplitudes and the corresponding temporal waveforms at the network output before and after this spectral normalization step for the diffractive networks shown in Fig. 2a and Supplementary Fig. 1a, respectively. Following the spectral normalization, the width of the square pulse created by the diffractive network in Supplementary Fig. 1a, for example, decreased from 11.85 ps to 10.49 ps, providing a better match to the 11.07 ps that is predicted by our numerical forward model ( Supplementary Fig. 2d). A similar improvement using spectral normalization was also observed for the diffractive network shown in Fig. 2a, almost perfectly matching its numerical counterpart in terms of the square pulse width, achieving 15.71 ps after the normalization step ( Supplementary Fig. 2c).
These results highlight that experiment-to-experiment variability of our input terahertz pulse profile causes it to deviate from the input pulse profiles used in the training phase of our diffractive network, creating some uncontrolled errors in the output pulse profile, which can be improved significantly after the spectral normalization step, as discussed above. To further explore the pulse shaping capabilities of diffractive networks, next we trained a set of generic diffractive networks that used/ assumed a flat input spectrum during their training in order to achieve a desired output waveform; stated differently, a generic diffractive network is trained using an input pulse where all the wavelengths have the same spectral amplitude and phase. To accurately demonstrate the pulse shaping behavior of these generic diffractive designs that were trained with flat spectra, we used spectral normalization based on the input pulse profile, experimentally measured at each run. For example, Supplementary Fig. 5a and 6 show the diffractive layers of a generic pulse shaping network model that was trained to create a 15.5 ps square pulse. Supplementary Fig. 5c reports the time-domain amplitude of the output waveform numerically computed (blue) based on these trained diffractive layers and the experimentally measured temporal waveform (orange) along with the corresponding spectral amplitude and phase distributions. The synthesized pulse shape by the 3D-printed diffractive network closely matches the numerically computed waveform using our forward model, despite the water absorption bands that appear in our experimental results, illustrated by the red arrows in Supplementary  Fig. 5b. The power efficiency at the carrier frequency (f 0 = 400 GHz) of this diffractive network was experimentally measured as 0.97%. Figure 3 further demonstrates three additional generic pulse shaping diffractive network models that were trained with a flat input spectrum and experimentally tested using our terahertz setup to achieve different square pulses, with pulse widths of 11.25 ps, 13.45 ps, and 16.69 ps, respectively, demonstrating a very good match to their numerical counterparts. The numerically computed peak frequencies for these three different diffractive networks were 399.4 GHz, 396.1 GHz, and 399.4 GHz, which were measured experimentally as 399.1 GHz, 402.2 GHz, and 401.8 GHz, respectively. As we move towards higher optical frequencies beyond 0.6 THz, the experimental spectral amplitude distributions start to deviate from their numerically predicted counterparts. Considering that the maximum material thickness in our model is~1 mm, at higher optical frequencies corresponding to wavelengths below~0.5 mm, the light may travel more than 2 wavelengths inside a diffractive feature (depending on the final trained model) which will then violate the thin modulation layer assumption in our forward model contributing to some of the experimental errors observed in Fig. 3. In addition, the size of each diffractive feature corresponding to a unique complex-valued modulation per neuron (see "Methods" section) was chosen to be 0.5 mm due to the limited lateral resolution of our 3D printer. Therefore, for higher frequencies, the light fields are modulated at each diffractive layer with 2D functions sampled at lower spatial rates, which, in return, partially limits the design capabilities of our diffractive networks at those smaller wavelengths of the pulse bandwidth. Furthermore, the uneven surface profile in 3D printing combined with thickness variations induced by fabrication imperfections contribute to some additional sources of experimental errors observed in our results.
To further demonstrate the design capabilities of our diffractive pulse shaping framework, in addition to the square pulses with various temporal widths reported earlier, we also trained three new diffractive network models that were designed to output (1) a chirped-Gaussian pulse ( Supplementary Fig. 7), (2) a sequence of positive and negative chirped Gaussian pulses, one following another ( Supplementary Fig. 8), and (3) a sequence of two chirpfree Gaussian pulses ( Supplementary Fig. 9). These results report a very good match, both in time and spectral domains, between the target, ground-truth pulse profiles and the corresponding output pulses synthesized by the trained diffractive networks, clearly demonstrating the versatile nature of the presented framework to synthesize arbitrary pulses, engineered through the deep learning-based design of diffractive surfaces.
Pulse width tunability. Next, we demonstrated the temporal width tunability of pulse shaping diffractive networks despite the passive nature of their layers. By changing the axial distance between successive diffractive layers by ΔZ, the temporal width and the peak frequency of the output waveform can be tuned without any further training or a change to the 3D printed diffractive layers. We demonstrated this pulse-width tunability using the 3D printed diffractive network depicted in Supplementary  Fig. 5, but a similar tunability also applies to the network models shown in Fig. 3. Since our diffractive networks used 30 mm layerto-layer distance in their design, we considered the ΔZ range to be between −10 mm to 20 mm; for instance, when ΔZ is taken as −10 mm, the axial distance between all the successive layers of the diffractive network is set to be 20 mm. Within this axial tuning range, Fig. 4a-h demonstrate the effect of changing this layer-to-layer distance of an already designed/trained diffractive network on the output waveform and its complex-valued spectrum. The results reveal that as the diffractive layers get closer to each other axially, i.e., a negative ΔZ, the pulse-width of the output waveform increases and the peak frequency decreases. For instance, when the axial distance between each diffractive layer of the design shown in Supplementary Fig. 5 is decreased by 5 mm (ΔZ = −5 mm) as shown in Fig. 4d, the peak of the spectral amplitude distribution shifts from 399.4 GHz to 349.1 GHz according to our numerical forward model. The pulse-width of the resulting square pulse at the output aperture was numerically found to be 17.59 ps suggesting a longer pulse compared to 15.56 ps synthesized by the original design, ΔZ = 0 mm (Fig. 4d). The experimentally measured pulse width with the same amount of axial change in the layer-to-layer distance of the diffractive network revealed a 17.56 ps pulse after the spectral normalization step, confirming the tunability of our pulse shaping diffractive network and also providing a very good match to our numerical results (Fig. 4).
When the layer-to-layer distance is increased, i.e., a positive ΔZ, the output square pulse gets narrower in time domain with an accompanying shift in the peak frequency toward higher values. Figure 4e demonstrates an example of this case with ΔZ = 5 mm, i.e., the distance between each diffractive layer is increased to 35 mm. In this case, the experimentally measured and numerically computed square pulses at the output plane have peak frequencies of 451.4 GHz and 453.1 GHz, with the corresponding pulse-widths of 14.3 ps and 13.97 ps, respectively, once again confirming the tunability of our pulse shaping diffractive networks and demonstrating a very good agreement between the numerical forward model and our experiments. As we further increase ΔZ beyond 10 mm (depicted in Fig. 4f), the time domain pulse continues to get narrower.
Modularity of diffractive pulse shaping network. To further explore methods to alter a given fabricated diffractive network and its output function, next we employed a Lego-like physical transfer learning approach to demonstrate pulse-width tunability by updating only part of a pre-trained network with newly trained and fabricated diffractive layers, showing the modularity of a diffractive pulse shaping network. For this aim, we took the pretrained network that experimentally synthesized a 15.57 ps square waveform, noted as the original design in Fig. 5a, and further trained only the last diffractive layer to synthesize a new desired output waveform, i.e., a 12.03 ps square pulse, by keeping the first three layers as they are (already fabricated). We experimentally validated this transfer learning approach as shown in Fig. 5b by removing the existing last diffractive layer and inserting a newly trained layer, fabricated using the same 3D printer. Numerical and experimental results revealed very good match to each other for the normalized output spectral amplitude over a wide frequency range, as well as for the normalized output field waveform, generating pulse-widths of 12.21 ps and 13.25 ps, respectively. Next we took an alternative approach: this time, the last two diffractive layers were replaced with new diffractive layers trained to generate 12.03 ps square pulses. As illustrated in Fig. 5c, with the addition of these two new diffractive layers to the already existing first two layers, the resulting new diffractive network successfully demonstrated the synthesis of 12.14 ps and 12.39 ps waveforms at the output aperture for the numerical and experimental waveforms, respectively. The peak frequency of the

Discussion
Our results reported in earlier sub-sections demonstrate direct pulse shaping in terahertz part of the spectrum, where a complex-valued spectral modulation function that is trained using deep learning directly acts on terahertz frequencies through a passive diffractive device, without the need for an external pump. The presented learning-based approach can shape any input terahertz pulse through diffraction and is fundamentally different from previous approaches that indirectly synthesize a desired terahertz pulse through optical-toterahertz converters or shaping of the optical pump that interacts with terahertz sources. This capability of direct pulse shaping in terahertz band enables new opportunities that could not be explored with indirect pulse shaping approaches. For example, precise engineering and synthesis of terahertz pulses with the state-of-the-art methods is either not possible or very hard and costly to achieve, including e.g., pulsed terahertz generation through quantum cascade lasers [64][65][66] , solid-state circuits 67,68 , and particle accelerators 69 . Furthermore, the presented deep learning-based framework is quite flexible and versatile that can be used to engineer terahertz pulses regardless of their polarization state, beam shape, beam quality or aberrations of the specific terahertz generation mechanism.
The intrinsic pulse-width tunability of a given diffractive network that is achieved by changing the axial layer-to-layer distance is an interesting feature that we demonstrated numerically and experimentally: Fig. 4a shows various pulse-widths obtained at seven different layer-to-layer distances using an existing network design. As the layer-to-layer distance of a diffractive network design increases, the temporal pulse-width at the output aperture gets smaller, without any further training or fabrication of new diffractive layers. This opens up the opportunity to synthesize new waveforms within a certain time window around the originally designed output pulse. In addition to that, an axial distance change between the existing layers of a diffractive network also shifts the center frequency of the output pulse as shown Fig. 4b. As the diffractive layers get closer to each other, we observed a red-shift in the center frequency. Another related aspect of this pulse shaping diffractive framework is its modularity to tune the output pulses using a physical transfer learning approach. By training a new layer (or layers) to replace part of an existing, pre-trained diffractive network model, on demand synthesis of new pulses can be achieved, as demonstrated in Fig. 5b-  how they can adapt to potential changes in the desired output pulse patterns. The presented pulse shaping framework has a compact design, with an axial length of approximately 250 × λ 0 , where λ 0 denotes the peak wavelength. Moreover, it does not utilize any conventional optical components such as spatial light modulators, which makes it ideal for pulse shaping in terahertz part of the spectrum, where high-resolution spatio-temporal modulation and control of complex wavefronts over a broad bandwidth represent a significant challenge. In addition to being compact and much simpler compared to previous demonstrations of pulse shaping in terahertz spectrum, our results present the implementation of direct pulse shaping in terahertz band, where the learned complex-valued spectral modulation function of the diffractive network directly acts on terahertz frequencies for pulse engineering. This capability enables new opportunities: when merged with appropriate fabrication methods and materials, the presented pulse shaping approach can be used to directly engineer terahertz pulses generated through quantum cascade lasers, solidstate circuits and particle accelerators. Another major advantage of this deep learning-based approach is that it is versatile and can be easily adapted to engineer terahertz pulses irrespective of their polarization state, beam quality, as well as spectral/spatial aberrations.
The experimentally measured power efficiency values reported in our manuscript are~1%. However, there are various design strategies that can increase power-efficiency in diffractive pulse shaping networks as detailed in table in Fig. 6 (also see the "Methods" section). The diffractive networks reported in Fig. 6 were trained to synthesize 15.5 ps square pulses at their output plane. As one can observe in Fig. 6, the power efficiency values of the resulting diffractive models can be increased by more than an order of magnitude by adjusting the training loss function, increasing the output aperture size and using low absorption materials. For example, as reported in the second column of table in Fig. 6, when the material absorption is ignored during the testing of a diffractive network model, a 2-fold wider output aperture (i.e., 4 mm) provides a significant improvement in the power efficiency of the pulse shaping networks, reaching 60.37 and 61% for two different network models. On the other hand, if the absorption of our 3D-printing material is taken into account as part of the optical forward model, one can reach an efficiency value of 17.84% by accordingly optimizing the training loss function and using a 4 mm output aperture (see Fig. 6).
By comparing the top and bottom efficiency values for a given training loss function and design strategy reported in Fig. 6, we clearly see that the 3D-printing material used in this work decreases the pulse shaping network efficiency 2-5 times, in different designs, compared to an ideal, non-absorbing optical material. As an alternative fabrication material for diffractive pulse shaping networks, one can consider low-absorption polymers 61-63 used in commercially available components designed for THz wavelengths, such as TPX, which exhibits a two ordersof-magnitude smaller absorption coefficient compared to the 3D printing material used in our work. There have been various fabrication processes developed for such low absorption polymers 70,71 , which can be used to precisely control the thickness of these low-loss polymers with a relatively high-resolution to manufacture pulse shaping diffractive networks with much lower material absorption. To even further improve the output efficiency of pulse shaping diffractive networks, anti-reflective (AR) coatings over diffractive surfaces can also be utilized to reduce back-reflections, similar to the AR-coated commercial lenses and other optical components.
In conclusion, we presented a modular pulse shaping network that synthesizes various pulse waveforms using deep learning.
Precise shaping of the spectral amplitude and phase profile of an arbitrary input pulse over a wide frequency range can be achieved using this platform, which will be transformative for various applications including e.g., communications, pulse compression, ultra-fast imaging and spectroscopy. In addition to direct engineering of terahertz pulses, the presented diffractive pulse shaping network can be utilized in different parts of the electromagnetic spectrum by using appropriate fabrication technologies and materials.

Methods
Terahertz setup. Figure 1 shows the schematic diagram of the terahertz timedomain spectroscopy (THz-TDS) setup that was used to measure the input and output pulse profiles reported in this work. A Ti:sapphire laser (Coherent Mira HP) is used to generate femtosecond optical pulses. The optical beam generated by the laser is split into two parts. One part of the beam is used to pump a high-power plasmonic photoconductive terahertz source to generate terahertz pulses 72 , which are collimated with off-axis parabolic mirrors and guided to a high-sensitivity plasmonic photoconductive terahertz detector 60 . The other part of the beam passes through an optical delay line (Newport IMS300LM) and is focused onto the terahertz detector. As a result, an ultrafast signal which is directly proportional to the incident terahertz field is generated within the terahertz detector. The signal is sampled with a 12.5 fs time-resolution over a 400 ps time-window by changing the time delay between the terahertz and optical probe pulses incident on the detector, amplified with a transimpedance pre-amplifier (Femto DHPCA-100), and acquired with a lock-in amplifier (Zurich Instruments MFLI). For each measurement, 10 time-domain traces are collected and averaged. The described THz-TDS setup provides a 90 dB signal-to-noise ratio over a 5 THz noise-equivalent-power bandwidth.
Each one of the pulse shaping diffractive networks consists of 4 trained layers that are separated by 3 cm as illustrated in Fig. 1. The diffractive layers, input and output apertures, were fabricated using a 3D Printer (Objet30 Pro, Stratasys Ltd.). The fabrication/preparation of each diffractive layer takes approximately 1.5-2 h. A square input aperture (0.8 cm) and an output aperture (0.2 cm) are placed 3 cm from the first diffractive layer and 10 cm from the last diffractive layer, respectively (Fig. 1c). The printed apertures were aluminum coated to prevent any light wave passing through the regions outside of the aperture. After the design and printing of the diffractive layers, they were placed at their corresponding locations inside a 3D printed holder that ensures robust alignment between the layers. During the pulse shaping experiments, the diffractive network was directly placed between the terahertz source and detector, coaxial with the terahertz input pulse emanating from the source (Fig. 1b,d). After the alignment of the diffractive network, the output pulse was measured and it was followed by the measurement of the reference input pulse which was acquired by placing the same terahertz detector at the input aperture, without any diffractive layers between the source and detector. For generic diffractive networks that were trained with flat input spectra, the measured output pulse spectrum is normalized with respect to the measured reference input pulse and its spectral amplitude is smoothened around water absorption lines shown in Figs. 3-5 and Supplementary Fig. 5. The measured pulse width at the network output is defined as the width of the time interval that the envelope of the pulse amplitude is at least 20% of its maximum (see e.g., Figs. 2-5).
Forward model. Our forward model considers the layers of a diffractive network as thin modulation elements that are connected to the next layer through free space propagation. The modulation of neurons at each layer can be modeled as: where M represents the complex transmission/reflection coefficient. The field amplitude, phase, wavelength, and diffractive layer number are denoted by A, ϕ, λ and n, respectively. Free space propagation between each layer is calculated based on the Rayleigh-Sommerfeld formulation of diffraction that models a diffractive feature as source of a secondary wave: and W n i ðx; y; z; λÞ is the secondary wave generated by the i th neuron on n th layer at location (x i , y i , z i ), respectively. Then, we can write the optical field at layer n, at point (x i , y i , z i ) as: Network training. During the training of a pulse shaping diffractive network, one of the 5 pulses measured at the input plane ( Supplementary Figs. 2a-b) were randomly selected as the input pulse at each iteration of the training model for the diffractive networks reported in Fig. 2 and Supplementary Fig. 1; for the generic diffractive network models reported in Figs. 3-5, Supplementary Fig. 5 and Supplementary Figs. 7-9, however, the input is modeled as a spectrally flat Gaussian beam with varying FWHM values over a wide frequency range ( Supplementary  Fig. 10) and with a uniform phase profile. The wave propagation is performed for N = 300 discrete frequencies that were uniformly sampled between 3 GHz and 1 THz. In our wave propagation through the diffractive layers, a 0.5 mm pixel (i.e., diffractive feature) size is assumed based on the lateral resolution of our 3D printer. While a pixel size of 0.5 mm can create all the propagating modes of free-space for frequencies below~300 GHz, they can only excite plane waves over a subset of the k-vectors supported by the free-space for the spectral components between 300 GHz and 1 THz 73 . Therefore, diffractive pulse shaping networks would in general benefit from higher resolution fabrication techniques with better lateral resolution to more accurately control and engineer the complex-valued spectral weights of a given desired pulse.
To calculate the Rayleigh-Sommerfeld integral more accurately, each pixel is oversampled twice so that all 4 elements have the same thickness values in that 2 × 2 grid. The thickness of each pixel, h, is composed of a base height (h base ) of 0.1 mm, which provides adequate mechanical stiffness to the fabricated diffractive layer and a trainable modulation height (h tr ) that is between 0 and 1 mm, i.e., To confine the modulation height between 0 and 1 mm, we defined h tr over an auxiliary training-related variable, h a , using: In its general form, the amplitude and phase modulation of each neuron of a given diffractive layer is a function of the layer thickness, incident wavelength, material extinction coefficient κ(λ) and refractive index n(λ), i.e., A n x; y; z; λ ð Þ¼exp À 2πκ λ ð Þh λ ð6Þ φ n x; y; z; λ ð Þ¼ 2πh n λ ð Þ À n air ð Þ λ ð7Þ The material refractive index n(λ) and the extinction coefficient κ(λ) are defined as the real and imaginary parts of the complex refractive index,ñ λ ð Þ ¼ n λ ð Þ þ jκðλÞ, determined by the dispersion of our 3D fabrication material 24 . Since we have relatively small variations in the extinction coefficient over the frequency band that we utilized in this work, we ignored the material absorption during the training and numerical simulations of diffractive layers.
After the wave propagation through diffractive layers, light goes through the output square aperture of 2 mm width, which is placed right in front of the hemisphere silicon lens which is 1.2 cm in diameter. Since the effective aperture of this Si lens was significantly restricted by the output aperture, it was modeled as a uniform slab with a refractive index of 3.4 and 0.5 cm thickness. After the propagation through the Si slab, the coherent integration of the optical waves incident on the active area of the detector was computed to obtain the spectral field amplitude and phase for each frequency. The power efficiency was defined as η f 0 ¼ I sensor;f 0 I input;f 0 for the peak/center frequency (f 0 ) of given diffractive network design, where I input;f 0 and I sensor;f 0 denote the power within the input and output apertures, respectively.
Our loss function (L) used during the training phase has three components: temporal loss term (L t ) which penalizes the mismatch between the target and the output time waveforms, the power loss term (L p ), and the power surrounding the detector region (L s ), i.e., To calculate the temporal loss, L t , first the output temporal waveform is reconstructed from the spectral field amplitude and phase on the detector area, and it is normalized. Then, the difference between the target temporal waveform and the reconstructed output waveform is integrated over time: where f target and f output denote the ground-truth, time-domain waveform and the synthesized waveform by the diffractive network model at a training iteration. For a given diffractive network model, f output is computed by propagating the input waves of all the spectral components from the input aperture to the output aperture. Next, the complex-valued wave fields of these different wavelength components are integrated over the sensitive area of the detector to obtain each complex-valued spectral coefficient at the output, which is followed by an inverse Fourier transform operation over the resulting vector. Alternatively, the error term between a target, time-domain pulse, f target , and the synthesized waveform by the diffractive network, f output , can directly be computed based on the complex-valued spectral coefficients without any inverse Fourier transform operation. However, in this case, since the error is defined based on the complex-valued target and output functions, two separate error functions must be computed for the real and imaginary parts of the spectral coefficients and these two losses must be combined to compute the final loss term. The analytical form of the square pulses used in this work can be written as: f target ðtÞ ¼ rectðbtÞ cosð2πf 0 tÞ, where f 0 and b represent the carrier frequency and the rectangular pulse-width, respectively. For the Gaussian pulses, however, the analytical form of the target waveform can be written as: where t 0,i , C i , p i and q i denote the time instant of the peak, magnitude, variance of the low-pass envelope and the instantaneous angular chirpiness, respectively. The number of desired pulses inside a targeted time-window is determined by n.
For the examples shown in Supplementary Figs. 7-9, the target time domain waveforms were created by setting these parameters to [n = 1, t 0 = 0, C 1 = 1, p 1 = 2.2 × 10 −22 , q 1 = 5.76 × 10 21 ]; [n = 2, t 0,1 = 0, t 0,2 = 27 ps, C 1 = 1, C 2 = 0.5, p 1 = p 2 = 1.38 × 10 −23 , q 1 = 6.25 × 10 22 q 2 = −6.25 × 10 22 ]; and [n = 2, t 0,1 = 0, t 0,2 = 19 ps, C 1 = 1, C 2 = 1, p 1 = p 2 = 4.58 × 10 −24 , q 1 = q 2 = 0], respectively. For the diffractive network designs shown in the last row of the table in Fig. 6, we used a power loss term, L p , defined as: where η = P ω I sensor P ω I input . I input and I sensor denote the power within the input and output apertures for a given wavelength, respectively. For the diffractive network designs shown in the last row of the table in Fig. 6, corresponding to 2 mm and 4 mm output apertures, η th was selected as 0.07 and 0.08, respectively. For the all remaining designs reported in the manuscript, the power loss term is defined as: where I target is the total power of the target waveform at a given wavelength within the input aperture, normalized with respect to the power of the input at the center frequency, f 0 . The last component of our loss function which represents the power surrounding the detector aperture is defined as: where I surround is the total power at a given wavelength within the 5 mm × 5 mm square region that is centered around the output aperture (excluding the output aperture, i.e., it only measures the signal surrounding the output aperture) and I output plane is the total power at a given wavelength within the output plane. The diffractive networks that synthesized 10.58 ps, 10.96 ps, 13.26 ps, 15.56 ps, 15.69 ps, and 17.94 ps square terahertz pulses were trained with α β ratios of 6500, 500, 4500, 1500, 750,000, and 2500, respectively. For the physical transfer learning approach, an α β ratio of 8500 was used. For Supplementary Figs. 7 and 9, we used an α β ratio of 1500, and for Supplementary Fig. 8, we used α β ¼ 15000. Figure 6 reports a series of diffractive optical network designs that are trained to create a square pulse of 15.5 ps at their output apertures, achieving different levels of power efficiencies. Among these pulse shaping diffractive network models, the α β ratio was adjusted depending on the size of the output aperture. Specifically, the diffractive networks targeting a 2 mm aperture at the output plane were trained with α β ¼ 1500, and this ratio was reduced to 136 for the diffractive pulse shaping systems with 4 mm wide output apertures. Finally, an α β ratio of 150 was used for the diffractive optical networks that were trained with the power efficiency loss term described in Eq. (10).
In our training, Adam optimizer is used as a standard error backpropagation method with a learning rate of 0.8 × 10 −3 for the pulses reported in Supplementary Figs. 7-9. For the diffractive networks synthesizing 10.96 ps and 15.69 ps square pulses, on the other hand, the learning rate was set to be 10 −3 . For the rest of the diffractive network models 10 −4 was used as the learning rate. All the trainable parameters were initialized as zero. Our designs used Python (v3.7.3) and TensorFlow (v1.15.0) on a computer that has Nvidia Titan RTX graphical processing unit, Intel Core i9 CPU and 128 GB of RAM with Windows 10 operating system. MATLAB 2016b is used to convert designed diffractive layers to a 3D printable (.stl) file format.

Data availability
All the data and methods needed to evaluate the conclusions in this work are present in the main text and the Supplementary Information. Any other relevant data are available from the authors upon reasonable request.

Code availability
The codes used in this work use standard libraries that are publicly available using TensorFlow.