Efficient full-path optical calculation of scalar and vector diffraction using the Bluestein method

Efficient calculation of the light diffraction in free space is of great significance for tracing electromagnetic field propagation and predicting the performance of optical systems such as microscopy, photolithography, and manipulation. However, existing calculation methods suffer from low computational efficiency and poor flexibility. Here, we present a fast and flexible calculation method for computing scalar and vector diffraction in the corresponding optical regimes using the Bluestein method. The computation time can be substantially reduced to the sub-second level, which is 105 faster than that achieved by the direct integration approach (~hours level) and 102 faster than that achieved by the fast Fourier transform method (~minutes level). The high efficiency facilitates the ultrafast evaluation of light propagation in diverse optical systems. Furthermore, the region of interest and the sampling numbers can be arbitrarily chosen, endowing the proposed method with superior flexibility. Based on these results, full-path calculation of a complex optical system is readily demonstrated and verified by experimental results, laying a foundation for real-time light field analysis for realistic optical implementation such as imaging, laser processing, and optical manipulation.


Introduction
Diffraction is a classic optical phenomenon accounting for the propagation of light waves. The efficient calculation of light diffraction is of significant value toward the real-time prediction of light fields in microscopy 1 , laser fabrication [2][3][4][5] , and optical manipulation 6,7 . The diffraction of electromagnetic (EM) waves can be cataloged into scalar diffraction and vector diffraction according to the validation of different approximation conditions. Scalar diffraction considers only the scalar amplitude of one transverse component of either the electric or the magnetic field with certain simplifications and approximations 8 . Scalar diffraction can yield sufficiently accurate results if the diffracting aperture and observing distance are both far larger than a wavelength, which is most valid for optical systems with a low numerical aperture (NA). For high-NA optical systems, polarization effects play a paramount role near the focal spot, and thus, vector diffraction must be adopted for light field tracing [9][10][11] . Although mathematical expressions for optical diffractions have been presented authoritatively for ages, fundamental breakthroughs have rarely been achieved in diffraction computations. The direct integration method was first used to calculate both scalar and vector diffraction [12][13][14] . However, the point-by-point calculation fashion renders the computation extremely tedious and inefficient. Fast Fourier transform (FFT)-based algorithms have been developed to perform fast calculations of light diffraction [15][16][17][18][19] . However, these methods can generate only the light field distribution within a fixed region of interest (ROI) and sampling numbers (i.e., resolution) determined by the intrinsic characteristic of the Fourier transform (FT), lacking flexibility in computing the desired local distribution with variable sampling intervals. Therefore, the versatile computation of optical diffraction in an efficient and flexible fashion is highly demanded for wide applications.
In addition, scalar and vector diffractions are separately analyzed in conventional studies because different integral formulas are needed for each case. However, in most practical apparatuses, scalar and vector diffractions coexist for different parts of the optical system. For example, in typical systems for optical microscopy, fabrication and manipulation, a monochromatic beam propagates over a long distance by passing optical elements such as focusing lenses, expanders, and collimators before entering an objective lens with a high NA. For the preceding part where the paraxial condition is valid, scalar diffraction is satisfactory for the light propagation evaluation. For the part behind the high-NA objective that meets the Debye approximation, vector diffraction is required for the accurate evaluation of the light propagation by taking into account each polarization component and non-paraxial propagation of light as well as apodization function of optical systems. Therefore, a facile and efficient method with the capacity for light propagation calculation along the entire optical path, which is termed full-path calculation, is highly desired for the comprehensive analysis of numerous realistic application scenarios.
Here, we propose an efficient full-path calculation method by exploring the mathematical similarities in scalar and vector diffraction. The scalar and vector diffraction are both expressed using the highly flexible Bluestein method. A fast light field evaluation over the entire optical path is achieved with arbitrarily defined ROIs and sampling numbers. This paper is organized as follows: first, the integral formulas for scalar and vector diffraction are revisited and deduced in FT forms. Second, the Bluestein method is utilized and reformed to completely supplant the FT in a more flexible fashion. Based on this, optical diffractions are evaluated with designated ROIs and sampling numbers. Third, representative examples are given for both scalar and vector diffraction to demonstrate the improvement in efficiency and flexibility. Finally, full-path light tracing of a laser holographic system is presented with unprecedented computation speed and agrees well with the experimental results, showcasing the superior ability of the Bluestein-based diffraction calculation. The proposed method holds great promise in the universal applications of optical microscopy, fabrication, and manipulation.

Scalar and vector diffraction integral in the form of a Fourier transform
For scalar diffraction, as shown in Fig. 1a, the electric field at a point (x, y, z) in the Cartesian coordinates can be obtained based on the Huygens-Fresnel principle and expressed by the Rayleigh-Sommerfeld diffraction integral 20 : is the distance between the source point and the observation point of interest. k = 2π/λ is the wavenumber. In the condition of the Fresnel approximation with a Fresnel number F ≥ 1, the third term and higher orders in the Taylor expression of r can be ignored, that is, . In the denominator of Eq. (1), r can be further approximated with only the first term (r ≈ z). Moreover, the paraxial approximation ensures cosθ ≈ 1. In this way, the complex electric field can be described by the Fresnel diffraction integral: which can be further rewritten as: Here, we define: Therefore, the integral Eq. (3) can be expressed in terms of the two-dimensional FT: here F represents the two-dimensional FT. Moreover, as with the other type of scalar diffraction, Fraunhofer diffraction in the far field can be expressed by Þ, which can be regarded as a special case of Fresnel diffraction passing through a converging lens. Therefore, scalar diffraction can be computed across the xy-plane using an FT-based approach.
Scalar diffraction can be used to effectively compute the complex amplitude distribution of many optical systems with a few approximations, as described above. However, it is known that the polarization components are changed due to large refractivity after passing through a high-NA non-paraxial system, and scalar diffraction is incapable of achieving proper results. The vectorial Debye diffraction integral, established by Richards and Wolf 21 , has to be adopted to analyze the complex EM field of each polarization component (Supplementary Information Section 1). The optical layout is shown in Fig. 1b.
Due to the refraction of the non-paraxial tight focusing system, the electric field components (polarization components e ! s and e ! p ) on the entrance pupil P e are transformed into a spherical reference surface P r ( e ! s , e ! th , and e ! r ). The transformation can be expressed in Cartesian coordinates as 20 : M is the transform matrix of the polarization from the entrance surface to the converging spherical surface. A 0 ffiffiffiffiffiffiffiffiffiffi cos θ p is the apodization factor accounting for the energy conservation. The propagation of the electric field from the reference surface P r to the imaging point p (x, y, z) near the focus is expressed by the Debye integral: The definition of k ! r can be found in Supplementary Information Section 1. By performing the integration over the planar surface P e instead of the surface P r (Supplementary Information Section 2): which can be rewritten in the form of an FT: In brief, both scalar diffraction and vector diffraction can be expressed by the FT. FFT algorithms in modern computer systems allow for fast and accurate calculations. The similarity between these two diffractions is obvious Fig. 1 Illustrative diagrams of scalar and vector diffraction. a Geometry for scalar diffraction calculation. b Geometry for vector diffraction calculation from a mathematical point of view: the vector diffraction integral is equivalent to the scalar Fraunhofer diffraction in the case of a low-NA optical system where 1/cosθ ≈ 1.
Although the FFT-based optical calculation is much faster than the direct integration method, it results in inevitable drawbacks: the resultant output field has a fixed transverse dimension and unchangeable sampling numbers determined by the dimension and sampling size of the input aperture for a given distance. The dimension of the output field is: where d is the distance between the input aperture and output plane. p s is the sampling size of the input aperture. The sampling numbers of the output plane are rigidly equivalent to those of the input aperture.
The restriction is brought about by the intrinsic characteristic of the FT and greatly limits the flexibility in light propagation calculations. For example, the input aperture must be enormously expanded with the aid of the zero-padding approach when a small portion of the output plane is required with high resolution, which inevitably leads to a large increment of the computation time.

Bluestein method to compute Fourier transform with arbitrary ROI and sampling resolution
Regarding mathematics, to achieve the required bandwidth and resolution in the frequency domain, the appropriate zero-padding operation is needed to extend the dimension of the original input sequence 15 . For most applications in laser manipulation and lithography, only a small fraction of the output field with high resolution is needed to obtain sufficient details, resulting in large amounts of zero-padding. This results in a severe waste of resources, as most of the results are discarded. The operation of the zero-padding inevitably increases the computation time and the demand for memory usage. Moreover, the resultant output region remains unchangeable, greatly limiting its potential in practical applications. Here, the Bluestein method is adopted to evaluate the scalar and vector diffraction calculations. The Bluestein method is an elegant method conceived by L. Bluestein 22 and further generalized by L. Rabiner et al. 23 that is capable of performing more general FTs at arbitrary frequencies as well as boosting the resolution over the full spectrum. The Bluestein method offers us a spectral zoom operation with high resolution and arbitrary bandwidth. This advantage is enabled by computing the z-transform along spiral contours in the z-plane for an input sequence (Supplementary Information Section 3 and Fig. S1). The computational complexity is O[(M + N) log 2 (M + N)], manifesting an FFT algorithm. The method is based on the z-transform along a spiral contour in the z-plane defined by A and W: M is the length of the transform. N is the length of input sequence. A specifies the complex starting point of the z-plane spiral contour of interest, and W specifies the complex scalar describing the complex ratio between points along the contour. Note that the case of A = 1, W = exp(−i2π/N), and M = N corresponds to the discrete Fourier transform (DFT), which computes the z-transform along the unit circle with a finite duration. More generally, the method can be used to calculate the DFT between an arbitrary starting point f 1 and ending point f 2 (i.e., the tuneable frequency bandwidth relative to the total frequency range f s ) with arbitrary sampling numbers M.
The practical implementations of the Bluestein method for enhanced DFT computation deserve additional comments. First, a 2D FT is needed for the computation of both scalar and vector diffraction. The Bluestein method should be adopted in both the column and the row dimensions to fulfill this requirement. Second, the Bluestein method internalizes padding of the input array with zeros at the tail. However, symmetric zero-padding around the input array is needed for the simulation of realistic optical systems. Third, an additional operation is needed to shift the zero-frequency component to the center of the array before and after the DFT to eliminate the high-frequency oscillation in the phase information. To address these issues, the definition of parameters A and W should be rearranged, and phase shifting factor P shift should be included at the end of the calculation (see Supplementary Information Section 3 and Figs. S2-S4).
By performing these adjustments, the Bluestein method can be developed as a fast approach for light diffraction calculation with superior flexibility: it allows for the selection of arbitrary segments in the imaging plane with arbitrary resolution, providing competitive efficiency and flexibility over direct integration and the FFT methods.
Fast numerical implementation of the Bluestein method in scalar Fresnel diffraction Figure 2 illustrates the scalar calculation with a paradigm of the converging spherical wave propagation, which is generated by a plane wave passing through a convex lens. The phase profile of the lens is shown in Fig. 2a, which is equivalent to the phase plate after being wrapped between 0 and 2π (Fig. 2b). The optical configuration is sketched in Fig. 2c, with the parameters λ = 800 nm, f = 600 mm, and D = 8.64 mm. Figure 2d, e shows the optical field distribution in the focal plane in terms of the intensity and phase. Figure 2f, g shows the cross-sectional intensity and phase distributions in the light propagation direction. The corresponding line plots of the intensity and phase are given in Fig. 2h-k. A comparison between the Bluestein method and traditional direct integration and FFT methods is also made, from which we can see excellent agreements. It is revealed that the Bluestein method can calculate the scalar light diffraction with high accuracy.
The Bluestein method has a superior advantage in the computation time cost over the direct integration and FFT methods. Due to the tedious point-by-point calculation method, the direct integration method is associated with two cycling loops, and the computation time increases drastically with the calculation points of the target plane (with a computational complexity of O (M 2 × N 2 )). For the case of the FFT method, a zero-padding operation is needed to fulfill the requirement for the preset target sampling numbers, resulting in a rapid increase in computation time with the sampling points. As shown in Fig. 2l, with the increase in the sampling points along one coordinate axis, the Bluestein method exhibits its obvious superiority compared with the other two methods. This advantage makes the method particularly applicable to scenarios where large sampling points are needed, such as high-resolution microscopy. For the case in Fig. 2d, e, where the sampling points in the entrance pupil and output field are the same (M = N = 1080) and the ROI is 0.2 × 0.2 mm, the computational cost is~13.7 h for the direct integration method, making it unsuitable for practical applications. For the FFT method, the computational cost is improved to 68 s, as shown in Fig. 2m. In comparison, the computation time is only 0.67 s using our proposed Bluestein method, which is 10 5 and 10 2 times less than those of the direct integration method and FFT method, respectively. The three-dimensional volumetric light field ( Supplementary Information Fig. S5) can be reconstructed using cross-sectional light fields by calculating the lateral planes layer by layer. As depicted in Fig. 2n, the computation time for the direct method is excessively long to obtain the volumetric light field (~85 days). It takes 2 h to calculate the cross-sectional light field in the longitudinal yz-plane. By using the FFT method, the computational cost is the same (2.8 h) for both the volumetric and cross-sectional light fields because the ROI cannot be tuned due to the intrinsic characteristic of the FT. Owing to the fast computation property of the Bluestein method, calculation of the 3D optical field can be accomplished in <100 s. The efficiency enhancement is on the same order as that in the lateral xy-plane. More examples of scalar diffraction are given in Supplementary Information Section 4 and Fig. S6.
In addition to the great improvement in computational efficiency, the Bluestein method has remarkable flexibility compared with the FFT method. That is, an arbitrary ROI can be defined with arbitrary resolution. This feature is illustrated by reconstructing a computer-generated hologram (CGH), as shown in Fig. 3. Evaluation of the light propagation after being modulated by a CGH is essential for predicting the performance of optical holographic tweezers 24 , holographic displays 25 , and laser holographic processing 26,27 . As shown in Fig. 3a, a CGH is generated by the weighted Gerchberg-Saxton (GSW) algorithm 28,29 . After FT by a converging FT lens, the designed pattern can be reconstructed (Fig. 3b). The process involves two scalar diffraction calculations: one is from the CGH to the FT lens, and the other is from the FT lens to the reconstruction plane.

Fast numerical calculation of the vectorial Debye diffraction
The vectorial nature of light is essential for optical systems with a high-NA aperture or specific polarization, such as radial and azimuthal polarizations 30,31 . Figure 4a illustrates the focusing of radially polarized light by a high-NA aplanatic objective (NA: 1.4). By using the proposed Bluestein method in the vectorial Debye-Wolf integral, the light field distribution near the focus can be rapidly calculated (insets of Fig. 4a). The results are consistent with those computed by direct integration and the FFT methods, as reflected by the line plots of the light intensity along the transverse and longitudinal directions in Fig. 4b, c.
The optical vortex generated by a spiral phase plate (Fig. 4d), in cooperation with circular polarization, plays a key role in super-resolution stimulated emission depletion microscopy 32 and nano-lithography 33 . A doughnutshaped focus profile with a dark center is used as the depletion beam to eliminate fluorescence or polymerization. Figure 4e, j shows the optical intensity profiles of the optical vortex in the lateral xy and longitudinal yz-planes, respectively. An engineered focus with a symmetric doughnut shape can be generated. Moreover, the light components in different polarizations can be obtained efficiently using our Bluestein method, as shown in Fig. 4f-i, k-m. It can be seen that all the light components have dark central intensities close to zero and the spiral phase. The light in the transverse polarizations is dominant over the longitudinal polarization. The Bluestein method also endows the vectorial calculation with high flexibility compared with the traditional FFT approach. Figure 4n, o shows the enlarged intensity profiles in the ROIs labeled in Fig. 4f, g, respectively. Another example of the usage of the Bluestein method for vector diffraction is shown in Supplementary Information Section 5 and Fig. S7. The optical information in the arbitrary ROIs can be investigated in detail without increasing the computational cost, making the Bluestein method advantageous in evaluating localized highresolution light distributions for the application of microscopy and photolithography.
For the computation time, the Bluestein method also exhibits great superiority. Here, we consider the calculation from the entrance pupil with~10 5 sampling points to the exit pupil with the same points in the xy-plane, and 100 layers along the optical axis are calculated for volumetric and cross-sectional light distributions in the yz-plane. As shown in Fig. 4p, the direct method requires 57.16 min to calculate the lateral light field. 95.3 h is needed for the volumetric 3D light field distribution, and 22.78 min is needed for the sliced yz-plane. An acceptable time (2.88 s) is needed for the FFT method to calculate the xy-plane. However, an impractical 280.4 s is needed to obtain the light distribution in the volumetric three dimensions and the two-dimensional yz-plane. In contrast, only 0.2 s is consumed by the Bluestein method for calculation in the xy-plane. Moreover, only 9.34 and 12.19 s is needed to achieve the 2D cross-sectional and 3D volumetric light fields. Note that the computation time increases much more quickly with the sampling numbers of the ROI for the direct method and FFT method than for the Bluestein method, e.g., more than 10 days are needed for the direct method and 126.5 s is needed for the FFT method to acquire transverse light distributions in the xy-plane when the number of sampling points increases to~10 6 (1080 × 1080), while only 1.78 s is needed for the Bluestein method, which is five orders of magnitude less than that needed for the direct method and 10 2 times less than that for the FFT method.
Full-path optical calculation with superior flexibility and efficiency As discussed above, both the scalar and vector diffraction can be efficiently calculated by the Bluestein method. Based on this, the full-path optical calculation and tracing can be performed with high flexibility and efficiency. Figure 5a illustrates a representative optical layout for laser holographic processing and holographic manipulation. This setup can be further adopted for two-photon scanning confocal microscopy. Here, a phase-only spatial light modulator (SLM, Holoeye Pluto NIR-II, resolution: 1920 × 1080) is used to modulate the wavefront of the laser by loading a predesigned CGH. A combination of a half-wave plate and polarization beam splitter is utilized to attenuate the laser power. A 4f configuration consisting of Lens 1 (f = 600 mm) and Lens 2 (f = 200 mm) is placed between the SLM and aplanatic objective (100×, NA: 1.4). It is a typical optical system involving both scalar diffraction and vector diffraction during light propagation. First, we simulate the multi-foci optical system, which can be used for holographic tweezers, laser parallel processing and data recording. Figure 5b is the corresponding CGH for the generation of a 9 × 9 multi-foci array. A linearly polarized femtosecond laser (800 nm, emitted from Chameleon Vision-S, Coherent) is modulated by the CGH. After the FT of Lens 1, a multi-foci array is generated (Fig. 5c). At the back of the objective, the phase and intensity distributions are retrieved as shown in Fig. 5d, e. The phase profile closely resembles the CGH, validating the accuracy of the Bluestein-enabled calculation of scalar diffraction. The light beam is slightly smaller than the size of the entrance pupil of the objective, ensuring that the phase-modulated beam can be fully transformed by the objective. In the focal plane of the objective, a diffraction-limited 9 × 9 multi-foci array is generated (Fig. 5f). The full-path calculation can be accomplished with high efficiency in <4 s. The experimentally measured multi-foci intensity (Fig. 5g) agrees well with the simulation. With the help of the highly flexible Bluestein method, a detailed analysis of a single focal spot is enabled, as shown in Fig. 5h, revealing that a Gaussian focus is generated with linear polarization. The light field in the longitudinal section can be readily computed, and the spatial uniformity can be investigated (Fig. 5i).
Another universal example is given in Fig. 5j-m, where a CGH is encoded on the SLM to generate a pattern as discussed in Fig. 3. By using the Bluestein full-path calculation method, the light field of the desired pattern can be simulated in the focal plane of the objective (Fig. 5j),

Discussion
The proposed Bluestein-based method provides a fundamental improvement in optical diffraction calculations. The advantages of the method lie in the following three aspects. First, the computation method for light diffraction is superfast, allowing for the real-time prediction of light field propagation for diverse implementations. Second, the method has great flexibility, without loss of accuracy and efficiency. The desired ROI can be freely chosen, and the sampling numbers can be arbitrarily tuned. Third, the method shows good universality. It suits all diffraction conditions, such as phase modulation, amplitude filtering, polarization conversion, and focusing transform. In particular, this method facilitates the simulation and optimal design of metasurfaces [34][35][36] , as exemplified in Supplementary Information Section 7 and Fig. S9. Both scalar and vector diffraction can be computed using this method, making this method promising for full-path propagation evaluation in broad applications of optical microscopy, lithography, and optical manipulation. In addition, the applicability of this method needs to be explicated for realistic implementations. First, some approximation conditions are assumed for both vector and scalar diffraction. For vector diffraction, the lens is assumed to obey Abbe's sine condition. For scalar diffraction, the Fresnel approximation should be valid. It is worth noting that Fraunhofer diffraction can also be implemented using the Bluestein method with slight modification. Second, it is important to take stringent precautions against aliasing effects. When the diffraction distance of scalar diffraction is too small or the focal shift of vector diffraction is too long, obvious aliasing is likely to occur because the sampling condition no longer satisfies the Nyquist sampling condition.
In summary, an efficient calculation method is developed to evaluate light diffraction with high flexibility and efficiency. First, a set of mathematical preliminaries is given to express the scalar and vector diffraction integrals in the form of an FT and then unified using the Bluestein method. Examples for both scalar and vector diffraction are demonstrated to reveal that the computational efficiency and flexibility are greatly improved. Calculation of the light field is realized at the sub-second time level compared with several minutes using the FFT method or hours using the direct integration method. Full-path light tracing is finally demonstrated using the Bluestein method. This method holds great potential not only in the fast prediction of numerous optical systems but also in the realm of signal processing for acoustic and other communication waves.

Computational environment
All the calculations are performed on a personal laptop with an Intel processor I5 2.50 GHz and 8 GB of memory, running the Windows 10 Professional operating system. The code is written, compiled and run in the MATLAB R2019a software. All the comparison studies on efficiency are performed in the same computational environment.

Laser holographic system
The laser holographic system consists of a Ti:sapphire femtosecond laser oscillator (Chameleon Vision-S, Coherent) with a central wavelength of 800 nm, a repetition rate of 80 MHz, and a pulse width of 75 fs. A phaseonly reflective liquid crystal SLM (Pluto NIR-2, Holoeye) is utilized for the phase modulation, which features a 1920 × 1080 resolution and a 8 μm pixel pitch. In the experiment, only the central portion of the SLM with 1080 × 1080 pixels is used to modulate the wavefront, while the other pixels are set to zero. A phase hologram pattern with 256 different shades of gray is loaded onto the SLM, corresponding to the phase modulation depth from 0 to 2π. A CCD camera is used to capture the light field distribution.