High-accuracy wavefront sensing by phase diversity technique with bisymmetric defocuses diversity phase

We investigate a specific diversity phase for phase diversity (PD) phase retrieval, which possesses higher accuracy than common PD, especially for large-scale and high-frequency wavefront sensing. The commonly used PD algorithm employs the image intensities of the focused plane and one defocused plane to build the error metric. Unlike the commonly used PD, we explore a bisymmetric defocuses diversity phase, which employs the image intensities of two symmetrical defocused planes to build the error metric. This kind of diversity phase, named PD-BD (bisymmetric defocuses phase diversity), is analysed with the Cramer-Rao lower bound (CRLB). Statistically, PD-BD shows smaller CRLBs than the commonly used PD, which indicates stronger capacity of phase retrieval. Numerical simulations also verify that PD-BD has higher accuracy of phase retrieval than the commonly used PD when dealing with large-scale and high-frequency wavefront aberrations. To further affirm that PD-BD possesses higher accuracy of wavefront sensing than PD, we also perform a simple verification experiment.

a specific diversity phase for PD phase retrieval. The main idea is to remove the zero-diversity phase and to add a symmetric defocus diversity phase. The bisymmetric defocuses diversity phase, named PD-BD (bisymmetric defocuses phase diversity), employs the image intensities of two symmetrical defocused planes to build the error metric. Obviously, compared with traditional PD, PD-BD introduces no additonal computational burden. PD-BD was first introduced by Lee et al. (1999) 24 , which was analyzed by Cramer-Rao lower bound. Their work hinted that PD-BD might possess a better performance for wavefront sensing. In general, PD-BD could be seen as a special case of the multi-image PD algorithm presented by Paxman et al. (1992) 20 , derived for an arbitrary number of diversity images. In our opinion, two bisymmetric defocus diversity phases can provide helpful constraint conditions for the reconstruction of the measured system aberration.

CRLB analysis and numerical simulations.
PD is a posteriori image-based aberration estimation technique. The smallest possible unbiased estimator variance for PD technique can be theoretically given by the Cramer-Rao lower bound (CRLB). CRLB could provide a lower bound on the mean-square error of an estimator, which can be used to theoretically evaluate the accuracy of wavefront sensing by phase diversity technique. The fundamental concept of CRLB is written symbolically as where  a k denotes an unbiased estimate of a k . The Var(·) operator denotes the variance of the  a k . The symbol [F(a)] stands for the Fisher information matrix of a. We use the same definition of CRLB as ref. 25 , of which the CRLB is denoted as the corresponding square root that bounds the rms error of an estimator. So the CRLB of PD technique on a Zernike parameter a k is expressed as The equality states that the estimated parameter vector a is bounded from below by the corresponding square root of diagonal elements of a Fisher information matrix inverse. Detailed expression of the Fisher information matrix F and descriptions for the CRLB of PD can be obtained in Refs 24,25 . We suppose a high signal-to-noise ratio (SNR) for the computation of CRLB. In this paper, we will consider the CRLBs of PD and PD-BD for the simulated object under the assumption of Gaussian noise.
We will carry on the comparisons between PD and PD-BD by CRLB analysis and numerical simulations. The pristine object used for CRLB analysis and numerical simulations is shown in Fig. 1. The object here is a high signal-to-noise ratio (SNR) object which will help to approach theoretical CRLB. In fact, it is unimportant to know accurate SNR when we focus on the comparison of CRLB between PD and PD-BD. The size of the Nyquist sampled object is 200 × 200 pixels and the intensity levels are quantized to 16 bits. The size of simulated pupil is 100 × 100 pixels. For numerical simulations, the regularization parameter γ is set to be 1.0 × 10 −6 . The optimization algorithm used here is a hybrid particle swarm optimization algorithm 19 , which can provide global search and avoid being trapped in local minimums. The amount of defocus used for PD is commonly set to be 0.5λ PV or 1.0λ PV (peak to valley). In the following, we will show that the amount of symmetrical defocuses used for PD-BD can be set as ±0.5λ PV, ±0.75λ PV and ±1.0λ PV, which possess similar performance. We denote PD with 0.5λ PV and 1.0λ PV defocus as PD-0.5 and PD-1.0, respectively. Similarly, PD-BD with ±0.5λ PV, ±0.75λ PV and ±1.0λ PV defocus are denoted as PD-0.5, PD-0.75 and PD-1.0, respectively. The measured wavefront aberration is denoted as φ set . The rms value of φ set is defined as follows: v set a ver T 2 1 2 where v is the two-dimensional position coordinate in the pupil plane. φ aver is the average value of φ set , N T is the total number of pixels within the pupil aperture. The accuracy of the reconstructed wavefront aberration φ is quantified with the rms phase error: We first compare the error metrics of PD-1.0 and PD-BD-1.0 with a astigmatism system aberration a 5 = 1.5 radian. The error metrics of PD-1.0 and PD-BD-1.0 then reduce to functions of one single variable a 5 , which are explicitly plotted in Fig. 2. We can see that PD-BD-1.0 provides a steeper gradient than PD-1.0 near the minimum extreme point a 5 = 1.5. Steeper gradients near the global minimum position mean a easier search for the measured phase with a higher accuracy.
Moreover, we consider a randomly generated aberration listed in Table 1. We show the CRLBs plotting of PD-0.5, PD-1.0, PD-BD-0.5, PD-BD-0.75 and PD-BD-1.0 in Fig. 3. PD-BDs provides smaller CRLBs than PDs for all Zernike parameters, which indicates stronger capacity of phase retrieval. It is reasonably expected that PD-BD posssesses a higher accuracy than PD. Corresponding simulation results for that aberration by PD and PD-BD are also shown in Table 1 and Fig. 4. The reconstruction accuracy of PD-0.5 is a little better than PD-1.0. Nevertheless, It can be seen that the reconstructed parameters by PD-BDs is apparently more accurate than that by PD-0.5. PD-BD-0.5, PD-BD-0.75 and PD-BD-1.0 provide similar high reconstruction accuracy. RMS errors obtained by PD-0.5 and PD-BD-0.5 are 0.0178λ and 0.0032λ, respectively. It is also displayed in Fig. 4 that the reconstructed phase by PD-BD-0.5 is obviously more accurate than that by PD-0.5.  To further testify the accuracy of PD-BD, we provide the statistical results based on 100 numerical cases. The measured 100 aberrations are randomly generated by setting a k ∈[−1, 1], k = 4, …, 15 (radians). We define CRLB i ( ) and RMSE(i) as the average CRLB and the RMS error for the i th numerical case, respectively. And the average CRLB and the average RMS error for the 100 numerical cases are defined as CRLB and RMSE, respectively.
i 1 100 We show CRLB and RMSE obtained by PD and PD-BD in Table 2. CRLB obatined by PD-BD-0.75 and PD-BD-1.0 are smaller than that by PDs, which indicates a higher accuracy. Although, CRLB obatined by PD-BD-0.5 is a  little greater than that by PD-1.0, RMSE obtained by PD-BD-0.5 is 0.0143λ, nearly two times smaller than that by PD-1.0. Overall, for the 100 numerical cases, PD-BD gains a higher accuracy than PD in 85% of all cases.
With the increasing of the scales of the measured wavefront aberrations, the accuracy of PD and PD-BD will drop gradually. Here, we will compare the performance of PD and PD-BD for wavefront aberrations of the same spatial frequency distribution, but with different scales. The coefficient vector listed in Table 1 is denoted as coe. The measured wavefront aberrations of different scales are generated from coe, of which each coefficient is multiplied by a same integer. Table 3 lists the RMS errors achieved by PD and PD-BD for wavefront aberrations of different scales. In general, it is can be seen that the accuracy of PD-BD is higher than PD for wavefront aberrations of different scales. It also seems that PD-BD with different amounts of defocus keep satisfying accuracy for wavefront aberrations of different scales.
At last, we will compare the performance of PD and PD-BD for high-frequency wavefront sensing. The measured high-frequency wavefront coefficient vector {a 4 , …, a 21 } is generated by adding random high-frequency parts to the relative low-frequency wavefront aberration listed in Table 1. And the coefficient vector {a 4 , …, a 28 } is also generated by adding random high-frequency parts to coefficient vector {a 4 , …, a 21 }. Reconstruction errors of each Zerinike coefficient by PD and PD-BD for coefficient vectors {a 4 , …, a 21 } and {a 4 , …, a 28 } are plotted in Fig. 5 and Fig. 6, respectively. PD-0.5 and PD-1.0 are obviously less accurate than that by PD-BDs. PD-BD-0.5, PD-BD-0.75 and PD-BD-1.0 provide similar high reconstruction accuracy, nearly five times higher than that by PD-0.5. With the addition of high-frequency parts of the measured wavefront aberration, the reconstruction accuracy of PD-0.5 drop gradually. For coefficient vectors {a 4 , …, a 15 }, {a 4 , …, a 21 } and {a 4 , …, a 28 }, RMS errors abtained by PD-0.5 for different high-frequency wavefront aberrations are 0.0178λ, 0.0282λ and 0.0305λ, respectively. By comparison, RMS errors abtained by PD-BD-0.5 for the same wavefront aberrations are 0.0032λ, 0.0052λ and 0.0054λ, respectively. Compared with PD, it seems that the accuracy of wavefront sensing by PD-BD is less sensitive to the number of Zernike polynomials used and the diversity amplitude, which is a notable advantage.  Table 3. RMS errors(λ) achieved by PD and PD-BD for wavefront aberrations of different scales. Experimental results. To further examine the accuracy of wavefront sensing with PD and PD-BD, we perform a simple verification experiment. The simplified block diagram of the optical system used here is shown in Fig. 7. The equivalent focal length F is 300 mm, the pupil diameter D is 10 mm, and the CCD pixel size is 5.2 μm. A common halogen lamp is selected as the light source. A fiber bundle is used as the extended object to be reconstructed. The measured wavefront aberration is randomly generated by a homemade deformed mirror with seven actuators. To obtain quasi-monochromatic images, a filter(632.8 nm) is located at the collimated beam. The regularization parameter γ is empirically set to be 5.0 × 10 −3 . We move the CCD by the distance ±d to get the plus defocused image and the minus defocused image. The corresponding peak-to-valley optical path difference Δ 26 is equal to 2 We choose d such that Δ = ±0.5λ, so the distance d here is 2.279mm. The measured object takes the area of 450 × 450 pixels and the exposure time is 20 ms. To get rid of the tilts aberrations induced by moving CCD, we match the plus defocused image and the minus defocused image by an image matching algorithm call speeded-up robust features (SURF) 27 . So, similar as our simulations, we ignore piston and tilt aberrations a 1 − a 3 . The computed parameter vecter is set as a = [a 4 ,a 5 , …, a 28 ].
To examine the accuracy of wavefront sensing with PD and PD-BD, we will compare the detailed clarity of the reconstructed object images. We denote the wavefront coefficients reconstructed by PD and PD-BD as PD-coe and PD-BD-coe, respectively. With the same input images, or the same formula for object reconstruction, we put PD-coe and PD-BD-coe into  Thus, the only difference for the two reconstructed object images are the wavefront coefficients reconstructed by PD and PD-BD. It can been seen in Fig. 8 that PD-coe and PD-BD-coe are quite different, especially for Zernike coefficients a 5 − a 15 . The big difference probably means one of them is much more precise than the other one for wavefront sensing. We show from left to right the minus defocused image, the focused image and the plus defocused image in Fig. 9(a). Figure 9(b) shows the wavefront aberration reconstructed by PD and object image reconstructed by Eq. (9). Figure 9(c) shows the wavefront aberration reconstructed by PD-BD and the object image reconstructed by Eq. (9). From the partial enlarged view of Fig. 9(b), apparent distortions for each single fiber can be found, which means there are still non-negligible and uncorrected system aberrations. While the object image reconstructed by PD-BD-coe possesses greater clarity than that by PD-coe for each single fiber. We think our experiment results have proved indirectly that PD-BD possesses higher accuracy of wavefront sensing than PD.

Discussion
For the example of a astigmatism system aberration a 5 = 1.5 radian, PD-BD provides a steeper gradient near the minimum extreme point than PD. A steep gradient of PD-BD near the minimum extreme point can lessen calculative burden and improve the accuracy of wavefront sensing. For the simulations of wavefront sensing with increasing scales of wavefront aberrations, the accuracy of PD and PD-BD drop gradually. Taken together, for different scales of wavefront aberrations, the accuracy of PD-BD is mainly higher than that of PD. With the addition of high-frequency parts of the measured wavefront aberration, the reconstruction accuracy of PD drop gradually. PD-BD could maintain essential reconstruction accuracy for high-frequency wavefront sensing. Compared with  PD, it seems that the accuracy of wavefront sensing by PD-BD is less sensitive to the number of Zernike polynomials used and the diversity amplitude, which is a notable advantage for engineering applications. For the simple verification experiment, we have affirmed the fact that PD-BD possesses a higher accuracy of wavefront sensing than PD.
We have statistically compared the CRLB of PD and PD-BD, which indicates PD-BD has a stronger capacity of phase retrieval than PD. In our opinion, the in-depth reason that PD-BD possesses a higher accuracy than PD is due to the shape of the error metric which results in better convergence in the optimization process. For actual engineering application, PD-BD employs two symmetric images of close SNR near the focal plane. This characteristic of PD-BD is favourable to balance necessary amplitude of defocus diversity and necessary SNR of the defocused image. In other words, PD-BD could better ensure the maximum likelihood estimation, which is the foundation of phase diversity technique.

Methods
In this section, compared with common PD, we briefly describe the bisymmetric defocuses phase diversity. It is supposed that the object is illuminated with non-coherent quasi-monochromatic light, and the imaging system is a linear shift-invariant system 28 . The intensity distribution of the image plane with Gaussian noise can be modeled by the following equation: where ⊗ stands for the convolution operation, r is a two-dimensional position vector in the image plane, o(r) is the object to be found, h(r) is the point-spread function (PSF) of the optical system, and n(r) is Gaussian noise. The point-spread function associated with the focused image is given by