Temporal phase unwrapping using deep learning

The multi-frequency temporal phase unwrapping (MF-TPU) method, as a classical phase unwrapping algorithm for fringe projection techniques, has the ability to eliminate the phase ambiguities even while measuring spatially isolated scenes or the objects with discontinuous surfaces. For the simplest and most efficient case in MF-TPU, two groups of phase-shifting fringe patterns with different frequencies are used: the high-frequency one is applied for 3D reconstruction of the tested object and the unit-frequency one is used to assist phase unwrapping for the wrapped phase with high frequency. The final measurement precision or sensitivity is determined by the number of fringes used within the high-frequency pattern, under the precondition that its absolute phase can be successfully recovered without any fringe order errors. However, due to the non-negligible noises and other error sources in actual measurement, the frequency of the high-frequency fringes is generally restricted to about 16, resulting in limited measurement accuracy. On the other hand, using additional intermediate sets of fringe patterns can unwrap the phase with higher frequency, but at the expense of a prolonged pattern sequence. With recent developments and advancements of machine learning for computer vision and computational imaging, it can be demonstrated in this work that deep learning techniques can automatically realize TPU through supervised learning, as called deep learning-based temporal phase unwrapping (DL-TPU), which can substantially improve the unwrapping reliability compared with MF-TPU even under different types of error sources, e.g., intensity noise, low fringe modulation, projector nonlinearity, and motion artifacts. Furthermore, as far as we know, our method was demonstrated experimentally that the high-frequency phase with 64 periods can be directly and reliably unwrapped from one unit-frequency phase using DL-TPU. These results highlight that challenging issues in optical metrology can be potentially overcome through machine learning, opening new avenues to design powerful and extremely accurate high-speed 3D imaging systems ubiquitous in nowadays science, industry, and multimedia.


Methods
Phase-shifting profilometry (PSP). In a typical FPP system, sinusoidal fringe-based FPP methods are more prevalent to a great variety of practical applications and can be generally divided into two main categories for phase extraction: Fourier transform profilometry (FTP) 28 and Phase-shifting profilometry (PSP) 29 . Numerous dynamic 3D measurement techniques have been developed based on FTP, which have the advantage to provide the phase map utilizing only a single high-frequency fringe pattern 16,30 . How, suffering from frequency band overlapping problem, these methods generally yield coarse wrapped phase with low quality which limits its measurement precision for dynamic 3D acquisition. In addition, not just limited to Fourier transform, the windowed Fourier transform (WFT) and the wavelet transform (WT) can also be applied for the phase retrieval and enhancing 3D measurement accuracy even in the case of complex surfaces and depth discontinuities 31 . Different from FTP, PSP can realize pixel-by-pixel phase measurements with higher accuracy unaffected by ambient light, but it needs to project at least three fringe patterns to obtain a phase map theoretically 29 . In this work, the standard 3-step phase-shifting fringe patterns with shift offset of 2π/3 are adopted and represented as π π = . + .
− I x y f x n ( , ) 0 5 0 5 cos(2 2 /3), represent fringe patterns to be projected, f is the frequency of fringe patterns. After projected onto the object surfaces, the deformed fringe patterns captured by the camera can be described as n c where A(x, y), B(x, y), and Φ(x, y) are the average intensity, the intensity modulation, and the phase distribution of the measured object. According to the least-squares algorithm, the wrapped phase φ(x, y) can be obtained as 32-34 : www.nature.com/scientificreports www.nature.com/scientificreports/ Due to the truncation effect of the arctangent function, the obtained phase φ(x, y) is wrapped within the range of (−π, π], and its relationship with Φ(x, y) is: x y k x y ( , ) ( , ) 2 ( , ), (4) where k(x, y) represents the fringe order of Φ(x, y), and its value range is from 0 to N − 1. N is the period number of the fringe patterns (i.e., N = f). In FPP, the core challenge for the absolute phase recovery is to obtain k(x, y) for each pixel in the phase map quickly and accurately.
Multi-frequency temporal phase unwrapping (MF-TPU). In temporal phase unwrapping (TPU), the wrapped phase φ(x, y) is unwrapped with the aid of one (or more) additional wrapped phase map with different frequency. For instance, two wrapped phases φ h (x, y) and φ l (x, y) are both retrieved from phase-shifting algorithms by using Eq. (3), ranging from −π to π. It is easy to find that the two absolute phases Φ h (x, y) and Φ l (x, y) corresponding to φ h (x, y) and φ l (x, y) have the following relationship: where f h and f l are the frequency of high-frequency fringes and low-frequency fringes. Based on the principle of MF-TPU, k h (x, y) can be calculated by the following formula: Since the fringe order k h (x, y) is integer, ranging from 0 to f h − 1, Eq. (6) can be adapted as h h l l h where Round() is the rounding operation. When f l is 1, there will be no phase ambiguity so that Φ l (x, y) is inherently an unwrapped phase. Theoretically, for MF-TPU, this single-period phase can be to directly assist phase unwrapping of φ h (x, y) with relatively higher frequency. However, the phase unwrapping capability of MF-TPU is greatly constrained due to the influence of noise in practice. Assuming phase errors in the wrapped phase maps φ h (x, y) and Φ l (x, y) are Δφ h (x, y) and Δφ l (x, y) respectively, from Eq. (6) we have: To avoid errors in determining the fringe orders, from Eqs. (7) and (9) we have: Notably, Eq. (11) defines the range of Δφ max where the absolute phase can be correctly recovered. Otherwise, error will occur in determining the exact k h (x, y). In MF-TPU, since the frequency of the low-frequency fringes is fixed to 1, it can be found from Eq. (11) that the higher the frequency of the high-frequency fringes, the narrower the range of Δφ max , and the worse the reliability of the phase unwrapping. Consequently, for a normal FPP system, MF-TPU can only reliably unwrap the phase with about 16 periods due to the non-negligible noises and other error sources in actual measurement. Thus, it generally exploits multiple (>2) sets of phases with different frequencies to hierarchically unwrap the wrapped phase step by step, and finally arrives at the absolute phase with high frequency instead of only using the phase with a single period. Obviously, MF-TPU, which consumes additional time for projecting patterns with intermediate frequencies, is not a good choice to realize high-speed, high-precision 3D shape measurement based on FPP.
www.nature.com/scientificreports www.nature.com/scientificreports/ Deep-learning based temporal phase unwrapping (DL-TPU). Aiming at this problem, we choose to use the deep neural networks (DNN) to overcome the limitations of MF-TPU, and the specific diagram of the proposed method is shown as in Fig. 1. The input data of the network are the two wrapped phases of the single period and high frequency, which is the same as the two-frequency TPU. To realize the highest unwrapping reliability, we adopt the residual network as the basic skeleton of our neural network 35 , which can speed up the convergence of deep networks and improve network performance by adding layers with considerable depth. Then, we introduce the multi-scale pooling layer to down-sampling the input tensors, which can compress and extract the main features of the tensors for reducing the computation complexity and preventing the over-fitting. Correspondingly, it is inconsistent for the tensors sizes in the different paths after the processing of the pooling layer. Therefore, upsampling blocks will be used to make the sizes of the tensors in the respective paths uniform (see Supplementary Section 1 for details) 36 . In summary, our network mainly consists of convolution layers, residual blocks, pooling layers, upsampling blocks, and concatenate layers. To maximize the efficiency of the model, after repeatedly adjusted the hyper-parameters of the network (number of layers and nodes), we found that in the whole network the number of residual blocks for each path should be set to 4, and the basical filter numbers of the convolution layers should be 50. The tensor data of each path in the network will be performed 1, 1/2, 1/4, and 1/8 downsampling operations by adopting pooling layers with different scales respectively, and then different numbers of upsampling blocks will be adopted to make the sizes of the tensors in the corresponding paths uniform. Besides, it has been found that implementing shortcuts between residual blocks contributes to making the convergence of the network more stable. Furthermore, to avoid over-fitting as the common problem of the deep neural network, L2 regularization is adopted in each convolution layer of residual blocks and upsampling blocks instead of all convolution layers of the proposed network, which can enhance the generalization ability of the network.
Although the purpose of building the network is to achieve phase unwrapping and obtain the absolute phase, there is no need to directly set the absolute phase as the network's label. Since Φ h (x, y) is simply the linear combination of k h (x, y) and φ h (x, y) according to Eq. (4), Φ h (x, y) can be obtained immediately if k h (x, y) is known. Once k h (x, y) is set as the output data of the network, the purpose of our network is to implement semantic segmentation 37 , which is a pixel-wise classification. It is easy to understand that the complexity of the network will be greatly reduced so that the loss of the network will converge faster and more stable, and the prediction accuracy of the network is effectively improved. Different from the traditional SPU and TPU that the phase unwrapping is performed by utilizing the phase information solely in the spatial or temporal domain, it should be noted that our proposed method based on deep neural network is able to learn feature extraction and data screening, thus can exploit the phase information in the spatial and temporal domain simultaneously, providing more degrees of freedom and possibilities to achieve significantly better unwrapping performance (refer to Supplementary Section 3 for details).  Figure 1. The diagram of the proposed method. The whole framework is composed of data process, deep neural network, and phase-to-height mapping. Data process is performed to extract phases and remove the background from fringe images according to Eq. (3) and Supplementary Eq. S1. Deep neural network, consisting of convolutional layers, pooling layers, residual blocks, upsampling blocks, and concatenate layer, is used to predict the period order map k h (x, y) from the input data (Φ l (x, y) and φ h (x, y)). Then, using Eq. (4), Φ h (x, y) is obtained and converted into 3D results after phase-to-height mapping.
www.nature.com/scientificreports www.nature.com/scientificreports/ Then, using Eq. (4), Φ h (x, y) is obtained and converted into 3D results after phase-to-height mapping. In preparation for phase-to-height mapping, the projection matrices of the camera and projector need to be obtained through system calibration 38,39 . Besides, in order to speed up the reconstruction, we suggest phase-to-height mapping to be implemented with a graphics processing unit 40 or several look-up tables 41

Results
Quantitative comparison with MF-TPU. In the first experiment, to verify the actual performance of the proposed DL-TPU, the trained DNN models for phase unwrapping with different high-frequency fringes are utilized to make predictions on the testing dataset (200 image pairs) (refer to Supplementary Section 2 for details), and MF-TPU is also implemented for comparison. In order to quantitatively analyze the accuracy of phase unwrapping for DL-TPU and MF-TPU, the phases with different high frequences are independently unwrapped by the two algorithms, and the average error rates for phase unwrapping on the testing dataset are calculated and plotted against f h in Fig. 2(a). It should be noted that these results are calculated only by comparing the differences between the obtained phases and the label's phases for each valid point from the testing dataset (refer to Supplementary Section 2 for identifying the valid points). The label's phases can be correctly acquired as the 'ground-truth' phase by exploiting multiple sets of phases with different frequencies to hierarchically unwrap the wrapped phase step by step. It can be seen from Fig. 2(a) that with the increase of f h the reconstructed phases of MF-TPU are completely obviated, with a substantial increase of phase unwrapping error rate from 0 to 12.71%. The result shows again that MF-TPU cannot successfully unwraps a phase map when f h ≥ 16 due to the non-negligible noises and other error sources in actual measurement. However, our approach always provides acceptable results, with more than 95% of all valid pixels being properly unwrapped.  www.nature.com/scientificreports www.nature.com/scientificreports/ that compared with MF-TPU our method can achieve much better unwrapping results and decrease the phase unwrapping errors by almost an order of magnitude.
In order to reflect the specific performance of DL-TPU and MF-TPU more intuitively, the 3D reconstruction results after phase unwrapping for a representative sample on the testing dataset are illustrated and compared in Fig. 2(b), and the phase unwrapping error rates can be obviously seen in the background. It can be found from Fig. 2(b) that our approach provides the smallest phase unwrapping errors and the significant improvement of phase measurement quality with the period number f h as expected. It can be further observed that the fringe order errors are mostly concentrated on the dark regions and object edges where the fringe quality is low. Different from MF-TPU, phase unwrapping errors caused by the low signal-to-noise ratio (SNR) region of phases is significantly reduced by using DL-TPU. For these low SNR region, the remaining phase errors have the characteristics of accumulation and can be easily further corrected by some compensation algorithm for fringe order errors [42][43][44] (refer to Supplementary Section 4 for details of these compensation algorithms). Consequently, the trained models can substantially decrease error points to provide better phase unwrapping results (even f h = 64) and lower error rates, which demonstrates the capability and reliability of DL-TPU for phase unwrapping.

Performance analysis under different types of phase errors. Intensity noise.
In the following series of experiments, we will further verify the superiority of DL-TPU in the presence of different types of phase errors. In high-speed 3D measurement, the quality of the fringe patterns is poorer than that of the static measurement because it is projected and captured with limited exposure time. To emulate the practical measurement conditions, we measure a standard ceramic plate using DL-TPU (f h = 32) but artificially adjust the camera's exposure time to 39 ms, 20 ms, 15 ms, and 10 ms. To better analyze and compare the reliability of the accuracy results for phase unwrapping, the absolute phase map obtained using the 12-step phase-shifting algorithm and combining with a highly redundant multi-frequency temporal phase unwrapping strategy (with different frequencies including 1, 8, 16, and 32) can serve as the reference phase. Next, the error rate of phase unwrapping and the variance of the phase error σ φ Δ h for different approaches are easily calculated by making a comparison between the unwrapped phase and the reference phase for each valid point.
Obviously, as the exposure time decreases, the quality of the phase measurement drops significantly presented in Fig. 3(a,b). Since the exposure time is a key factor affecting the speed and quality of phase measurement, the shorter the exposure time the algorithm can withstand, the faster the measurement can be achieved with six projection patterns in FFP. Therefore, a more robust phase unwrapping method is essential to eliminate the phase ambiguity introduced by reduced exposure times and make phase unwrapping correct. In Fig. 3(c,d), it can be found that DL-TPU can always provide higher success rate of phase unwrapping and lower phase error σ φ Δ h compared with MF-TPU, making it more appropriate for the high-speed 3D shape measurement applications.   www.nature.com/scientificreports www.nature.com/scientificreports/ Low fringe modulation. Another attractive attribute of DL-TPU is its good tolerance to noise that can significantly suppress phase unwrapping errors in low-fringe-modulation areas, which frequently appear in practical measurement for the surfaces of complex objects, like the tested object shown in Fig. 4(a,b). For the low-modulation logo region, conventional MF-TPU results provide spinous results teemed with significant delta-spike artifacts, as shown in Fig. 4(c). In contrast, the DNN approach successfully overcomes the low-SNR problem and produces smooth measurement results with negligible errors, as shown in Fig. 4(d). This experimental result confirms once again that DL-TPU can provide superior capability and stability of phase unwrapping for suppressing unwrapping errors caused by low fringe modulation.
Intensity nonlinearity. In this section, we test the proposed DL-TPU under different degrees of intensity gamma distortion. The gamma distortion, or so called intensity nonlinearity, is a common error source in FPP due to the nonlinear response of the commercial projector, introducing high-order harmonics to the projected fringe patterns. The intensity of the fringes with the gamma distortion can be expressed as where γ represents the nonlinearity parameter of projector that means the nonlinear response of the commercial projector. Then, we choose an industrial workpiece of metal as the measured object to validate the resistance of DL-TPU to the gamma distortion. A set of fringe patterns with different nonlinearity intensities, ranging from 0.5 to 1.5, are generated using Eq. (12) and projected onto the measured object in Fig. 5(a). It can be found from the 3D results shown in Fig. 5(b) that MF-TPU cannot provide acceptable phase unwrapping results even under low-level gamma distortions. On the contrary, DL-TPU is able to achieve a close to ideal phase unwrapping result even when γ is 0.8. It should be also noticed that, when γ is as low as 0.5 or as high as 1.5, both of the two   www.nature.com/scientificreports www.nature.com/scientificreports/ approaches can produce meaningful results since the phase errors artificially introduced is much larger than the "safe line" without triggering phase unwrapping errors, so that the success/error rate of unwrapping is about fifty-fifty. In Fig. 5(c-e), for phase unwrapping with different high frequencies (such as 32, 48 and 64) under different degrees of intensity gamma distortion, the statistics curves of phase unwrapping for MF-TPU are shown as the solid lines, and the results are significantly improved by using DL-TPU as shown by dashed lines. These www.nature.com/scientificreports www.nature.com/scientificreports/ results verify that our method can significantly reduce the fringe order errors of phase unwrapping and produce high-quality absolute phases even under a certain degree of gamma distortion in the FPP system. Application to high-speed 3D surface imaging. Finally, our system, which can project and capture the fringe images at the speed of 25 Hz, is applied to imaging some classical dynamic scenes for fast 3D reconstruction: objects with fast translation movement and rapid rotatory motion. In Fig. 6(a), a standard ceramic plate, fixed on precise displacement platform, is performed to periodic translational movement at the speed of 1.25 cm/s. In traditional MF-TPU, it is more much difficult to recovery the high-frequency absolute phase using only one unit-frequency phase in Fig. 6(c) due to the unavoidable noises in actual measurement. Therefore, to guarantee a stable phase unwrapping success rate for the high-frequency phase, three sets of phase-shifting fringe patterns, so-called MF-TPU (3f) in which the frequency of the second set of fringe patterns is 8, are used to achieve high-accuracy but inefficient phase unwrapping. When measuring dynamic scenes, the relative motion between the object and the phase-shifting fringe patterns sequentially projected will cause motion artifacts and thus introduce additional phase errors into the initial phase map which is non-negligible and becomes more severe because of projecting more patterns as presented in Fig. 6(c). However, without the assistance of additional patterns, it illustrates the reliability and efficiency of DL-TPU from Fig. 6(c) that the trained models can still achieve better phase unwrapping results. We try to take one cross-section on the 3D results of the ceramic plate to compare DL-TPU with MF-TPU and MF-TPU (3f). From the comparison results shown in Fig. 6(d), it can be found that our approach provides the highest unwrapping reliability and best noise-robustness compared with other methods.
And then, for measuring the rapid rotatory motion, the statue of David rotates in a counter-clockwise direction at the rotation rate of 3 rpm as shown in Fig. 6(b). Undoubtedly, in Fig. 6(e), the experiment yielded a result similar to that of the fast translational motion. It can be found from these results that the 3D profile information with high quality of the ceramic plate and the David statue are accurately acquired during the entire movement of the tested objects, again demonstrating the unwrapping stability of the proposed method for implementing high-precision, fast absolute 3D shape measurement.

Discussion
In this work, we have demonstrated that a trained deep neural network can greatly improve the ability of TPU with high-frequency fringes acquired by a common FPP system. This high-performance TPU (so-called DL-TPU) can be achieved based on a deep neural network after appropriate training. Compared with MF-TPU, DL-TPU can effectively recover the absolute phase from two wrapped phases with different frequencies by exploiting both spatial and temporal phase information in an integrated way. It can substantially improve the reliability of phase unwrapping even when high-frequency fringe patterns are used. We have further experimentally demonstrated for the first time, to our knowledge, that the high-frequency phase obtained from 64-period 3-step phase-shifting fringe patterns can be directly and reliably unwrapped from one unit-frequency phase, facilitating high-accuracy high-speed 3D surface imaging with use of only 6 projected patterns without exploring any prior information and geometric constraint. After that, various experiments have been designed to access the phase unwrapping capability of the proposed approach under the conditions of intensity noise, low fringe modulation, and intensity nonlinearity. Experimental results have verified that TPU using deep learning provides significantly improved unwrapping reliability to realize the absolute 3D measurement for objects with complex surfaces. Besides, for the applications to high-speed FPP, it has also been observed that the deep learning-based approach is much less affected by motion artifacts in dynamic measurement and can successfully reconstruct the surface profile of the moving and rotating objects at high speed. These results highlight that machine learning is able to potentially overcome challenging issues in optical metrology, and provides new possibilities and flexibilities to design more powerful high-speed FPP systems. Although the TPU and FPP have been the main focus of this research, we envisage that the similar deep learning framework might also be applicable to other 3D surface imaging modalities, including, e.g., stereo vision 45 , DIC 46 , spatial-temporal stereo 47 , spatial-temporal correlation 48 , among others.