Abstract
With the advances in scientific foundations and technological implementations, optical metrology has become versatile problemsolving backbones in manufacturing, fundamental research, and engineering applications, such as quality control, nondestructive testing, experimental mechanics, and biomedicine. In recent years, deep learning, a subfield of machine learning, is emerging as a powerful tool to address problems by learning from data, largely driven by the availability of massive datasets, enhanced computational power, fast data storage, and novel training algorithms for the deep neural network. It is currently promoting increased interests and gaining extensive attention for its utilization in the field of optical metrology. Unlike the traditional “physicsbased” approach, deeplearningenabled optical metrology is a kind of “datadriven” approach, which has already provided numerous alternative solutions to many challenging problems in this field with better performances. In this review, we present an overview of the current status and the latest progress of deeplearning technologies in the field of optical metrology. We first briefly introduce both traditional imageprocessing algorithms in optical metrology and the basic concepts of deep learning, followed by a comprehensive review of its applications in various optical metrology tasks, such as fringe denoising, phase retrieval, phase unwrapping, subset correlation, and error compensation. The open challenges faced by the current deeplearning approach in optical metrology are then discussed. Finally, the directions for future research are outlined.
Introduction
Optical metrology is the science and technology of making measurements with the use of light as standards or information carriers^{1,2,3}. Light is characterized by its fundamental properties, namely, amplitude, phase, wavelength, direction, frequency, speed, polarization, and coherence. In optical metrology, these fundamental properties of light are ingeniously utilized as information carriers of a measurand, enabling a wide range of optical metrology tools that allow the measurement of a wide range of subjects^{4,5,6}. For example, optical interferometry takes advantage of the wavelength of light as a precise dividing marker of length. The speed of light defines the international standard of length, the meter, as the length traveled in vacuum during a time interval of 1/299,792,458 of a second^{7}. As a result, optical metrology is being increasingly adopted in many applications where reliable data about the distance, displacement, dimensions, shape, roughness, surface properties, strain, and stress state of the object under test are required^{8,9,10}. Optical metrology is a broad and interdisciplinary field relating to diverse disciplines such as photomechanics, optical imaging, and computer vision. There is no strict boundary between those fields, and in fact, the term “optical metrology” is often interchangeably used with “optical measurement”, in which achieving higher precision, sensitivity, repeatability, and speed is always a priority^{11,12}.
There are a few inventions that revolutionized optical metrology. The first is the invention of laser^{13,14}. The advent of laser interferometry could be traced back to experiments conducted independently in 1962 by Denisyuk^{15} and Leith and Upatnix^{16} with the objective of marrying coherent light produced by lasers with Gabor’s holography method^{17}. The use of lasers as a light source in optical metrology marked the first time that such highly controlled light became available as a physical medium to measure the physical properties of samples, opening up new possibilities for optical metrology. The second revolution was initiated with the invention of charged coupled device (CCD) cameras in 1969, which replaced the earlier photographic emulsions by virtue of recording optical intensity signals from the measurand digitally^{8}. The use of the CCD camera as a recording device in optical metrology represented another important milestone: the compatibility of light with electricity, i.e., “light” can be converted into “electrical quantity (current, voltage, etc.)”. This means that the computational storage, access, analysis, and transmission of captured data are easily attainable, leading to the “digital transition” of optical metrology. Computerbased signal processing tools were introduced to automate the quantitative determination of optical metrology data, eliminating the inconvenience associated with the manual, laborintensive, timeconsuming evaluation of fringe patterns^{18,19,20}. Methods such as digital interferometry^{21}, digital holography^{22}, and digital image correlation (DIC)^{23} have become state of the art by now.
With the digital transition, image processing plays an essential role in optical metrology for the purpose of converting the observed measurements (generally displayed in the form of deformed fringe/speckle patterns) into the desired attributes (such as geometric coordinates, displacements, strain, refractive index, and others) of an object under study. Such informationrecovery process is similar to those of computer vision and computational imaging, presenting as an inverse problem that is often illposed with respect to the existence, uniqueness, and stability of the solution^{24,25,26,27}. Tremendous progress has been achieved in terms of accurate mathematical modeling (statistical models of noise and the observational data)^{28}, regularization techniques^{29}, numerical methods, and their efficient implementations^{30}. For the field of optical metrology, however, the situation becomes quite different due to the fact that the optical measurements are frequently carried out in a highly controlled environment. Instead of explicitly interpreting optical metrology tasks from the perspective of solving inverse problems (based on a formal optimization framework), mainstream scientists in optical metrology prefer to bypass the illposedness and simplify the problem by means of active strategies, such as sample manipulation, system adjustment, and multiple acquisitions^{31}. A typical example is the phaseshifting technique^{32}, which sacrifices the time and effort of capturing multiple fringe patterns to exchange for a deterministic and straightforward solution. Under such circumstances, the phase retrieval problem is wellposed or even overdetermined (when the phaseshifting step is larger than 3), and employing more evolved algorithms, such as compressed sensing^{33} and nonconvex (lowrank) regularization^{34} seem redundant and unnecessary, especially as they fail to demonstrate clear advantages over classical ones in terms of accuracy, adaptability, speed, and, more importantly, easeofuse. This gives us the key question and motivation of this review paper: whether machine learning will be the driving force in optical metrology not only provides superior solutions to the growing new challenges but also tolerates imperfect measurement conditions with the least efforts, such as additive noise, phaseshifting error, intensity nonlinearity, motion, and vibration.
In the past few years, we have indeed witnessed the rapid progress on highlevel artificial intelligence (AI), where deep representations based on convolutional and recurrent neural network models are learned directly from the captured data to solve many tasks in computer vision, computational imaging, and computeraided diagnosis with unprecedented performance^{35,36,37}. The early framework for deep learning was established on artificial neural networks (ANNs) in the 1980s^{38}, yet only recently the real impact of deep learning became significant due to the advent of fast graphics processing units (GPUs) and the availability of large datasets^{39}. In particular, deep learning has revolutionized the computer vision community, introducing nontraditional and effective solutions to numerous challenging problems such as object detection and recognition^{40}, object segmentation^{41}, pedestrian detection^{42}, image superresolution^{43}, as well as medical imagerelated applications^{44}. Similarly, in computational imaging, deep learning has led to rapid growth in algorithms and methods for solving a variety of illposed inverse computational imaging problems^{45}, such as superresolution microscopy^{46}, lensless phase imaging^{47}, computational ghost imaging^{48}, and image through scattering media^{49}. In this context, researchers in optical metrology have also made significant explorations in this regard with very promising results within just a few short years, as evidenced by the everincreasing and the respectable number of publications^{50,51,52,53,54,55}. Meanwhile, those research works are scattered rather than systematic, which gives us the second motivation to provide a comprehensive review to understand their principles, implementations, advantages, applications, and challenges. It should be noted that optical metrology covers a wide range of methods and applications today. It would be beyond the scope of this review to discuss all relevant technologies and trends. We, therefore, restrict our focus to phase/correlation measurement techniques, such as interferometry, holography, fringe projection, and DIC. Although phase retrieval and wavefield sensing technologies, such as defocus variation (Gerchberg–Saxton–Fienuptype methods^{56,57}), transport of intensity equation (TIE)^{58,59}, aperture modulation^{60}, ptychography^{61,62}, and wavefront sensing (e.g., Shack–Hartmann^{63}, Pyramid^{64}, and computational shear interferometry^{65}), has been recently introduced to optical metrology^{66,67,68}, they may be more appropriately placed in the field of “computational imaging”. The reader is referred to the earlier review by Barbastathis et al.^{45} for more detailed information on this topic. It is also worth mentioning that (passive) stereovision, which extracts depth information from stereo images, is an important branch of photogrammetry that has been extensively studied by the computer vision community. Although stereovision techniques do not strictly fall into the category of optical metrology, due to the fact that many ideas and algorithms in DIC and fringe projection were “borrowed” from stereovision, they are also included in this review.
The remainder of this review is organized as follows. We start by summarizing the relevant foundations and image formation models of different optical metrology approaches, which are generally required as a priori knowledge in conventional optical metrology methods. Next, we present a general hierarchy of the imageprocessing algorithms that are most commonly used in conventional optical metrology in the “Image processing in optical metrology” section. After a brief introduction to the history and basic concept of deep learning, we recapitulate the advantages of using deep learning in optical metrology tasks by interpreting the concept as an optimization problem. We then present a recollection of the deep learning methods that have been proposed in optical metrology, suggesting the pervasive penetration of deep learning in almost all aspects of the imageprocessing hierarchy. The “Challenges” section discusses both technical and implementation challenges faced by the current deeplearning approach in optical metrology. In the “Future directions” section, we give our outlook for the prospects for deep learning in optical metrology. Finally, conclusions and closing remarks are given in the “Conclusions” section.
Image formation in optical metrology
Optical metrology methods often form images (e.g., fringe/speckle patterns) for processing. Thus image formation is essential to reconstruct various quantities. In most interferometric metrological methods, the image is formed by the coherent superposition of the object and reference beams. As a result, the raw intensity across the object is modulated by a harmonic function, resulting in the bright and dark contrasts, known as fringe patterns. A typical fringe pattern can be written as^{18,19}
where (x, y) refers to the spatial coordinates along the horizontal and vertical directions, A(x, y) is the background intensity, B(x, y) is the fringe amplitude, ϕ(x, y) is the phase distribution. In most cases, phase is the primary quantity of the fringe pattern to be retrieved as it is related to the final object quantities of interest, such as surface shape, mechanical displacement, 3D coordinates, and their derivations. The related techniques include classical interferometry, photoelasticity, holographical interferometry, digital holography, etc. On a different note, the fringe patterns can also be created noninterferometrically by overlapping of two periodic gratings as in geometric moiré, or incoherent projection of structured patterns onto the object surface as in fringe projection profilometry (FPP)/deflectometry. As summarized in Fig. 1, though the final fringe patterns obtained in all forms of fringebased techniques discussed herein are similar in form, the physics behind the image formation process and the meanings of the fringe parameters are different. In DIC, the measured intensity images are speckle patterns of the specimen surface before and after deformation,
where \(\left( {D_x(x,y),D_y(x,y)} \right)\) refers to the displacement vectorfield mapping from the undeformed/reference pattern I_{r}(x, y) to the deformed one I_{d}(x, y). It directly provides fullfield displacements and strain distributions of the sample surface. The DIC technique can also be combined with binocular stereovision or stereophotogrammetry to recover depth and outofplane deformation of the surface from the displacement field (socalled disparity) by exploiting the unique textures present in two or more images of the object taken from different viewpoints. The image formation processes for typical optical metrology methods are briefly described as follows.

(1)
Classical interferometry: In classical interferometry, the fringe pattern is formed by superimposition of two smooth coherent wavefronts, one of which is typically a flat or spherical reference wavefront and the other a distorted wavefront formed and directed by optical components^{69,70} (Fig. 1a). The phase of the fringe pattern reflects the difference between the ideal reference wavefront and object wavefront. Typical examples of classical interferometry include the use of configurations such as the Michelson, Fizeau, Twyman Green, and MachZehnder interferometers to characterize the surface, aberration, or roughness of optical components with high accuracy, of the order of a fraction of the wavelength.

(2)
Photoelasticity: Photoelasticity is a nondestructive, fullfield, optical metrology technique for measuring the stress developed in transparent objects under loading^{71,72}. Photoelasticity is based on an optomechanical property, socalled “double refraction” or “birefringence” observed in many transparent polymers. Combined with two circular polarizers (linear polarizer coupled with quarter waveplate) and illuminated with a conventional light source, a loaded photoelastic sample (or photoelastic coating applied to an ordinary sample) can produce fringe patterns whose phases are associated with the difference between the principal stresses in a plane perpendicular to the light propagation direction^{73} (Fig. 1b).

(3)
Geometric moiré/Moiré interferometry: In optical metrology, the moiré technique is defined as the utilization of the moiré phenomenon to measure shape, deformation, or displacements of surfaces^{74,75}. A moiré pattern is formed by the superposition of two periodic or quasiperiodic gratings. One of these gratings is called reference grating, and the other one is object grating mounted or engraved on the surface to be studied, which is subjected to distortions induced by surface changes. For inplane displacement and strain measurements, moiré technology has evolved from lowsensitivity geometric moiré^{75,76,77} to highsensitivity moiré interferometry^{75,78}. In moiré interferometry, two collimated coherent beams interfere to produce a virtual reference grating with high frequencies, which interacts with the object grating to create the moiré pattern with fringes representing subwavelength inplane displacements per contour (Fig. 1c).

(4)
Holographic interferometry: Holography, invented by Gabor^{17} in the 1940 s, is a technique that records an interference pattern and uses diffraction to reproduce a wavefront, resulting in a 3D image that still has the depth, parallax, and other properties of the original scene. The principle of holography can also be utilized as an optical metrology tool. In holographic interferometry, a wavefront is first stored in the hologram and later interferometrically compared with another, producing fringe patterns that yield quantitative information about the object surface deriving these two wavefronts^{79,80}. This comparison can be made in three different ways that constitute the basic approaches of holographic interferometry: realtime^{81}, doubleexposure^{82}, and timeaverage holographic interferometry^{83,84} (Fig. 1d), allowing for both qualitative visualization and quantitative measurement of realtime deformation and perturbation, changes of the state between two specific time points, and vibration mode and amplitude, respectively.

(5)
Digital holography: Digital holography utilizes a digital camera (CMOS or CCD) to record the hologram produced by the interference between a reference wave and an object wave emanating from the sample^{85,86} (Fig. 1e). Unlike classical interferometry, the sample may not be precisely infocus and can even be recorded without using any imaging lenses. The numerical propagation using Fresnel transform or angular spectrum algorithm enables digital refocusing at any depths of the sample without physically moving it. In addition, digital holography also provides an alternative and much simpler way to realize doubleexposure^{87} and timeaveraged holographic interferometry^{88,89}, without additional benefits of quantitative evaluation of holographic interferograms and flexible phaseaberration compensation^{86,90}.

(6)
Electronic speckle pattern interferometry (ESPI): In ESPI, the tested object generally has an optically rough surface. When illuminated by a coherent laser beam, it will create a speckle pattern with random phase, amplitude, and intensity^{91,92}. If the object is displaced or deformed, the objecttoimage distance will change, and the phase of the speckle pattern will change accordingly. In ESPI, two speckle patterns are acquired one each for the undeformed and deformed states, by double exposure, and the absolute difference between these two deformed patterns results in the form of fringes superimposed on the speckle pattern where each fringe contour normally represents a displacement of half a wavelength (Fig. 1f).

(7)
Electronic speckle shearing interferometry (shearography): Electronic speckle shearing interferometry, commonly known as shearography, is an optical measurement technique similar to ESPI. However, instead of using a separate known reference beam, shearography uses the test object itself as the reference; and the interference pattern is created by two sheared speckle fields originated from the light scattered by the surface of the object under test^{93,94}. In shearography, the phase encoded in the fringe pattern depicts the derivatives of the surface displacements, i.e., to the strain developed on the object surface (Fig. 1g). Consequently, the anomalies or defects on the surface of the object can be revealed more prominently, rendering shearography one of the most powerful tools for nondestructive testing applications.

(8)
Fringe projection profilometry/deflectometry: Fringe projection is a widely used noninterferometic optical metrology technique for measuring the topography of an object at a certain angle between the observation and the projection point^{95,96}. The sinusoidal pattern in fringe projection techniques is generally incoherently formed by a digital video projector and directly projected onto the object surface. The corresponding distorted fringe pattern is recorded by a digital camera. The average intensity and intensity modulation of the captured fringe pattern are associated with the surface reflectivity and ambient illuminations, and the phase is associated with the surface height^{32} (Fig. 1h). Deflectometry is another structured light technique similar to FPP, but instead of being produced by a projector, similar types of fringe patterns are displayed on a planar screen and distorted by the reflective (mirrorlike) test surface^{97,98}. The phase measured in deflectometry is directly sensitive to the surface slope (similar to shearography), so it is more effective for detecting shape defects^{99,100}.

(9)
Digital image correction (DIC)/stereovision: DIC is another important noninterferometic optical metrology method that employs image correlation techniques for measuring fullfield shape, displacement, and strains of an object surface^{23,101,102}. Generally, the object surface should have a random intensity distribution (i.e., a random speckle pattern), which distorts together with the sample surface as a carrier of deformation information. Images of the object at different loadings are captured with one (2DDIC)^{23}, or two synchronized cameras (3DDIC)^{103}, and then these images are analyzed with correlationbased matching (tracking or registration) to extract fullfield displacement and strain distributions (Fig. 1i). Unlike 2DDIC that is limited to inplane deformation measurement of nominal planar objects, 3DDIC, also known as stereoDIC, allows for the measurement of 3D displacements (both inplane and outofplane) for both planar and curved surfaces^{104,105}. 3DDIC is inspired by binocular stereovision or stereophotogrammetry in the computer vision community, which recovers the 3D coordinates by finding pixel correspondence (i.e., disparity) of unique features that exist in two or more images of the object taken from different points of view^{106,107}. Nevertheless, unlike DIC, in which the displacement vector can be along both x and y directions, in stereophotogrammetry, after epipolar rectification, disparities between the images are along the x direction only^{108}.
Image processing in optical metrology
The elementary task of digital image processing in optical metrology can be defined as the conversion of the captured raw intensity image(s) into the desired object quantities taking into account the physical model of the intensity distribution describing the image formation process. In most cases, image processing in optical metrology is not a onestep procedure, and a logical hierarchy of image processing steps should be accomplished. As illustrated in Fig. 2, the imageprocessing hierarchy typically encompasses three main steps, preprocessing, analysis, and postprocessing, each of which includes a series of mapping functions that are cascaded to form a pipeline structure. For each operation, the corresponding f is an operator that transforms the imagelike input into an output of corresponding (possibly resampled) spatial dimensions. Figure 3 shows the big picture of the imageprocessing hierarchy with various types of algorithms distributed in different layers. Next, we will zoom in one level deeper on each of the hierarchical steps.
Preprocessing
The purpose of preprocessing is to assess the quality of the image data and improve the data quality by suppressing or minimizing unwanted disturbances (noise, aliasing, geometric distortions, etc.) before being fed to the following image analysis stage. It takes place at the lowest level (socalled iconic level) of image processing —the input and output of the corresponding mapping function(s) are both intensity images, i.e., \(f_{anal}:I \to I^\prime\). Representative image preprocessing algorithms in optical metrology includes but not limited to:

Denoising: In optical metrology, noise in captured raw intensity data has several sources that are related to the electronic noise of photodetectors and the coherent noise (socalled speckle). Typical numerical approaches to noise reduction include median filter^{109}, spin filter^{110}, anisotropic diffusion^{111}, coherence diffusion^{112}, Wavelet^{113}, windowed Fourier transform (WFT)^{114,115}, block matching 3D (BM3D)^{116}, etc. For more detailed information and comparisons of these algorithms, the reader may refer to the reviews by Kulkarnia and Rastogi^{117} and Bianco et al.^{118}.

Enhancement: Image enhancement is a crucial preprocessing step in intensitybased fringe analysis approaches, such as fringe tracking or skeletonizing. Referring to the intensity model, the fringe pattern may still be disturbed by locally varying background and intensity modulation after denoising. Several algorithms have been developed for fringe pattern enhancement, e.g., adaptive filter^{119}, bidimensional empirical mode decomposition^{120,121}, and dualtree complex wavelet transform^{122}.

Color channel separation: Because a Bayer color sensorcamera captures three monochromatic (red, green, and blue) images at once, color multiplexing techniques are often employed in optical metrology to speed up the image acquisition process^{123,124,125,126,127}. However, the separation of three color channels is not so straightforward due to the coupling and imbalance among the three color channels. Many crosstalkmatrixbased color channel calibration and leakage correction algorithms have been proposed to minimize such side effects^{128,129,130}.

Image registration and rectification: Image registration and rectification are aimed at aligning two or more images of the same object to a reference or correcting image distortion due to lens aberration. In stereophotogrammetry, epipolar (stereo) rectification determines a reprojection of each image plane so that pairs of conjugate epipolar lines in both images become collinear and parallel to one of the image axes^{108}.

Interpolation: Image interpolation algorithms, such as the nearest neighbor, bilinear, bicubic^{109}, and nonlinear regression^{131} are necessary when the measured intensity image is sampled at an insufficient dense grid. In DIC, to reconstruct displacements with subpixel accuracy, the correlation criterion must be evaluated at nonintegerpixel locations^{132,133,134}. Therefore, image interpolation is also a key algorithm for DIC to infer subpixel gray values and grayvalue gradients in many subpixel displacement registration algorithms, e.g., the Newton–Raphson method^{133,134,135}.

Extrapolation: Image extrapolation, especially fringe extrapolation is often employed in Fourier transform (FT) fringe analysis methods to minimize the boundary artifacts induced by spectrum leakage. Schemes for the extrapolation of the fringe pattern beyond the borders have been reported, such as softedged frequency filter^{136} and iterative FT^{137}.
Analysis
Image analysis is the core component of the imageprocessing architecture to extract the key informationbearing parameter(s) reflecting the desired physical quantity being measured from the input images. In phase measurement techniques, image analysis refers to the reconstruction of phase information from the fringelike modulated intensity distribution(s), i.e., \(f_{anal}:I \to \phi\).

Phase demodulation: The aim of phase demodulation, or more specifically, fringe analysis, is to obtain the wrapped phase map from the quasiperiodic fringe patterns. Various techniques for fringe analysis have been developed to meet different requirements in diverse applications, which can be broadly classified into two categories:

Spatial phase demodulation: Spatial phasedemodulation methods are capable of estimating the phase distribution through a singlefringe pattern. FT^{138,139}, WFT^{114,115,140}, and wavelet transform (WT)^{141} are classical methods for the spatial carrier fringe analysis. For closedfringe patterns without the carrier, alternative methods, such as Hilbert spiral transform^{142,143}, regularized phase tracking (RPT)^{144,145} and frequencyguided sequential demodulation^{146,147}, can be applied provided that the cosinusoidal component of the fringe pattern can be extracted by preprocessing algorithms of denoising, background removal, and fringe normalization. The interested reader may refer to the book by Servin et al.^{148} for further details.

Temporal phase demodulation: Temporal phasedemodulation techniques detect the phase distribution from the temporal variation of fringe signals, as typified by heterodyne interferometry^{149} and phaseshifting techniques^{150}. Many phaseshifting algorithms have originally been proposed for optical interferometry/holography and later been adapted and extended to fringe projection, for example, standard Nstep phaseshifting algorithm^{151}, Hariharan 5step algorithm^{21}, 2 + 1 algorithm^{152} etc. The interested reader may refer to the chapter “Phase shifting interferometry”^{153} of the book edited by Malacara^{4} and the review article by Zuo et al.^{32} for more details about phaseshifting techniques in the contexts of optical interferometry and FPP, respectively.


Phase unwrapping: No matter which phasedemodulation technique is used, the retrieved phase distribution is mathematically wrapped to the principal value of the arctangent function ranging between −π and π. The result is what is known as a wrapped phase image, and phase unwrapping has to be performed to remove any 2πphase discontinuities. Phase unwrapping algorithms can be broadly classified into three categories:

Spatial phase unwrapping: Spatial phase unwrapping methods use only a single wrapped phase map to retrieve the corresponding unwrapped phase distribution, and the unwrapped phase of a given pixel is derived based on the adjacent phase values. Representative methods include Goldstein’s method^{154}, reliabilityguided method^{155}, Flynn’s method^{156}, minimal Lpnorm method^{157}, and phase unwrapping maxflow/mincut (PUMA) method^{158}. The interested reader may refer to the book by Ghiglia et al. for more technical details. There are also many reviews on the performance comparisons of different unwrapping algorithms for specific applications^{159,160,161}. Limited by the assumption of phase continuity, spatial phase unwrapping methods cannot fundamentally address the inherent fringe order ambiguity problem when the phase difference between neighboring pixels is greater than π.

Temporal phase unwrapping: To remove the phase ambiguity, temporal phase unwrapping methods generally generate different or synthetic wavelengths by adjusting flexible system parameters (wavelength, angular separation of light sources, spatial frequency, orientation of the projected fringe patterns) step by step, so that the object can be covered by fringes with different periods. Representative temporal phase unwrapping algorithms include graycode methods^{162,163}, multifrequency (hierarchical) methods^{164,165,166}, multiwavelength (heterodyne) methods^{167,168,169}, and numbertheoretical methods^{170,171,172,173}. For more detailed information about these methods, the reader can refer to the comparative review by Zuo et al.^{174} The advantage of temporal phase unwrapping lies in that the unwrapping is neighborhoodindependent and proceeds along the time axis on the pixel itself, enabling an absolute evaluation of the mod2π phase distribution.

Geometric phase unwrapping: Geometric phase unwrapping approaches can solve the phase ambiguity problem by exploiting the epipolar geometry of projector–camera systems. If the measurement volume can be predefined, depth constraints can be incorporated to preclude some phase ambiguities corresponding to the candidates falling out of the measurement range^{175,176,177,178,179,180,181,182,183,184,185}. Alternatively, an adaptive depthconstraint strategy can provide pixelwise depth constraint ranges according to the shape of the measured object^{186}. By introducing more cameras, tighter geometry constraints can be enforced so as to guarantee the unique correspondence and improve the unwrapping reliability^{185,187}.

In stereomatching techniques, image analysis refers to determining (tracking or matching) the displacement vector of each pixel point between a pair of acquired images, i.e., \(f_{anal}:(I_r,I_d) \to (D_x,D_y)\). In the routine implementation for DIC and stereophotogrammetry, a region of interest (ROI) or subset in the image is specified at first. The subset is further divided into an evenly spaced virtual grid. The similarity is evaluated at each point of the virtual grid in the reference image to obtain the displacement between two subsets. A fullfield displacement map can be obtained by sliding the subset in the searching area of the reference image and obtaining the displacement at each location.

Subset correlation: In DIC, to quantitatively evaluate the similarity or difference between the selected reference subset and the target subset, several correlation criteria have been proposed, such as crosscorrelation (CC), the sum of absolute difference (SAD), the sum of squared difference (SSD), zeromean normalized crosscorrelation criterion (ZNCC), zeromean normalized sum of squared difference (ZNSSD), and the parametric sum of squared difference (PSSD)^{188,189,190}. The subsequent matching procedure is realized by identifying the peak (or valley) position of the correlation coefficient distribution based on certain optimization algorithms. In stereophotogrammetry, nonparametric costs rely on the local ordering (i.e., Rank^{191}, Census^{192}, and Ordinal measures^{193}) of intensity values, which are more frequently used due to their robustness against radiometric changes and outliers, especially near object boundaries^{192,193,194}.

Subpixel refinement: The subset correlation methods mentioned above can only provide integerpixel displacements. To further improve the measurement resolution and accuracy, many subpixel refinement methods were developed, including intensity interpolation (i.e., the coarse–fine search method)^{195,196}, correlation coefficient curvefitting^{133,197}, gradientbased method^{198,199}, Newton–Raphson (NR) algorithm^{135,200,201}, and inverse compositional Gauss–Newton (ICGN) algorithm^{202,203,204}. Among these algorithms, NR and ICGN are most commonly used for their high registration accuracy and effectiveness in handling highorder surface transformations. However, they suffer from expensive computation cost stemming from their iterative nonlinear optimization and repeated subpixel interpolation. Therefore, accurate initial guesses obtained by integerpixel subset correlation methods are critical to ensure the rapid convergence^{205} and reduce the computational cost^{206}. In stereovision, the matching algorithms can be classified as local^{207,208,209}, semiglobal^{210}, and global methods^{211}. Local matching methods utilize the intensity information of a local subset centered at the pixel to be matched. Global matching methods take the result obtained by local matching methods as the initial value and then optimize the disparity by minimizing a predefined global energy function. Semiglobal matching methods reduce the 2D global energy minimization problem into a 1D one, enabling faster and more efficient implementations of stereomatching.
Postprocessing
In optical metrology, the main task of postprocessing is to further refine the measured phase or retrieved displacement field, and finally transform them into the desired physical quantity of the measured object, i.e., the corresponding operator \(f_{post}:\phi {{{\mathrm{/}}}}(D_x,D_y) \to q\), where q is the desired sample quantity.

Denoising: Instead of applying to raw fringe patterns, image denoising can also be used as a postprocessing algorithm to remove noise directly from the retrieved phase distribution. Various phase denoising algorithms have been proposed, such as leastsquare (LS) fitting^{212}, anisotropic average filter^{213}, WFT^{214}, total variation^{215}, and nonlocal means filter^{216}.

Digital refocusing: The numerical reconstruction of propagating wavefronts by diffraction is a unique feature of digital holography. Since the hologram of the object may not be recorded in the infocus plane. Numerical diffraction or backpropagation algorithms (e.g., Fresnel diffraction and angular spectrum methods) should be used to obtain a focused image by performing a planebyplane refocusing after the image acquisition^{217,218,219}.

Error compensation: There are various types of phase errors associated with optical metrology systems, such as phaseshifting error, intensity nonlinearity, and motioninduced error, which can be compensated with different types of postprocessing algorithms^{60,220,221}. In digital holographic microscopy, the microscope objective induces additional phase curvature on the measured wavefront, which needs to be compensated in order to recover the phase information induced by the sample. Typical numerical phaseaberration compensation methods include double exposure^{222}, 2D spherical fitting^{223} Zernike polynomials fitting^{224}, Fourier spectrum filtering^{225}, and principal component analysis (PCA)^{226}.

Quantity transformation: The final step of postprocessing and also the whole measurement chain is to convert the phase or displacement field into the desired sample quantity, such as height, thickness, displacement, stress, strains, and 3D coordinates, based on sample parameters (e.g., refractive index, relative stress constant) or calibrated system parameters (e.g., sensitivity vector and camera (intrinsic, extrinsic) parameters). The optical setup should be carefully designed to optimize the sensitivity with respect to the measuring quantity in order to achieve a successful and efficient measurement^{227,228}.
Finally, it should be mentioned that since optical metrology is a rapidly expanding field in both its scientific foundations and technological developments, the imageprocessing hierarchy used here cannot provide full coverage of all relevant methods and technologies. For example, phase retrieval and wavefield sensing technologies have shown great promise for inexpensive, vibrationtolerant, noninterferometric, optical metrology of optical surfaces and systems^{66,67}. These methods constitute an important aspect of computational imaging as they often involve solving illposed inverse problems. There are also some optical metrology methods based on solving constrained optimization problems with added penalties and relaxations (e.g., RPT phase demodulation^{144,145} and minimal Lpnorm phase unwrapping methods^{157}), which may make pre and postprocessing unnecessary. For a detailed discussion on this topic, please refer to the subsection “Solving inverse optical metrology problems: issues and challenges”.
Brief introduction to deep learning
Deep learning is a subset of machine learning, which is defined as the use of specific algorithms that enable machines to automatically learn patterns from large amounts of historical data, and then utilize the uncovered patterns to make predictions about the future or enable decision making under uncertain intelligently^{229,230}. The key specific algorithm used in machine learning is the ANN, which exploits input data \({{{\mathbf{x}}}} \in {{{\mathcal{X}}}} \subseteq {\Bbb R}^n\) to predict an unknown output \({{{\mathbf{y}}}} \in {{{\mathcal{Y}}}}\). The tasks accomplished by the ANN can be broadly divided as classification tasks or regression tasks, depending on whether y is a discrete label or a continuous value. The objective of machine learning is then to find a mapping function \(f:{{{\mathbf{x}}}} \to {{{\mathbf{y}}}}\). The choice of such functions is given by the neural network models with additional parameters \({{{\mathbf{\theta }}}} \in \Theta\): i.e., \({{{\hat{\mathbf y}}}} = f\left( {{{{\mathbf{x}}}},{{{\mathbf{\theta }}}}} \right) \approx {{{\mathbf{y}}}}\). The goal of this section is to provide a brief introduction to deep learning, as a preparation for the introduction of its applications in optical metrology later on.
Artificial neural network (ANN)
Inspired by the biological neural network (Fig. 4a), ANNs are composed of interconnected computational units called artificial neurons. As illustrated in Fig. 4b, the simplest neural network following the above concept is the perceptron, which consists of only one single artificial neuron^{231}. An artificial neuron takes a bias b and weight vector \({{{\mathbf{w}}}} = \left( {w_1,w_2, \cdots ,w_n} \right)^T\) as parameters \({{{\mathbf{\theta }}}} = \left( {b,w_1,w_2, \cdots ,w_n} \right)^T\) to map the input \({{{\mathbf{x}}}} = \left( {x_1,x_2, \cdots ,x_n} \right)^T\) to the output \(f_P\left( {{{\mathbf{x}}}} \right)\) through a nonlinear activation function σ as
Typical choices for such activation functions are the sign function \(\sigma \left( x \right) = sgn\left( x \right)\), sigmoid function \(\sigma \left( x \right) = \frac{1}{{1\, + \,e^{  x}}}\), hyperbolic tangent function \(\sigma \left( x \right) = \frac{{e^x\,  \,e^{  x}}}{{e^x\, + \,e^{  x}}}\), and rectified linear unit (ReLU) \(\sigma \left( x \right) = \max \left( {0,x} \right)\)^{232}. A single perceptron can only model a linear function, but because of the activation functions and in combination with other neurons, the modeling capabilities will increase dramatically. Arranged in a single layer, it has already been shown that neural networks can approximate any continuous function f(x) on a compact subset of \({\Bbb R}^n\). A singlelayer network, also called singlelayered perceptron (SLP), is represented as a linear combination of M individual neurons:
where v_{i} is the combination weight of the ith neuron. We can further extend the mathematical specification of SLP by stacking several singlelayer networks into a multilayered perceptron (MLP)^{233}. As the network goes deeper (number of layers increase), the number of free parameters increases, as well as the capability of the network to represent highly nonlinear functions^{234}. We can formalize this mathematically by stacking several singlelayer networks into a deep neural network (DNN) with N layers, i.e.
where the circle ◦ is the symbol for the composition of functions. The first layer is referred to as the input layer, the last as the output layer, and the layers in between the input and output are termed as hidden layers. We refer to these using the term “deep”, when a neural network contains many hidden layers, hence the term “deep learning”.
Neural network training
Having gained basic insights into neural networks and their basic topology, we still need to discuss how to train the neural network, i.e., how its parameters θ are actually determined. In this regard, we need to select the appropriate model topology for the problem to be solved and specify the various parameters associated with the model (known as “hyperparameters”). In addition, we need to define a function that assesses the quality of the network parameter set θ, the socalled loss function L, which quantifies the error between the predicted value \({{{\hat{\mathbf y}}}} = f_{{{\mathbf{\theta }}}}\left( {{{\mathbf{x}}}} \right)\) and the true observation y (label)^{235}.
Depending on the type of task accomplished by the network, the loss function can be divided into classification loss and regression loss. Commonly used classification loss functions include hinge loss (\(L_{Hinge} = \mathop {\sum}\nolimits_{i = 1}^n {\max [0,1  {{{\mathrm{sgn}}}}(y_i)\hat y_i]}\)) and crossentropy loss \(L_{CE} =  \mathop {\sum}\nolimits_{i = 1}^n {[y_i\log \hat y_i + (1  y_i)\log (1  \hat y_i)]}\))^{236}. Since the optical metrology tasks involved in this review mainly belong to regression tasks, here we focus on the regression loss functions. The mean absolute error (MAE) loss (\(L_{MAE} = \frac{1}{n}\mathop {\sum}\nolimits_{i = 1}^n {\left {y_i  \hat y_i} \right}\)) and the mean squared error (MSE) loss (\(L_{MSE} = \frac{1}{n}\mathop {\sum}\nolimits_{i = 1}^n {(y_i  \hat y_i)^2}\)) are the two most commonly used loss functions, which are also known as L1 loss and L2 loss, respectively. In imageprocessing tasks, MSE is usually converted into a peak signaltonoise ratio (PSNR) metric: \(L_{PSNR} = 10\,{{{\mathrm{log}}}}_{10}\frac{{MAX^2}}{{L_{MSE}}}\), where MAX is the maximum pixel intensity value within the dynamic range of the raw image^{237}. Other variants of L1 and L2 loss include RMSE, Euclidean loss, smooth L1, etc.^{238}. For natural images, the structural similarity (SSIM) index is a representative image fidelity measurement, which judges the structural similarity of two images based on three metrics (luminance, contrast, and structure): \(L_{SSIM} = l({{{\mathbf{y}}}},\widehat {{{\mathbf{y}}}})c({{{\mathbf{y}}}},\widehat {{{\mathbf{y}}}})s({{{\mathbf{y}}}},\widehat {{{\mathbf{y}}}})\)^{239}, where \(l({{{\mathbf{y}}}},\widehat {{{\mathbf{y}}}})\), \(c({{{\mathbf{y}}}},\widehat {{{\mathbf{y}}}})\), and \(s({{{\mathbf{y}}}},\widehat {{{\mathbf{y}}}})\) are the similarities of the local patch luminances, contrasts, and structures, respectively. For more details about these loss functions, readers may refer to the article by Wang and Bovik^{240}. With the defined loss function, the objective behind the training process of ANNs can be formalized as an optimization problem^{241}
The learning schemes can be broadly classified into three categories, supervised learning, semisupervised learning, and unsupervised learning^{36,242,243,244}. Supervised learning dominates the majority of practical applications, in which a neural network model is optimized based on a large amount dataset of labeled data pairs (x, y), and the training process amounts to find the model parameters \(\widehat {{{\mathbf{\theta }}}}\) that best predict the data based on the loss function \(L\left( {\widehat {{{\mathbf{y}}}},{{{\mathbf{y}}}}} \right)\). In unsupervised learning, training algorithms process input data x without corresponding labels y, and the underlying structure or distribution in the data has to be modeled based on the input itself. Semisupervised learning sits in between both supervised and unsupervised learning, where a large amount of input data x is available and only some of the data is labeled. More detailed discussions about semisupervised and unsupervised learning can be found in the “Future directions” section.
From perceptron to deep learning
As summarized in Fig. 4, despite the overall upward trend, a broader look at the history of deep learning reveals three major waves of development. Concepts of machine learning and deep learning commenced with the research into the artificial neural network, which was originated from the simplified mathematical model of biological neurons established by McCulloch and Pitts in 1943^{245}. In 1958, Rosenblatt^{231} proposed the idea of perceptron, which was the first ANN that allows neurons to learn. The emergence of perceptron marked the first peak of neural network development. However, a singlelayer perceptron model can only solve linear classification problems and cannot solve simple XOR and XNOR problems^{246}. These limitations caused a major dip in their popularity and stagnated the development of neural networks for nearly two decades.
In 1986, Rumelhart et al.^{247} proposed the idea of a backpropagation algorithm (BP) for MLP, which constantly updates the network parameters to minimize the network loss based on a chain rule method. It effectively solves the problems of nonlinear classification and learning, leading neural networks into a second development phase of “shallow learning” and promoting a boom of shallow learning. Inspired by the mammalian visual cortex (stimulated in the restricted visual field)^{248}, LeCun et al.^{249} proposed the biologically inspired CNN model based on the BP algorithm in 1989, establishing the foundation of deep learning for modern computer vision. During this wave of development, various models like long shortterm memory (LSTM) recurrent neural network (RNN), distributed representation, and processing were developed and continue to remain key components of various advanced applications of deep learning to this date. Adding more hidden layers to the network allows a deep architecture to be built, which can accomplish more complex mappings. However, training such a deep network is not trivial because once the errors are backpropagated to the first few layers, they become negligible (socalled gradient vanishing), making the learning process very slow or even fails^{250}. Moreover, the limited computational capacity of the available hardware at that time could not support training largescale neural networks. As a result, deep learning suffered a second major roadblock.
In 2006, Hinton et al.^{251,252} proposed a Deep Belief Network (DBN) (the composition of simple, unsupervised networks such as Deep Boltzmann Machines (DBMs)^{253} (Fig. 4f) or Restricted Boltzmann Machines (RBMs)^{254} (Fig. 4e)) training approach based on the brain graphical models, trying to overcome the gradientvanishing problem. They gave the new name “deep learning” to multilayer neural networkrelated learning methods^{251,252}. This milestone revolutionized the approaching prospects in machine learning, leading neural networks into the third upsurge along with the development of computer hardware performance, the development of GPU acceleration technology, and the availability of massive labeled datasets.
In 2012, Krizhevsky et al.^{255} proposed a deep CNN architecture — AlexNet, which won the 2012 ImageNet competition, making CNN^{249,256} become the dominant framework for deep learning after more than 20 years of silence. Meanwhile, several new deeplearning network architectures and training approaches (e.g., ReLU^{232} given by \(\sigma (x) = \max (0,x)\), and Dropout^{257} that discards a small but random portion of the neurons during each iteration of training to prevent neurons from coadapting to the same features) were developed to further combat the gradient vanishing and ensure faster convergence. These factors have led to the explosive growth of deep learning and its applications in image analysis and computer visionrelated problems. Different from CNN, RNN is another popular type of DNN inspired by the brain’s recurrent feedback system. It provides the network with additional “memory” capabilities for previous data, where the inputs of the hidden layer consist of not only the current input but also the output from the previous step, making it a framework specialized in processing sequential data^{258,259,260} (Fig. 4d). CNNs and RNNs usually operate on Euclidean data like images, videos, texts, etc. With the diversification of data, some nonEuclidean graphstructured data, such as 3Dpoint clouds and biological networks, are also considered to be processed by deep learning. Graph neural networks (GNNs), where each node aggregates feature vectors of its neighbors to compute its new feature vector (a recursive neighborhood aggregation scheme), are effective graph representation learning frameworks specifically for nonEuclidean data^{261,262}.
With the focus of more attention and efforts from both academia and industry, different types of deep neural networks have been continuously proposed in recent years with exponential growth, such as VGGNet^{263} (VGG means “Visual Geometry Group”), GoogLeNet^{264} (using “GoogLe” instead of “Google” is a tribute to LeNet, one of the earliest CNNs developed by LeCun^{256}), RCNN (regions with CNN features)^{265}, generative adversarial network (GAN)^{266}, etc. In 2015, the emergence of the residual block (Fig. 4h), containing two convolutional layers activated by ReLU that allow the information (from the input or those learned in earlier layers) to penetrate more into the deeper layers, significantly reduces the vanishing gradient problem as the network gets deeper, making it possible to train largescale CNNs efficiently^{267}. In 2016, the Googleowned AI company DeepMind shocked the world by beating Lee Sedol with its AlphaGo AI system, alerting the world to deep learning, a new breed of machine learning that promised to be smarter and more creative than before^{268}. For a more detailed description of the history and development of deep learning, readers can refer to the chronological review article by Schmidhuber^{39}.
Convolutional neural network (CNN)
In the subsection “Artificial neural network”, we talked about the simplest DNN, socalled MLPs, which basically consist of multiple layers of neurons, each fully connected to those in the adjacent layers. Each neuron receives some inputs, which are multiplied by their weights, with nonlinearity applied via activation functions. In this subsection, we will talk about CNNs, which are considered an evolution of the MLP architecture that is developed to process data in single or multiple arrays, and thus are more appropriate to handle imagelike input. Given the prevalence of CNNs in image processing and analysis tasks, here we briefly review some basic ideas and concepts widely used in CNNs. For a comprehensive introduction to CNN, we refer readers to the excellent book by Goodfellow et al.^{36}.
CNN follows the same pattern as MLP: artificial neurons are stacked in hidden layers on top of each other; parameters are learned during network training with nonlinearity applied via activation functions; the loss \(L\left( {\widehat {{{\mathbf{y}}}},{{{\mathbf{y}}}}} \right)\) is calculated and backpropagated to update the network parameters. The major difference between them is that instead of regular fully connected layers, CNN uses specialized convolution layers to model locality and abstraction (Fig. 5b). At each layer, the input image \({{{\mathbf{x}}}}\) (lexicographically ordered) is convolved with a set of convolutional filters W (note here W represents blockToeplitz convolution matrix) and added biases b to generate a new image, which is subjected to an elementwise nonlinear activation function σ (normally use ReLU function \(\sigma (x) = \max (0,x)\)), and the same structure is repeated for each convolution layer k:
The second key difference between CNNs and MLPs is the typical incorporation of pooling layers in CNNs, where pixel values of neighborhoods are aggregated by applying a permutation invariant function, such as the max or mean operation, to reduce the dimensionality of the convolutional layers and allows significant features to propagate downstream without being affected by neighboring pixels (Fig. 5c). The major advantage of such an architecture is that CNNs exploit spatial dependencies in the image and only consider a local neighborhood for each neuron, i.e., the network parameters are shared in such a way that the network performs convolution operations on images. In other words, the idea of a CNN is to take advantage of a pyramid structure to first identify features at the lowest level before passing these features to the next layer, which, in turn, create features of a higher level. Since the local statistics of images are invariant to location, the model does not need to learn weights for the same feature occurring at different positions in an image, making the network equivariant with respect to translations of the input. It makes CNNs especially suitable for processing images captured in optical metrology, e.g., a fringe pattern consisting of sinusoidal signal repeated over different image locations. In addition, it also drastically reduces the number of parameters (i.e., the number of weights no longer depends on the size of the input image) that need to be learned.
Figure 5a shows a CNN architecture for the imageclassification task. Every layer of a CNN transforms the input volume to an output volume of neuron activation, eventually leading to the final fully connected layers, resulting in a mapping of the input data to a 1D feature vector. A typical CNN configuration consists of a sequence of convolution and pooling layers. After passing through a few pairs of convolutional and pooling layers, all the features of the image have been extracted and arranged into a long tube. At the end of the convolutional stream of the network, several fully connected layers (i.e., regular neural network architecture, MLP, that discussed in the previous subsection) are usually added to fatten the features into a vector, with which tasks, such as classifications, can be performed. Starting with LeNet^{256}, developed in 1998 for recognizing handwritten characters with two convolutional layers, CNN architectures have evolved since then to deeper CNNs like AlexNet^{264} (5 convolutional Layers) and VGGNet^{263} (19 convolutional Layers) and beyond to more advanced and superdeep networks like GoogLeNet^{264} and ResNet^{267}, respectively. These CNNs have been extremely successful in computer vision applications, such as object detection^{269}, action recognition^{270}, motion tracking^{271}, and pose estimation^{272}.
Fully convolutional network architectures for image processing
Conventionally, CNNs have been used for solving classification problems. Due to the presence of a parameterrich fully connected layer at the end of the network, typical CNNs throw away spatial information and produce nonspatial outputs. However, for most imageprocessing tasks that we encountered earlier in the Section “Image processing in optical metrology”, the network must have a wholeresolution output with the same or even larger size compared with the input, which is commonly referred to as dense prediction (contrary to the single target category per image)^{273}. Specifically, fully convolutional network architectures without fully connected layers should be used for this purpose, which accepts input of any size, is trained with a regression loss, and produces an output of the corresponding dimensions^{273,274}. Here, we briefly review three representative network architectures with such features.

SRCNN: In conventional CNN, the downsampling effect of pooling layers results in an output with a far lower resolution than the input. Thus, a relatively naive and straightforward solution is simply stacking several convolutions layers while skipping pooling layers to preserve the input dimensions. Dong et al.^{275} firstly adopt this idea and propose SRCNN for the image superresolution task. SRCNN utilizes traditional upsampling algorithms to obtain lowresolution images and then refine them by learning an endtoend mapping from interpolated coarse images to highresolution images of the same dimension but with more details, as illustrated in Fig. 6a. Due to its simple ideal and implementation, SRCNN has gradually become one of the most popular frameworks in image superresolution^{276} and been extended to many other tasks such as radar image enhancing^{277}, underwater image high definition display^{278}, and computed tomography^{279}. One major disadvantage of SRCNN is the cost of time and space to keep the whole resolution through the whole network, limiting SRCNN only practical for relatively shallow network structures.

FCN: The fully convolutional network (FCN) proposed by Long et al.^{273} is a popular strategy and baseline for semanticsegmentation tasks. FCN is inspired by the fact that the fully connected layers in classification CNN (Fig. 5) can also be viewed as convolutions with kernels that cover their entire input regions. As illustrated in Fig. 6b, FCN uses the existing classification CNN as the encoder module of the network and replace these fully connected layers into 1 × 1 convolution layers (also termed as deconvolution layers) as the decoding module, enabling the CNN to upsample the input feature maps and get pixelwise output. In FCN, skip connections combining (simply adding) information in fine layers and coarse layers enhances the localization capability of the network, allowing for the reconstruction of accurate fine details that respect global structure. FCN and its variants have achieved great success in the application of dense pixel prediction as required in many advanced computer vision understanding tasks^{280}.

UNet: Ronneberger et al.^{281} took the idea of FCN one step further and proposed the UNet architecture, which replaces the onestep upsampling part with a bunch of complimentary upsampling convolutions layers, resulting in a quasisymmetrical encoderdecoder model architecture. As illustrated in Fig. 6c, the basic structure of UNet consists of a contractive branch and an expansive branch, which enables multiresolution analysis and general multiscale imagetoimage transforms. The contractive branch (encoder) downsamples the image using conventional strided convolution, producing a compressed feature representation of the input image. The expansive branch (decoder), complimentary to the contractive branch, uses upsampling methods like transpose convolution to provide the processed result with the same size as the input. In addition, UNet features skip connections that concatenate the matching resolution levels of the contractive branch and the expansive branch. Ronneberger’s UNet is a breakthrough toward automatic image segmentation and has been successfully applied in many tasks that require imagetoimage transforms^{282}.
Since the feature extraction is only performed in lowdimensional space, the computation and spatial complexity of the above encoderdecoder structured networks (FCN and UNet) can be much reduced. Therefore, the encoderdecoder CNN structure has become the mainstream for image segmentation and reconstruction^{283}. The encoder is usually a classic CNN (Alexnet, VGG, Resnet, etc.) in which downsampling (pooling layers) is adopted to reduce the input dimension so as to generate lowresolution feature maps. The decoder tries to mirror the encoder to upsample these feature representations and restore the original size of the image. Thus, how to perform upsampling is of great importance. Although traditional upsampling methods, e.g., nearest neighbor, bilinear, and bicubic interpolations, are easy to implement, deeplearningbased upsampling methods, e.g., unpooling^{284}, transpose convolution^{273}, subpixel convolution^{285}, has gradually become a trend. All these approaches can be combined with the model mentioned above to prevent the decrease in resolution and obtain a fullresolution image output.

Unpooling upsampling: Unpooling upsampling reverts maxpooling by remembering the location of the maxima in the maxpooling layers and in the unpooling layers copy the value to exactly this location, as shown in Fig. 7a.

Transposed convolution: The opposite of the convolutional layers are the transposed convolution layers (also misinterpreted as deconvolution layers^{280}), i.e., predicting the possible input based on feature maps sized like convolution output. Specifically, it increases the image resolution by expanding the image by inserting zeros and performing convolution, as shown in Fig. 7b.

Sub pixel convolution: The subpixel layer performs upsampling by generating a plurality of channels by convolution and then reshaping them, as Fig. 7c shows. Within this layer, a convolution is firstly applied for producing outputs with M times channels, where M is the scaling factor. After that, the reshaping operation (a.k.a. shuffle) is performed to produce outputs with size M times larger than the original.
As discussed in the Section “Image processing in optical metrology”, despite their diversity, the imageprocessing algorithms used in optical metrology share a common characteristic—they can be regarded as a mapping operator that transforms the content of arbitrarysized inputs into pixellevel outputs, which fits exactly with DNNs with a fully convolutional architecture. In principle, any fully convolutional network architectures presented here can be used for a similar purpose. By applying different types of training datasets, they can be trained for accomplishing different types of imageprocessing tasks that we encountered in optical metrology. This provides an alternative approach to process images such that the produced results resemble or even outperform conventional imageprocessing operators or their combinations. There are also many other potential desirable factors for such a substitution, e.g., accuracy, speed, generality, and simplicity. All these factors were crucial to enable the fast rise of deep learning in the field of optical metrology.
Invoking deep learning in optical metrology: principles and advantages
Let us return to optical metrology. It is essential that the image formation is properly understood in order to reconstruct the required geometrical or mechanical quantities of the sample, as we discussed in Section “Image formation in optical metrology”. In general, the relation between the observed images \({{{\mathbf{I}}}} \in {\Bbb R}^m\) (framestacked lexicographically ordered with m × 1 in dimension) and the desired sample parameter (or informationbearing parameter that clearly reflects the desired sample quantity, e.g., phase or displacement field) \({{{\mathbf{P}}}} \in {\Bbb R}^m\) (or \({\Bbb C}^n\)) can be described as
where \({{{\mathcal{A}}}}\) is the (possibly nonlinear) forward measurement operator mapping from the parameter space to the image space, which is given by the physics laws governing the formation of data; \({{{\mathcal{N}}}}\) represents the effect of noise (not necessarily additive). This model seems general enough to cover almost all image formation processes in optical metrology. However, this does not mean that p can be directly obtained from I. More specifically, we have to conclude in general from the effect (i.e., the intensity at the pixel) to its cause (i.e., shape, displacement, deformation, or stress of the surface), suggesting that an inverse problem has to be solved.
Solving inverse optical metrology problems: issues and challenges
Given the forward model represented by Eq. (8), our task is to find the parameters by an approximate inverse of \({{{\mathcal{A}}}}\) (denoted as \(\tilde {{{\mathcal{A}}}}^{  1}\)) such that \(\widehat {{{\mathbf{p}}}} = \widehat {{{\mathcal{R}}}}\left( {{{\mathbf{I}}}} \right) = \tilde {{{\mathcal{A}}}}^{  1}\left( {{{\mathbf{I}}}} \right) \approx {{{\mathbf{p}}}}\). However, in real practice, there are many problems involved in this process:

Unknown or mismatched forward model. The success of conventional optical metrology approaches relies heavily on the precise preknowledge about the forward model \({{{\mathcal{A}}}}\), so they are often regarded as modeldriven or knowledgedriven approaches. In practical applications, the forward model \({{{\mathcal{A}}}}\) used is always an approximate description of reality, and extending it might be challenging due to a limited understanding of experimental perturbations (noise, aberrations, vibration, motion, nonlinearity, saturation, and temperature variations) and noncooperative surfaces (shiny, translucent, coated, shielded, highly absorbent, and strong scattering). These problems are either difficult to model or result in a too complicated (even intractable) model with a large number of parameters.

Error accumulation and suboptimal solution. As described in the section “Image processing in optical metrology”, “divideandconquer” is a common practice for solving complex problems with a sequence of cascaded imageprocessing algorithms to obtain the desired object parameter. For example, in FPP, the entire imageprocessing pipeline is generally divided into several substeps, i.e., image preprocessing, phase demodulation, phase unwrapping, and phasetoheight conversion. Although each subproblem or substep becomes simpler and easier to handle, the disadvantages are also apparent: error accumulation and suboptimal solution, i.e., the aggregation of optimum solutions to subproblems may not be equivalent to the global optimum solution.

Illposedness of the inverse problem. In many computer vision and computational imaging tasks, such as image deblurring^{24}, sparse computed tomography^{25}, and imaging through scattering media^{27}, the difficulty in retrieving the desired information p from the observation I arises from the fact that the operator \({{{\mathcal{A}}}}\) is usually poorly conditioned, and the resulting inverse problem is illposed, as illustrated in Fig. 8a. Due to the similar indirect measurement principle, there are also many important inverse problems in optical metrology that are illposed, among which the phase demodulation from a singlefringe pattern and phase unwrapping from single wrapped phase distributions are the best known for specialists in optical metrology (Fig. 8b). The simplified model for the intensity distribution of fringe patterns (Eq. (1)) suggests that the observed intensity I results from the integration of several unknown components: the average intensity A(x, y), the intensity modulation B(x, y), and the desired phase function ϕ(x, y). Simply put, we do not have enough information to solve the corresponding inverse problem uniquely and stably.
In the fields of computer vision and computational imaging, the classical approach in solving an illposed inverse problem is to reformulate the illposed original problem into a wellposed optimization problem by imposing certain prior assumptions about the solution p that helps in regularizing its retrieval:
where  _{2} indicates the Euclidean norm, R(p) is a regularization penalty function that incorporates the prior information about p, such as smoothness^{286}, sparsity in some basis^{287} or dictionary^{288}. γ is a real positive parameter (regularization parameter) that governs the weight given to the regularization against the need to fit the measurement and should be selected carefully to make an admissible compromise between the prior knowledge and data fidelity. Such an optimization problem can be solved efficiently with a variety of algorithms^{289,290} and provide theoretical guarantees on the recoverability and stability of the approximate solution to an inverse problem^{291}.
Instead of regularizing the numerical solution, in optical metrology, we prefer to reformulate the original illposed problem into a wellposed and adequately stable one by actively controlling the image acquisition process so as to add systematically more knowledge about the object to be investigated into the evaluation process^{31}. Due to the fact that the optical measurements are frequently carried out in a highly controlled environment, such a solution is often more practical and effective. As illustrated by Fig. 8c, by acquiring additional multifrequency phaseshifted patterns, absolute phase retrieval becomes a wellposed estimation or regression problem, and the simple standard (unconstrainted, regularizationfree) leastsquare methods in regression analysis provides a stable, precise, and efficient solution^{292,293}:
The situation may become very different when we step out of the laboratory and into the complicated environment of the real world^{294}. The active strategies mentioned above often impose stringent requirements on the measurement conditions and the object under test. For instance, highsensitivity interferometric measurement in general needs a laboratory environment where the thermalmechanical settings are carefully controlled to preserve beam path conditions and minimize external disturbances. Absolute 3D shape profilometry usually requires multiple fringe pattern projections, which requires that the measurement conditions remain invariant while sequential measurements are performed. However, harsh operating environments where the object or the metrology system cannot be maintained in a steadystate may make such active strategies a luxurious or even unreasonable request. Under such conditions, conventional optical metrology approaches will suffer from severe physical and technical limitations, such as a limited amount of data and uncertainties in the forward model.
To address these challenges, researchers have made great efforts to improve stateoftheart methods from different aspects over the past few decades. For example, phaseshifting techniques were optimized from the perspective of signal processing to achieve highprecision robust phase measurement and meanwhile minimize the impact of experimental perturbations^{32,153}. Singleshot spatial phasedemodulation methods have been explicitly formulated as a constrained optimization problem similar to Eq. (9) with an extra regularization term enforcing a priori knowledge about the recovered phase (spatially smooth, limited spectral extension, piecewise constant, etc.)^{140,148}. Multifrequency temporal phase unwrapping techniques have been optimized by utilizing the inherent information redundancy in the average intensity and the intensity modulation of the fringe images, allowing for absolute phase retrieval with the reduced number of patterns^{32,295}. Geometric constraints were introduced in FPP to solve the phase ambiguity problem without additional image acquisition^{175,183}. Despite these extensive research efforts for decades, how to extract the absolute (unambiguous) phase information, with the highest possible accuracy, from the minimum number (preferably single shot) of fringe patterns remains one of the most challenging open problems in optical metrology. Consequently, we are looking forward to innovations and breakthroughs in the principles and methods of optical metrology, which are of significant importance for its future development.
Solving inverse optical metrology problems via deep learning
As a “datadriven” technology that has emerged in recent years, deep learning has received increasing attention in the field of optical metrology and made fruitful achievements in very recent years. Different from the conventional physical model and knowledgedriven approaches that the objective function (Eqs. (9) and (10)) is built based on the image formation model \({{{\mathcal{A}}}}\), in deeplearning approaches, we create a set of true object parameters p and the corresponding raw measured data I, and establish their mapping relation \({{{\mathcal{R}}}}_\theta\) based on a deep neural network with all network parameters θ learned from the dataset by solving the following optimization problem (Fig. 9):
with \(\left\ {}\, \right\_2^2\) being the L2norm error (loss) function once again (different types of loss functions discussed in the subsection “Neural network training” can be specified depending on the type of training data) and R is a regularizer of the parameters to avoid overfitting. A key element in deeplearning approaches is to parameterize \(\widehat {{{{\mathcal{R}}}}_\theta }\) by parameters \(\theta \in \Theta\). The “learning” process refers to finding an “optimal” set of network parameters from the given training data by minimizing Eq. (11) over all possible network parameters \(\theta \in \Theta\). And the “optimality” is quantified through the loss function that measures the quality of the learned \({{{\mathcal{R}}}}_\theta\). Different deeplearning approaches can be thought of as different ways to parameterize the reconstruction network \({{{\mathcal{R}}}}_\theta\). Different from conventional approaches that solving the optimization problem directly gives the final solution \(\widehat {{{{\mathcal{R}}}}_\theta }\) to the inverse problem corresponding to a current given input, in deeplearningbased approaches, the optimization problem is phrased as to find a “reconstruction algorithm” \(\widehat {{{{\mathcal{R}}}}_\theta }\) satisfying the pseudoinverse property \(\widehat {{{\mathbf{p}}}} = \widehat {{{{\mathcal{R}}}}_\theta }\left( {{{\mathbf{I}}}} \right) = \tilde {{{\mathcal{A}}}}^{  1}\left( {{{\mathbf{I}}}} \right) \approx {{{\mathbf{p}}}}\) from the prepared (previous) dataset, which is then used for the reconstruction of the future input.
Most of the deeplearning techniques currently used in optical metrology belong to supervised learning, i.e., a matched dataset of groundtruth parameters p and corresponding measurements I should be created to train the network. Ideally, the dataset should be collected by physical experiments based on the same metrology system to account for all experimental conditions (which are usually difficult to be fully described by the forward image formation model). The ground truth can be obtained by measuring various samples that one is likely to encounter by employing active strategies mentioned above, without considering the illposedness of the real problem. To be more precise, in deeplearningbased optical metrology approaches, active strategies frequently used in conventional optical metrology approaches are shifted from the actual measurement stage to the preparation (network training) stage. Although the situation faced during the preparation stage may be different from that in the actual measurement stage, the information obtained in the former can be transferred to the latter in many cases. What we should do during the training stage is to reproduce the sample (using representative test objects), the system (using the same measurement system), and the error sources (noise, vibration, background illumination) during the measurement stage to ensure that the captured input data is as close as possible to those in the real measurement. On the other hand, we should make the remaining environmental variables as controllable as possible so that more active strategies (sample manipulation, illumination changing, multiple acquisitions) can be involved in the training stage to derive the ground truth corresponding to these captured data. Once the network is trained, we can then strip out these ideal environment variables and make the network run in a realistic experimental condition.
For example, for an interferometric system working in a harsh environment or a FPP system designed for measuring dynamic objects, phase demodulation from a singlefringe pattern is the most desirable choice. The inherent illposedness of the problem makes it a very good example for deep learning in this regard. In the training stage, we reproduce all the experimental conditions except that we employ the multiframe phaseshifting technique with large phaseshifting steps to obtain the ground truth for the training samples. Once the network is established, it can map from only one singlefringe pattern to the desired phase distribution, and thus can be used in harsh environments where the singleshot phasedemodulation technique should be applied. Note that in this example, all the training data is fully generated by experiments, so the reconstruction algorithm (inverse mapping) \(\widehat {{{{\mathcal{R}}}}_\theta }\) can be established without the knowledge of the forward model \({{{\mathcal{A}}}}\) in principle. Even though, since we have sufficient realworld training observations of the form (p, I), it can be expected that those experimental data can reflect the true \({{{\mathcal{A}}}}\) in a complete and realistic way.
It should be noted that there are also many cases that the ground truth corresponding to the experimental data is inaccessible. In such cases, the matched dataset can be obtained by a “learning from simulation” scheme — simulating the forward operator (with the knowledge of the forward image formation model \({{{\mathcal{A}}}}\)) on ideal sample parameters. However, due to the complexity of real experimental conditions, we typically only know an approximation of \({{{\mathcal{A}}}}\). Subsequently, the inconsistency or uncertainty in the forward operator \({{{\mathcal{A}}}}\) may lead to a compromised performance in real experiments (see the “Challenges” section for detailed discussions). On the other hand, partial knowledge of the forward model \({{{\mathcal{A}}}}\) can be leveraged and incorporated in the deep neural network design to alleviate the “black box” nature of conventional neural network architectures, which may reduce the amount of required training data and provide more accurate and reliable network reconstruction (see the “Future directions” section for more details).
Advantages of invoking deep learning in optical metrology
In light of the above discussions, we summarize the potential advantages that can be gained by using a deeplearning approach in optical metrology. Figure 10 shows the advantages of deeplearning techniques compared to traditional optical metrology algorithms by taking FPP as an example. One may have noticed that FPP has appeared a few times, and in fact, it will appear more times. The reason is that FPP is currently one of the most promising and wellresearched areas at the intersection of deep learning and optical metrology, offering a representative and convincing example of the use of deep learning in optical metrology.
(1) From “physicsmodeldriven” to “datadriven” Deep learning subverts the conventional “physicsmodeldriven” paradigm and opens up the “datadriven” learningbased representation paradigm. The reconstruction algorithm (inverse mapping) \(\widehat {{{{\mathcal{R}}}}_\theta }\) can be learned from the experimental data without resorting to the preknowledge of the forward model \({{{\mathcal{A}}}}\). If the training data is collected under an environment that reproduces the real experimental conditions (including metrology system, sample types, measurement environment, etc.), and the amount (diversity) of data are sufficient, the trained model \(\widehat {{{{\mathcal{R}}}}_\theta }\) should reflect the true \({{{\mathcal{A}}}}\) more precisely and comprehensively and is expected to produce better reconstruction results than conventional physicsmodeldriven or knowledgedriven approaches. The “datadriven” learningbased paradigm eliminates the need to design different processing flows for specific imageprocessing algorithm based on experience and preknowledge. By applying different types of training datasets, one specific class of neural network can be trained to perform various types of transformation for different tasks, significantly improving the universality and reducing the complexity of solving new problems.
(2) From “divideandconquer” to “endtoend learning” In contrast to the traditional optical metrology approach that solves the sequence of tasks independently, deep learning allows for an “endtoend” learning structure, where the neural network can learn the direct mapping relation between the raw image data and the desired sample parameters in one step, i.e., \(\widehat {{{\mathbf{p}}}} = \widehat {{{{\mathcal{R}}}}_\theta }\left( {{{\mathbf{I}}}} \right)\), as illustrated in Fig. 10b. Compared with the “divideandconquer” scheme, the “endtoend” learning allows to jointly solve multiple tasks, with great potential to alleviate the total computational burden. Such an approach has the advantage of synergy: it enables sharing information (features) between parts of the network that perform different tasks, which is more likely to get better overall performance compared to solving each task independently.
(3) From “solving illposed inverse problems” to “learning pseudoinverse mapping” Deep learning utilizes complex neural network structures and nonlinear activation functions to extract highdimensional features of the sample data, remove irrelevant information, and finally establish a nonlinear pseudoinverse mapping model that is sufficient to describe the entire measurement process. The major reason for the success of deep learning is the abundance of training data and the explicit agnosticism from a priori knowledge of how such data are generated. Instead of handcrafting a regularization function or specifying prior, deep learning can automatically learn it from the example data. Consequently, the learned prior R(θ) is tailored to the statistics of real experimental data and, in principle, provides stronger and more reasonable regularization to the inverse problem pertaining to a specific metrology system. Consequently, the obstacle of “solving nonlinear illposed inverse problems” can be bypassed, and the pseudoinverse mapping relation between the input and the desired output can be established directly.
The use of deep learning in optical metrology
Deeplearningenabled image processing in optical metrology
Owing to the abovementioned advantages, deep learning has been gaining increasing attention in optical metrology, demonstrating promising performance in various optical metrology tasks and in many cases exceeding that of classic techniques. In this section, we review these existing researches leveraging deep learning in optical metrology according to an architecture similar to that introduced in the section “Image processing in optical metrology”, as summarized in Fig. 11. The basic network types, loss functions, and data acquisition methods of some representative examples are listed in Table 1.

(1)
Preprocessing: Many early works applying deep learning to optical metrology focused on image preprocessing tasks, such as denoising and enhancement. This is mainly due to the fact that the successful use cases of deep learning to such preprocessing tasks can be easily found in the computer vision community. Many image preprocessing algorithms in optical metrology could receive a performance upgrade by simply reengineering these existing neural network architectures for a similar kind of problem.

Denoising: Yan et al.^{55} constructed a CNN composed of 20 convolutional layers for fringe denoising (Fig. 12a). Simulated fringe patterns with artificial Gaussian noise were generated as the training dataset, and corresponding noisefree versions were used as ground truth. Figure 12d, e shows the denoising results of WFT^{114} and the deeplearningbased method, showing that their method was free of the boundary artifacts in WFT and achieved comparable denoising performance in the central region. Jeon et al.^{296} proposed a fast specklenoise reduction method based on UNet, which showed robust and excellent denoising performance for digital holographic images. Hao et al.^{54} constructed a fast and flexible denoising convolutional neural network (FFDNet) for batch denoising of ESPI fringe images. Lin et al.^{297} developed a denoising CNN (DnCNN) for specklenoise suppression of fringe patterns. ReyesFigueroa and Rivera^{298} proposed a fringe pattern filtering and normalization technique based on autoencoder^{299}. The autoencoder was able to finetune the UNet network parameters and reduce residual errors, thereby improving the stability and repeatability of the neural network. Since it is difficult to access noisefree groundtruth images in real experimental conditions, the training datasets of these deeplearningbased denoising methods are all generated based on simulations.

Color channel separation: Our group reported a singleshot 3D shape measurement approach with deeplearningbased color fringe projection profilometry that can automatically eliminate color crosstalk and channel imbalance^{300}. As shown in Fig. 13a, the network predicted the sine and cosine terms related to highquality crosstalkfree phase information from the input 3channel fringe images of different wavelengths. In order to get rid of color crosstalk and chromatic aberration, the green monochromatic fringe patterns were projected and only the green channel of the captured patterns was used to generate labels. Figure 13b–d shows 3D reconstruction results of a David plaster model measured by the traditional colorcoded method^{301} and our method, showing that the deeplearningbased method yielded more accurate surface details. The quality of the 3D reconstruction was comparable to the ground truth (Fig. 13e) obtained by the noncomposite (monochromatic) multifrequency phaseshifting method^{174}. The deeplearningbased method was applied for dynamic 360° 3D digital modeling, demonstrating its potential in rapid reverse engineering and related industrial applications (Fig. 13f–i).

Enhancement: Shi et al.^{51} proposed a fringeenhancement method based on deep learning, and the flowchart of which is given in Fig. 14a. The captured fringe image and the corresponding enhanced one obtained by the subtraction of two fringe patterns with π relative phase shift were used to establish the mapping between the raw fringe and the desired enhanced versions. Figure 14b–d shows the 3D reconstruction results of a moving hand using the traditional FT method^{138} and the deeplearning method, suggesting that the deeplearning method outperformed FT in terms of detail preservation and SNR. Goy et al.^{302} proved that DNN could recover an image with decent quality under lowphoton conditions, and successfully applied their method to phase retrieval. Yu et al.^{303} proposed a fringeenhancement method in which the fringe modulation was improved by deep learning, facilitating highdynamic 3D shape measurement without resorting to conventional multiexposure schemes.


(2)
Analysis: Image analysis is the most critical step in the imageprocessing architecture of optical metrology. Consequently, most deeplearning techniques applied to optical metrology are proposed to accomplish the tasks associated with image analysis. For phase measurement techniques, deep learning is extensively explored for (both spatial and temporal) phase demodulation and (spatial, temporal, and geometric) phase unwrapping.

Phase demodulation:

Spatial phase retrieval: To address the contradiction between the measurement efficiency and accuracy of traditional phase retrieval methods, our group, for the first time, introduced deep learning to fringe pattern analysis, substantially enhancing the phasedemodulation accuracy from a singlefringe pattern^{50}. As illustrated in Fig. 15a, the background image A was first predicted from the acquired fringe image I through CNN1. Then CNN2 was employed to realize the mapping from I and A to the numerator (sine) term M and denominator (cosine) term D. Finally, the wrapped phase information can be acquired by computing the arctangent of M/D. Figure 15b compares the phases retrieved by two representative traditional singleframe phase retrieval methods (FT^{138}, WFT^{114}) and the deeplearning method, revealing that our deeplearningbased singleframe phase retrieval method achieved the highest reconstruction quality, which almost visually reproduced the groundtruth information obtained by the 12step phaseshifting method. We have incorporated the deeplearningbased phase retrieval technique into the microFourier transform profilometry (μFTP) technique to eliminate the need for additional uniform patterns, doubling the measurement speed and achieving an unprecedented 3D imaging frame rate up to 20,000 Hz^{304}. Figure 15c shows the 3D measurement results of a rotating fan at different speeds (3000 and 5000 revolutions per minute (RPM)), suggesting that the 3D shape of fan blades can be intactly reconstructed without any motioninduced artifacts visible. Qiao et al.^{305} applied this deeplearningbased phase extraction technique for phase measuring deflectometry, and achieved singleshot highaccuracy 3D shape measurement of specular surfaces. Some other network structures, such as structured light CNN (SLCNN)^{306} and deep convolutional GAN^{307} were also adopted for singleframe phase retrieval. In addition, deep learning can also be applied to Fourier transform profilometry for automatic spectrum extraction by identifying the carrier frequency components bearing the object information in the Fourier domain, facilitating automatical spectrum extraction, and achieving higher phase retrieval accuracy without human intervention^{308}. Wang et al.^{309} proposed an automatical holographic reconstruction framework (YNet) consisting of two symmetrical UNets, allowing for simultaneous recovery of phase and intensity information from a single offaxis digital hologram. They also doubled the capability of YNet, extending it to the reconstruction of dualwavelength complex amplitudes, while overcoming the spectral overlapping issue in commonpath dualwavelength digital holography^{310}. Recently, our group used UNet to realize aliasingfree phase retrieval from a dualfrequency composite fringe pattern^{311}. Compared with the traditional Fourier transform profilometry, the deeplearningenabled approach avoids the complexities associated with dualfrequency spectra separation and extraction, allowing for higherquality singleshot absolute 3D shape reconstruction.

Temporal phase retrieval: Wang et al.^{312} introduced a deeplearning scheme to the phaseshifting technique in FPP. As shown in Fig. 16a, by introducing a fully connected DNN, the link between three low and unitfrequency phaseshifting fringe patterns and highquality absolute phases calculated from highfrequency fringe images were established, and thus, the 3D measurement accuracy could be significantly enhanced. The three unitfrequency phaseshifting patterns were encoded in three monochrome channels of a color image and projected by a 3LCD projector. The individual fringe patterns were then decoded and projected by the projector sequentially and rapidly^{313,314}. Consequently, the hardware system allowed for realtime 3D surface imaging of multiple objects at a speed of 25.6 fps. Zhang et al.^{315} developed a deepphaseshift network (DPSNet) based on GAN, with which multistep phaseshifting interferograms with accurate arbitrary phase shifts for calculating highquality phase information were predicted from a single interferogram. Besides random intensity noise, conventional phaseshifting algorithms are also sensitive to other experimental imperfections, such as phaseshifting error, illumination fluctuations, intensity nonlinearity, lens defocusing, motioninduced artifacts, and detector saturation. Deep learning also provides a potential solution to eliminate or at least partially alleviate the impact of these error sources on phase measurement. For example, Li et al.^{316} proposed a deeplearningbased phaseshifting interferometric phase recovery approach. The constructed UNet was capable of predicting the accurate wrapped phase map from two interferogram inputs with unknown phase shifts. Zhang et al.^{317} applied CNN to extract a highaccuracy wrapped phase map from conventional 3step phaseshifting fringe patterns. In the training stage, lowmodulation or saturated fringe patterns were used as the raw dataset, and the relation between these imperfect raw fringe and highquality errorfree unwrapped phase (obtained by 12step phaseshifting algorithms) were established based on CNN. Consequently, the deeplearningbased approach could accommodate both dark and reflective surfaces, and the related phase errors (noise and saturation) in the conventional threestep phaseshifting method were significantly suppressed, making it a promising approach for highdynamicrange (HDR) 3D measurement of surfaces with large reflectivity variations (Fig. 16d–g). Wu et al.^{318} proposed a deeplearningbased phaseshifting approach to overcome the phase errors associated with intensity nonlinearity. Through a welltrained FCN, the distortionfree highquality phase map could be reconstructed conveniently and efficiently from the raw phaseshifting fringe patterns with a strong gamma effect. Yang et al.^{319} constructed a threetothree deeplearning framework (TreeNet) based on UNet to compensate for the nonlinear effect in the phaseshifting images, which effectively and robustly reduced the phase errors by about 90%. Recently, our group demonstrated that the nonsinusoidal errors (e.g., residual highorder harmonics in binary defocusing projection, intensity saturation, gamma effect of projectors and cameras, and their coupling) in phaseshifting profilometry could be handled by an integrated deeplearning framework. A welltrained UNet could effectively suppress the phase errors caused by different types of nonsinusoidal fringe with only a minimum of three fringe patterns as input^{320}.


Phase unwrapping:

Spatial phase unwrapping: Wang et al.^{321} proposed a onestep phase unwrapping approach based on deep learning. Various ideal (noisefree) continuous phase distributions and the corresponding wrapped phase maps with different types of noises (Gaussian, salt and pepper, or multiplicative noises) were simulated and used as the training dataset for a CNN based on UNet. Upon completion of the training, the absolute phases can be predicted directly from a noisy wrapped phase map, as illustrated in Fig. 17a. Figure 17b–f shows the comparisons of phase unwrapping results obtained by the traditional leastsquare (LS) method^{322} and deeplearningbased method, demonstrating that deep learning can directly fulfill the complicated nonlinear phase unwrapping task in one step with improved antinoise and antialiasing ability. Spoorthi et al.^{323} developed a CNNbased phase unwrapping frameworkPhaseNet. The fringe order (2π integer phase jumps) used for phase unwrapping can be obtained pixel by pixel through a semantic segmentationbased deeplearning framework of the encoderdecoder structure. Recently, they developed an enhanced phase unwrapping framework—PhaseNet 2.0, which could directly map a noisy wrapped phase to a denoised absolute one^{324}. Zhang et al.^{325} transferred the task of phase unwrapping to a multiclass classification problem and generated fringe orders by feeding the wrapped phase into a convolutional segmentation network. Zhang et al.^{53} proposed a deeplearningbased approach for rapid 2D phase unwrapping, which demonstrated good denoising and unwrapping performance and outperformed the conventional pathdependent and pathindependent methods. Kando et al.^{326} applied UNet to achieve absolute phase prediction from a single interferogram, and the quality of the recovered phase was superior to that obtained by the conventional FT method, especially for closedfringe patterns. Li et al.^{327} proposed a deeplearningbased phase unwrapping strategy for closed fringe patterns. They compared four different network structures for phase unwrapping and found that the improved FCN architecture performed the best in terms of accuracy and speed. However, it should be mentioned that, similar to the case of fringe denoising, true absolute phase maps corresponding to the real experimentally obtained wrapped phase maps are generally quite hard to obtain in many interferometric techniques (which requires sophisticated multiwavelength illuminations and heterodyne operations). Therefore, the training datasets used in the abovementioned deeplearningbased spatial phase unwrapping methods are generated based on numerical simulation instead of real experiments. Moreover, since only one single wrapped phase map is used as input, the abovementioned deeplearningbased spatial phase unwrapping methods still suffers from the 2π ambiguity problem inherent in traditional phase measurement techniques.

Temporal phase unwrapping: Our group developed a deeplearningbased temporal phase unwrapping framework, as illustrated in Fig. 18a^{52}. The inputs of the network are a singleperiod (wrapfree) phase map and a highfrequency wrapped phase map, from which the constructed CNN could directly predict the fringe orders corresponding to the highfrequency phase to be unwrapped. Figure 18b–e gives the comparison between the traditional multifrequency temporal phase unwrapping (MFTPU) method^{174} and the deeplearningbased approach for the 3D reconstructions obtained by unwrapping the wrapped phase maps using the (1–32) and (1–64) frequency combination of fringe patterns, respectively. In comparison with MFTPU, the deeplearningassisted method produced phase unwrapping results with higher accuracy and robustness even in the case of different types of error sources (low SNR, intensity nonlinearity, and object motion). Liu et al.^{328} further improved this approach by using a lightweight classification CNN to extract the fringe orders from a pair of lowhighfrequency phase maps, which saved a large amount of training time and made it possible to deploy the network on mobile devices. Li et al.^{329} proposed a deeplearningbased dualwavelength phase unwrapping approach in which only a singlewavelength interferogram was used to predict another interferogram recorded at a different wavelength with a conditional GAN (CGAN). Though their approach still suffered from the phase ambiguity problem when measuring discontinuous surface or isolated objects, it provided an effective and potential solution to phase unwrapping and extended the measurement range of singlewavelength interferometry and holography techniques. Yao et al. designed FCNs by incorporating residual layers to predict the fringe orders of wrapped phases from only two^{330} or even single^{331} Graycode image(s), significantly reducing the required images compared with the conventional Graycode technique.

Geometric phase unwrapping: Our group proposed a deeplearningassisted geometric phase unwrapping approach for singleshot 3D surface measurement^{332}. The flowchart of this approach is shown in Fig. 19a. Two CNNs (CNN1 and CNN2) were constructed for phase retrieval and phase unwrapping, respectively. Based on a stereo camera system, dualview singleshot fringe patterns, as well as the reference plane images, were fed into CNN2 to determine the fringe orders. With the predicted wrapped phases and fringe orders, the absolute phase map can be recovered. Figure 19b–e is the comparison of 3D reconstructions obtained through different conventional geometric phase unwrapping methods^{175,179,186} and the deeplearningbased method, demonstrating that the deeplearningbased method can robustly unwrap the wrapped phases of dense fringe patterns within a larger measurement volume under the premise of singleframe projection. It should be mentioned that it is indeed a straightforward idea to establish the relationship between the fringe pattern to the corresponding absolute phase directly. However, since the rationality of the deeplearningbased approach is largely dependent on the input data, when the input fringe itself is ambiguous, the network can never always produce reliable phase unwrapped results. For example, in Yu’s work^{333}, when there exist large depth discontinuities and isolated objects, even with the assistance of deep learning, one fringe image is insufficient to eliminate the 2π phase ambiguity.

In DIC and stereophotogrammetry, image analysis aims to determine the displacement vector of each pixel point between a pair of acquired images. Recently, deep learning has also been extensively applied to stereomatching in order to achieve improved performance compared with traditional subset correlation and subpixel refinement methods.

Subset Correlation: Zbontar and LeCun^{334} presented a deeplearningbased approach for estimating the disparity map from a rectified stereo image pair. A siamesestructured CNN was reconstructed to address the matching cost computation problem through learning the similarity measure from small image patches. The output of CNN was utilized for initializing the stereomatching cost, followed by some postprocessing processes, as shown in Fig. 20a. Figure 20d–h is the disparity images obtained from the traditional Census transform method^{335} and the deeplearningbased method, from which we can see that the deeplearningbased approach achieved a lower error rate and better prediction result. Luo et al.^{336} exploited a siamese CNN connected by point product layer to speed up the calculation of matching score and obtained improved matching performance. Recently, our group improved Luo’s network by introducing additional residual blocks and convolutional layers to the head of the neural network and replacing the original inner product with the fully connected layers with shared weights^{337}. The improved network can extract a more accurate initial absolute disparity map from speckle image blocks after epipolar correction, and showed better matching capability than Luo’s network. Hartmann et al.^{338} constructed a CNN with five siamese branches to learn a matching function, which could directly predict a scalar similarity score from multiple image patches. It should be noted that the siamese CNN is one of the most widely used network structures in stereovision applications, which has been frequently employed and continuously improved for subset correlation tasks^{339,340,341,342,343}. On a different note, Guo et al.^{344} improved the 3Dstacked hourglass network to obtain the cost volume by groupwise correlation and then realized stereomatching. Besides conventional supervised learning approaches, unsupervised learning was also introduced to subset correlation. Zhou et al.^{345} proposed an unsupervised deeplearning framework for learning the stereomatching cost, using a leftright consistency check to guide the training process to converge to a stable state. Kim et al.^{346} constructed a semisupervised network to estimate stereo confidence. First, the matching probability was calculated according to the matching cost with residual networks. Then, the confidence measure was estimated based on a unified deep network. Finally, the confidence feature of the disparity map is extracted by synthesizing the results obtained by the two networks.

Subpixel refinement: Pang et al.^{347} proposed a cascade (twostage) CNN architecture for subpixel stereomatching. Figure 21a shows the flowchart of their method. In the first stage, the disparity image with more details was obtained from the input stereo images through DispFulNet (“Ful” means full resolution) equipped with extra upsampling modules. Then the initialized disparity was rectified and the residual signals across multiple scales were generated through the hourglass structure DispResNet (“Res” means residual) in the second stage. According to the combination of the outputs from the two stages, the final disparity with subpixel accuracy can be obtained. Figure 21d–g shows the predicted disparity images and error distributions of the input stereo image pairs (Fig. 21b) obtained by DispFulNet and DispResNet. It can be seen from the experimental results that after the second stage of optimization, the quality of the disparity was significantly improved. Based on different considerations, a large variety of network structures were proposed for subpixel refinement, e.g., StereoNet^{348}, LGCNet^{349}, DeepMVS^{350,351}, StereoDRNet^{352}, DeepPruner^{353}, LAFNet^{354}, 3D CNN^{355}, MADNet^{356}, Unos^{357}, leftright comparative recurrent model^{358}, CNNbased disparity map optimization^{359}, deeplearningbased fringeimageassisted stereomatching method^{360}, and UltraStereo^{361}.


(3)
Postprocessing: Deeplearning techniques also play an important role in the final postprocessing stage of the imageprocessing architecture of optical metrology. Examples of applying deep learning for postprocessing are very diverse, including further optimization of the measurement results (e.g., phase denoising, error compensation, and refocusing) and converting the measured intermediate variable to the desired physical quantity (e.g., system calibration and phasetoheight mapping in FPP).

Denoising: Montrésor et al.^{362} proposed to use DnCNN for phase denoising. As illustrated in Fig. 22a, the sine and cosine components of the noisy phase map were fed into a DnCNN to produce the corresponding denoised version, and the resultant phase information was calculated by the arctangent function. The phase was then fed back into and refined by DnCNN again, and this process was repeated several times to achieve a better denoising performance. In order to generate more realistic training datasets via simulation, the additive amplitudedependent speckle noise was carefully modeled by taking its nonGaussian statistics, nonstationary properties, and a correlation length into account. Figure 22b–e shows the comparison of the denoising results obtained by WFT^{114} and the deeplearning methods, suggesting that DnCNN yielded comparable standard deviation but lower peaktovalley phase error than WFT. Yan et al.^{363} proposed a CNNbased wrapped phase denoising method. By filtering the original numerator and denominator of the arctangent function, phase denoising results can be achieved without tuning any parameters. They also presented a deeplearningbased phase denoising technique for digital holographic speckle pattern interferometry^{364}. Their approach could obtain an enhanced wrapped phase map by significantly suppressing the speckle noise, and outperformed traditional phase denoising methods when processing phases with steep spatial gradients.

Digital refocusing: Ren et al.^{365} proposed the holographic reconstruction network (HRNet) to deal with the holographic reconstruction problem, which could perform automatic digital refocusing without employing any prior knowledge. Figure 23a gives the schematic of their deeplearning workflow, where a hologram input (the first block) was fed into HRNet, and then the reconstructed image (the third block) corresponding to the specific input was directly predicted. A typical lensfree MachZehnder interferometer was constructed to acquire training input images, and traditional convolution method^{366}, PCA aberration compensation^{226}, manual artifacts removal, and phase unwrapping^{367} were successively employed to obtain the corresponding label images. Figure 23b–f shows the results of refocusing and hologram reconstruction with different methods, proving that the predicted images by HRNet were precisely infocus and noisefree, whereas there are significant noises and artifacts in the reconstruction results obtained by traditional convolution and angular spectrum method^{368}. Alternatively, the autofocusing problem in DH could be recast as a regression problem, with the focal distance being a continuous response corresponding to a digital hologram. Ren et al.^{369} constructed a CNN to achieve nonparametric autofocusing for digital holography, which could accurately predict the focal distance without knowing the physical parameters of the optical imaging system. Lee et al.^{370} constructed a CNNbased estimator combined with Discrete Fourier Transform (DFT) to realize the automatic focusing of offaxis digital holography. Their method can automatically determine the objecttoimage distance rapidly and effectively, and a sharp infocus image of the object can be reconstructed accurately. Shimobaba et al.^{371} used the regressionbased CNN for holographic reconstruction, which could directly predict the sample depth position with millimeter accuracy from the power spectrum of the hologram. Jaferzadeh et al.^{372} proposed a regressionlayertoped CNN to determine the optimal focus position for numerical reconstruction of microsized objects, which can be extended to the study of biological samples such as cancer cells. Pitkäaho et al.^{373} constructed a CNN based on AlexNet and VGG16 to learn the defocus distances from a large number of holograms. The welltrained network can determine the highaccuracy infocus position of a new hologram without resorting to conventional numerical propagation algorithms.

Error compensation: Nguyen et al.^{374} proposed a phaseaberration compensation framework combining CNN and Zernike polynomial fitting, as illustrated in Fig. 24a. The unwrapped phase aberration map of the hologram was fed into a CNN with the UNet structure to detect the background regions, which were then sent into the Zernike polynomial fitting^{375} to determine the conjugated phase aberration. For training data collection/preparation, the PCA method^{226} was used for training data collection/preparation. Figure 24b–e gives the phase aberration compensation results of PCA and the deeplearning method, showing that the phase aberrations were completely eliminated by using the deeplearning technique, while they were still visible in the phase results obtained by the PCA method. In addition, the deeplearningbased technique was fully automatic, and the robustness and accuracy were shown to be superior to PCA. Lv et al.^{376} used DNN to compensate projector distortioninduced measurement errors in a FPP system. By learning the mapping between the 3D coordinates of the object and their corresponding distortioninduced error distribution, the distortion errors of the original test 3D data can be accurately predicted. Aguenounon et al.^{377} leveraged a DNN with a double UNet structure to provide the single snapshot of optical properties imaging with the additional function of realtime profile correction. The realtime visualization of the resulting profilecorrected optical property (absorption and reduced scattering) map has the potential to be deployed to guide surgeons.

Quantity transformation: Li et al.^{378} proposed an accurate phaseheight mapping approach for fringe projection based on a “shallow” (3layer) BP neural network. The flowchart of their method is shown in Fig. 25a, where the camera image coordinates (X_{ci}, Y_{ci}) and their corresponding horizontal ones X_{pi} of the projector image were fed into the network to predict the desired 3D information (X_{i}, Y_{i}, Z_{i}). To obtain the training data, a standard calibration board with circle marks fixed on a highprecision displacement stage was captured at different Zdirection positions. With the captured images, the marks’ centers coordinates (X_{ci}, Y_{ci})with subpixel accuracy were extracted with the conventional circle center detection algorithm^{379}, and the horizontal coordinates X_{pi} of the corresponding projector image for each mark center were calculated through the absolute phase value. Figure 25b shows the 3D reconstruction result of a standard stair sample predicted by the neural network. Figure 25c and d shows the error distributions of the measurement results obtained by traditional phaseheight conversion method^{380} and neural network, showing that the learningbased method was insensitive to the fringe intensity nonlinearity and could recover the 3D shape of a workpiece with high accuracy.
Endtoend learning in optical metrology
As mentioned earlier, “divide and conquer” is a core idea of solving complex optical metrology problems by breaking the whole imageprocessing pipeline into several modules or substeps. On a different note, deep learning enables direct mapping between the original input and the desired output, and the whole process can be trained as a whole, in an endtoend fashion. Although somewhat bruteforce, such a straightforward treatment has been extensively used in deep learning, and gradually introduced to many subfields of optical metrology, e.g., FPP and DIC.

From fringe to 3D shape: In FPP, the imaging processing pipeline generally consists of preprocessing, phase demodulation, phase unwrapping, and phasetoheight conversion. Deep learning provides a viable and efficient way to reconsider the whole problem from a holistic perspective, taking human intervention out of the loop and solving the “fringe to 3D shape” problem in a purely datadriven manner. Based on this idea, Nguyen et al.^{381} proposed an endtoend neural network to directly perform the mapping from a fringe pattern to its corresponding 3D shape, the flowchart of which is shown in Fig. 26a. Three different deep CNNs, including FCN, autoencoder^{299}, and UNet, were trained based on the datasets obtained by the conventional multifrequency phaseshifting profilometry method. Figure 26b, c gives an input and its corresponding groundtruth 3D shape. Figure 26c shows the best 3D reconstruction results predicted by the three networks with the depth measurement accuracy of 2mm. Van et al.^{382} presented an SRCNNbased DNN to directly extract absolute height information from a singlefringe image. Through simulated fringe and depth image pairs, the trained network was able to obtain highaccuracy fullfield depth information from a singlefringe pattern. Recently, they compared the effect of different loss functions (MAE, MSE, and SSIM) on a modified UNet for mapping a fringe image to the corresponding depth, and designed a new mixed gradient loss function that yielded higherquality 3D reconstructions than conventional ones^{383}. Machineni et al.^{384} constructed a CNN with multiresolution similarity assessment to directly reconstruct the object’s shape from the corresponding deformed fringe image. Their proposed method can achieve promising results under various challenging conditions such as low SNR, low fringe density, and high dynamic range. Zheng et al.^{385} utilized the calibration matrix from a realworld FPP system to construct its “digital twin”, which provided abundant simulation data (fringe pattern and corresponding depth map) required for the model training. The trained UNet can then be employed to the realworld FPP system to extract the 3D geometry encoded in the fringe pattern in one step. Similarly, Wang et al.^{386} constructed a virtual FPP system for the training dataset generation. A modified loss function based on SSIM index was employed, providing improved performance in terms of measurement accuracy and detail preservation.

From stereo images to disparity: Deep learning can also be applied to DIC and stereophotogrammetry to bypass all intermediate imageprocessing steps in the pipeline for displacement and 3D reconstruction. Mayer et al.^{387} presented endtoend networks for the estimation of disparity (DispNet) and optical flow (FlowNet). In DispNet, a 1D correlation was proposed along the disparity line corresponding to the stereo cost volume. In addition, they also offered a large synthetic dataset, Scene Flow^{388}, for training largescale stereomatching networks. Kendall et al.^{389} established an endtoend Geometry and Context Network (GCNet) mapping from a rectified pair of stereo images to disparity maps with subpixel accuracy (Fig. 27a). Stereo images were fed into the network to directly output disparity images of two perspectives. Figure 27b–d shows the test results on Scene Flow, where Fig. 27b is the left input, Fig. 27c is the disparity predicted by deep learning, and Fig. 27d is the ground truth. Experimental results show that the endtoend learning method produced highresolution disparity images and could tolerate large occlusions. Chang et al.^{390} developed a pyramid stereomatching network (PSMNet) to enhance the matching accuracy by using the 3D CNNbased spatial pyramid pooling and multiple hourglass networks. Zhang et al.^{391} proposed a cost aggregation network incorporating the local guided filter and semiglobalmatchingbased cost aggregation, achieving higher matching quality as well as better network generalization. Recently, our group proposed an endtoend speckle correlation strategy for 3D shape measurement, where a multiscale residual subnetwork was utilized to obtain feature maps of stereo speckle images, and the 4D cost volume at onefourth of the original^{392}. Besides, a saliency detection network was integrated to generate a pixelwise mask to exclude the shadownoised regions. Nguyen et al.^{393} used three UNetbased networks to convert a single speckle image into its corresponding 3D information. It should be mentioned that stereophotogrammetry is a representative field that deep learning has been extensively applied. Many other endtoend deeplearning structures directly mapping stereo images to disparity have been proposed, such as hybrid CNNCRF models^{394}, Demon (CNNbased)^{395}, MVSNet (CNNbased)^{396}, CNNbased disparity estimation through feature constancy^{397}, Segstereo^{398}, EdgeStereo^{399}, stereomatching with explicit cost aggregation architecture^{400}, HyperDepth^{401}, practical deep stereo (PDS)^{402}, RNNbased stereomatching^{403,404}, and unsupervised learning^{405,406,407,408,409}. For DIC, Boukhtache et al.^{410} presented an enhanced FlowNet (socalled StrainNet) to predict displacement and strain fields from pairs of deformed and reference images of a flat speckled surface. Their experimental results demonstrated the feasibility of the deeplearning approach for accurate pixelwise subpixel measurement over full displacement fields. Min et al.^{411} proposed a 3D CNNbased strain measurement method, which allowed simultaneous characterization in spatial and temporal domains from the surface images obtained during a tensile test of BeCu thin film. Rezaie et al.^{412} compared the performance of conventional DIC method and their deeplearning method based on UNet for detecting cracks on stone masonry wall images, showing that the learningbased method could detect most visible cracks and better preserve the crack geometry.
It should be mentioned that, not just limited to phase or correlation measurement techniques, deep learning has also been widely adopted in many other fields of optical metrology. However, due to space limitations, it is not possible to describe or discuss all of them. Examples include but are not limited to the time of flight (ToF)^{413,414,415,416,417,418}, photometric stereo^{419,420,421,422,423,424,425}, wavefront sensing^{426,427,428,429}, aberrations characterization^{430}, and fiber optic imaging^{431,432,433,434,435}, etc.
After reviewing hundreds of recent works leveraging deep learning for different optical metrology tasks, readers may still be interested to know to apply these new datadriven approaches to their own problems or projects. To help the reader, we present a stepbystep guide to applying deep learning to optical metrology in the Supplementary Information, taking phase demodulation from a singlefringe pattern as an example. We explain how to build a DNN with fully convolutional network architectures and train it with the experimentally collected training dataset. We also distribute the source code and the corresponding datasets for this example. Based on this example, we demonstrate that a welltrained DNN can accomplish the phasedemodulation task in an accurate and efficient manner, using only a singlefringe pattern as input. Thus, it is capable of combining the singleframe strength of the spatial phase demodulation methods with the highmeasurement accuracy of the temporal phasedemodulation methods. The interested reader may refer to the Supplementary Information for the stepbystep tutorial.
Deep learning in optical metrology: challenges
Our review in the last section shows that the deeplearning solutions in optical metrology are straightforward, but have led to improved performance compared with the stateoftheart. In this session, we will shift our attention to reveal some challenges of the use of deep learning in optical metrology, which require further attention and careful consideration:

High cost of collecting and labeling experimental training data: Most of the deeplearning techniques reviewed belong to supervised learning, which requires a large amount of labeled data to train the network. To account for real experimental conditions, deeplearning approaches can benefit from large amounts of experimental training data. Since these data serve as ground truth with sufficiently high accuracy, they are usually expensive to collect^{436}. In addition, since the optical metrology system is highly customized, training data collected by one system may not be suitable for another system of the same type. This may explain why there were far fewer publicly available datasets in the field of optical metrology (especially compared with the computer vision community). Without such public benchmark datasets, it is difficult to make a fair and standardized comparison between different algorithms. Although some emerging machine learning approaches, such as transfer learning^{437}, fewshot learning^{438}, unsupervised learning^{244}, and weaksupervised learning^{439}), can decrease the reliance on the amount of data to some extent, their performance is not comparable to that of supervised learning with large data numbers so far.

Ground truth inaccessible for experimental data: In many areas of optical metrology, e.g., fringe or phase denoising, it is infeasible or even impossible to get the actual ground truth of the experimental data. As discussed in previous sections, generating a training dataset by simulating the forward image formation process can bypass this difficulty^{362,385}, often at the price of compromised actual performance when the knowledge of the forward image formation model \({{{\mathcal{A}}}}\) is imprecise or simulated dataset fails to reflect the real experimental system realistically and comprehensively. An alternative approach to this issue is to create a “quasiexperimental” dataset by collecting experimental raw data and then using the conventional stateoftheart solutions to get the corresponding labels^{308,309,310}. Essentially, the network is trained to “duplicate” the approximate inverse operator \(\tilde {{{\mathcal{A}}}}^{  1}\) corresponding to the conventional algorithm that is used to generate the labels. After training, the network is able to emulate the conventional reconstruction algorithm \(\widehat {{{{\mathcal{R}}}}_\theta }\left( {{{\mathbf{I}}}} \right) \approx {{{\mathrm{ }}}}\tilde {{{\mathcal{A}}}}^{  1}\left( {{{\mathbf{I}}}} \right)\), but the improvement in performance over conventional approaches becomes an unreasonable expectation.

Empiricism in network design and training: So far, there is no standard paradigm for selecting appropriate DNN architectures because it requires a comprehensive understanding of the topology, training methods, and other parameters. In practice, we usually determine our network structure by evaluating different available candidate models, or comparing similar taskspecific models by training them with different hyperparameters settings (network layers, neural units, and activation functions) on a specific validation dataset^{440}. However, the overwhelming number of deeplearning models often limits one to evaluating only a few of the most trustworthy models, which may lead to suboptimal results. Therefore, one should learn how to quickly and efficiently narrow down the range of available models to find those most likely to be best performing on a specific type of problem. In addition, training a DNN is generally laborious and timeconsuming, and becomes even worse with repetitive adjustments in the network architecture or hyperparameters to prevent overfitting and convergence issues.

Lack of generalization ability after specific sample training: The generalization ability of deeplearning approaches is closely related to the size and diversity of training samples. Generally, deeplearning architectures used in optical metrology are highly specialized to a specific domain, and they should be implemented with extreme care and caution when solving issues that do not pertain to the same domain. Thus, we cannot ignore the risk that when a neverbeforeexperienced input differs even slightly from what they encountered at the training stage, the mapping \(\widehat {{{{\mathcal{R}}}}_\theta }\) established by deep networks may quickly stop making sense^{441}. This is quite different from the traditional optical metrology solutions in which the reliability of the reconstruction can be secured for diverse types of samples as long as “the forward model \({{{\mathcal{A}}}}\) is accurate” and “the corresponding reconstruction algorithm \(\tilde {{{\mathcal{A}}}}^{  1}\) is effective”.

“Deep learning in computer vision” ≠ “Deep learning in optical metrology”: Deep learning is essentially the process of using computers to help us find the underlying patterns within the training dataset. Since the information cannot be “born out of nothing”, DNNs cannot always produce a provably correct solution. Compared with many computer vision tasks, optical metrology concerns more on accuracy, reliability, repeatability, and traceability^{442}. For example, surface defect inspection is an indispensable qualitycontrol procedure in manufacturing processes^{443}. When using deep learning for optical metrological inspection, one may face the risk that a defect in an industrial component is “smoothed out” and undetected by an overfitted DNN in the inspection stage, which will make the entire production run defective. Since the success of deep learning depends on the “common” features learned and extracted from the training samples, which may lead to unsatisfactory results when facing “rare samples”.

“Deep learning” lacks the ability of “deep understanding”: The “black box” nature of DNNs, which is arguably one of their most wellknown disadvantages, prevents us from knowing how the neural network generates expected results from specific inputs by learning a large amount of training data. For example, when we send a fringe pattern into a neural network, and it outputs a poor phase image, it is not easy to comprehend what makes it arrive at such a prediction. Interpretability is critical in optical metrology because it ensures the traceability of the mistake. Consequently, most researchers in optical metrology community use deeplearning approaches in a pragmatic fashion without the possibility to explain why it provides good results or without the ability to explain the logical bases and apply modifications in the case of underperformance.
Deep learning in optical metrology: future directions
Although the above challenges have not been adequately addressed, optical metrology is now surfing the wave of deep learning, following a trend similarly being experienced in many other fields. This field is still young, but is expected to play an increasingly prominent role in the future development of optical metrology, especially with the evolution of computer science and AI technology.

Hybrid, composite, and automated learning: It must be admitted that at this stage, deeplearning methods for optical metrology are still limited to some elementary techniques. There is further untapped potential as a number of latest innovations in deep learning can be directly introduced into the context of optical metrology. (1) Hybrid learning methods, such as semisupervised^{242}, unsupervised^{244}, and selfsupervised learning^{444}, are capable of extracting valuable insights from unlabeled data, which is extremely attractive as the availability of groundtruth or labeled data in optical metrology is very limited. For example, GANs utilize two networks in a competitive manner, generator and discriminator, to deceive each other during the training process to generate the final prediction without specific labels^{266}. In stereovision, the network models trained by unsupervised methods have been shown to produce better disparity prediction results in real scenes^{345}. (2) Composite learning approaches attempt to combine different models pretrained on a similar task to produce a composite model with improved performance^{437} or search for the optimal network architecture in the reinforcement learning environment for a certain dataset^{445}. They are premised on the idea that a singular model, even very large, cannot outperform a compositional model with several small models/components, each being delegated to specialize in part of the task. As optical metrology tasks are getting more and more complicated, composite learning can deconstruct one huge task into several simpler, or singlefunction components and make them work together, or against each other, producing a more compressive and powerful model. (3) Automated machine learning (AutoML) approaches, such as Google AutoML^{446} and Azure AutoML^{447}, is developed to execute tedious modeling tasks that once performed by professional scientists^{440,448}. It burns through an enormous number of models and the associated hyperparameters on the raw input data to decide what model is best applied to it. Consequently, AutoML is expected to permit even “citizen” AI scientists with their background in optical metrology to make streamlined use cases by only utilizing their domain expertise, offering practitioners a competitive advantage with minimum investments.

Physicsinformed deep learning: Unlike traditional physicsmodelbased optical metrology methods for which the domain knowledge is carefully engineered into solutions, most of the current deeplearningbased optical metrology methods do not benefit so much from such prior knowledge but rather learn the solution from scratch by making use of massive training data. In contrast, if the physics laws governing the image formation (the knowledge about the forward image formation model \({{{\mathcal{A}}}}\)) are known—even partially, they should be naturally incorporated into the DNN model so that the training data and network parameters are not wasted on “learning the physics”. For example, in fringe analysis, inspired by the conventional phaseshifting techniques, Feng et al.^{50} proposed to learn the sine and cosine components of the fringe pattern, based on which the wrapped phase can be calculated by the arctangent function (Fig. 28c, d). This method shows a significant gain in performance than directly using an endtoend network structure^{50} (Fig. 28a, b). Goy et al.^{302} suggested a method for lowphoton count phase retrieval where the noisy input image was converted into an approximant. As the approximant obtained by prior knowledge is much closer to the final prediction than the raw lowphoton image, the phase reconstruction accuracy by using deep learning can be improved significantly. Wang et al.^{449} incorporated the diffraction model of numerical propagation into a DNN for phase retrieval. By minimizing the difference between the actual input image and the predicted input image, DNN learns how to reconstruct the phase that best matches the measurements without any groundtruth data.

Interpretable deep learning: As we have already highlighted in the previous sections, most researchers in optical metrology use deeplearning approaches intuitively without the possibility to explain why it produces such “good” results. This can be very problematic in highstakes settings such as industrial inspection, quality control, and medical diagnose where the decisions of algorithms must be explainable, or where accountability is required. Academics in deep learning are acutely aware of this interpretability problem, and there have been several developments in recent years for visualizing the features and representations they have learned by DNNs^{284}. On the other hand, often applied to highrisk scenarios, optical metrology is among the most significant deeplearning challenges—we are dealing with unknown, uncertain, ambiguous, incomplete, noisy, inaccurate, and missing datasets in highdimensional spaces. The unexplainability and incomprehensibility of deep learning also imply the predictions are at risk of failure. Figure 29 illustrates one such example, where a welltrained deeplearning model for stereophase unwrapping fails when there exists depth ambiguity in a certain perspective^{332}. Therefore, explainability will become a key strength in deeplearning techniques to interpret and explain models, which would significantly expand the usefulness of deeplearning methods in optical metrology.

Uncertainty quantification: Characterizing uncertainty in deeplearning solutions can help make better decisions and take precautions against erroneous predictions, which is essential for many optical metrology tasks^{450}. However, most deeplearning methods reviewed in this work cannot provide uncertainty estimates. In recent years, Bayesian deep learning has emerged as a unified probabilistic framework that tightly integrates deep learning with Bayesian models^{451}. By using a GAN training framework to estimate a posterior distribution of images fitting a given measurement dataset (or estimation statistics derived from the posterior), Bayesian convolutional neural networks (BNNs) can quantify the reliability of predictions through two predictive uncertainties, including model uncertainty and data uncertainty, akin to epistemic and revelation uncertainty in Bayesian analysis, respectively^{452}. It is expected to be adopted in optical metrology applications, e.g., fringe pattern analysis, to give pixelwise variance estimates and data uncertainty evaluation (Fig. 30)^{453}. The latter further allows assessment of the randomness of predictions stemming from data imperfections, including noise, incompleteness of the training data, and other experimental perturbations. Incorporating similar uncertainty quantification into other deeplearningbased optical metrology methods, especially when the ground truth is unavailable, is an interesting direction for future research.

Guiding the metrology system design: Most of the current work using deep learning in optical metrology only considers how to reconstruct the measured data as a postprocessing algorithm while ignoring the way how the image data should be formed. However, an important feature of optical metrology methods is their active nature, especially with respect to the way of manipulating the illumination. For example, in FPP, the structure of the illumination is modulated systematically throughout the object surface to deliver high accuracy and robustness in establishing the triangulation. The design of the illumination coding strategy is curial to improving the measurement accuracy removing the ambiguity of the depth reconstruction with a minimum number of image acquisitions. However, this problem has long been tackled using heuristics like composite coding, frequency multiplexing, and color multiplexing, which does not guarantee optimality (in terms of facilitating the recovery of desired information). Deep learning provides a mechanism to optimize the system design in a more principled way. By integrating the image formation model (with trainable parameters controlling the image acquisition) into the reconstruction network, the system design and the reconstruction algorithm (i.e., both \({{{\mathcal{A}}}}\) and the corresponding \(\widehat {{{{\mathcal{R}}}}_\theta }\)) can be jointly optimized with the training data^{454}. It allows us to determine which type of system design can yield the best results for a particular deeplearningdriven task. Such an idea has been successfully demonstrated in designing optimal illumination patterns for computational microscopes^{455,456,457}. We hope that this “joint optimization” network can effectively bridge the gap between how images should be acquired and how these images should be postprocessed by deep learning, and can be widely adopted in designing the optical metrology systems, such as the fringe pattern design in FPP (Fig. 31), and the speckle pattern design in DIC, etc.

Both “deep” and “indepth”: Should we use deep learning or traditional optical metrology algorithms? It is a tough question to answer because it depends heavily on the problem to be solved. Considering the “no free lunch theorem”, the selection between deeplearning and traditional algorithms should be considered rationally. For several problems where traditional methods based on physics models, if implemented properly, can deliver straightforward and more than satisfying solutions, there is no need to use deep learning. However, sometimes this kind of “unnecessary” may not be recognized easily. While being functionally effective, we should keep in mind that “how best deep learning can do” generally depends on “how reliable the training data we can provide.” For example, though the popular “learning from simulation” scheme used in optical metrology eliminates the dependence on huge labeled experimental data, the inconsistency between the image formation model and actual experimental condition leads to additional challenges of “domain adaptation”. Therefore, our personal view is that deep learning does not (at least at the current stage) make our research easier. On the contrary, it raises the threshold for optical metrology research because it requires researchers not only need to use and understand deep learning deeply but also need to take “indepth” research in traditional algorithms so as to make an impartial and objective assessment between deep learning and traditional optical metrology algorithms (Fig. 32).
Conclusions
A brief summary of this review indicates that there has been significant interest in the advancement of optical metrology technologies using deeplearning architectures. The rapid development of deeplearning technology has led to a paradigm shift from physics and knowledgebased modeling to datadriven learning for solving a wide range of optical metrology tasks. In general, deep learning is particularly advantageous for many problems in optical metrology whose physical models are complicated and acquired information is limited, e.g., in harsh environments and many challenging applications. Strong empirical and experimental evidence suggests that using problemspecific deeplearning models outperforms conventional knowledge or physical modelbased approaches.
Despite the promising—in many cases pretty impressive—results that have been reported in the literature, potential problems and challenges remain. For model training, we need to acquire large amounts of experimental data with labels, which, even if available, is laborious and requires professional experts. We have been looking for the theoretical groundwork that would clearly explain the mechanisms and ways to the optimal selection of network structure and training algorithm for a specific task, or to profoundly comprehend why a particular network structure or algorithm is effective in a given task or not. Furthermore, deeplearning approaches have often been regarded as “black boxes”, and in optical metrology, accountability is essential and can cause severe consequences. Combining Bayesian statistics with deep neuron networks to obtain quantitative uncertainty estimates allows us to assess when the network yields unreliable predictions. A synergy of the physicsbased models that describe the a priori knowledge of the image formation and datadriven models that learn a regularizer from the experimental data can bring our domain expertise into deep learning to provide more physically plausible solutions to specific optical metrology problems. Leveraging these emerging technologies in the application of deeplearning methods to optical metrology could promote and accelerate the recognition and acceptance of deep learning in more application areas. These are among the most critical issues that will continue to attract the interest of deeplearning research in the optical metrology community in the years to come.
In summary, although for different optical metrology tasks, deeplearning techniques can bring substantial improvements compared to traditional methods, the field is still at the early stage of development. Many researchers are still skeptical and maintain a waitandsee attitude towards its applications involving industrial inspection and medical care, etc. Shall we accept deep learning as the key problemsolving tool? Or should we reject such a blackbox solution? These are controversial issues in the optical metrology community today. Looking on the bright side, it has promoted an exciting trend and fostered expectations of the transformative potential it may bring to the optical metrology society. However, we should not overestimate the power of deep learning by considering it as a silver bullet for every challenge encountered in the future development of optical metrology. In practice, we should assess whether the large amount of data and computational resources required to use deep learning for a particular task is worthwhile, especially when other conventional algorithms may yield comparable performance with lower complexity and higher interpretability. We envisage that deep learning will not replace the role of traditional technologies within the field of optical metrology for the years to come, but will form a cooperative and complementary relationship, which may eventually become a symbiotic relationship in the future.
Change history
27 March 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41377022007570
References
Gåsvik, K. J. Optical Metrology, 3rd edn. (Wiley, 2002).
Yoshizawa, T. Handbook of Optical Metrology: Principles and Applications, 2nd edn. (CRC Press, 2017).
Sirohi, R. S. Introduction to Optical Metrology (CRC Press, 2016).
Malacara, D. Optical Shop Testing, 3rd edn. (John Wiley & Sons, 2007).
Harding, K. Handbook of Optical Dimensional Metrology (CRC Press, 2013).
Chen, Z. G. & Segev, M. Highlighting photonics: looking into the next decade. eLight 1, 2 (2021).
Kleppner, D. On the matter of the meter. Phys. Today 54, 11–12 (2001).
Kulkarni, R. & Rastogi, P. Optical measurement techniques—a push for digitization. Opt. Lasers Eng. 87, 1–17 (2016).
Chen, F., Brown, G. M. & Song, M. M. Overview of 3D shape measurement using optical methods. Optical Eng. 39, 10–22 (2000).
Blais, F. Review of 20 years of range sensor development. J. Electron. Imaging 13, 231–243 (2004).
Rastogi, P. Digital Optical Measurement Techniques and Applications (Artech House, 2015).
Osten, W. Optical metrology: the long and unstoppable way to become an outstanding measuring tool. In Proceedings of SPIE 10834, Speckle 2018: VII International Conference on Speckle Metrology. 1083402 (SPIE, Janów Podlaski, Poland, 2018).
Wyant, J. C. & Creath, K. Recent advances in interferometric optical testing. Laser Focus 21, 118–132 (1985).
Takeda, M. & Kujawinska, M. Lasers revolutionized optical metrology. https://spie.org/news/spieprofessionalmagazinearchive/2010october/lasersrevolutionizedopticalmetrology?SSO=1 (2010).
Denisyuk, Y. N. On the reflection of optical properties of an object in a wave field of light scattered by it. Dokl. Akad. Nauk SSSR 144, 1275–1278 (1962).
Leith, E. N. & Upatnieks, J. Reconstructed wavefronts and communication theory. J. Optical Soc. Am. 52, 1123–1130 (1962).
Gabor, D. A new microscopic principle. Nature 161, 777–778 (1948).
Reid, G. T. Automatic fringe pattern analysis: a review. Opt. Lasers Eng. 7, 37–68 (1986).
Rajshekhar, G. & Rastogi, P. Fringe analysis: premise and perspectives. Opt. Lasers Eng. 50, iii–x (2012).
Rastogi, P. & Hack, E. Phase Estimation in Optical Interferometry (CRC Press, 2015).
Hariharan, P., Oreb, B. F. & Eiju, T. Digital phaseshifting interferometry: a simple errorcompensating phase calculation algorithm. Appl. Opt. 26, 2504–2506 (1987).
Schnars, U. & Jüptner, W. Digital Holography: Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer Science & Business Media, 2005).
Pan, B. et al. Twodimensional digital image correlation for inplane displacement and strain measurement: a review. Meas. Sci. Technol. 20, 062001 (2009).
Raskar, R., Agrawal, A. & Tumblin, J. Coded exposure photography: motion deblurring using fluttered shutter. ACM Trans. Graph. 25, 795–804 (2006).
Ritschl, L. et al. Improved total variationbased CT image reconstruction applied to clinical data. Phys. Med. Biol. 56, 1545 (2011).
Edgar, M. P., Gibson, G. M. & Padgett, M. J. Principles and prospects for singlepixel imaging. Nat. Photonics 13, 13–20 (2019).
Katz, O. et al. Noninvasive singleshot imaging through scattering layers and around corners via speckle correlations. Nat. Photonics 8, 784–790 (2014).
Stuart, A. M. Inverse problems: a Bayesian perspective. Acta Numerica 19, 451–559 (2010).
Osher, S. et al. An iterative regularization method for total variationbased image restoration. Multiscale Modeling Simul. 4, 460–489 (2005).
Goldstein, T. & Osher, S. The split Bregman method for L1regularized problems. SIAM J. Imaging Sci. 2, 323–343 (2009).
Osten, W. What optical metrology can do for experimental mechanics? Appl. Mech. Mater. 70, 1–20 (2011).
Zuo, C. et al. Phase shifting algorithms for fringe projection profilometry: a review. Opt. Lasers Eng. 109, 23–59 (2018).
Baraniuk, R. G. Compressive sensing [lecture notes]. IEEE Signal Process. Mag. 24, 118–121 (2007).
Zibulevsky, M. & Elad, M. L1L2 optimization in signal and image processing. IEEE Signal Process. Mag. 27, 76–88 (2010).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press Cambridge, 2016).
Chang, X. Y., Bian, L. H. & Zhang, J. Largescale phase retrieval. eLight 1, 4 (2021).
Fukushima, K. Neocognitron: a selforganizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).
Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
Baccouche, M. et al. Sequential deep learning for human action recognition. In Proceedings of the 2nd International Workshop on Human Behavior Understanding. 29–39 (Springer, Amsterdam, 2011).
Charles, R. Q. et al. PointNet: deep learning on point sets for 3D classification and segmentation. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 77–85 (IEEE, Honolulu, 2017).
Ouyang, W. L. & Wang, X. G. Joint deep learning for pedestrian detection. In Proceedings of 2013 IEEE International Conference on Computer Vision. 2056–2063 (IEEE, Sydney, NSW, 2013).
Dong, C. et al. Learning a deep convolutional network for image superresolution. In Proceedings of 13th European Conference on Computer Vision. 184–199 (Springer, Zurich, 2014).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Barbastathis, G., Ozcan, A. & Situ, G. On the use of deep learning for computational imaging. Optica 6, 921–943 (2019).
Wang, H. D. et al. Deep learning enables crossmodality superresolution in fluorescence microscopy. Nat. Methods 16, 103–110 (2019).
Rivenson, Y. et al. Phase recovery and holographic image reconstruction using deep learning in neural networks. Light.: Sci. Appl. 7, 17141 (2018).
Wang, F. et al. Learning from simulation: an endtoend deeplearning approach for computational ghost imaging. Opt. Express 27, 25560–25572 (2019).
Li, S. et al. Imaging through glass diffusers using densely connected convolutional networks. Optica 5, 803–813 (2018).
Feng, S. J. et al. Fringe pattern analysis using deep learning. Adv. Photonics 1, 025001 (2019).
Shi, J. S. et al. Label enhanced and patch based deep learning for phase retrieval from single frame fringe pattern in fringe projection 3D measurement. Opt. Express 27, 28929–28943 (2019).
Yin, W. et al. Temporal phase unwrapping using deep learning. Sci. Rep. 9, 20175 (2019).
Zhang, T. et al. Rapid and robust twodimensional phase unwrapping via deep learning. Opt. Express 27, 23173–23185 (2019).
Hao, F. G. et al. Batch denoising of ESPI fringe patterns based on convolutional neural network. Appl. Opt. 58, 3338–3346 (2019).
Yan, K. T. et al. Fringe pattern denoising based on deep learning. Opt. Commun. 437, 148–152 (2019).
Gerchberg, R. W. & Saxton, W. O. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik 35, 237–246 (1972).
Fienup, J. R. Phase retrieval algorithms: a comparison. Appl. Opt. 21, 2758–2769 (1982).
Teague, M. R. Deterministic phase retrieval: a Green’s function solution. J. Optical Soc. Am. 73, 1434–1441 (1983).
Zuo, C. et al. Transport of intensity equation: a tutorial. Opt. Lasers Eng. 135, 106187 (2020).
Zhang, F. C., Pedrini, G. & Osten, W. Phase retrieval of arbitrary complexvalued fields through apertureplane modulation. Phys. Rev. A 75, 043805 (2007).
Faulkner, H. M. L. & Rodenburg, J. M. Movable aperture lensless transmission microscopy: a novel phase retrieval algorithm. Phys. Rev. Lett. 93, 023903 (2004).
Zheng, G. N. et al. Concept, implementations and applications of Fourier ptychography. Nat. Rev. Phys. 3, 207–223 (2021).
Platt, B. C. & Shack, R. History and principles of ShackHartmann wavefront sensing. J. Refractive Surg. 17, S573–S577 (2001).
Ragazzoni, R. Pupil plane wavefront sensing with an oscillating prism. J. Mod. Opt. 43, 289–293 (1996).
Falldorf, C., von Kopylow, C. & Bergmann, R. B. Wave field sensing by means of computational shear interferometry. J. Optical Soc. Am. A 30, 1905–1912 (2013).
Fienup, J. R. Phase retrieval for optical metrology: past, present and future. in Optical Fabrication and Testing (eds Reinhard, V.) 2017. OW2B1 (Optical Society of America, 2017).
Claus, D. et al. Dual wavelength optical metrology using ptychography. J. Opt. 15, 035702 (2013).
Falldorf, C., Agour, M. & Bergmann, R. B. Digital holography and quantitative phase contrast imaging using computational shear interferometry. Optical Eng. 54, 024110 (2015).
Creath, K. V phasemeasurement interferometry techniques. Prog. Opt. 26, 349–393 (1988).
Hariharan, P. Basics of Interferometry, 2nd edn. (Elsevier, 2007).
Aben, H. & Guillemet, C. Integrated photoelasticity. in Photoelasticity of Glass (eds Aben, H. & Guillemet, C.) 86–101 (Springer, 1993).
Asundi, A. Phase shifting in photoelasticity. Exp. Tech. 17, 19–23 (1993).
Ramesh, K. & Lewis, G. Digital photoelasticity: advanced techniques and applications. Appl. Mech. Rev. 55, B69–B71 (2002).
Sciammarella, C. A. The moiré method—a review. Exp. Mech. 22, 418–433 (1982).
Post, D., Han, B. & Ifju, P. High Sensitivity Moiré: Experimental Analysis for Mechanics and Materials. (Springer Science & Business Media, 2012).
Durelli, A. J. & Parks, V. J. Moiré Analysis of Strain (Prentice Hall, 1970).
Chiang, F. P. Moiré methods of strain analysis. Exp. Mech. 19, 290–308 (1979).
Post, D., Han, B. & Ifju, P. Moiré interferometry. in High Sensitivity Moiré: Experimental Analysis for Mechanics and Materials (eds Post, D., Han, B. & Ifju, P.) 135–226 (Springer, 1994).
Rastogi, P. K. Holographic Interferometry: Principles and Methods (SpringerVerlag, 1994).
Kreis, T. Handbook of Holographic Interferometry: Optical and Digital Methods (John Wiley & Sons, 2004).
Hariharan, P., Oreb, B. F. & Brown, N. Realtime holographic interferometry: a microcomputer system for the measurement of vector displacements. Appl. Opt. 22, 876–880 (1983).
Heflinger, L. O., Wuerker, R. F. & Brooks, R. E. Holographic interferometry. J. Appl. Phys. 37, 642–649 (1966).
Khanna, S. M. & Tonndorf, J. Tympanic membrane vibrations in cats studied by timeaveraged holography. J. Acoustical Soc. Am. 51, 1904–1920 (1972).
Tonndorf, J. & Khanna, S. M. Tympanicmembrane vibrations in human cadaver ears studied by timeaveraged holography. J. Acoustical Soc. Am. 52, 1221–1233 (1972).
Schnars, U. et al. Digital holography. in Digital Holography and Wavefront Sensing: Principles, Techniques and Applications 2nd edn. (eds Schnars, U. et al.) 39–68 (Springer, 2015).
Cuche, E., Bevilacqua, F. & Depeursinge, C. Digital holography for quantitative phasecontrast imaging. Opt. Lett. 24, 291–293 (1999).
Xu, L. et al. Studies of digital microscopic holography with applications to microstructure testing. Appl. Opt. 40, 5046–5051 (2001).
Picart, P. et al. Timeaveraged digital holography. Opt. Lett. 28, 1900–1902 (2003).
Singh, V. R. et al. Dynamic characterization of MEMS diaphragm using time averaged inline digital holography. Opt. Commun. 280, 285–290 (2007).
Colomb, T. et al. Automatic procedure for aberration compensation in digital holographic microscopy and applications to specimen shape compensation. Appl. Opt. 45, 851–863 (2006).
Løkberg, O. J. Electronic speckle pattern interferometry. in Optical Metrology (ed. Soares, O. D. D.) 542–572 (Springer, 1987).
Rastogi, P. K. Digital Speckle Pattern Interferometry and Related Techniques (Wiley, 2001).
Hung, Y. Y. Shearography: a new optical method for strain measurement and nondestructive testing. Optical Eng. 21, 213391 (1982).
Hung, Y. Y. & Ho, H. P. Shearography: an optical measurement technique and applications. Mater. Sci. Eng.: R: Rep. 49, 61–87 (2005).
Gorthi, S. S. & Rastogi, P. Fringe projection techniques: whither we are? Opt. Lasers Eng. 48, 133–140 (2010).
Geng, J. Structuredlight 3D surface imaging: a tutorial. Adv. Opt. Photonics 3, 128–160 (2011).
Knauer, M. C., Kaminski, J. & Hausler, G. Phase measuring deflectometry: a new approach to measure specular freeform surfaces. In Proceedings of SPIE 5457, Optical Metrology in Production Engineering. 366–376 (IEEE, Strasbourg, 2004).
Huang, L. et al. Review of phase measuring deflectometry. Opt. Lasers Eng. 107, 247–257 (2018).
Zhang, Z. H. et al. Threedimensional shape measurements of specular objects using phasemeasuring deflectometry. Sensors 17, 2835 (2017).
Xu, Y. J., Gao, F. & Jiang, X. Q. A brief review of the technological advancements of phase measuring deflectometry. PhotoniX 1, 14 (2020).
Chu, T. C., Ranson, W. F. & Sutton, M. A. Applications of digitalimagecorrelation techniques to experimental mechanics. Exp. Mech. 25, 232–244 (1985).
Schreier, H, Orteu, J. J & Sutton, M. A. Image Correlation for Shape, Motion and Deformation Measurements: Basic Concepts. Theory and Applications (Springer, 2009).
Verhulp, E., van Rietbergen, B. & Huiskes, R. A threedimensional digital image correlation technique for strain measurements in microstructures. J. Biomech. 37, 1313–1320 (2004).
Sutton, M. A. et al. The effect of outofplane motion on 2D and 3D digital image correlation measurements. Opt. Lasers Eng. 46, 746–757 (2008).
Pan, B. Digital image correlation for surface deformation measurement: historical developments, recent advances and future goals. Meas. Sci. Technol. 29, 082001 (2018).
Marr, D. & Poggio, T. A computational theory of human stereo vision. Philos. Trans. R. Soc. B: Biol. Sci. 204, 301–328 (1979).
Luhmann, T. et al. CloseRange Photogrammetry and 3D Imaging, 2nd edn. (De Gruyter, 2014).
Fusiello, A., Trucco, E. & Verri, A. A compact algorithm for rectification of stereo pairs. Mach. Vis. Appl. 12, 16–22 (2000).
Pitas, I. Digital Image Processing Algorithms and Applications (Wiley, 2000).
Yu, Q. F. et al. Spin filtering with curve windows for interferometric fringe patterns. Appl. Opt. 41, 2650–2654 (2002).
Tang, C. et al. Secondorder oriented partialdifferential equations for denoising in electronicspecklepattern interferometry fringes. Opt. Lett. 33, 2179–2181 (2008).
Wang, H. X. et al. Fringe pattern denoising using coherenceenhancing diffusion. Opt. Lett. 34, 1141–1143 (2009).
Kaufmann, G. H. & Galizzi, G. E. Speckle noise reduction in television holography fringes using wavelet thresholding. Optical Eng. 35, 9–14 (1996).
Kemao, Q. Windowed Fourier transform for fringe pattern analysis. Appl. Opt. 43, 2695–2702 (2004).
Kemao, Q. Twodimensional windowed Fourier transform for fringe pattern analysis: principles, applications and implementations. Opt. Lasers Eng. 45, 304–317 (2007).
Bianco, V. et al. Quasi noisefree digital holography. Light.: Sci. Appl. 5, e16142 (2016).
Kulkarni, R. & Rastogi, P. Fringe denoising algorithms: a review. Opti. Lasers Eng. https://doi.org/10.1016/j.optlaseng.2020.106190 (2020).
Bianco, V. et al. Strategies for reducing speckle noise in digital holography. Light.: Sci. Appl. 7, 48 (2018).
Zhi, H. & Johansson, R. B. Adaptive filter for enhancement of fringe patterns. Opt. Lasers Eng. 15, 241–251 (1991).
Trusiak, M., Patorski, K. & Wielgus, M. Adaptive enhancement of optical fringe patterns by selective reconstruction using FABEMD algorithm and Hilbert spiral transform. Opt. Express 20, 23463–23479 (2012).
Wang, C. X., Qian, K. M. & Da, F. P. Automatic fringe enhancement with novel bidimensional sinusoidsassisted empirical mode decomposition. Opt. Express 25, 24299–24311 (2017).
Hsung, T. C., Lun, D. P. K. & Ng, W. W. L. Efficient fringe image enhancement based on dualtree complex wavelet transform. Appl. Opt. 50, 3973–3986 (2011).
Awatsuji, Y. et al. Singleshot phaseshifting color digital holography. In IEEE Lasers and ElectroOptics Society Annual Meeting Conference Proceedings. 84–85 (IEEE, Lake Buena Vista, FL, 2007).
Zhang, Z. H. Review of singleshot 3D shape measurement by phase calculationbased fringe projection techniques. Opt. Lasers Eng. 50, 1097–1106 (2012).
Phillips, Z. F., Chen, M. & Waller, L. Singleshot quantitative phase microscopy with colormultiplexed differential phase contrast (cDPC). PLoS ONE 12, e0171228 (2017).
Sun, J. S. et al. Singleshot quantitative phase microscopy based on colormultiplexed Fourier ptychography. Opt. Lett. 43, 3365–3368 (2018).
Fan, Y. et al. Singleshot isotropic quantitative phase microscopy based on colormultiplexed differential phase contrast. APL Photonics 4, 121301 (2019).
Zhang, Z. H., Towers, C. E. & Towers, D. P. Time efficient color fringe projection system for 3D shape and color using optimum 3frequency selection. Opt. Express 14, 6444–6455 (2006).
Zhang, Y. B. et al. Color calibration and fusion of lensfree and mobilephone microscopy images for highresolution and accurate color reproduction. Sci. Rep. 6, 27811 (2016).
Lee, W. et al. Singleexposure quantitative phase imaging in colorcoded LED microscopy. Opt. Express 25, 8398–8411 (2017).
Schemm, J. B. & Vest, C. M. Fringe pattern recognition and interpolation using nonlinear regression analysis. Appl. Opt. 22, 2850–2853 (1983).
Schreier, H. W., Braasch, J. R. & Sutton, M. A. Systematic errors in digital image correlation caused by intensity interpolation. Optical Eng. 39, 2915–2921 (2000).
Bing, P. et al. Performance of subpixel registration algorithms in digital image correlation. Meas. Sci. Technol. 17, 1615 (2006).
Pan, B. et al. Study on subset size selection in digital image correlation for speckle patterns. Opt. Express 16, 7037–7048 (2008).
Bruck, H. et al. Digital image correlation using NewtonRaphson method of partial differential correction. Exp. Mech. 29, 261–267 (1989).
Massig, J. H. & Heppner, J. Fringepattern analysis with high accuracy by use of the Fouriertransform method: theory and experimental tests. Appl. Opt. 40, 2081–2088 (2001).
Roddier, C. & Roddier, F. Interferogram analysis using Fourier transform techniques. Appl. Opt. 26, 1668–1673 (1987).
Takeda, M., Ina, H. & Kobayashi, S. Fouriertransform method of fringepattern analysis for computerbased topography and interferometry. J. Optical Soc. Am. 72, 156–160 (1982).
Su, X. Y. & Chen, W. J. Fourier transform profilometry:: a review. Opt. Lasers Eng. 35, 263–284 (2001).
Kemao, Q. Windowed Fringe Pattern Analysis (SPIE Press, 2013).
Zhong, J. G. & Weng, J. W. Spatial carrierfringe pattern analysis by means of wavelet transform: wavelet transform profilometry. Appl. Opt. 43, 4993–4998 (2004).
Larkin, K. G., Bone, D. J. & Oldfield, M. A. Natural demodulation of twodimensional fringe patterns. I. general background of the spiral phase quadrature transform. J. Optical Soc. Am. A 18, 1862–1870 (2001).
Trusiak, M., Wielgus, M. & Patorski, K. Advanced processing of optical fringe patterns by automated selective reconstruction and enhanced fast empirical mode decomposition. Opt. Lasers Eng. 52, 230–240 (2014).
Servin, M., Marroquin, J. L. & Cuevas, F. J. Demodulation of a single interferogram by use of a twodimensional regularized phasetracking technique. Appl. Opt. 36, 4540–4548 (1997).
Servin, M., Marroquin, J. L. & Quiroga, J. A. Regularized quadrature and phase tracking from a single closedfringe interferogram. J. Optical Soc. Am. A 21, 411–419 (2004).
Kemao, Q. & Soon, S. H. Sequential demodulation of a single fringe pattern guided by local frequencies. Opt. Lett. 32, 127–129 (2007).
Wang, H. X. & Kemao, Q. Frequency guided methods for demodulation of a single fringe pattern. Opt. Express 17, 15118–15127 (2009).
Servin, M., Quiroga, J. A. & Padilla, J. M. Fringe Pattern Analysis for Optical Metrology: Theory, Algorithms, and Applications (WileyVCH, 2014).
Massie, N. A., Nelson, R. D. & Holly, S. Highperformance realtime heterodyne interferometry. Appl. Opt. 18, 1797–1803 (1979).
Bruning, J. H. et al. Digital wavefront measuring interferometer for testing optical surfaces and lenses. Appl. Opt. 13, 2693–2703 (1974).
Srinivasan, V., Liu, H. C. & Halioua, M. Automated phasemeasuring profilometry of 3D diffuse objects. Appl. Opt. 23, 3105–3108 (1984).
Wizinowich, P. L. Phase shifting interferometry in the presence of vibration: a new algorithm and system. Appl. Opt. 29, 3271–3279 (1990).
Schreiber, H. & Bruning, J. H. Phase shifting interferometry. in Optical Shop Testing, 3rd edn. (ed. Malacara, D.) 547–666 (Wiley, 2007).
Goldstein, R. M., Zebker, H. A. & Werner, C. L. Satellite radar interferometry: twodimensional phase unwrapping. Radio Sci. 23, 713–720 (1988).
Su, X. Y. & Chen, W. J. Reliabilityguided phase unwrapping algorithm: a review. Opt. Lasers Eng. 42, 245–261 (2004).
Flynn, T. J. Twodimensional phase unwrapping with minimum weighted discontinuity. J. Optical Soc. Am. A 14, 2692–2701 (1997).
Ghiglia, D. C. & Romero, L. A. Minimum L^{p}norm twodimensional phase unwrapping. J. Optical Soc. Am. A 13, 1999–2013 (1996).
BioucasDias, J. M. & Valadao, G. Phase unwrapping via graph cuts. IEEE Trans. Image Process. 16, 698–709 (2007).
Zappa, E. & Busca, G. Comparison of eight unwrapping algorithms applied to Fouriertransform profilometry. Opt. Lasers Eng. 46, 106–116 (2008).
Zebker, H. A. & Lu, Y. P. Phase unwrapping algorithms for radar interferometry: residuecut, leastsquares, and synthesis algorithms. J. Optical Soc. Am. A 15, 586–598 (1998).
Zhao, M. et al. Qualityguided phase unwrapping technique: comparison of quality maps and guiding strategies. Appl. Opt. 50, 6214–6224 (2011).
Sansoni, G. et al. Threedimensional imaging based on Graycode light projection: characterization of the measuring algorithm and development of a measuring system for industrial applications. Appl. Opt. 36, 4463–4472 (1997).
Sansoni, G., Carocci, M. & Rodella, R. Threedimensional vision based on a combination of graycode and phaseshift light projection: analysis and compensation of the systematic errors. Appl. Opt. 38, 6565–6573 (1999).
Huntley, J. M. & Saldner, H. Temporal phaseunwrapping algorithm for automated interferogram analysis. Appl. Opt. 32, 3047–3052 (1993).
Zhao, H., Chen, W. Y. & Tan, Y. S. Phaseunwrapping algorithm for the measurement of threedimensional object shapes. Appl. Opt. 33, 4497–4500 (1994).
Saldner, H. O. & Huntley, J. M. Temporal phase unwrapping: application to surface profiling of discontinuous objects. Appl. Opt. 36, 2770–2775 (1997).
Cheng, Y. Y. & Wyant, J. C. Twowavelength phase shifting interferometry. Appl. Opt. 23, 4539–4543 (1984).
Creath, K., Cheng, Y. Y. & Wyant, J. C. Contouring aspheric surfaces using twowavelength phaseshifting interferometry. Opt. Acta.: Int. J. Opt. 32, 1455–1464 (1985).
Towers, C. E., Towers, D. P. & Jones, J. D. C. Optimum frequency selection in multifrequency interferometry. Opt. Lett. 28, 887–889 (2003).
Gushov, V. I. & Solodkin, Y. N. Automatic processing of fringe patterns in integer interferometers. Opt. Lasers Eng. 14, 311–324 (1991).
Takeda, M. et al. Frequencymultiplex Fouriertransform profilometry: a singleshot threedimensional shape measurement of objects with large height discontinuities and/or surface isolations. Appl. Opt. 36, 5347–5354 (1997).
Zhong, J. G. & Wang, M. Phase unwrapping by lookup table method: application to phase map with singular points. Optical Eng. 38, 2075–2080 (1999).
Burke, J. et al. Reverse engineering by fringe projection. In Proceedings of SPIE 4778, Interferometry XI: Applications. 312–324 (SPIE, Seattle, WA, 2002).
Zuo, C. et al. Temporal phase unwrapping algorithms for fringe projection profilometry: a comparative review. Opt. Lasers Eng. 85, 84–103 (2016).
Tao, T. Y. et al. Realtime 3D shape measurement with composite phaseshifting fringes and multiview system. Opt. Express 24, 20253–20269 (2016).
Liu, X. R. & Kofman, J. Background and amplitude encoded fringe patterns for 3D surfaceshape measurement. Opt. Lasers Eng. 94, 63–69 (2017).
Weise, T., Leibe, B. & Van Gool, L. Fast 3D scanning with automatic motion compensation. In Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. 1–8 (IEEE, Minneapolis, MN, 2007).
Zuo, C. et al. Micro Fourier transform profilometry (μFTP): 3D shape measurement at 10,000 frames per second. Opt. Lasers Eng. 102, 70–91 (2018).
An, Y. T., Hyun, J. S. & Zhang, S. Pixelwise absolute phase unwrapping using geometric constraints of structured light system. Opt. Express 24, 18445–18459 (2016).
Li, Z. W. et al. Multiview phase shifting: a fullresolution and highspeed 3D measurement framework for arbitrary shape dynamic objects. Opt. Lett. 38, 1389–1391 (2013).
BräuerBurchardt, C. et al. Highspeed threedimensional measurements with a fringe projectionbased optical sensor. Optical Eng. 53, 112213 (2014).
Garcia, R. R. & Zakhor, A. Consistent stereoassisted absolute phase unwrapping methods for structured light systems. IEEE J. Sel. Top. Signal Process. 6, 411–424 (2012).
Jiang, C. F., Li, B. W. & Zhang, S. Pixelbypixel absolute phase retrieval using three phaseshifted fringe patterns without markers. Opt. Lasers Eng. 91, 232–241 (2017).
Liu, X. R. & Kofman, J. Highfrequency background modulation fringe patterns based on a fringewavelength geometryconstraint model for 3D surfaceshape measurement. Opt. Express 25, 16618–16628 (2017).
Tao, T. Y. et al. Highprecision realtime 3D shape measurement using a bifrequency scheme and multiview system. Appl. Opt. 56, 3646–3653 (2017).
Tao, T. Y. et al. Highspeed realtime 3D shape measurement based on adaptive depth constraint. Opt. Express 26, 22440–22456 (2018).
Cai, Z. W. et al. Lightfieldbased absolute phase unwrapping. Opt. Lett. 43, 5717–5720 (2018).
Pan, B., Xie, H. M. & Wang, Z. Y. Equivalence of digital image correlation criteria for pattern matching. Appl. Opt. 49, 5501–5509 (2010).
Gruen, A. W. Adaptive least squares correlation: a powerful image matching technique. J. Photogramm. Remote Sens. Cartogr. 14, 175–187 (1985).
Altunbasak, Y., Mersereau, R. M. & Patti, A. J. A fast parametric motion estimation algorithm with illumination and lens distortion correction. IEEE Trans. Image Process. 12, 395–408 (2003).
Gutman, S. On optimal guidance for homing missiles. J. Guidance Control 2, 296–300 (1979).
Zabih, R. & Woodfill, J. Nonparametric local transforms for computing visual correspondence. In Proceedings of the 3rd European Conference on Computer Vision. 151–158 (Springer, Stockholm, 1994).
Bhat, D. N. & Nayar, S. K. Ordinal measures for image correspondence. IEEE Trans. Pattern Anal. Mach. Intell. 20, 415–423 (1998).
Sara, R. & Bajcsy, R. On occluding contour artifacts in stereo vision. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 852–857 (IEEE, San Juan, PR, 1997).
Sutton, M. A. et al. Effects of subpixel image restoration on digital correlation error estimates. Optical Eng. 27, 271070 (1988).
Zhang, D., Zhang, X. & Cheng, G. Compression strain measurement by digital speckle correlation. Exp. Mech. 39, 62–65 (1999).
Hung, P. C. & Voloshin, A. Inplane strain measurement by digital image correlation. J. Braz. Soc. Mech. Sci. Eng. 25, 215–221 (2003).
Davis, C. Q. & Freeman, D. M. Statistics of subpixel registration algorithms based on spatiotemporal gradients or block matching. Optical Eng. 37, 1290–1298 (1998).
Zhou, P. & Goodson, K. E. Subpixel displacement and deformation gradient measurement using digital image/speckle correlation. Optical Eng. 40, 1613–1620 (2001).
Press, W. H. et al. Numerical Recipes in Fortran 77: Volume 1, Volume 1 of Fortran Numerical Recipes: The Art of Scientific Computing, 2nd edn. (Cambridge University Press, 1992).
Chapra, S. C., Canale, R. P. Numerical Methods for Engineers (McGrawHill Higher Education, 2011).
Baker, S. & Matthews, I. Equivalence and efficiency of image alignment algorithms. In Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1 (IEEE, Kauai, HI, 2001).
Baker, S. & Matthews, I. LucasKanade 20 years on: a unifying framework. Int. J. Computer Vis. 56, 221–255 (2004).
Pan, B., Li, K. & Tong, W. Fast, robust and accurate digital image correlation calculation without redundant computations. Exp. Mech. 53, 1277–1289 (2013).
Pan, B. & Li, K. A fast digital image correlation method for deformation measurement. Opt. Lasers Eng. 49, 841–847 (2011).
Zhang, L. Q. et al. High accuracy digital image correlation powered by GPUbased parallel computing. Opt. Lasers Eng. 69, 7–12 (2015).
Konolige, K. Small vision systems: hardware and implementation. in Robotics Research: The Eighth International Symposium (eds Shirai, Y. & Hirose, S.) 203–212 (Springer, 1998).
Hirschmüller, H., Innocent, P. R. & Garibaldi, J. Realtime correlationbased stereo vision with reduced border errors. Int. J. Computer Vis. 47, 229–246 (2002).
Scharstein, D. & Szeliski, R. A taxonomy and evaluation of dense twoframe stereo correspondence algorithms. Int. J. Computer Vis. 47, 7–42 (2002).
Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30, 328–341 (2008).
Boykov, Y., Veksler, O. & Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001).
Hong, C. K., Ryu, H. S. & Lim, H. C. Leastsquares fitting of the phase map obtained in phaseshifting electronic speckle pattern interferometry. Opt. Lett. 20, 931–933 (1995).
Aebischer, H. A. & Waldner, S. A simple and effective method for filtering speckleinterferometric phase fringe patterns. Opt. Commun. 162, 205–210 (1999).
Yatabe, K. & Oikawa, Y. Convex optimizationbased windowed Fourier filtering with multiple windows for wrappedphase denoising. Appl. Opt. 55, 4632–4641 (2016).
Huang, H. Y. H. et al. Pathindependent phase unwrapping using phase gradient and totalvariation (TV) denoising. Opt. Express 20, 14075–14089 (2012).
Chen, R. P. et al. Interferometric phase denoising by pyramid nonlocal means filter. IEEE Geosci. Remote Sens. Lett. 10, 826–830 (2013).
Langehanenberg, P. et al. Autofocusing in digital holographic phase contrast microscopy on pure phase objects for live cell imaging. Appl. Opt. 47, D176–D182 (2008).
Gao, P. et al. Autofocusing of digital holographic microscopy based on offaxis illuminations. Opt. Lett. 37, 3630–3632 (2012).
Dubois, F. et al. Focus plane detection criteria in digital holography microscopy by amplitude analysis. Opt. Express 14, 5895–5908 (2006).
Pan, B. et al. Phase error analysis and compensation for nonsinusoidal waveforms in phaseshifting digital fringe projection profilometry. Opt. Lett. 34, 416–418 (2009).
Feng, S. J. et al. Robust dynamic 3D measurements with motioncompensated phaseshifting profilometry. Opt. Lasers Eng. 103, 127–138 (2018).
Ferraro, P. et al. Compensation of the inherent wave front curvature in digital holographic coherent microscopy for quantitative phasecontrast imaging. Appl. Opt. 42, 1938–1946 (2003).
Di, J. L. et al. Phase aberration compensation of digital holographic microscopy based on least squares surface fitting. Opt. Commun. 282, 3873–3877 (2009).
Miccio, L. et al. Direct full compensation of the aberrations in quantitative phase microscopy of thin objects by a single digital hologram. Appl. Phys. Lett. 90, 041104 (2007).
Colomb, T. et al. Total aberrations compensation in digital holographic microscopy with a reference conjugated hologram. Opt. Express 14, 4300–4306 (2006).
Zuo, C. et al. Phase aberration compensation in digital holographic microscopy based on principal component analysis. Opt. Lett. 38, 1724–1726 (2013).
Martínez, A. et al. Analysis of optical configurations for ESPI. Opt. Lasers Eng. 46, 48–54 (2008).
Wang, Y. J. & Zhang, S. Optimal fringe angle selection for digital fringe projection technique. Appl. Opt. 52, 7094–7098 (2013).
Michie, D., Spiegelhalter, D. J. & Taylor, C. C. Machine learning. Neural Stat. Classification. Neural Stat. Classif. 13, 1–298 (1994).
Zhang, X. D. Machine learning. in A Matrix Algebra Approach to Artificial Intelligence (ed. Zhang, X. D.) 223–440 (Springer, 2020).
Rosenblatt, F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).
Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning. 807–814 (ACM, Haifa, 2010).
Gardner, M. W. & Dorling, S. R. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ. 32, 2627–2636 (1998).
Sussillo, D. Random walks: training very deep nonlinear feedforward networks with smart initialization. Preprint at https://arxiv.org/abs/1412.6558v2 (2014).
Kraus, M., Feuerriegel, S. & Oztekin, A. Deep learning in business analytics and operations research: models, applications and managerial implications. Eur. J. Operational Res. 281, 628–641 (2020).
Zhang, Z. L. & Sabuncu, M. R. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 8792–8802 (ACM, Montréal, 2018).
Korhonen, J. & You, J. Y. Peak signaltonoise ratio revisited: is simple beautiful? In Proceedings of the 4th International Workshop on Quality of Multimedia Experience. 37–38 (IEEE, Melbourne, VIC, 2012).
Girshick, R. Fast RCNN. In Proceedings of 2015 IEEE International Conference on Computer Vision. 1440–1448 (IEEE, Santiago, 2015).
Wang, Z. et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Wang, Z. & Bovik, A. C. Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26, 98–117 (2009).
Wang, J. J. et al. Deep learning for smart manufacturing: methods and applications. J. Manuf. Syst. 48, 144–156 (2018).
Kingma, D. P. et al. Semisupervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 3581–3589 (ACM, Montreal, 2014).
Hinton, G. E. et al. The “wakesleep” algorithm for unsupervised neural networks. Science 268, 1158–1161 (1995).
Bengio, Y. et al. Deep generative stochastic networks trainable by backprop. In Proceedings of the 31th International Conference on Machine Learning. 226–234 (JMLR, Beijing, 2014).
McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophysics 5, 115–133 (1943).
Minsky, M. & Papert, S. A. Perceptrons: an Introduction to Computational Geometry (The MIT Press, 1969).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by backpropagating errors. Nature 323, 533–536 (1986).
Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962).
LeCun, Y. et al. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989).
Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain., Fuzziness Knowl.Based Syst. 6, 107–116 (1998).
Hinton, G. E., Osindero, S. & Teh, Y. W. A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006).
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Hinton, G. E. & Sejnowski, T. J. Learning and relearning in Boltzmann machines. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations (eds Rumelhart, D. E. & McClelland, J. L.) (MIT Press, 1986) 282–317.
Smolensky, P. Information processing in dynamical systems: foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations (eds Rumelhart, D. E. & McClelland, J. L.) (MIT Press, 1986) 194–281.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. 1097–1105 (ACM, Lake Tahoe, Nevada, 2012).
LeCun, Y. et al. Gradientbased learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Hinton, G. E. et al. Improving neural networks by preventing coadaptation of feature detectors. Preprint at https://arxiv.org/abs/1207.0580 (2012).
Windhorst, U. On the role of recurrent inhibitory feedback in motor control. Prog. Neurobiol. 49, 517–587 (1996).
Elman, J. L. Finding structure in time. Cogn. Sci. 14, 179–211 (1990).
Hochreiter, S. & Schmidhuber, J. Long shortterm memory. Neural Comput. 9, 1735–1780 (1997).
Zhou, J. et al. Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020).
Xu, K. et al. How powerful are graph neural networks? In Proceedings of the 7th International Conference on Learning Representations. (OpenReview, New Orleans, LA, 2018).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for largescale image recognition. In Proceedings of the 3rd International Conference on Learning Representations. (DBIP, San Diego, CA, 2014).
Szegedy, C. et al. Going deeper with convolutions. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 1–9 (IEEE, Boston, MA, 2015).
Girshick, R. et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. 580–587 (IEEE, Columbus, OH, 2014).
Goodfellow, I. J. et al. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 2672–2680 (ACM, Montreal, 2014).
He, K. M. et al. Deep residual learning for image recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (IEEE, Las Vegas, NV, 2016).
Chen, J. X. The evolution of computing: AlphaGo. Comput. Sci. Eng. 18, 4–7 (2016).
Ouyang, W. L. et al. DeepIDNet: object detection with deformable part based convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1320–1334 (2017).
Lin, L. et al. A deep structured model with radius–margin bound for 3D human activity recognition. Int. J. Computer Vis. 118, 256–273 (2016).
Doulamis, N. & Voulodimos, A. FASTMDL: fast adaptive supervised training of multilayered deep learning models for consistent object tracking and classification. In Proceedings of 2016 IEEE International Conference on Imaging Systems and Techniques (IST). 318–323 (IEEE, Chania, 2016).
Toshev, A. & Szegedy, C. DeepPose: human pose estimation via deep neural networks. In Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1653–1660 (IEEE, Columbus, OH, 2014).
Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of 2005 IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440 (IEEE, Boston, MA, 2015).
Chen, Q. F., Xu, J. & Koltun, V. Fast image processing with fullyconvolutional networks. In Proceedings of 2017 IEEE International Conference on Computer Vision. 2516–2525 (IEEE, Venice, 2017).
Dong, C. et al. Image superresolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2015).
Wang, Z. H., Chen, J. & Hoi, S. C. H. Deep learning for image superresolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3365–3387 (2021).
Dai, Y. P. et al. SRCNNbased enhanced imaging for low frequency radar. In 2018 Progress in Electromagnetics Research Symposium (PIERSToyama). 366–370 (IEEE, Toyama, 2018).
Li, Y. J. et al. Underwater image high definition display using the multilayer perceptron and color featurebased SRCNN. IEEE Access 7, 83721–83728 (2019).
Umehara, K., Ota, J. & Ishida, T. Application of superresolution convolutional neural network for enhancing image resolution in chest CT. J. Digital Imaging 31, 441–450 (2018).
Noh, H., Hong, S. & Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of 2015 IEEE International Conference on Computer Vision. 1520–1528 (IEEE, Santiago, 2015).
Ronneberger, O., Fischer, P. & Brox, T. UNet: convolutional networks for biomedical image segmentation. In Proceedings of 18th International Conference on Medical Image Computing and ComputerAssisted Intervention. 234–241 (Springer, Munich, 2015).
Falk, T. et al. UNet: deep learning for cell counting, detection, and morphometry. Nat. Methods 16, 67–70 (2019).
Badrinarayanan, V., Kendall, A. & Cipolla, R. SegNet: a deep convolutional encoderdecoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017).
Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision. 818–833 (Springer, Zurich, 2014).
Shi, W. Z. et al. Realtime single image and video superresolution using an efficient subpixel convolutional neural network. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 1874–1883 (IEEE, Las Vegas, NV, 2016).
Bell, J. B. Solutions of Illposed problems. by A. N. Tikhonov, V. Y. Arsenin. Math. Comput. 32, 1320–1322 (1978).
Figueiredo, M. A. T. & Nowak, R. D. A bound optimization approach to waveletbased image deconvolution. In IEEE International Conference on Image Processing 2005. II782 (IEEE, Genova, Italy, 2005).
Mairal, J. et al. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning. 689–696 (ACM, Montreal, Quebec, 2009).
Daubechies, I., Defrise, M. & De Mol, C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004).
Boyd, S. et al. Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers (Now Publishers Inc, 2011).
Candès, E. J., Romberg, J. K. & Tao, T. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59, 1207–1223 (2006).
Greivenkamp, J. E. Generalized data reduction for heterodyne interferometry. Optical Eng. 23, 234350 (1984).
Morgan, C. J. Leastsquares estimation in phasemeasurement interferometry. Opt. Lett. 7, 368–370 (1982).
Osten, W. Optical metrology: from the laboratory to the real world. in Computational Optical Sensing and Imaging (ed. George, B. et al.) 2013. JW2B4 (Optical Society of America, 2013).
Van der Jeught, S. & Dirckx, J. J. J. Realtime structured light profilometry: a review. Opt. Lasers Eng. 87, 18–31 (2016).
Jeon, W. et al. Speckle noise reduction for digital holographic images using multiscale convolutional neural networks. Opt. Lett. 43, 4240–4243 (2018).