Quantitative Phase and Intensity Microscopy Using Snapshot White Light Wavefront Sensing

Phase imaging techniques are an invaluable tool in microscopy for quickly examining thin transparent specimens. Existing methods are limited to either simple and inexpensive methods that produce only qualitative phase information (e.g. phase contrast microscopy, DIC), or significantly more elaborate and expensive quantitative methods. Here we demonstrate a low-cost, easy to implement microscopy setup for quantitative imaging of phase and bright field amplitude using collimated white light illumination.


S.1 Theory
In this section we show how to extend previous speckle-pattern tracking phase retrieval techniques, 1-9 using two separate but consistent models, from physical optics and ray optics respectively, to derive a concise image formation model for our coded wavefront sensor.
In this Supplementary, we preserve the 3 rd approximation, but re-formulate the 1 st approximation by considering the small diffraction of the field f 0 (r − λ zρ ρ ρ), and relax the 2 nd approximation to be a general field that f 0 (r) = A(r) exp[jφ (r)] where A(r) is the sample amplitude that we would also like to recover. We now discuss our reformulations on works of Wang et al. 7

Sample diffraction
Consider f z (r), the diffraction field of f 0 (r) at distance z. Since scalar field f 0 (r) = A(r) exp[jφ (r)] is the product of the amplitude and the phase exponential, by Eq. (S.3), we can decompose f z (r) in terms of its amplitude and phase contributions: where A z (ρ ρ ρ) is the Fourier transform of A z (r), A z (r) and g z (r) are the diffraction fields of A(r) and g(r) = exp[jφ (r)] respectively. We now discuss how to compute A z (r) (the amplitude diffraction field) and g z (r) (the phase diffraction field), respectively.
Amplitude diffraction field. Under Fresnel diffraction assumption, denote * as convolution, the diffraction field A z (r) is computed as (neglecting the obliquity factor): 11 which can be further simplified by preserving only the 0 th order of A(r − r ), i.e. A(r − r ) ≈ A(r) (see also Fig. S.1 for explanation), and neglecting the constant phase terms: where we utilize the Fourier transform pair formula (Eq. (S.56)) that can be found in the Appendix. Consequently we have: where we neglect all the constant factors, including 1/ √ π, and A (ρ ρ ρ) denotes the Fourier transform of A(r). Equation (S.7) is intuitive: for a zero phase amplitude field, it should preserve its amplitude and remain the null phase status in a sufficiently short propagation distance z.

S.2/S.18
Phase diffraction field. Similarly, by Fresnel diffraction formula, the diffraction field of g(r) is computed as: A Taylor expansion of φ (r − r ) around r up to second order suggests: With the approximation in Eq. (S.9), Eq. (S.8) is simplified as: , (S.10) where C(φ ) is a phase function of φ (r). Consequently we have: where we neglect the constant factor 1/ √ π. Equation (S.11) indicates that, though starting with a uniform amplitude, there is a small amplitude change for the diffraction field of the phase exponential term g(r) due to local wavefront curvature ∇ 2 φ (r).
Total diffraction field. Given results from Eq. (S.7) and Eq. (S.11), we can calculate f z (r) in Eq. (S.4) as: where we preserve only the 0 th order of the terms in ∇ 2 φ (r) and C(φ (r)), i.e. assuming zero frequency shifts for these two minor terms due to ρ ρ ρ = 0: Recall the 3 rd approximation mentioned previously in Wang et al. 7 that: Therefore, Eq. (S.12) can be further simplified to: Amplitude Shifted Nyquist ρ ρ ρ Difference over λ zρ ρ ρ Mean relative error From left to right: Original square-rooted bright field image, i.e. the amplitude A(r), normalized between 0 and 1; One of the many numerically shifted images are shown, which is at Nyquist frequency ρ ρ ρ (most shifts); Difference map over ρ; Mean relative error for each shift. In most areas, the difference between A(r) and A(r − λ zρ ρ ρ) is small, therefore the approximation that A(r) ≈ A(r − λ zρ ρ ρ) is valid within Nyquist frequency ρ ρ ρ.
Based on Eq. (S.15), we can compute the total diffraction field f z (r − λ zρ ρ ρ) as: where again we preserve only the 0 th order of ∇ 2 φ (r) and C(φ (r)).

Final result
The derived Eq. (S.17) leads to the following final formula for Eq. (S.3): where we neglect all the constant phase terms in Eq. (S.17). The intensity image received on the sensor will then be: where we approximate the denominator with first order Taylor expansion in the reason that λ z/(2π)|∇ 2 φ | 1, and a reference image I 0 (r) = |p z (r)| 2 is taken under collimated illumination that φ 0 (r) = constant. Equation (S. 19) can be re-arranged as: where the first approximation comes from the amplitude image approximation ( Fig. S.1), and the second approximation ignores the higher order terms, given that | λ z 2π ∇ 2 φ 0 | 1. For simple validation, for instance, at wavelength λ = 500 nm, an 100 mm radius paraxial spherical wavefront satisfies | λ z 2π ∇ 2 φ 0 | = 0.015 1 for sensor-mask distance z = 1.5 mm, indicating a practical engineering tolerance for un-collimated illumination beams. As a result, we can safely conclude that in Eq. (S.20), for practical setups, the reference illumination does not affect subsequent measurements as long as both the reference image and measurement image are taken under the same illumination configuration.
Compared to the original formula in Wang et al. 7 , the obtained more general Eq. (S.20) contains an additional multiplication of the sample intensity |A(r)| 2 , and a higher order intensity change due to the wavefront curvature ∇ 2 φ . Equation (S. 20) indicates that, when neglecting the intensity change by the wavefront curvature, the specimen intensity |A(r)| 2 and phase φ (r) are separately encoded in the intensity term and in the coordinate term. Simultaneous intensity and phase reconstruction is possible via a numerical optimization algorithm that will be discussed in Section S.2.

Ray optics interpretation
Damberg and Heidrich 12 proposed a warping based phase optimization image formation model for computational lens design, in which the core idea is that the energy conservation law is valid in each local differentiable area, and consequently a ray optics based formula can be derived for caustic image formation from a freeform lens. Here, we show how their idea can be elegantly revived as a ray optics perspective to our work. Their (approximated) image formation model, modified by taking the modulation mask diffraction into consideration, at wavelength λ , in our notation, at the calibration phase, is written as (refer Eq. (S.61) in Appendix for a short re-derivation): where J(r) is the diffraction pattern (of the modulation mask) at the sensor plane, containing contribution from background illumination. That said, J(r) has taken the consideration of mask diffraction. And φ 0 (r) is the initial wavefront from the S.5/S. 18 background illumination. At the measurement phase, when there is unknown intensity |A(r)| 2 and phase φ (r), the measurement image becomes: With the introduction of auxiliary variable r = r + λ z/(2π)∇φ 0 and Eq. (S.22), Eq. (S.23) can be re-phased as: where c = λ z 2π |∇ 2 φ | 1 and is valid for the approximations. When under collimated illumination, φ 0 = constant, and we have r = r and I 0 (r) = J(r). Therefore:

Spatial wavefront resolution analysis
Given the main result Eq. (S.20), we would like to know the maximum spatial wavefront resolution that it can resolve. As mentioned before, our sensor is a slopes-tracking sensor. When speckles are overlapped with each other, it is ambiguous for the solver to perform wavefront retrieval. To avoid this scenario, the wrapped coordinate in the measurement image I should be monotonically increasing or decreasing. Mathematically: This conclusion is consistent with the small wavefront curvature assumption that λ z/(2π) · |∇ 2 φ | 1 when deriving Eq. (S.20) from Eq. (S. 19). Similar analysis has been done in terms of Fourier harmonic functions in Berto et al. 8 , yet without specifying the small wavefront curvature assumption, and hence their bound ( ∇ 2 OPD ≤ 1/z) is not as tight as ours ( ∇ 2 OPD 1/z). Please refer to the main text for our measurement of the actual bound, and the resultant wavefront resolution reduction.

Connection to the Transport-of-Intensity Equation (TIE)
The well-known Transport-of-Intensity Equation (TIE), a.k.a. the irradiance transport equation, has been one of the norms for determined phase retrieval since its discovery back in early 1980s. 10,[13][14][15][16][17][18][19][20][21] Here, we show the famous TIE can be derived from the principle revealed here, i.e. from Eq. (S.20). If doing Taylor expansion for I(r + λ z 2π ∇φ ) around r, by preserving only the first order, and re-arranging the terms, we have: In traditional TIE setups, there is no masks, and hence I 0 (r) = 1. To see it more clearly, let the image captured at the original mask plane be I 1 (r) = |A(r)| 2 , and the second image captured at the original sensor plane be I 2 (r) = I(r), then Eq. (S.27) can be reformulated as: (S.28) Let I 1 (r) ≈ I 2 (r) ≈Ī(r) we arrive at the standard form of TIE, when z → 0, justifying the finite difference approximation. The derivation from Eq. (S.20) to TIE (i.e. Eq. (S.28)) shows that TIE is a special case of our model, however, our model is more powerful than TIE in a number of ways: • A concise theoretical model for two distanced planes, which can be separated far away (e.g. z = 1 mm), whereas TIE is only valid at one particular transversal plane, and suffers from finite approximation for the z-axis derivative. As such, the distance control has to be careful and precise, for example at µm scale. This allows our sensor to outperform in terms of sensitivity and setup easiness.
• Our model contains an image warping operation, which preserves nonlinearity and can be linearized (if desired) to match the linear formulations in TIE.
• Our model extends the degrees of freedom for classical TIE systems to allow for a customized-made modulation mask (reflected as a customizable reference image I 0 (r), for which in TIE is usually uniform), thus making it possible for single-shot measurements.
• Based on our model, a much noise-robust (that ignores the wavefront curvature at first) while efficient algorithm can be derived with modern optimization frameworks to estimate the unknowns. Please see Section S.2 for more numerical and algorithmic details.
• Our model allows for broadband illumination because it is formulated in terms of optical path differences, while TIE in principle requires coherent light.

Connection to previous works and their models
In the absence of from light ray attenuation, previous works in speckle-pattern tracking can be grouped into different forms of Eq. (S. 20), as been summarized in Table S.1. We now describe each model followed by a short discussion. Amplitude-contained flow-tracking I r + λ z 2π ∇φ = |A(r)| 2 I 0 (r) [4,27] or I(r) = |A(r)| 2 I 0 r − λ z 2π ∇φ Eq. (S.20) I r + λ z 2π ∇φ = |A(r)| 2 1 − λ z 2π ∇ 2 φ I 0 (r) [12] Flow-tracking model If we drop out the amplitude term and the wavefront curvature term, from Eq. (S.20) one has: which we recognize as the famous optical flow formulation in computer vision, 28,29 or the previous tracking-based specklepattern phase reconstruction techniques. 1,2,7,8,22,23 Now we show lateral shearing interferometry wavefront sensing 24-26 as one special variant to this linear displacement model as following. For lateral shearing interferometers the mask is usually a specially designed grating, for simplicity here we only discuss the orthogonal sinusoidal grating in Bon et al. 25 , other gratings can be handled in a similar way. Say with a spatial frequency ω/2 the mask transmission function is p(r) = cos(ωx/2) cos(ωy/2). The intensity image under collimated illumination would be: , the wavefront slopes encoded terms (S.31)

S.7/S.18
By interferogram analysis 30,31 , the wavefront slopes ∇φ are encoded in I slopes (r), and notice the assumption that above terms are separable (non-overlapping) in Fourier domain. If only I slopes (r) is considered, in Fourier domain, we have: where · denotes complex conjugate, and D x (ρ ρ ρ) and D y (ρ ρ ρ) are the Fourier transforms of d x (r) and d y (r) respectively, where d x (r) and d y (r) are defined as: This permits one to selectively filter only the spectrum components D x (ρ ρ ρ x − ω 2π , ρ ρ ρ y ) and D y (ρ ρ ρ x , ρ ρ ρ y − ω 2π ) on the carrier frequency ω, shift it down to the origin to remove its carrier frequency, and obtain D x (ρ ρ ρ) and D y (ρ ρ ρ), which are the spectrums of d x (r) and d y (r), i.e. the linearly scaled wavefront slopes. A Poisson integration is then imposed on the previously obtained wavefront slopes to obtain the desired phase φ (r). As a result, with a specifically designed mask, lateral shearing interferometers are able to resolve phase information in Fourier domain directly, thus a reference image is not necessary to be taken prior to phase measurements.

Amplitude-contained flow-tracking model
If drop out only the wavefront curvature term, from Eq. (S.20) we have: which we recognize as the models typically employed in X-ray phase imaging applications. 4 Another commonly seen formulation is: , the wavefront slopes encoded terms + · · · , (S.36) where we neglect higher tangled terms including the multiplication parts between x and y. In Fourier domain: where A (ρ ρ ρ) denotes the Fourier transform of |A(r)| 2 , and * denotes convolution. Now the central low frequency spectrum encodes the sample amplitude spectrum A (ρ ρ ρ), and can be extracted first to detangle for obtaining the wavefront slopes spectrum F {I slopes (r)} from F {I(r)}. Note the numerical reconstruction is in Fourier domain, and can be potentially improved by the dual approach in spatial domain. 32

Comparison with Zanette et al.'s model
Finally note the model in Zanette et al. 5 , which modifies Eq. (S.35) by incorporating a dark field D(r) that represents contribution from higher order phase e.g. from ∇ 2 φ : whereĪ 0 is the mean value of I 0 (r), and ∆I 0 (r) = I 0 (r) −Ī 0 . With Eq. (S.20) we know part of their D(r) origins from the caustic term 1 − λ z∇ 2 φ /(2π) that contains wavefront curvature ∇ 2 φ , representing sample diffraction under a distorted wavefront, which is part of refraction. Therefore, the dark field D(r) in Eq. (S.38) can be further refined to contain only scattering terms. Eq. (S.20) provides new insights for incorporating dark field term into the image formation model.

Decoupling amplitude, phase, and caustic effects
Each contribution term in the main result Eq. (S.20) can be interpreted as following: Notice above formula enables the separation between sample amplitude and caustic effects caused by defocusing. To recall, our task is, knowing a nominal wavelength λ and a pre-calibrated z, given the image pair I 0 (r) and I(r), solve for sample amplitude A(r) due to absorption, and solve for phase φ (r) due to sample shape refraction, at the same time. To do this, we devise an alternating optimization scheme, i.e. updating amplitude and phase in alternation. For the amplitude update, since the contribution from defocusing caused by local wavefront curvature ∇ 2 φ is usually small, we absorb the caustic term into amplitude as unknown. Consequently, the amplitude estimation reduces to be a variant of the classical Rudin-Osher-Fatemi (ROF) denoising problem 33 , which can be efficiently solved by modern splitting algorithms. Given the previous amplitude estimation, the phase update simplifies to an optical flow problem, which is solved by a variant of solver in Wang et al. 7 with new priors for microscopy images. Until convergence, deducing the caustic term from the estimated intensity term. To summarize, the final problem is, solve a quasi amplitude termÃ(r) and phase φ (r) from reference I 0 (r) and measurement I(r) based on: After obtainingÃ(r) and φ (r), the pure amplitude term is calculated as: Regularization and proper optimization techniques improve solution plausibility, and will be discussed in Section S.2 where optimization frameworks and efficient numerical algorithms are proposed to solve above equations in a numerical manner.

S.2 Optimization
We now discuss how to discretize and solve Eq. (S.40) in a numerical manner, which involves solving an optimization problem in terms of linear algebra. In the following vectors and matrices are denoted as bold small and capital letters, respectively. Absorbing λ z/(2π) into phase and discretize Eq. (S.40) yields following joint optimization problem: where denotes Hadamard product, K = ∇ ∇ 2 is a concatenated matrix of gradient and Laplacian operators, a and φ φ φ are the intensity (including the caustic effect) and phase information that we would like to recover. Equation (S.42) is a non-convex problem and is highly ill-posed. To see this, let total pixel numbers be N, note we have 2N unknowns (intensity and phase) to estimate whereas only given N equations from Eq. (S.20). The two regularizers, phase prior Γ phase (φ φ φ ) and intensity prior Γ intensity (a) are hence introduced here to help reducing the ill-posed, but the estimation will still be hard.
To tackle this problem, we devise an alternating algorithm to alternatively solve for intensity and phase, i.e. solve for one when given the other is fixed. Algorithm S.1 shows this procedure. In practice we found just a few alternating iterations (< 5) are sufficient for a satisfactory convergence. We now discuss each updating step in more details.

Intensity update
We recognize the intensity update step as a variant of the classical ROF denoising problem: 33 minimize a a i i i 0 (r) − i i i(r + ∇φ φ φ K ) Phase update: solve φ φ φ given a 5: end while Equation (S.43) is convex but non-differentiable. By introducing a slack variable b = ∇a that represents image gradient, via de-coupling diagonalization (though not strictly equivalent, in practice we found in this formation it is easier to formulate the solver and to converge), denoting i i i K warp = i i i(r + ∇φ φ φ K )/i i i 0 (r), the original objective function in Eq. (S.43) can be split and approximated as: , subject to b = ∇a, a-update 4: b k+1 ← prox g/µ (∇a k+1 + η η η k ); b-update 5: η η η k+1 ← η η η k + ∇a k+1 − b k+1 ; η η η-update 6: end while a-update. This step implements a straightforward Poisson solver, and we solve it in the spectral domain assuming symmetric boundary conditions. Exploiting the Discrete Cosine Transforms (DCT), denoted as F DCT , then: Exploiting the Fast Fourier Transforms (FFT) algorithms for all the DCT operations, the a-update can be efficiently done in parallel. b-update. This step is an element-wise estimation and the solution is readily obtained by the so-called shrinkage operator that is embarrassingly parallel, where sign(·) denotes the element-wise signum function: After obtaining a K , a median filter (window of 3 × 3) is imposed to further suppress speckle noise. In practice we found the median filtering is significant for better performance.

S.10/S.18
Phase update For phase update, people usually do the digital image correlation method 36 , which however in computer vision is known as a variant of the famous Lucas-Kanade method 29 for optical flow computation. Here we modify and improve our previously proposed wavefront solver to fit the assumption for normal microscopy samples. Consequently the wavefront solver presented here is a variant of Wang et al. 7 Recall the following optimization problem to update phase: where α > 0 and β > 0 are weighting parameters. Note the data fitting term ( 2 -norm) is non-convex and one of the data prior term ( 1 -norm) is convex but non-smooth and non-differentiable. Since the phase shifts are usually small, we linearize i i i(r + ∇φ φ φ ) around r. It yields: To handle the boundary condition (which may introduce reconstruction artifacts in conventional phase-from-slope techniques), we add a selection matrix M to include the unknown boundary values as additional variables to be optimized. 37 In linear algebra, denote g t = i i i(r) − a K+1 i i i 0 (r), Eq. (S.48) reads as: Equation (S.49) is convex but non-differentiable. By introducing a slack variable w = ∇φ φ φ that represents phase gradient, the original objective function in Eq. (S.49) can be split as: , subject to w = ∇φ φ φ .

S.3 Additional Results
In this section additional results are presented.

Solver performance
In Fig. S.2 we show solver convergence in terms of alternating iterations. The two inner loops (amplitude-update) and (phase-update) are both 10 iterations. Our solver converges fast in a few iterations. In practice we chose 3 alternating iterations.

Comparison with classical solver
Here we do a numerical comparison between classical algorithm and our proposed one, using synthetic data.

Classical solver implementation
Recall that classical speckle-pattern tracking algorithms 1,2,6,8,22,23 are able to reconstruct simultaneously (in our notation): a bright field image |A(r)| 2 , a phase image φ (r), and a dark-field image D(r) from a single raw measurement speckle-pattern image I(r) and a pre-calibrated reference image I 0 (r). For a fair comparison, we implement the baseline classical speckle-pattern tracking algorithm as following: Bright field image. The bright field image is computed as: 6 where µ w (·) denotes mean operation inside window w. The window size of w is chosen to be 3 × 3 to preserve maximum details while suppressing noise at the same time. In practice we found a median filtering on |A(r)| 2 decrease the phase image accuracy, so there is no post-processing after obtaining the bright field image.
Phase image. To reconstruct the phase shifts, classical algorithm tracks the image pair I(r) and |A(r)| 2 I 0 (r) using digital image correlation (DIC) method 36 to get the wavefront slopes w(r), and then integrate the slopes to get the final wavefront S.12/S.18 φ (r). Here we employ the imregdemons MATLAB function as suggested by Berto et al. 8 to compute the slopes w(r), and then apply a Poisson solver (with pre-zero-paddings and symmetric boundary condition assumption) to obtain the final phase image. In math: Note this approach could suppress the phase reconstruction: instead of solving for φ ∈ R N directly, it solves for an intermediate wavefront slopes ∇φ ∈ R 2N , and then integrate to space R N , during which possible accuracy loss could happen.

Comparison using synthetic data
Multiple 512 × 512 size diffraction patterns are simulated by the angular spectrum method with filtering. 11,40 The sensor and mask pixel sizes are set as 6.45 µm, and the distance between mask and sensor is 1.5 mm. Wavelength is set as 550 nm.
Reconstruction comparison between classical algorithm and ours are shown in Fig. S.3 (amplitude) and Fig. S.4 (phase), with the absolute error shown and mean value calculated as well. For amplitude reconstruction, both algorithms have the difficulty reconstructing the sharp edges, but for smooth amplitude changes the reconstructions are successful, and the differences between the two algorithms are subtle. However, for phase reconstruction, specifically for the spherical wavefront and the lenslets wavefront (large distorted wavefronts at the present of local curvature), reconstruction error of our algorithm is smaller than the classical one, which underestimates the wavefront distortion. This is probably due to the fact that, from the N given equations (i.e. the measurement image), classical algorithm suffers an accuracy loss when first solving the intermediate wavefront slopes (of size 2N) and then integrate to get the final phase (of size N), while our algorithm solves for the phase (of size N) directly from the given N equations. But the difference between the two is not significant.

Model evaluation
It is also possible to compute a pseudo dark field image from the image pair I 0 (r) and I(r) as following.
Dark field image. The dark field image is related to the decrease of speckle-pattern contrast. It can be computed as: 6 D(r) ≈ σ w I(r + λ z 2π ∇φ ) σ w |A(r)| 2 I 0 (r) , where σ w (·) denotes standard deviation operation inside window w. The window size of w is chosen to be 3 × 3. Fig. S.5 shows one example of the construction of amplitude A(r), phase φ (r), and dark field scattering image D(r) from image pair I 0 (r) and I(r), as well as the caustic term computed from Eq. (S.20). As a result, the scattering image is very poor, while the local wavefront curvature is small that is negligible by the tracking model. To the contrary, for visible light microscopy applications, in TIE imaging the wavefront Laplacian term is significant because the usual distance between defocusing intensity measurements (z ≈ 20 µm) are much smaller than that in speckle-pattern tracking techniques (z ≈ 1.5 mm).

Additional experimental reconstructions
In Fig. S.6 we show further reconstruction results on laboratory-acquired raw data.

Appendix Fourier transform of a complex Gaussian function
Given value a ∈ R, the Fourier transform of f (r) = exp ja r 2 2 is, in real dual domain ρ ρ ρ: A full reconstruction example from experimental data, where amplitude, phase, dark field, and caustic term are recovered from the raw speckle image data. The dark field image is not obvious, and the caustic effect is small enough to be neglected as been done in most other speckle-pattern tracking techniques. Scale bars are 10 µm.

Computational caustic image formulation
To be consistent with the paper, here we replicate the derivation from Damberg and Heidrich 12 for referencing Eq. (S.22). Let r and r be the coordinates at the mask plane and sensor plane respectively, for one single ray, by geometry at wavelength λ : where we employ the small angle approximation that sin θ ≈ θ when λ 2π ∇φ (r) = θ 1. Since for each local differential area the irradiance energy conserves, therefore there is a simple relationship between intensity I(r ) at sensor plane and intensity J(r) at mask plane: where the approximation is valid because λ z 2π ∇ 2 φ (r) 1. By Eq. (S.57), finally we arrive at: