Self-supervised learning of hologram reconstruction using physics consistency

Existing applications of deep learning in computational imaging and microscopy mostly depend on supervised learning, requiring large-scale, diverse and labelled training data. The acquisition and preparation of such training image datasets is often laborious and costly, leading to limited generalization to new sample types. Here we report a self-supervised learning model, termed GedankenNet, that eliminates the need for labelled or experimental training data, and demonstrate its effectiveness and superior generalization on hologram reconstruction tasks. Without prior knowledge about the sample types, the self-supervised learning model was trained using a physics-consistency loss and artificial random images synthetically generated without any experiments or resemblance to real-world samples. After its self-supervised training, GedankenNet successfully generalized to experimental holograms of unseen biological samples, reconstructing the phase and amplitude images of different types of object using experimentally acquired holograms. Without access to experimental data, knowledge of real samples or their spatial features, GedankenNet achieved complex-valued image reconstructions consistent with the wave equation in free space. The GedankenNet framework also shows resilience to random, unknown perturbations in the physical forward model, including changes in the hologram distances, pixel size and illumination wavelength. This self-supervised learning of image reconstruction creates new opportunities for solving inverse problems in holography, microscopy and computational imaging. Microscopic imaging and holography aim to decrease reliance on labelled experimental training data, which can introduce biases, be time-consuming and costly to prepare, and may lack real-world diversity. Huang et al. develop a physics-driven self-supervised model that eliminates the need for labelled or experimental training data, demonstrating superior generalization on the reconstruction of experimental holograms of various samples.

In these existing approaches, supervised learning models were utilized, demanding largescale, high-quality and diverse training datasets (from various sources and types of objects) with annotations and/or ground truth experimental images.For microscopic imaging and holography, in general, such labeled training data can be acquired through classical algorithms that are treated as the ground truth image reconstruction method 38,39,43,47,48,51 , or through registered image pairs (input vs. ground truth) acquired by different imaging modalities 8,17,18,25 .These supervised learning methods require significant labor, time, and cost to acquire, align and pre-process the training images, and potentially introduce inference bias, resulting in limited generalization to new types of objects never seen during the training.
Generally speaking, existing supervised learning models demonstrated on microscopic imaging and holography tasks are highly dependent on the training image datasets acquired through experiments, which show variations due to the optical hardware, types of specimens and imaging (sample preparation) protocols.Though there were efforts utilizing unsupervised learning [60][61][62][63][64][65][66] and self-supervised learning 16,[67][68][69] to alleviate the reliance on large-scale experimental training data, the need for experimental measurements or sample labels with the same or similar features as the testing samples of interest is not entirely eliminated.Using labeled simulated data for network training is another possible solution; however, generating simulated data distributions to accurately represent the experimental sample distributions can be complicated and requires prior knowledge of the sample features and/or some initial measurements with the imaging set-up of interest 6,10,[70][71][72][73] .For example, supervised learningbased deep neural networks for hologram reconstruction tasks demonstrated decent internal generalization to new samples of the same type as in the training dataset, while their external generalization to different sample types or imaging hardware was limited 38,46,51 .
A common practice to enhance the imaging performance of a supervised model is to apply transfer learning 51,68,[74][75][76] , which trains the learned model on a subset of the new test data.
However, the features learned through supervised transfer learning using a limited training data distribution, e.g., specific types of samples, do not necessarily advance external generalization to other types of samples, considering that the sample features and imaging set-up may differ significantly in the blind testing phase.Furthermore, transfer learning requires additional labor and time to collect fresh data from the new testing data distribution and fine-tune the pre-trained model, which might bring practical challenges in different applications.
In addition to these, deep learning-based solutions for inverse problems in computational imaging generally lack the incorporation of explicit physical models in the training phase; this, in turn, limits the compatibility of the network's inference with the physical laws that govern the light-matter interactions and wave propagation.Recent studies demonstrated physics-informed neural networks (PINNs) 69,[77][78][79][80][81][82] , where a physical loss was formulated to train the network in an unsupervised manner to solve partial differential equations.However, PINN-based methods that can match (or come close to) the performance of supervised learning methods have not been reported yet for solving inverse problems in computational imaging with successful generalization to new types of samples.
Here, we demonstrate the first self-supervised learning (SSL)-based deep neural network for hologram reconstruction, which is trained without any experimental data or prior knowledge of the types or spatial features of the samples.We term it GedankenNet as the self-supervised training of our network model is based on randomly-generated artificial images with no connection or resemblance to real samples at the micro-or macro-scale, and therefore the spatial frequencies and the features of these images do not represent any real-world samples and are not related to any experimental set-up.As illustrated in Fig. 1(a), the self-supervised learning scheme of GedankenNet adapts a physics-consistency loss between the input synthetic holograms of random, artificial objects, and the numerically predicted holograms calculated using the GedankenNet output complex fields, without any reference to or use of the ground truth object fields during the learning process.After its training, the selfsupervised GedankenNet directly generalizes to experimental holograms of various types of samples even though it never saw any experimental data or used any information regarding the real samples.When blindly tested on experimental holograms of human tissue sections (lung, prostate and salivary gland tissue) and Pap smears, GedankenNet achieved better image reconstruction accuracy compared to supervised learning models using the same training datasets.We further demonstrated that GedankenNet can be widely applied to other training datasets, including simulations and experimental datasets, and achieves superior generalization to unseen data distributions over supervised learning-based models.Since GedankenNet's self-supervised learning is based on a physics-consistency loss, its inference and the resulting output complex fields are compatible with the Maxwell's equations and accurately reflect the physical wave propagation phenomenon in free-space.
By testing GedankenNet with experimental input holograms captured at shifted (unknown) axial positions, we showed that GedankenNet does not hallucinate and the object field at the sample plane can be accurately retrieved through wave propagation of the GedankenNet output field, without the need for retraining or fine-tuning its parameters.These results indicate that in addition to generalizing to experimental holograms of unseen sample types without seeing any experimental data or real object features, GedankenNet also implicitly acquired the physical information of wave propagation in free-space and gained robustness towards defocused holograms or changes in the pixel size through the same self-supervised learning process.Furthermore, for phase-only objects (such as thin label-free samples), GedankenNet framework also exhibits resilience to random unknown perturbations in the imaging system, including arbitrary shifts of the sample-to-sensor distances and unknown changes in the illumination wavelength, all of which make its generalization even broader without the need for any experimental data or ground truth labels.
The success of GedankenNet eliminates three major challenges in existing deep learningbased holographic imaging approaches: (1) the need for large-scale, diverse and labeled training data, (2) the limited generalization to unseen sample types or shifted input data distributions, and (3) the lack of an interpretable connection and compatibility between the physical laws/models and the trained deep neural network.This work introduces a promising and powerful alternative to a wide variety of supervised learning-based methods that are currently applied in various microscopy, holography and computational imaging tasks.

Results
The hologram reconstruction task, in general, can be formulated as an inverse problem 42 : where  ∈ ℝ  2 represents the vectorized  measured holograms, each of which is of dimension  ×  and  ∈ ℂ  2 is the vectorized object complex field.(⋅) is the forward imaging model, (⋅) is the loss function and (⋅) is the regularization term.Under spatially and temporally coherent illumination of a thin sample, (⋅) can be simplified as: where  ∈ ℂ  2 × 2 is the free-space transformation matrix 44,83 ,  ∈ ℝ  2 represents random detection noise and (⋅) refers to the (opto-electronic) sensor-array sampling function, which records the intensity of the optical field.
Different schemes for solving holographic imaging inverse problems are summarized in Fig. 1.Existing methods for generalizable hologram reconstruction can be mainly classified into two categories, as shown in Fig. 1 neural networks were also used to provide iterative approximations to the object field from a batch of hologram(s); however, these network models were iteratively optimized for each hologram batch separately, and cannot generalize to reconstruct holograms of other objects once they are optimized 69,78,80 (see Supplementary Note 2 and Extended Data Fig. 3).
Different from existing learning-based approaches, instead of directly comparing the output complex fields ( �) and the ground truth object complex fields (), GedankenNet infers the predicted holograms ̂ from its output complex fields  � using a deterministic physical forward model, and directly compares ̂ with .Without the need to know the ground truth object fields , this forward model -network cycle establishes a physics-consistency loss ( ℎ− ) for gradient back-propagation and network parameter updates, which is defined as: where   and   are the Fourier domain mean absolute error (FDMAE) and the mean square error (MSE), respectively, calculated between the input holograms  and the predicted holograms .,  refer to the corresponding weights of each term (see the Methods section for the training and implementation details).The network architecture of GedankenNet is also detailed in the Methods section and Extended Data Fig. 1.
As emphasized in Fig. 1, GedankenNet eliminates the need for experimental, labeled training data and thus presents unique advantages over existing methods.The training dataset of GedankenNet only consists of artificial holograms generated from random images (with no connection or resemblance to real-world samples), which serve as the amplitude and phase channels of the object field (see the Methods section and Fig. 1(b)).After its self-supervised training using artificial images without any experimental data or real-world specimens, GedankenNet can be directly used to reconstruct experimental holograms of various microscopic specimens, including e.g., densely connected tissue samples and Pap smears.This is vastly in contrast to existing supervised learning methods that exhibit limited external generalization to unseen data distributions and new sample types.Furthermore, compared with classical iterative phase retrieval algorithms, GedankenNet (after its one-time training is complete) provides significantly faster reconstructions in a single forward inference without the need for numerical iterations, transfer learning or fine-tuning of its parameters on new testing samples.
To demonstrate these unique features of GedankenNet, we trained a series of self-supervised network models that take multiple input holograms ( ranging from 2 to 7), following the training process introduced in Fig. 1.Each GedankenNet model for a different M value was trained using artificial holograms generated from random synthetic images based on  different planes with designated sample-to-sensor distances   ,  = 1,2, ⋯ , .In the blind testing phase illustrated in Fig. 2(a),  experimental holograms of human lung tissue sections were captured by a lensfree in-line holographic microscope (see Extended Data Fig. 1(b) and the Methods section for experimental details).We tested all the self-supervised GedankenNet models on 94 non-overlapping fields-of-view (FOVs) of tissue sections and quantified the image reconstruction quality in terms of the amplitude and phase structural similarity index measure (SSIM) values with respect to the ground truth object fields (see Fig. 2(b)).The ground truth fields were retrieved by the multi-height phase retrieval (MHPR [84][85][86] ) algorithm using  = 8 raw holograms of each FOV.Our results indicate that all the GedankenNet models were able to reconstruct the sample fields with high fidelity even though they were trained using random, artificial images without any experimental data (Fig. 2(c)).
Additionally, Fig. 2 demonstrates that the reconstruction quality of GedankenNet models increased with increasing number of input holograms , which inherently points to a general trade-off between the image reconstruction quality and system throughput; depending on the level of reconstruction quality desired and the imaging application needs,  can be accordingly selected/optimized.In addition to the number of input holograms, we investigated the relationship between the sample-to-sensor distances and the reconstruction quality of GedankenNet; see the Extended Data Figure 2 and Supplementary Note 1. Due to the reduced signal-to-noise ratio (SNR) of the experimental in-line holograms acquired at large sample-to-sensor (axial) distances, GedankenNet models trained with larger sample-tosensor distances exhibit a relatively reduced reconstruction quality compared with the GedankenNet models trained with smaller axial distances.
We also compared the generalization performance of self-supervised GedankenNet models against other supervised learning models and iterative phase recovery algorithms using experimental holograms of various types of human tissue sections and Pap smears; see Fig. 3.
Though only seeing artificial holograms of random images in the training phase, GedankenNet ( = 2) was able to directly generalize to experimental holograms of Pap smears and human lung, salivary gland and prostate tissue sections.For comparison, we trained two supervised learning models using the same artificial image dataset, including the Fourier Imager Network (FIN) 48 and a modified U-Net 87 architecture (see the Methods section).These supervised models were tested on the same experimental holograms to analyze their external generalization performance.Compared to these supervised learning methods, GedankenNet exhibited superior external generalization on all four types of samples (lung, salivary gland and prostate tissue sections and Pap smears), scoring higher enhanced correlation coefficient (ECC) values (see the Methods section).A second comparative analysis was performed against a classical iterative phase recovery method, i.e., MHPR [84][85][86] : GedankenNet inferred the object fields with less noise and higher image fidelity compared to MHPR (M=2) that used the same input holograms (see Fig. 3(a,c)).In addition, we compared GedankenNet image reconstruction results against deep image prior (DIP) based approaches 69,78,80,88 , also confirming its superior performance (see Extended Data Fig. 3 and Supplementary Note 2).
The inference time of each of these hologram reconstruction algorithms is summarized in Extended Data Table 1, which indicates that GedankenNet accelerated the image reconstruction process by ~128 times compared to MHPR (M=2).These holographic imaging experiments and resulting analyses successfully demonstrate GedankenNet's unparalleled generalization to experimental holograms of unknown, new types of samples without any prior knowledge about the samples or the use of experimental training data or labels.
GedankenNet's strong external generalization is due to its self-supervised learning scheme that employs the physics-consistency loss, which is further validated by the additional comparisons we performed between self-supervised learning and supervised learning schemes; see Extended Data Fig. 4 and Supplementary Note 3. In this new analysis, we compared GedankenNet and the supervised learning model FIN that were trained with the same artificial hologram datasets generated from random synthetic images (Extended Data Fig. 4a) or natural images from COCO dataset (Extended Data Fig. 4b).The blind testing of these models used experimental holograms of Pap smear samples and lung tissue sections.
The results of this comparison (summarized in Extended Data Fig. 4) reveal that the selfsupervised learning scheme consistently achieved better reconstruction accuracy and enhanced ECC scores over the supervised learning scheme, further highlighting the superior external generalization of GedankenNet to experimental holograms of new types of samples.
In addition to GedankenNet's superior external generalization (from artificial random images to experimental holographic data), this framework can also be applied to other training datasets.To showcase this, we trained three GedankenNet models using (i) the artificial  In addition to the analyses reported in Figure 5, where the input holograms were defocused by an axial distance of Δ, we report in Extended Data Figure 6 and Supplementary Note 5 This robustness of GedankenNet and its external generalization performance reported in earlier analyses can be further improved using some object priors, such as the assumption of phase-only objects.Different from the earlier models, here the GedankenNet framework uses phase-only artificial random complex fields (with unit amplitude) during its training.Apart from this phase-only object assumption, there is no prior knowledge about the sample types to be imaged, and therefore the self-supervised learning model was trained using the same physics-consistency loss and artificial phase-only random objects that were synthetically generated without any experiments or resemblance to real-world samples.This new model, which we termed GedankenNet-Phase, exhibits enhanced adaptability to random, unknown perturbations in the forward optical model.Following a similar experimental-data-free training protocol as used in earlier GedankenNet models, GedankenNet-Phase was trained on  = 2 simulated holograms of artificial phase-only objects; however, these holograms were virtually located at independent random axial positions between 275 μm and 400 μm during for new types/classes of samples never seen before (see Extended Data Fig. 4).In general, the residual errors that stochastically occur during the network training would be non-physical errors that are incompatible with the wave equation, e.g., noise-like errors that do not follow wave propagation.In contrast to traditional structural loss functions that penalize these types of residual errors based on the statistics of the sample type of interest (which requires experimental data and/or knowledge about the samples and their features), the physicsconsistency loss function that we used focuses on physical inconsistencies, which is at the heart of the superior external generalization of GedankenNet framework since such physical errors are universally applicable, regardless of the type of sample or its physical properties or features.Furthermore, this physics consistency loss benefits from multiple hologram planes (i.e., M ≥2) so that it can also filter out twin-image-related artifacts that would normally appear in conventional in-line hologram reconstruction methods due to lack of direct phase information; stated differently, an artificial twin image that would be superimposed onto the complex-valued true image of the sample would be attacked by our self-consistency loss since it will create physical inconsistencies on at least M-1 hologram planes as a result of the wave propagation step for M ≥2 planes.In addition to this, the large degrees of freedom provided by the artificially synthesized image datasets, with random phase and amplitude channels, also contribute to the effectiveness of the GedankenNet framework, as also highlighted in the Results section, Extended Data Figure 5 and Supplementary Note 4.
Limited by the optical system, the experimental holographic imaging process applies a lowpass filter to the ground truth object fields.Furthermore, the recurrent spatial features within the same type of samples further reduce the diversity of the experimental datasets.Thus, adapting simulated holograms of random, artificial image datasets presents a more effective solution when access to large amounts of experimental data is impractical (see Fig. 4 and Extended Data Figure 5).In addition, GedankenNet exhibits superior generalization to unseen data distributions than supervised models, and achieves better holographic image reconstruction for unseen, new types of samples (see e.g., Figs.3-4).
During the training phase of GedankenNet, the physical forward model is given to the network as part of the Gedankenexperiments.However, perturbations, i.e., the mismatch between the a priori forward model and the a posteriori model in the experiments, could impact the performance of the learned GedankenNet.These sources of perturbations often include: (1) the measurement noise , (2) the modeling error of the sampling function  and (3) the error of the transfer function .The first two sources can come from a combination of factors, e.g., thermal and shot noise, sensor nonlinearity, aberrations, etc., and can be properly handled by setting regularization terms in the loss function, e.g., the total variation (TV) loss 90 .The last source of perturbations may result from the assumptions when establishing the forward model and errors in the key parameters of the holographic imaging system, e.g., the sample-to-sensor distances, the illumination wavelength, the pixel pitch, etc.
Through the self-supervised learning process, GedankenNet intrinsically acquired robustness to various types of random perturbations in the physical forward model (see e.g., Fig. 5 and Extended Data Fig. 6) and implicitly learned the physics of wave propagation.Furthermore, we demonstrated that the GedankenNet framework could adapt to more complicated random, unknown perturbations, as shown with GedankenNet-Phase and GedankenNet-Phase, which used artificial phase-only random training images that were synthetically generated without any experiments or resemblance to real-world samples.These observations align with earlier reports, which showed self-supervised models to be more robust to adversarial attacks and input image/data corruptions and exceed the performance of fully supervised models on near-distribution outliers 91,92 .
In summary, GedankenNet overcomes important limitations of existing deep learning models in holographic microscopy by creating experimental-data-free, generalizable, and physicscompatible deep learning models.GedankenNet further opens up new opportunities for other microscopy imaging modalities and various computational inverse imaging problems and could facilitate a diverse set of applications for deep learning-based holography and microscopy techniques.The artificial holograms used in this work for the training were simulated either from random images or natural images (from COCO dataset).Random images (with no connection or resemblance to real-world samples) were generated using a Python package randimage, which colored the pixels along a path found from a random gray-valued image to generate an artificial RGB image.Then we mapped the generated random RGB images to grayscale.Two independent images randomly selected from the dataset served as the amplitude and phase of the complex object field, and a small constant was added to the amplitude channel to avoid zero transmission and undefined phase issue.For the artificial random phase-only object fields, only the phase image was selected, and the amplitude was set as 1 everywhere.The object field was then propagated by the given sample-to-sensor distances using the angular spectrum approach 83 , and the intensity of the resulting complex field was calculated.The where  ∈ ℂ ×(2+1)×(2+1) are the truncated frequency components.The resulting tensor ′ is then transformed into the spatial domain through an inverse 2D FFT.The same pyramidlike setting of half window size  as in Ref. 81 was applied here such that  decreases for deeper SPAF blocks.This pyramid-like setting provides a mapping of the high-frequency information of the holographic diffraction patterns to low-frequency regions in the first few layers and passes this low-frequency information to the subsequent layers with a smaller window size, which better utilizes the spatial features at multiple scales and at the same time considerably reduces the model size, avoiding potential overfitting and generalization issues.

Figures and figure legends
The architecture of GedankenNet was extended for two additional models reported in the Results section, namely GedankenNet-Phase and GedankenNet-Phase, as shown in the Extended Data Figure 8(a).Similar to GedankenNet, these models use a sequence of  holograms concatenated as the input image with  channels, but, instead of outputting real and imaginary parts, the GedankenNet-Phase and GedankenNet-Phase only generate phaseonly output images.The Dynamic SPAF (dSPAF) modules 96 inside GedankenNet-Phase and GedankenNet-Phaseλ exploit a shallow U-Net to dynamically generate weights  for each input tensor, and enable the capabilities of autofocusing and adapting to unknown shifts/changes in the illumination wavelengths.The dense links provide an efficient flow of information from the input layer to the output layer, so that every output tensor of the dSPAF group is appended and fed to the subsequent dSPAF groups, resulting in an economic and powerful network architecture.

Algorithm implementation
GedankenNet, GedankenNet-Phase and GedankenNet-Phase were implemented using To avoid trivial ambiguities in phase retrieval [99][100][101] , the GedankenNet's output was normalized using its complex mean; the GedankenNet-Phase and GedankenNet-Phase's outputs were subtracted from their corresponding mean.
All the trainable parameters in GedankenNet were optimized using the Adam optimizer 102 .
The learning rate follows a cosine annealing scheduler with an initial rate of 0.002.All the models went through ~ 0.75 million batches (equivalent to ~7.5 epochs) and the best model was preserved with the minimal validation loss.The training takes ~48 hours for an  = 2 model on a computer equipped with an i9-12900F CPU, 64 GB RAM and an RTX 3090 graphics card.The inference time measurement (Extended Data Table 1) was done on the same machine with GPU acceleration and a test batch size of 20 for GedankenNet, 12 for both GedankenNet-Phase and GedankenNet-Phase.
The supervised FIN adopted the same architecture and parameters as in Ref. 48.The U-Net architecture employed four convolutional blocks in the down-sampling and up-sampling paths separately, and each block contained two convolutional layers with batch normalization and ReLU activation.The input feature maps of the first convolutional block had 64 channels and each block in the down-sampling path doubled the number of channels.Supervised FIN and U-Net 87 models adopted the same loss function as in Ref. 48.The same Adam optimizer and learning rate were applied to the supervised learning models.DIP adopted a U-Net architecture, an Adam optimizer and the loss function used in Ref. 80 .

Image reconstruction evaluation metrics
(a): (1) iterative phase retrieval algorithms based on the physical forward model and iterative error-reduction; (2) supervised deep learning-based inference methods that learn from training image pairs of input holograms  and the ground truth object fields .Similar to the iterative phase recovery algorithms listed under (1), deep hologram dataset generated from random images, same as before; (ii) a new artificial hologram dataset generated from a natural image dataset (COCO) 89 ; (iii) an experimental hologram dataset of human tissue sections (see Methods for dataset preparation).Each one of these training datasets had ~100 K training image pairs with  = 2,  1 = 300 μm and  2 = 375 μm.As shown in Fig. 4, these three individually trained GedankenNet models were tested on four testing datasets, including artificial holograms of (1) random synthetic images and (2) natural images as well as experimental holograms of (3) lung tissue sections and (4) Pap smears.Our results reveal that all the self-supervised GedankenNet models showed very good reconstruction quality for both internal and external generalization; see Fig. 4(a-b).When trained using the experimental holograms of lung tissue sections, the supervised hologram reconstruction model FIN (solid red bar) scored higher ECC values ( value of 7.5 × 10 −38 ) than the GedankenNet (solid blue bar) on the same testing set of the lung tissue sections.However, when it comes to external generalization, as shown in Fig. 4(b), GedankenNet (the blue shadow bar) achieved superior imaging performance ( value of 8.5 × 10 −10 ) compared to FIN (the red shadow bar) on natural images (from COCO dataset).One can also notice the overfitting of the supervised model (FIN) by the large performance gap observed between its internal and external generalization performance shown with the red bars in Fig. 4(b).On the contrary, the self-supervised GedankenNet trained with artificial random images (the blue bars) showed very good generalization performance for both test datasets covering natural macro-scale images as well as micro-scale tissue images.To further illustrate the relationship between the training dataset composed of artificial random images and the generalization performance of GedankenNet, we compared the standard GedankenNet model reported earlier (Fig. 3) against a new GedankenNet model trained on artificial random complex fields with correlated amplitude and phase channels.As summarized in Extended Data Figure 5 and Supplementary Note 4, this new GedankenNet model trained with correlated amplitude and phase images generalized relatively worse on the same external test datasets compared to the original GedankenNet model that did not use any correlation between the amplitude and phase channels.These results further confirm that the large data variations in the phase and amplitude channels of the artificially generated random training images greatly contribute to the superior generalization of GedankenNet models.Besides its generalization to unseen testing data distributions and experimental holograms, the inference of GedankenNet is also compatible with the wave equation.To demonstrate this, we tested the GedankenNet model (trained with the artificial hologram dataset generated from random synthetic images) on experimental holograms captured at shifted unknown axial positions  1 ′ ≅  1 + Δ   2 ′ ≅  2 + Δ, where  1 ,  2 were the training axial positions and Δ is the unknown axial shift amount.The same model as in Fig. 3 was used for this analysis and blindly tested on lung tissue sections (i.e., external generalization).Due to the unknown axial defocus distance (Δ), the direct output fields of GedankenNet do not match well with the ground truth, indicated by the orange curve in Fig. 5(a).However, since GedankenNet was trained with the physics-consistency loss, its output fields are compatible with the wave equation in free-space.Thus, the object fields at the sample plane can be accurately retrieved from the GedankenNet output fields by performing wave propagation by the corresponding axial defocus distance.After propagating the output fields of GedankenNet by −Δ using the angular spectrum approach, the propagated fields (blue curve) matched very well with the ground truth fields across a large range of axial defocus values, Δ.These results are important because (1) they once again demonstrate the success of GedankenNet in generalizing to experimental holograms even though it was only trained by artificial holograms of random synthetic images; and (2) the physics-consistency based self-supervised training of GedankenNet encoded the wave equation into its inference process so that instead of hallucinating and creating non-physical random optical fields when tested with defocused holograms, GedankenNet outputs correct (physically consistent) defocused complex fields.In this sense, GedankenNet not only exhibits superior external generalization (from experimentand data-free training to experimental holograms), but also very well generalized to work with defocused experimental holograms.To the best of our knowledge, these features were not demonstrated before for any hologram reconstruction neural network in the literature.

Figure 5 (
Figure 5(b) reports another example of GedankenNet's superior external generalization and the resilience of GedankenNet models to different experimental input holograms with various pixel pitches that are larger compared to the training pixel pitch.In these results, GedankenNet simultaneously implemented hologram reconstruction and pixel superresolution without any fine-tuning or retraining of its model, which only utilized artificial random training images at a single pixel pitch (see the Extended Data Figure6).To further explore the robustness of GedankenNet to variations in other physical features of the acquired holograms of interest, in Supplementary Note 6 and Extended Data Figure7, we studied the impact of the SNR of the input holograms on the image reconstruction quality, and compared GedankenNet's blind inference results against supervised models, further confirming its superiority and robustness to different sources of noise or perturbations.
the training, aiming to achieve both hologram reconstruction and autofocusing using selfsupervised learning based on the same physics-consistency loss (see the Methods and Extended Data Figure8for details on the training process and the architecture of GedankenNet-Phase).After its training with artificial random data generated without any experiments or resemblance to real-world samples, Fig.6demonstrates the experimental hologram reconstruction and autofocusing performance of GedankenNet-Phase on unstained (label-free) human kidney tissue sections.Figure 6(a) visualizes GedankenNet-Phase outputs corresponding to  = 2 input holograms independently captured at arbitrary and unknown axial positions within [300, 400]μm, and Fig. 6(b) quantitatively evaluates the reconstruction quality of GedankenNet-Phase in terms of phase root mean squared error (RMSE) with respect to the ground truth over a test set of 98 unique FOVs.These results reveal that GedankenNet-Phase successfully achieved both autofocusing and hologram reconstruction within its training axial distance range.As expected, the reconstruction quality drops for  1 =  2 since it corresponds to  = 1, deviating from the training of GedankenNet-Phase, which used  = 2 random input holograms.The concept of GedankenNet-Phase can be further expanded to bring additional resilience to its reconstructions and achieve broader external generalization for other types of perturbations in the physical forward model, such as unknown changes or shifts/drifts in the illumination wavelength.To showcase this, we created an additional model, which we termed GedankenNet-Phase and trained it on  = 2 simulated holograms of artificial phase-only random fields, which were illuminated and propagated with random illumination wavelengths between 520 nm and 540 nm (more details are provided in Supplementary Note 7).Apart from the phase-only assumption, the training of GedankenNet-Phase did not involve any experiments, data resembling real-world samples, or other prior knowledge about the samples.After its training with artificial random data, GedankenNet-Phase generalized to experimental data of unstained human kidney tissue sections and achieved stable reconstruction performance on holograms acquired at various illumination wavelengths, ranging from 500 nm to 560 nm, without knowing what the illumination wavelength is (see Extended Data Figure9and Supplementary Note 7).These results and analyses demonstrate the superior robustness of the GedankenNet framework to various sources of perturbations in the physical forward model, while also maintaining its broad external generalization performance.DiscussionIn this work, we demonstrated GedankenNet, a self-supervised hologram reconstruction neural network that eliminates the dependence on labeled and experimental training data, and achieves better generalization to unseen data distributions than existing methods.Based on its self-supervised learning scheme and the physics-consistency loss function, GedankenNet is able to implicitly learn the physics of wave propagation and perform hologram reconstruction tasks without any experimental data or prior knowledge of the samples.Stated differently, the training of GedankenNet involves Gedankenexperiments (thought experiments) without involving any experimental data or any prior knowledge about real-world samples, and after its training, GedankenNet successfully generalizes to experimental holograms and shows superior reconstruction quality for external generalization compared with supervised learning-based network models.We also demonstrated that GedankenNet outputs are compatible with the wave equation, and it does not hallucinate artificial (non-physical) output fields when defocused holograms are provided as input.These results present an additional degree of successful generalization (beyond experiment-and data-free training to experimental holograms) since during the self-supervised training of GedankenNet we always used Δ = 0. Compared with the existing supervised learning methods, GedankenNet has several unique advantages.It eliminates the dependence on labeled experimental training data in computational microscopy, which often come from other imaging modalities or classical algorithms and therefore, inevitably introduce biases for external generalization performance of the trained network.The self-supervised learning scheme of GedankenNet also considerably relieves the cost and labor of collecting and preparing large-scale microscopic image datasets.For the inverse problem of hologram reconstruction, the reported physicsconsistency loss that we used in self-supervised learning outperforms traditional structural loss functions commonly employed in supervised learning since they often overfit to specific image features that appear in the training dataset, resulting in generalization errors, especially

Figure 1 .
Figure 1.Diagrams of GedankenNet and other existing methods for solving holographic

Figure 3 .
Figure 3. External generalization of GedankenNet on human tissue sections and Pap smears,

Figure 4 .
Figure 4. Generalization of GedankenNet trained with different training datasets to various

Figure 5 .
Figure 5. Compatibility of GedankenNet output images with the wave equation in free-space.

Figure 6 .
Figure 6.Autofocusing performance of GedankenNet-Phase for experimental holograms of resulting holograms were cropped into 512 × 512 patches.Each of the two datasets (either from random images or COCO natural images) used ~100K images for training and a set of 100 images for validation and testing, which were excluded from training.All models in this work used the amplitude of the measured fields as the inputs.Given a randomly selected amplitude () image and phase () image, the simulated hologram (, ; ) at axial position  is generated by free-space propagation (FSP): (, ; ) = �FSP �( + )⨀  ; � + � where  ∈ ℝ stands for the added small constant, ⨀ represents element-wise multiplication, and  ∈ ℂ × is the additional white Gaussian noise.For the phase-only objects, the simulated holograms can be expressed as: (, ; ) = |FSP�  ; � + | iteration is completed after all 8 holograms have been used.The algorithm generally converges after 100 iterations.Input-target pairs of 512 × 512 pixels were cropped from the super-resolved holograms and their corresponding retrieved ground truth fields, forming the experimental hologram datasets.Standard data augmentation techniques were applied, including random rotations by 0, ±45, ±90 degrees and random vertical and horizontal flipping.The multi-height experimental hologram dataset of tissue sections contains ~100K input-target pairs of stained human lung, prostate, salivary gland, kidney, liver and esophagus tissue sections.A subset of the lung, prostate, salivary gland slides from new patients and Pap smears were excluded from the training dataset and used as testing datasets, containing 94, 49, 49 and 47 unique FOVs, respectively.The holograms of the unstained (label-free) kidney tissue thin sections (~3-4 µm thick) were used as our phase-only object test dataset containing 98 unique FOVs.Network architectureA sequence of  holograms is concatenated as the input image with  channels and the real and imaginary parts of the object complex field are generated at the output of GedankenNet.GedankenNet contains a series of spatial-Fourier transformation (SPAF) blocks and a largescale residual connection, in addition to two 1 × 1 convolution layers at the head and the tail of the network (see Extended Data Fig.1(a)).In each SPAF block, input tensors pass through two recursive SPAF modules with residual connections, which share the same parameters before entering the PReLU (parametric rectified linear unit) activation layer95 .The PReLU activation function with respect to an input value  ∈ ℝ is defined as:PReLU() = max(0, ) +  * min (0, )where  ∈ ℝ is a learnable parameter.Another residual connection passes the input tensor after the PReLU layer.The SPAF module consists of a 3 × 3 convolution layer and a branch performing linear transformation in the Fourier domain (Extended Data Fig.1(a)).The input tensor with  channels to the SPAF module is first transformed into the frequency domain by a 2D FFT and truncated by a window with a half size  to filter out higher frequency components.The linear transformation in the frequency domain is realized through pixelwise multiplication with a trainable weight tensor  ∈ ℝ ×(2+1)×(2+1) , i.e.,  ,, ′ =  ,, • �  ,,  =1 , ,  = 0, ±1, … , ±,  = 1, … ,