Introduction

Optical fibers are well-known for transmitting remote information out of otherwise inaccessible areas. Because of their miniature sizes and flexibility, fiber-optic imaging systems (FOISs)1 have become an indispensable tool in clinical practice and biological research, such as early detection of gastrointestinal cancers2,3,4,5 and visualization of neuronal activities in freely moving animals6,7,8,9,10. Most common FOISs are based on optical fiber bundles or multimode fiber (MMFs). An optical fiber bundle consists of thousands of closely spaced cores in a shared cladding. Each core acts as a single-pixel detector to sample the object6,11,12,13,14. Due to the loss of information in the cladding, the output images from an optical fiber bundle suffer from the honeycomb effect15. On the other hand, an MMF supports thousands of optical modes in a single core. Because of the mode coupling in MMF, object images are scrambled into speckle patterns. Supervised deep learning16,17 has been successfully implemented in both cases to reconstruct high-quality images18,19,20,21. A Convolutional Neural Network (CNN) can “learn” an image reconstruction mapping from numerous pairs of fiber output images and the input object images. Despite its success, supervised deep learning imposes a heavy burden on FOISs. The collection of paired fiber outputs and input objects in the calibration step involves time-consuming and demanding alignments of the FOIS. Especially, a re-calibration is required for any system variations, which is infeasible for practical applications.

Unsupervised deep learning circumvents these hurdles by using unpaired training image data. Since the deep learning model has to uncover the hidden mapping between two image domains without paired images, image reconstruction using unsupervised deep learning is considered to be a challenging task. Recently, it has been demonstrated that if the two image domains are similar in the high-dimensional space, “generator” CNNs and “discriminator” CNNs can compete in adversarial games to find a “natural” translation between the two image domains22,23. To achieve this similarity in the high-dimension domain, the FOISs should have a point-to-point transmission between the input object and the fiber output with high sampling densities. Unfortunately, neither optical fiber bundles nor MMFs meet these requirements. Although optical fiber bundles can directly convey the images of the objects, they have limited sampling densities (~0.1 mode/µm2). As more sampling points, i.e., more cores, are added, the core-to-core crosstalk becomes stronger and degrades the point-to-point transmission fidelity24. On the other hand, the input-output relationship in MMFs is far deviated from a point-to-point transmission due to the multimode interference. The recently proposed glass-air Anderson localizing optical fibers (GALOFs)25,26,27,28,29,30,31,32,33 provide a promising alternative. With a disordered arrangement of air holes embedded in a silica matrix, GALOFs achieve local confinement of light and high sampling densities (~10 mode/µm2) simultaneously34 due to the transverse Anderson localization (TAL)35,36. Moreover, the TAL-supported modes are insensitive to external perturbations37 or wavelength shifts38, as opposed to the modes in optical fiber bundles24,39,40 or MMFs41,42,43,44,45. Therefore, robust full-color image transport can be achieved.

Here, we demonstrate unsupervised full-color high-fidelity image reconstruction through a meter-long GALOF in both transmission and reflection modes. We show that a simple histogram equalization step is adequate to reveal the hidden objects in the GALOF outputs preliminarily due to the densely-distributed TAL modes. The objects’ fine details can be further recovered by utilizing the unpaired image-to-image translation22,23. Unsupervised image reconstruction significantly simplifies the calibration step, where the object images only need to be collected once. Therefore, the GALOF-based FOIS is flexible towards different conditions. As a remarkable example, we show the system’s consistent imaging performance within a working distance of at least 4 mm, with a simple one-step re-calibration that only requires GALOF outputs. Moreover, due to the robustness of the TAL-supported modes, high image quality is preserved even under substantial mechanical bending (~60° bent angle). Finally, we show that the cross-domain generalizability of unseen objects can be enhanced by increasing the objects’ diversity.

Results

Principles

When propagating through a GALOF (Fig. 1a), the imaging information of the object is encoded by the TAL-supported modes. The light confinement provided by the TAL results in a nearly point-to-point transmission from the GALOF’s input facet to the output facet. Due to the different mode losses, the GALOF output pattern is an unevenly weighted superposition of the TAL-supported modes. Reconstructing the object from the output pattern involves standardizing all the TAL-supported modes and solving an inverse imaging problem. This is a challenging task since the TAL-supported modes have a very high mode density. Instead, we tackle this problem by standardizing the pixels of the GALOF’s output images. In the calibration step (Fig. 1a), we collect 1000 fiber output images and another unpaired 1000 objects’ reference images (“Methods”). Especially, there does not exist a one-to-one correspondence between these unpaired data sets. Before standardization, we register the 1000 fiber outputs according to some arbitrarily chosen fiber outputs (Methods) to compensate for the image drift caused by the mechanical instability during experiments. As a result, each pixel in the fiber output has 1000 different values. Since a large area of the object is scanned, each pixel should have captured the comprehensive statistical features of the object. Statistically speaking, all these pixels should have the same Probability Mass Function (PMF) as those in the reference objects, despite being from different unknown objects. Therefore, we perform histogram equalization (“Methods”) to each pixel in the fiber output image for each RGB (red, green, and blue) channel (Fig. 1b). We calculate the Cumulative Distribution Function (CDF) from the PMF of each pixel and look for the pixel value in the reference objects that has the same CDF. In this way, we generate a Look-Up Table (LUT) for each pixel to transform its value. Among the range of 0–255, zeros are assigned to the defective pixels whose maximum value is less than 10 or whose Standard Deviation (STD) value is less than 2. For example, the value of the blue channel is set to zeros, as shown in Fig. 1b. Next, we perform image inpainting (“Methods”) on each processed image. For each RGB channel, we interpolate inward from the pixels whose values are less than 10. Fuzzy objects are recovered after inpainting.

Fig. 1: The calibration process of the unsupervised full-color image reconstruction.
figure 1

a Unknown objects are placed in front of the GALOF input facet at some working distance. To obtain the objects’ statistical information, we separately collect unpaired reference objects. The calibration process consists of several steps. First, the GALOF outputs are registered (images ) according to an arbitrarily chosen fiber output image. b The histogram equalization step. The PMF of each pixel in from (a) is transformed to resemble the PMF of the pixels in the reference objects. For each RGB channel, a LUT is created by matching the pixel values with the same CDF. “Bad” pixels with maxima less than 10 or STDs less than 2 are set to zero (e.g., blue channel in (b)), resulting in the images in (a). Then, the inpainting is performed on each image to fill in those bad pixels (the images in (a)). c A Restore-CycleGAN recovers the image details. Two U-Net generators G1 and G2 translate between the images and the reference object images, whereas two PatchGAN discriminators D1 and D2 distinguish the “real” images in the target domain from the “fake” generated images. Both the generators and the discriminators are optimized through the least square adversarial losses LLSGAN. The generators are also updated through the identity mapping loss Lidentity and the cycle-consistent loss Lcycle. Lidentity requires an identical output if the input is in the target domain, while Lcycle requires an unchanged image if the image goes through a full cycle. After learning, high-quality images can be reconstructed by G1

Finally, the reference objects are used again to further enhance the imaging quality of the 1000 fuzzy objects. We utilize our recently proposed image restoration cycle-consistent adversarial network (Restore-CycleGAN)23 (Fig. 1c). As shown in our previous work, Restore-CycleGAN exhibits enhanced performance than the original CycleGAN22 in extracting global information. In the Restore-CycleGAN, a U-Net46 works as a generator G1 to transform the fuzzy images into high-quality images, while a PatchGAN47 works as a discriminator D1 to distinguish the “real” reference objects from the “fake” ones produced by the generator G1. The generator G1 and the discriminator D1 compete in an adversarial game through the least square adversarial loss LLSGAN. In this adversarial game, G1 gets rewarded if it successfully “fools” D1, whereas D1 gets rewarded if it differentiates the “real” from the “fake”. G1 is also optimized through the identity mapping loss Lidentity, which requires a reference object to remain identical if it passes through G1. Similarly, there is another pair of generator G2 and discriminator D2 in the opposite direction. To enhance cycle consistency, a cycle-consistent loss Lcycle is adopted to enforce an unaltered output if an image goes through the two generators successively. The details of the network architectures and the training processes can be found in the Methods. After training, only the generator G1 is used. Therefore, unsupervised image reconstruction is achieved without paired training data. During the test, a fiber output image goes through the process of: (1) aligning with the arbitrarily chosen fiber output image; (2) pixel value transformation using the LUTs; (3) inpainting; and (4) quality enhancement through the generator G1. The set of reference objects is only needed to recalibrate the system for special cases, such as changing the working distances (Fig. 1a).

High fidelity

We perform the calibration processes for six different biological objects: human red blood cells, frog blood cells, human eosinophils, human cancerous stomach tissues, human bronchogenic carcinoma tissues, and human sarcoma of uterus tissues. All calibrations are performed using the same straight GALOF with a working distance of 0 mm. As shown in Fig. 2a, the data of the first four objects (human red blood cells, frog blood cells, human eosinophils, and human cancerous stomach tissues) are collected in the transmission mode, whereas that of the last two (human bronchogenic carcinoma tissues and human sarcoma of uterus tissues) are collected in the reflection mode. For each case, we separately collect 1000 object images and their GALOF outputs to evaluate the performance. The reconstruction time per image is about 1.6 s. Figure 2a shows some examples of the objects’ reference images, the GALOF output images, and the results after each reconstruction step (without the registration step). Although the raw GALOF outputs are unrecognizable, they preserve the local information of the objects well at all RGB channels. This is made clear after the histogram equalization is applied to the pixels in the registered images. After the inpainting step, fuzzy images of the objects start to show up. Finally, Restore-CycleGANs further recover the fine details. The high fidelity of the reconstructions is quantitatively demonstrated in Fig. 2b, where we plot the mean absolute errors (MAEs) and STDs of the 1000 reconstructions. In all six cases, the MAEs are below 0.035 (maximum ~1). In addition, we conduct a detailed analysis of the entire imaging pre-processing and reconstruction process, which reveals the cooperative impact of the pre-processing and the Restore-CycleGAN (Supplementary Information: Step-by-step analysis of unsupervised image reconstruction).

Fig. 2: Test results of unsupervised full-color image reconstruction on different types of biological objects.
figure 2

a Sample images of the objects, the intermediate outputs of the reconstructions (excluding the registration step), and the final reconstructed images. The objects are human red blood cells, frog blood cells, human eosinophils, human cancerous stomach tissues, human bronchogenic carcinoma tissues, and human sarcoma of uterus tissues. The reconstruction process is calibrated and tested using the images collected through a straight GALOF with 0 mm working distance. The GALOF-based imaging system is in the transmission mode for the first four cases (the red brackets) and in the reflection mode for the last two cases (the blue brackets). b MAEs and STDs of the reconstructions are evaluated on 1000 objects for each type of biological object

Robustness

To test the robustness of the unsupervised fiber-optic imaging, we bend the GALOF with a central angle of 60° (Fig. 3a). The output images are reconstructed using the LUTs and the Restore-CycleGAN calibrated on a straight GALOF. Both the calibration and test stages use the same type of human red blood cell samples at a working distance of 0 mm. As illustrated in Fig. 3b, high-fidelity reconstructions are achieved despite the large-angle fiber bending. We repeat the image reconstruction for 1000 GALOF outputs and calculate their MAE and STD with respect to the input objects (Fig. 3c). Similar values can be observed from the test results on a straight fiber. This indicates the consistency of the GALOF outputs, as attributed to the excellent light confinement of the TAL-supported modes.

Fig. 3: Robustness of the unsupervised image reconstruction under mechanical deformation.
figure 3

a After calibrating the LUTs and Restore-CycleGAN on human red blood cell samples through a straight GALOF with 0 mm working distance, we bend the GALOF with a centric angle of 60° and test the image reconstruction. b Sample images of the objects, the corresponding GALOF outputs, and the reconstruction. c MAE averaged over 1000 reconstructions. The error bar stands for STD

Flexible working distance

Benefiting from the unsupervised image reconstruction, the reference objects only need to be collected once. When the working distance varies, we re-calibrate the LUTs and the Restore-CycleGAN using the same reference objects collected under a working distance of 0 mm. Figure 4 shows the test results on human red blood cells under the working distance of 1–6 mm with a step of 1 mm. Due to the loss of high-frequency information over the distance, the processed images after inpainting only demonstrate blurry profiles of the objects (Fig. 4a). Nevertheless, Restore-CycleGANs can still recover the images of objects with fine details. High-fidelity reconstructions are preserved up to a working distance of at least 4 mm. With increased working distances, the processed images after inpainting lost more information with significantly degraded imaging qualities, resulting in false blood cell reconstructions by the Restore-CycleGANs, such as the reconstructions obtained at 6 mm working distance in Fig. 4a. To quantify the imaging performance, we calculate the MAEs and STDs of 1000 pairs of reconstructions and ground truths at each working distance (Fig. 4b). It shows that the increased working distance does not rapidly degrade the image quality. Therefore, our unsupervised image reconstruction approach enables flexible working distances through a simple re-calibration procedure. Imaging under various working distances serves as a showcase of the flexible re-calibration brought by unsupervised image reconstruction. The flexibility is further demonstrated in Supplementary Information, where we conduct numerical investigations to analyze the imaging performance under other extreme conditions, including low-light, high-noise, and uneven illuminations (Supplementary Information: “Imaging under low-light conditions”, “Imaging under high-noise levels”, and “Imaging under uneven illuminations”). The simulations demonstrate that a simple re-calibration can enable high-fidelity imaging even under low-light illuminations of 5% visibility, illuminations with additional Gaussian noise (0 mean and 50 variance), or Gaussian-distributed uneven illuminations.

Fig. 4: Test results of unsupervised full-color image reconstruction with different working distances.
figure 4

The reconstruction process is calibrated and tested in a straight GALOF at each working distance. a Sample images of the objects (human red blood cells), the outputs of each reconstruction step (excluding the registration step), and the final reconstructed images. b MAEs and STDs of the reconstructions evaluated on 1000 objects for each type of biological objects

Cross-domain generalizability

In the abovementioned results, the calibration and test are conducted on the same type of biological objects. Yet, FOISs are expected to perform high-fidelity imaging on unseen objects in real-world applications. To enhance cross-domain generalizability, it is necessary to enrich the statistical information of the objects for image reconstruction. For this purpose, we include image data generated from various types of biological samples (Fig. 5a), such as human red blood cells, frog blood cells, human eosinophils, and cancerous stomach tissues. For each object type, we collect 300 reference object images and 300 GALOF output images from a straight GALOF with 0 mm working distance in the transmission mode. These two sets of images are unpaired and uncorrelated. We follow the same calibration procedure demonstrated in Fig. 1. After obtaining the LUTs and the Restore-CycleGAN, we test the unsupervised image reconstruction on GALOF outputs from unseen objects, i.e., bird blood cells. Figure 5b shows sample object images, the corresponding GALOF outputs, and the processed images after each reconstruction step (excluding the registration step). The profiles and orientations of the bird blood cells can be clearly identified, despite slightly degraded image quality. This corresponds to a higher MAE over 300 reconstructed images (Fig. 5c). The increase in MAE originates from the limited data size and object variations, which could be addressed by improving the training data further.

Fig. 5: Cross-domain generalizability on unseen objects.
figure 5

a In the calibration, both the GALOF outputs and the reference objects contain mixed images from four different types: human red blood cells, frog blood cells, human eosinophils, and human cancerous stomach tissues. We collect 300 unpaired and uncorrelated images for each type of sample from the GALOF outputs and the reference objects. Following the same procedure (Fig. 1), we obtain the LUTs for histogram equalization and the Restore-CycleGAN. Then we test the image reconstruction on GALOF outputs from unseen objects, i.e., bird blood cells. b Sample images of the bird blood cell images, the corresponding GALOF outputs, and the outputs of reconstruction. c MAE and STD averaged over 300 reconstructions

Discussion

Robust full-color high-fidelity image transport using unsupervised learning is achieved through the combined effects of the GALOF’s properties, the pre-processing steps (registration, standardization, and inpainting), and the Restore-CycleGAN. First, the high-density localized modes result in a point-to-point transmission of the object with a high sampling ratio, which makes the inverse imaging problem well-suited for unsupervised learning. In addition, the TAL-supported modes have flat responses to different wavelengths38, enabling full-color imaging. In contrast, an optical fiber bundle designed to transmit images at one wavelength may not be suitable for another wavelength13. Moreover, the robustness of the TAL-supported modes makes the imaging performance stable under large fiber deformations, whereas a translation of a few millimeters in one end of a meter-long optical fiber bundle or MMF tends to significantly degrade the output image24,41,42,43. Second, the pre-processing steps help bring the two image domains of the GALOF outputs and the objects closer, enabling the Restore-CycleGAN to find a ‘natural’ translation23, since unsupervised image-to-image translation often fails when an extreme transformation occurs between the two image domains22 (Supplementary Information: Step-by-step analysis of unsupervised image reconstruction). Third, the Restore-CycleGAN determines the high-fidelity reconstructions using the pre-processed images. Its crucial role is even more significant under high-noise conditions (Supplementary Information: Imaging under high-noise levels).

Free from strictly paired training imaging data, unsupervised image reconstruction streamlines the system design and calibration process, facilitates simpler and faster data acquisition, and enhances GALOF-based FOISs as a flexible and efficient imaging platform for practical applications in various circumstances. Without the constraint of paired data, the reference object images in our system can be used repeatedly, with only the GALOF outputs needed for re-calibrations when the system changes. For example, a wide range of working distances is desirable in endoscopy applications to reduce penetration damage. System re-calibration by supervised learning is impractical since it requires collecting paired images for a changed working distance. As shown in this work, unsupervised learning enables simple re-calibrations of the system to acquire high-quality images up to a working distance of at least 4 mm. Moreover, the amount of data needed in calibrations is dramatically reduced. In our experiments, we only acquire 1000 GALOF outputs and 1000 reference object images for one calibration. In contrast, supervised learning typically requires tens of thousands of paired images to train a CNN.

Further improvement can be made in the GALOF fabrications and the unsupervised image reconstruction process. Currently, there are many defective pixels in the GALOF outputs, which lead to the loss of information. Future work can be devoted to investigating methods of eliminating these defective pixels. Moreover, the geometrical parameters of the GALOF can also be improved. The GALOF used in this work has an air-filling fraction of ~28 %. In contrast, an air-filling fraction of ~50% has been shown to be favorable for reducing the localization lengths and improving spatial resolution48,49. On the other hand, there is still much room for improvements in cross-domain generalizability. We expect a larger and more diversified image dataset would enhance the image reconstruction of unseen objects in future work. Furthermore, while it only takes the Restore-CycleGAN about 70 ms to reconstruct an image, the steps preceding the Restore-CycleGAN add a significant amount of time to the whole reconstruction process. The reconstruction time per image is about 1.6 s. Future studies can focus on speeding up the processing speed for steps preceding the Restore-CycleGAN to realize real-time imaging.

Future studies could also investigate the behaviors of unsupervised learning-based fiber imaging under extreme conditions, both experimentally and numerically, to develop system enhancement solutions. As detailed in the Supplementary Information, our unsupervised-learning-based fiber imaging method maintains high-quality imaging capabilities under low-light, high-noise, or uneven illuminations, demonstrating significant resistance to these challenging conditions. However, we also show that our imaging method fails beyond certain critical thresholds, such as 6 mm long working distance, extremely low-light illumination of 2% visibility, or high-level Gaussian noise of 100 variances, due to the significant alternations in the statistical features of GALOF’s outputs. This raises the question of how to quantitatively monitor the entire imaging process and evaluate the reconstruction fidelity. In Supplementary Information: “Confidence metric for image reconstruction”, we propose the correlation coefficient between the pre- and post-processed images by the Restore-CycleGAN as a confidence metric for alerting to model failures. The correlation coefficient demonstrates great conformity to image fidelity. Despite the progress, it remains an open question whether the limitations of unsupervised learning-based fiber imaging have been fully understood and whether a more suitable confidence metric could be employed. Consequently, further systematic experimental and numerical investigations are necessary to uncover better answers.

In conclusion, we achieve unsupervised image reconstruction in a meter-long GALOF based on its unique property of point-to-point transmission with high sampling densities. Full-color high-fidelity image transport is demonstrated on different types of biological samples in both transmission and reflection modes. The image quality is preserved when the GALOF is substantially bent with an angle of 60°. Enabled by unsupervised image reconstruction, the GALOF-based FOIS is flexible to different circumstances. High image quality is maintained within a working distance of at least 4 mm using a much-simplified re-calibration. Increased cross-domain generalizability on unseen objects is also shown by including diversified objects. Based on these results, we see the GALOF-based FOISs as promising candidates for the next-generation FOISs.

Materials and methods

Experimental setup

In both the transmission mode and the reflection mode (Fig. 6a, b), we use a quartz halogen lamp as the light source (wavelength: ~400 nm to ~2000 nm). A lens, L1, is placed in front of the lamp to collimate the light. In the transmission mode, the collimated light illuminates the object from behind. The object image is relayed by a 10× microscope objective (MO1) (infinity-corrected, NA = 0.3) and a tube lens L2 (f = 200 mm). The magnified image is then sent to two arms by a beam splitter BS1. In the reference arm, the image is further magnified by a 20x microscope objective (MO2) (NA = 0.75) and a tube lens L3 (f = 200 mm), and then collected by the CCD1 camera (Manta G-145C). In the imaging arm, the object image is delivered through the GALOF. The GALOF is fabricated by the stack-and-draw method. It has a disordered structure with a diameter of ~278 µm and an air-hole-filling fraction of ~28.5%. A segment of ~80 cm is used in the experiment. The GALOF output is magnified by a combination of a 20x microscope objective (MO3) (NA = 0.75) and a tube lens L4 (f = 200 mm) before being collected by the CCD2 camera (Manta G-145C). In the reflection mode, the illumination light is coupled into the back aperture of the MO1 by a beam splitter (BS2), and focused onto the object. We place a mirror M as a highly reflective substrate behind the object without contact. Similar to the transmission mode, the reflected object image is magnified and sent to the two imaging arms. For both the transmission and reflection modes, the reference arm and the imaging arm collect images separately during calibration. They only operate synchronously during the test to evaluate the system’s imaging performance.

Fig. 6: Schematic of the GALOF-based imaging systems.
figure 6

a Experimental setup under the transmission mode. b Experimental setup under the reflection mode. L1: collimating lens. L2, L3, L4: tube lenses. MO1, MO2, MO3: microscope objectives. BS1, BS2: beam splitters. M: reflective mirror. In the transmission mode, the white light illuminates the biological object from the back. In the reflection mode, the sample illumination and the image acquisition use the same objective, MO1. In both modes, the object image is relayed by MO1 and L2, split by BS1 into two copies. Each copy transmits through the imaging arm GALOF-MO3-L4-CCD2 and the reference arm MO2-L3-CCD1. The two arms operate separately to collect unpair images for calibration, while both arms operate synchronously to collect unpaired images for the test

GALOF outputs registration and inpainting

We first convert all the GALOF outputs to grayscale images. To find the transformation for registration, we use MATLAB “imregtform” function with monomodal registration and translation geometric transformation. Inpainting is based on MATLAB’s “regionfill” function that can interpolates inward from the values of the pixels on the outer boundary of the regions.

Histogram equalization

The detailed workflow of the histogram equalization step is illustrated in Fig. 7. For each RGB channel, every pixel in the 1000 registered GALOF outputs has 1000 values varying between 0 and 255 (Fig. 7a). We calculate the PMF of each pixel by counting the occurrence of a particular pixel value among the 1000 values. With the dimensions of one GALOF output being N × N (N = 420), there are N × N PMFs (probability versus pixel value). For the 1000 reference objects, we treat all the pixels equally and calculate one reference PMF from the N × N × 1000 pixel values. We convert each PMF to a CDF by summing up the probabilities of pixels whose values are smaller or equal to a specific value (Fig. 7b). The N × N CDFs of the GALOF output pixels are then compared with the reference CDF individually. For each GALOF pixel, the pixel values are mapped to new values, such as 8 to 73, by matching the cumulative probabilities in the GALOF’s CDF with those in the reference CDF. In this way, we obtain N×N LUTs. Consequently, the post-processed PMFs produced by the LUTs resemble the reference PMF distribution. Finally, the outputs from the histogram equalization step are generated by transforming the pixel values using the LUTs. It is noteworthy that, for any pixel with a maximum value of less than 10 or an STD value of less than 2 among the 1000 values (ranging from 0 to 255), we simply map all their values to 0. These pixels will be filled in during the inpainting step.

Fig. 7: Principle of the histogram equalization.
figure 7

a Schematic of the histogram equalization step. N×N PMFs (probability versus pixel value) of the N × N pixels in the GALOF output are generated from the 1000 registered GALOF outputs. They are compared with the reference PMF generated from the 1000 reference objects to produce N × N LUTs. The pixel values in the 1000 registered GALOF outputs are then transformed according to their corresponding LUTs. b Detailed workflow of a LUT generation. The PMF of one pixel and the reference PMF are converted to CDFs. A LUT is generated by matching the pixel values with the same cumulative probability

Restore-CycleGAN

The architectures of the generator and discriminator networks in the Restore-CycleGAN are shown in Fig. 8a, b. The generator is a U-Net with skip connections. The input image has a size of 420 × 420. It passes through different convolutional layers to a bottleneck and then passes through different transposed convolutional layers to the final output. The discriminator is a PatchGAN, which looks into patches of an input image and decides whether they are from the real images in the target domain or from the fake images generated by the generator. The detailed operations in the convolutional layers and transpose convolutional layers are shown in Fig. 8c. The numbers of filters in the layers of the generator are 64-128-256-512-512-512-512-512-512-512-512-512-256-128-64-3. The numbers of filters in the layers of the discriminator are 64-128-256-512-512-1.

Fig. 8: The generator and discriminator architectures.
figure 8

a The generator architecture in Restore-CycleGAN. b The discriminator architecture in Restore-CycleGAN. c The detailed explanation of the arrows in (a) and (b). Conv convolution, T-Conv transpose convolution, InstanceNorm instance normalization, s stride

The weights in the generators and the discriminators are initialized by random Gaussian distributions with a zero mean and a standard deviation of 0.02. For translating between the domain x and the domain y, there are two generator-discriminator pairs: \({G}_{x\to y}\) and \({D}_{y}\) in the direction from x to y, and \({G}_{y\to x}\) and \({D}_{x}\) in the other direction. The loss function of a generator \({G}_{x\to y}\) can be written as:

$$\begin{array}{c}{{L}}_{{G}_{x\to y}}={{E}}_{x}[{({D}_{y}({G}_{x\to y}(x))-1)}^{2}]\\ \,+{\alpha }_{1}{{E}}_{y}[{\Vert {G}_{x\to y}({G}_{y\to x}(y))-y\Vert }_{1}]\\ \,+{\alpha }_{1}{{E}}_{x}[{\Vert {G}_{y\to x}({G}_{x\to y}(x))-x\Vert }_{1}]\\ \,+{\alpha }_{2}{{E}}_{y}[{\Vert {G}_{x\to y}(y)-y\Vert }_{1}]\end{array}$$
(1)

The four terms in Eq. (1) are the least square adversarial loss \({ {\mathcal L} }_{{\rm{LSGAN}}}\), the cycle-consistent losses \({ {\mathcal L} }_{{\rm{cycle}}}\) in both directions, and the identity mapping loss \({ {\mathcal L} }_{{\rm{identiy}}}\), respectively. \({\alpha }_{1}\) and \({\alpha }_{2}\) are the weights controlling the balance among the losses. We use \({\alpha }_{1}=10\), and \({\alpha }_{2}=5\). The weights in \({D}_{y}\) and \({G}_{y\to x}\) are fixed when we train \({G}_{x\to y}\). The loss function of \({D}_{y}\) is the least square adversarial loss \({ {\mathcal L} }_{{\rm{LSGAN}}}\):

$${{L}}_{{D}_{y}}={{E}}_{y}[{({D}_{y}(y)-1)}^{2}]+{{E}}_{x}[{D}_{y}{({G}_{x\to y}(x))}^{2}]$$
(2)

The weights in \({G}_{x\to y}\) are fixed when we train \({D}_{y}\). The real images y to train \({D}_{y}\) are randomly selected from all the images in the target domain, whereas the fake images \({G}_{x\to y}(x)\) are randomly selected from a pool of 50 fake images. The pool is randomly updated through newly generated fake images. The loss of the discriminators is divided by half during training. The loss functions of \({G}_{y\to x}\) and \({D}_{x}\) can be written in a similar way. We train the discriminators and generators for 100 epochs with a batch size of 1. We use an Adam optimizer with a learning rate of 0.0002 and the exponential decay rate for the first momentum β1 = 0.5. The training takes ~40 h on a dual-GPU (GeForce GTX 1080 Ti) desktop.