Incorporating the image formation process into deep learning improves network performance

We present Richardson–Lucy network (RLN), a fast and lightweight deep learning method for three-dimensional fluorescence microscopy deconvolution. RLN combines the traditional Richardson–Lucy iteration with a fully convolutional network structure, establishing a connection to the image formation process and thereby improving network performance. Containing only roughly 16,000 parameters, RLN enables four- to 50-fold faster processing than purely data-driven networks with many more parameters. By visual and quantitative analysis, we show that RLN provides better deconvolution, better generalizability and fewer artifacts than other networks, especially along the axial dimension. RLN outperforms classic Richardson–Lucy deconvolution on volumes contaminated with severe out of focus fluorescence or noise and provides four- to sixfold faster reconstructions of large, cleared-tissue datasets than classic multi-view pipelines. We demonstrate RLN’s performance on cells, tissues and embryos imaged with widefield-, light-sheet-, confocal- and super-resolution microscopy.

The learning parameters and in the k-th layer (deconvolution iteration) have the same size as the blurring kernel and input image . Although the computational burden is never specified in the paper, we suspect the large size of these parameters combined with the necessary convolutions very likely contributes to a long training time, perhaps infeasibly long for 3D applications. Another concern is the application of Deep-URL in tasks that require a spatially varying PSF, as Deep-URL outputs only a single PSF. In addition, Deep-URL was only demonstrated on simulated data; its performance on experimental data is unknown. RLN differs from Deep-URL in the following aspects: (1) for most fluorescence microscopes, the point spread function can be measured or modelled, and thus RLN does not need to predict the blur kernel explicitly, which simplifies network architecture; (2) RLN was motivated by our improved Richardson-Lucy algorithm, i.e., using an unmatched backprojector 5 , and can achieve resolution-limited results with only 1 deconvolution iteration so there is no need to specify or tune an deconvolution iteration number K; (3) RLN makes use of many small [3 x 3 x 3] convolution kernels to perform learning, thereby rapidly and effectively performing 3D deconvolution.
In summary, we emphasize that the architecture of RLN is unique compared to USRNet and Deep-URL, and that this unique architecture provides notable advantages including extension to 3D imaging, the absence of a need to specify an explicit deconvolution iteration number in the network design, and considerably less computational burden than the other two networks, despite handling larger (3D) data. In addition, RLN can handle images with spatially variant blurring (Extended Data Fig. 4g), which would challenge USRNet and Deep-URL, since USRNet requires a fixed blurring kernel as input and Deep-URL predicts only a single blurring kernel. Figure SN1.1, The overall architecture of USRNet with the data module, prior module, and hyper-parameter module emphasized. In addition to the typical parameters required for deep learning (e.g., learning rate, batch size, decay rate, etc.), this architecture also requires the following tunable parameters (red): the deconvolution iteration number for constructing the data and prior module, the blurring kernel, noise level, and scale factor. k = 1, 2, …, represent intermediate iterations. Figure SN1.2, The architecture of Deep-URL for model-aware blind deconvolution. Given a blurred image y and initial estimates of the clean image x 0 and blurring kernel H 0 , the model updates x k and H k . The deconvolution iteration number (red) must be chosen as it is used to determine how many layers are needed. K = 1, 2,…, represent intermediate iterations. Figure SN1.3, The architecture of RLN for 3D deconvolution, consisting of three parts: downscale estimation starting from the average-pooled input image, H1; original-scale estimation starting with original-scale input image, H2; and merging/fine-tuning, H3. H1 and H2 are inspired by the RL deconvolution update formula, which mimic the unmatched forward/back projector steps. H1 is used to increase the field of view and decrease the computational burden. H2 is used to provide information to assist H1. RLN only needs the learning-based parameters typical for any data-driven network; there are no model parameters to adjust. 6

Supplementary Note 2: RLN's interpretability and generalizability
Interpretability refers to the extent of a human's ability to understand the model, but it appears difficult to reach consensus on the exact meaning of the term 6,7 . For example, some researchers explore post-hoc explanations for models, while others try to explore the interplay between the internal components / machinery of a model. RLN was motivated by, and is based on, Richardson-lucy deconvolution with unmatched backprojectors 5 . As the convolution operation plays key roles in both classic deconvolution algorithms and convolutional neural networks (CNN), RLN uses the convolutional layers of a CNN to replace the traditional convolution operation, solving the unmatched forward/backward projector design problem by combining data-driven training with an architecture that mimics the underlying RLD model.  Figure SN2.1), finding that H1 and H2 modules display both data-driven learning and behavior characteristic of RLD, while H3 is purely data-driven. As shown, the FP submodule provides gradually smoother features, the DV step enhances dim signals, the final results of BP provide the update factor, and resolution is enhanced after the update step. It is difficult to explain what the first few layers of BP (images with red borders) are doing, perhaps because there is no obvious link to RLD, but it appears that the last layers of BP in both H1 and H2 provide update factors that enhance contrast for fine features (e.g., edges).
Despite efforts to explain why networks do or do not generalize, there still does not appear to be a completely satisfactory explanation 8,9 . RLN's generalization capability may be at least partially explained by the following: (1) although the network does not directly learn the PSF kernel used in forward and backward projection steps, the summed effect of the convolutional layers mimic an effective PSF analogous to that used in RLD, perhaps explaining the robustness to different types of data (as shown in Figs (2) the RLD formula embedded in the network structure acts to regularize training, helping to guide non-content-based feature learning; (3) the number of learning parameters in RLN is much less than in CARE and RCAN, which might reduce over-fitting. where RLN does and does not agree with RLD. H1 and H2 display both data-driven behavior and more interpretable output analogous to RLD. The FP submodule provides gradually smoother features, the DV step enhances dim signals (yellow arrows), the final results of BP provide the update factor and resolution is enhanced after the update step. The first few layers of BP are not easy to understand because of the learning-based characteristic of convolutional layers (indicated by the red borders), but the last layer of BP provides an update factor that enhances contrast for fine features. H3 is based purely on data-driven learning and is used to merge the output of H1 and H2 part. Scalebar: 10 pixels.
Supplementary Fig. 1, The training simulated data generation process. Simulated training ground truth (GT) consists of dots, solid spheres, and ellipsoidal surfaces. The noiseless raw data is generated by convolving the ground truth data with the PSF (enlarged at left for clarity). Scale bars: 5 μm. showing that RLN prediction is closer to ground truth than RLN-a. g) SSIM and PSNR values of RLD, RLN, and RLN-a as a function of different noise types. In all cases, SSIM and PSNR decrease as noise increases, and RLN and RLN-a show considerably higher SSIM and PSNR than RLD at all noise levels. Individual values (open circles), means, and standard deviations from N=9 volumes are shown, yellow arrows highlight example structure best resolved in RLN compared to RLN-a and RLD. Scale bars: 3 μm. Supplementary Fig. 5, RLN performance on simulated bead samples, comparing RLD, conventional testing, and generalization. a) Raw image, ground truth (GT), RLD result, and RLN predictions in lateral (top) and axial (bottom) views. The model for generalization was trained with phantom objects consisting of mixed dots, solid spheres, and ellipsoidal surfaces, whereas the model for the conventional test used the same type of training data as the test data (simulated beads). Although RLN always outperforms RLD, the generalization result slightly distorts and sharpens bead shapes compared to the ground truth and conventional testing result (green and yellow arrows). Scale bar: 5 μm. b) Normalized intensity of raw, RLD, and RLN predictions (y axis) vs. normalized intensity of ground truth (x axis) taken along the red line shown in the lateral view in a). The scattered red dots indicate pixel intensities, the solid blue line is the linear fit to the data, and the insets display the fitting equation and the square of the correlation coefficient (R 2 ). c) as in b), but for the magenta line shown in the axial view in a). These data suggest that the RLN linearity is better than RLD. Supplementary Fig. 7, Four color lateral and axial maximum intensity projections and Fourier spectra of a fixed U2OS cell. Images were acquired by widefield microscopy; here raw input, RLD, and RLN predictions based on a model trained on the synthetic mixed structures are compared. See also Fig. 5a-c. Red: mitochondria immunolabeled with anti-Tomm20 primary antibody and donkey α-rabbit-Alexa-488 secondary; green: actin stained with phalloidin-Alexa Fluor 647; Blue: tubulin immunolabeled with mouse-α-Tubulin primary and goat α-mouse-Alexa-568 secondary; yellow: nuclei stained with DAPI. Images and Fourier spectra in axial and lateral views indicate that RLD better recovers resolution than RLD. Scale bars: 20 μm.    Generalization ability on simulated data Deconvolution ability on biological samples Generalization ability on biological samples Figure  Fig, 1d, Fig. 1f, Supplementary Fig. 3g  Fig. 2b