A fast blind zero-shot denoiser

Image noise is a common problem in light microscopy. This is particularly true in real-time live-cell imaging applications in which long-term cell viability necessitates low-light conditions. Modern denoisers are typically trained on a representative dataset, sometimes consisting of just unpaired noisy shots. However, when data are acquired in real time to track dynamic cellular processes, it is not always practical nor economical to generate these training sets. Recently, denoisers have emerged that allow us to denoise single images without a training set or knowledge about the underlying noise. But such methods are currently too slow to be integrated into imaging pipelines that require rapid, real-time hardware feedback. Here we present Noise2Fast, which can overcome these limitations. Noise2Fast uses a novel downsampling technique we refer to as ‘chequerboard downsampling’. This allows us to train on a discrete 4-image training set, while convergence can be monitored using the original noisy image. We show that Noise2Fast is faster than all similar methods with only a small drop in accuracy compared to the gold standard. We integrate Noise2Fast into real-time multi-modal imaging applications and demonstrate its broad applicability to diverse imaging and analysis pipelines.


Main
Image noise is the random fluctuation of color or intensity values that is inherent to image acquisition.It usually presents as a hazy shroud that obscures an otherwise clear visual signal.Image denoising methods try to fix this by removing noise after the fact, usually by exploiting the innate structure and pattern of the underlying signal and leveraging it against the apparent stochasticity of the noise.Denoising is particularly important in live cell imaging applications, where a balance between the conflicting considerations of resolution, phototoxicity and throughput can force experimenters to accept a considerable amount of noise as neces-sary to achieving their goals.
Many techniques are focused on explicitly modelling noise, based on an understanding of its origin; for example it is known that confocal microscopy is mainly subject to a combination of Gaussian and Poisson distributed noise [1].However, with the advent of deep learning, such explicit models are avoidable by instead training a neural network to learn how to map noisy images to their clean counterparts, such as in DnCNN [2], or even just by training it to map noisy pairs of images to one another, such as in Noise2Noise [3].But, trained methods like these cannot be expected to perform well on image types that were not well represented in the training set.In cases where we do not have access to such training data, alternatives must be considered.
One such alternative is Noise2Void [4].Noise2Void denoises images by using a masking procedure wherein the neural network learns to fill in pixel gaps in the noisy image.The network's failure to learn the noise causes it to denoise the underlying image.Although it was trained on entire datasets of images with similar noise levels in their paper, Noise2Void can be adapted to denoise single noisy images based purely on the information contained within that image, appealing to no outside information or pretrained weights.This basic process was improved and generalized in Noise2Self [5] and further refined in Self2Self [6] to achieve single image denoising results that are competitive with traditional fully trained methods.Recently, an approach based on corrupting the input image into pairs of new noisy realizations called Recorrupted2Recorrupted (R2R) [7] has emerged and achieved better results than Self2Self on real world noise.However, all viable single image denoisers to date require a considerable amount of time to run, making them impractical for use on high resolution microscopy images in time sensitive situations.
To alleviate this, we propose Noise2Fast.Noise2Fast is similar to the masking based methods in that the network is blind to many of the input pixels during training.Our method is inspired by a recently published approach called arXiv:2108.10209v1 [eess.IV] 23 Aug 2021 Neighbor2Neighbor [8] where the neural network learns a mapping between adjacent pixels.We tune our method to speed by using a discrete four image training set obtained by an unusual form of downsampling we refer to as "checkerboard downsampling" and train a fairly small neural network on this discrete training set, validating using the original full-sized image to determine convergence.Noise2Fast is faster than all compared methods, and is more accurate than all tested methods except for Self2Self, which takes well over 100 times as long to denoise a single image.

Theoretical Background
Consider a 2D image x ∈ R mxn composed of both signal and noise s, n ∈ R mxn .That is to say Denoising is concerned with the inverse problem of inferring s from x (or equivalently inferring n and then solving for s).A neural network attempts to solve this problem by finding a function f θ : R mxn → R mxn (parameterized by the network weights θ) such that The most intuitive way to train such a network is by using pairs of noisy/clean images and having the network learn a mapping from one to the other.Noise2Noise takes an alternate approach by training the network to learn a mapping from different noisy shots of the same image, allowing for training in the absence of clean ground truth data.Specifically, given two noisy realizations of the same underlying signal s+n 1 and s+n 2 Noise2Noise attempts to learn the mapping However if we assume mean-zero noise and choose a sensible loss function [3], the network may fail to actually learn the noise n 2 , and we will be left with denoising the image as a result.Although elegant, this method still requires pairs of noisy images to train on.
Recently, interest has grown in methods that can denoise single noisy images, without this added requirement.To fully understand these methods, we need to adopt a different perspective of how neural networks denoise images.
Here, we take the view of Krull et al. [4], based on the concept of receptive fields.The receptive field of a fully convolutional neural network (FCN) is the set of input pixels that were taken into consideration for a given output pixel prediction.For example, in our above scenario suppose (i, j) ∈ N ≤m ×N ≤n are the co-ordinates of some pixel in the output image f θ (x).Then the receptive field of that pixel is the set of indices RF (i, j) ⊆ N ≤m × N ≤n such that f θ (x) (i,j) depends only upon the value of x| RF (i,j) (typically this will be a small square patch of the image x).We can then view the neural network as a mapping from the input image along some receptive field to its corresponding output pixel, with the goal of finding θ such that for every (i, j) ∈ N ≤m × N ≤n .The question though, is how to train these networks without any actual training data other than the noisy image itself.Blind-spot methods approach this by excluding the center pixel from the receptive field (either by removing/replacing it [5,4] or ignoring it altogether using partial convolutions [6]), and training the network to recover this center pixel from its surroundings.More specifically, they train the network to learn the mapping However, just as in Noise2Noise, the network fails to learn the noise, leaving us with Excluding the center pixel is crucial and ensures that the network does not just learn the identity.However a side effect of this is that the neural network does not give proper weight to the pixel itself when computing the output, which is unfortunate, since the pixel itself is always going to be the best individual predictor of its denoised value.
Our method takes a related, but slightly different approach.Instead of masking the input image, we explicitly divide the input image in two, by using a simple downsampling method that we refer to here as "checkerboard downsampling".This process is easier to visualize than explain (see Figure 1), however we take our input image x and split it into two smaller images composed of the even pixels (where i + j is even) and odd pixels (where i + j is odd) respectively, and compress them into the two following m × 1 2 n images x even (i, j) = x(i, 2j + (i mod 2)), (8) We can call these the "up" checkerboard downsamples, since they involve shifting everything up one pixel to close the image.This type of downsampling is highly susceptible to aliasing, however the goal here isn't visual clarity, it's to preserve the original noise model of our image as much as possible.Now suppose we train our neural network to learn the mapping Figure 1.Checkerboard downsampling illustrated.We take our initial image, remove one half of all pixels in a checkerboard pattern, and shift the remaining pixels to fill in the gaps left behind.
We can rewrite this as f θ (s even + n even ) → s even + n odd + (s odd − s even ).(11) Notice that this is analogous to Noise2Noise 3, except for the addition of the (s odd − s even ) term.However for every (i, j) ∈ N ≤m × N ≤n , we have that s odd (i, j) and s even (i, j) are adjacent pixels in the original image signal, it is therefore reasonable to think this term would be very small in all but the most highly dynamic regions.Indeed, in our testing, we found that even if we cheat and subtract out the term using known ground truth values, there was no measurable gain in denoising performance.We therefore claim that for most natural images, Then, analogous with Noise2Noise (3,4), training our network as outlined in 10 should, in effect, find weights θ such that However, in our experiments we have witnessed a much stronger result than this.In particular, we observe that a network trained as in 10 will not just learn to denoise the downsampled image, but the entire image as a whole.That is To explain this phenomenon, we return to the receptive field based perspective of 6.In this case, our network is trained to learn the mapping It is known, and is often exploited by denoising algorithms, that single images contain significant internal redundancy in the form of recurrent patches [9].It is also known, and is crucial to some super-resolution methods, that single images have a certain degree of self-similarity, and hence these patches also recur across scales [10,11].This acrossscale patch recurrence implies a level of commonality between the two sets Hence, a neural network trained to learn 10 may be applicable to the overarching denoising task.Our method uses this basic principle to generate a small training set of four image pairs (each one produced by a form of checkerboard downsampling).This compact training set allows for rapid network convergence and hence quick single image denoising results that were previously unattainable with such a high degree of accuracy.

Contribution and Significance
We present a training scheme and accompanying neural network for blind single-shot denoising.Our method uses only information contained within the single noisy input image to train its weights.Our two main contributions are as follows: • A novel denoising method which combines an unusual downsampling method with Neigh-bor2Neighbor.
Our method uses checkerboard downsampling to generate a small fixed dataset for rapid convergence.We then apply our network trained on this smaller dataset to denoise the larger input image.We also use the larger input image to validate our network and determine when it has reached maximum accuracy, which is necessary because Noise2Noise based methods can start to overfit at a certain point and performance drops.To our knowledge, this procedure as a whole is novel; in particular, we are unaware of another denoising method that makes use of this unusual form of downsampling, nor are we aware of any other single image blind denoisers that attempt to build a small fixed training set of images.We are also unaware of any other method that successfully employs a one-size-fits all validation strategy to avoid overfitting, among methods that are susceptible to this.
• High accuracy and significant speed gains over existing methods.Our method is tailored specifically for speed; using a small four image dataset ensures rapid convergence, while the theoretical underpinnings of our method ensure its accuracy.In terms of PSNR and SSIM, the only tested method more accurate than the one we propose here is Self2Self which is 300-700 times slower, for example it requires an average of over 3 hours to denoise a 512x512 image that our method can denoise in under 45 seconds on an RTX 5000 mobile GPU.This large time investment makes it impractical for on-demand usage in larger image screens wherein denoising is but one step in a much bigger process.
Single-shot blind denoisers are a valuable tool for their convenience and their broad applicability.However, thus far, the only such tool that achieves accuracy comparable to trained methods is Self2Self and possibly R2R [7], which both require a massive amount of time to run on modern technology.Here, we present a much faster alternative, with only a small drop in accuracy.
In particular, the speed of our tool makes it usable in smart microscopy pipelines (e.g.[12,13,14]) where the microscope captures an image and then responds to information contained within that captured image (e.g.zoom in on any cells undergoing mitosis in a given field of view), for which denoising is typically the first step to ensure better classification by the downstream neural networks and/or manual analysis.In such pipelines, time is of the essence since there is only a small window to do analysis, and gathering training data for traditional training based denoising methods can be expensive both in terms of cost and the delays it imposes on research.Additionally, our tool allows for the processing of very large datasets, such as 3D time lapse videos of cells, with a compute overhead 100s of time smaller.

Related Work
Methods that require a training set: The first attempt to apply Convolutional Neural Networks (CNNs) to the task of denoising was in [15].This was heavily refined in both the works of Mao et al. [16] and Zhang et al. [2] (DnCNN) to achieve performance that is still competitive today.Zhang et al. later released FFDNet [17], a denoising CNN designed with speed in mind which, similar to our method, also uses downsampling, although in a different manner and to an entirely different end (see [18]).
The main benefit of using trained methods, outside of their outstanding performance, is that they don't require either implicit or explicit modeling of the type and structure of the noise, they can simply be trained on noisy/clean pairs of images from the desired domain.However, their reliance on noisy/clean image pairs, either real or synthetically generated, can be considered a limitation in situations where we do not have access to ground truth images to train on.
To overcome this limitation, Noise2Noise was developed [3].Noise2Noise can be trained exclusively on pairs of noisy images without any access to ground truth data.For this reason, it is especially useful in biological imaging where it is often the case that trade-offs dictate that ground truth data can't ever be obtained.
However, paired noisy images aren't always easy to obtain, so there was interest in developing methods that could denoise on unpaired training sets of noisy images from some desired domain.The first method capable of this without having sensitive hyperparameters was Noise2Void [4].Noise2Void works by training the network to learn a mapping from the noisy image back to itself, masking the center of each receptive field so as to avoid learning the identity.
This basic model of masking the input is known as a blind-spot network, and was heavily refined and expanded upon in [19] and much more recently applied in BP-AIDE [20] in a manner that is specifically tailored to gaussianpoisson noise.In [21] they demonstrate a retooled version of BP-AIDE with much faster inference time.
A recently developed alternative to blind-spot networks is Neighbor2Neighbor [8] which underlies the method we present in this paper.Neighbor2Neighbor learns to map adjacent pixels in the image to one-another, with the idea being that, except in the most highly dynamic regions of the image, adjacent pixels tend to have a similar underlying signal.
Ultimately, all methods listed in this section require a representative training set of noisy images to train on before being applied.Although we can fairly easily extend some of them to apply to single noisy images without any additional outside information (which we do in our comparisons), in the next section we describe methods that were specifically developed with this task in mind.
Single-image methods: The first method that directly applied itself to the task of single image denoising is Noise2Self [5].Noise2Self is a very similar method to Noise2Void that achieves slighlty better performance, and includes a very thorough mathematical justification for the principles underlying the success of masking based denoising techniques.
Self2Self [6] was the first single-image method whose performance approaches fully trained methods.Self2Self is a blind-spot method, however instead of replacing masked pixels, it ignores them altogether by using partial convolu-tions [22].Self2Self also introduces the innovative step of adding dropout and averaging across multiple runs of the same image.However, this comes at a high computational cost, at least under modern hardware constraints.
A very recently published single-image denoiser is R2R [7], which achieves even better single-image denoising results than Self2Self on real world images.R2R is quite different than the blindspot network approaches in that it attempts to corrupt single noisy images into noisy image pairs, and then apply a Noise2Noise-like network.
All of the above methods make few assumptions and run out-of-the-box on most single noisy images (with the possible excpetion of R2R [7], which we have not tested as it is quite recent and there is no publicly available code yet).In the next section, we present single image methods that are not quite as generically applicable.
Single image methods with sensitive hyperparameters: NL-means [23] is one of the easiest and most intuitive non-learning based ways to denoise single images.It denoises by taking the weighted average of all pixels in an image based on how similar we would expect that pixel to be to the target (determined by comparing small square patches centered at those pixels).NL-means is however highly sensitive to a filtering parameter that must be specified by the user for optimal performance.
A similar method is BM3D [24].Since its introduction, BM3D has been one of the gold standards for pure Gaussian noise.It works by unfolding the image into interleaved square patches, clustering those patches based on similarity, and then filtering them before reconstructing the image.BM3D however is not blind and takes, as a parameter, an estimate of the standard deviation of the underlying noise.Moreover, BM3D does not perform well (and was not designed to perform well) on poisson noise.
A much more recent learning based method is Deep Image Prior (DIP) [25].DIP works by taking a neural network with randomly initialized weights, and training to reconstruct the noisy image.Similar to Noise2Noise, it will fail to learn the underlying noise (at least at first) and instead learn to denoise the signal.DIP is highly sensitive to the number of iterations, and will quickly overfit if trained too long, for this reason it isn't completely practical as a blind denoiser.For our experiments, we force it to be blind by using a fixed iteration number, however the results it attains are far below what a non-blind version of this algorithm can reach.

Results
We demonstrate the speed and accuracy of our method on both simulated Gaussian noise and on real world microscopy data that is subject to Gaussian-Poisson noise.We also benchmark our method on the BSD68 dataset [26] with synthetic Gaussian noise added.See Methods for details on datasets and compared algorithms.
The benchmarking of reference datasets was carried out using a single laptop GPU (RTX 5000 mobile GPU) to better approximate the modest (although still powerful) computational capabilities of the average end user.However, because of the massive amount of time required to test Self2Self on 68 images under these constraints, we rely on their previously published accuracy for comparison and estimate time per image using a random sample of 5 images for this dataset only.On all other datasets (Set12 and Confocal), we run Self2Self on the entire set to obtain accuracy.
On synthetic Gaussian noise our method outperforms everything except Self2Self, which beats us by an average of about 0.7 PSNR across Set12 and BSD68 (see Figure 3a).We did not test our method on the very recent R2R [7] because there is no publicly available code for this yet, however we note that in the paper the authors assert that their method takes about 30 minutes to run on a 512 × 512 × 3 image using unstated hardware.Therefore, we feel comfortable saying that Noise2Fast is faster than all competing methods, by a significant margin in every case except for DIP3000 (which is far less accurate than ours).
We also tested our method on Confocal microscopy images, where again our method is considerably faster (300 times) although slightly less accurate than Self2Self.Visual comparison of the results (see Figure 3c) indicate that Noise2Fast appears to smooth the image less than the other methods, creating a more textured look.All methods performed very similarly for the Confocal microscopy dataset, except for DIP3000, which likely needed more iterations to converge, and Neighbor2Neighbor, which seems to not really be suited to single image denoising (nor was it ever intended to be, we adapted it for single image denoising and included it in our benchmarks simply because it is the method most comparable to our own).
Because 'speed' is just a reflection of the maximum number of iterations we allow each method to run (a parameter we borrow from their published code where possible), we also compared the accuracy of each method if we set the maximum number of iterations so that each program only runs for as long as Noise2Fast takes to fully denoise the image (see Figure 3d).In this case, it is easy to see that no competitor even approaches the accuracy Noise2Fast can achieve in such a short amount of time.
Next we determined the performance of Noise2Fast on larger image datasets of both fixed and live cells acquired on our imaging systems.For this, MDA-MB 231 cells were fixed and either stained for Actin and DNA or endogenously tagged with H33B-mScarlet and mNeon-ACTB.The performance was compared using two different imaging modalities: epifluorescence for the fixed cells and resonance scanning confocal microscopy for the live cells.Based on the linearity of the intensity measurements of our imaging system, our results indicate that we can achieve relatively clear images while exposing our images to 400 times less light (see Figure 4).Although S2S achieves similar, if not slightly better results, processing time was significantly longer, specifically S2S required 596 core days vs 0.7 for our method to process the video in Fig 4b ) on a Tesla V100.

Conclusion
We proposed Noise2Fast a blind single-image denoiser that rapidly converges to accurate results using only the input image to train on.Our key innovation is building a small discrete training set based on checkerboard downsampling that enables our network to quickly converge.We can monitor the progress training using original noisy image as validation.The accuracy of our method surpasses all but one tested blind single image denoising method, namely Self2Self, however Self2Self takes well over 100 times longer to run and is therefore impractical in situations where fast results are desired, such as in high-throughput and automated-microscopy based pipelines.To this end, we hope our very fast and accurate method will integrate well with other AI-based smart microscopy pipelines, such as genome-wide single-cell phenotypic screens [14], where denoising will be the initial step in improving the ease and speed of downstream analysis.We also hope that the speed and generality of our method will make it attractive to anyone as a quick, near real-time denoising solution applicable to large live-cell volumetric datasets.Additionally, we believe the observed superiority of checkerboard downsampling over traditional 2 × 2 downsampling is noteworthy, and the implications this has for full dataset based denoising methods such as Neighbor2Neighbor might be a worthwhile subject of future research.

Noise2Fast Implementation Details
Here, we outline the specifics of our neural network and training scheme, giving the implementation details of the process outlined earlier.
For our neural network, we use a simple feed forward architecture which we explain briefly here, and illustrate in Figure 2a.We start by performing two 32 channel 3x3 convolutions with ReLU activation.We repeat this step three more times, each time doubling the number of channels.In the final step, we do 1x1 convolution followed by sigmoid activation.
In our initial testing we found that this much simpler architecture outperformed the classical U-net architecture used in the original Noise2Noise paper [3].Although the results aren't that sensitive to the number of hidden layers, we do find a noticeable, albeit small, drop in performance as we add more to our current model.A possible reason for this is that it causes our network to overfit the data much too quickly.This architecture is similar in its simplicity to DnCNN, one major difference being our lack of batch normalization The main novelty of our method is how we train it.Consider a 2D image x ∈ R mxn .Recall from the theoretical background that we can divide our image in two by using checkerboard downsampling.By taking the even or odd pixels and squeezing them up to fill in the spaces, as depicted in Figure 1, we can generate two downsampled m × 1 2 n images x even (i, j) = x(i, 2j + (i mod 2)), We can call these the "up" checkerboard downsamples.Notice that we can also squeeze the pixels left to generate two Giving us the "left" checkerboard downsamples.Using these we construct the following four image-pair training set (see Figure 2b for an overview of our training scheme): We feed this training data one-by-one into our neural network (batch size = 1).At each iteration we compute the

Input Target x even
x odd x odd x even x even x odd x odd x even binary cross-entropy (BCE) loss between the target and the output of our neural network, and adjust our weights using the Adam optimizer [27] with learning rate set to 0.001.For validation, after each epoch we run the original full sized noisy image through our network and compute the mean squared error (MSE) between the noisy input image and the "denoised" output of our network.
We validate this way because our initial testing on im-ages with known ground truths showed that the MSE between the denoised image and the ground truth image plateaued at roughly the same time as the MSE between the denoised image and the original noisy image.After this point, results start to get worse, much like the way DIP starts to overfit at a certain point, however in our case we can use this validation protocol to prevent that without introducing a sensitive case-dependant parameter.If one hundred epochs have passed without any improvement to the best validation score, we terminate the program, and output the average of the last one hundred validation tests as the denoised image.One unusual feature of our training scheme is our usage of a BCE loss function usually reserved for classification based tasks.Our main motivation for using this loss function is to deal with class imbalance.When an image contains only a small object on a black background, it will sometimes just learn to map everything to black.We found that using BCE loss fixed this in all cases we could find, without affecting overall performance as compared to MSE loss.We also tried binary focal loss [28], but the results were not as good.

Compared Datasets
For blind Gaussian denoising we use use the grayscale BSD68 [26] dataset, as was used in [4] and a multitude of other denoising papers.BSD68 consists of 68 clear 481x321 photographs to which we add synthetic gaussian noise.However, to show the effect of spatial resolution on speed and performance, we additionally tested the methods on Set12 which contains a mixture of 256x256 and 512x512 images.
For performance on real world confocal microscopy, we used a subset of the confocal microscopy images in Fluorescent Microscopy Dataset (FMD) [1] that we refer to as "Confocal".This dataset contains, among other things, images of biological materials such as cells, zebrafish, and mouse brain tissues acquired using commercial confocal microscopes.As described in their paper, ground truth values are estimated by averaging together all 60000 noisy images in a given set.

Compared Methods
We compare denoising and speed performance against five other blind single image denoisers: Noise2Self [5] (N2S), Noise2Void [4](N2V), Self2Self [6](S2S), Neighbor2Neighbor [8](Ne2Ne) and Deep-Image Prior [25](DIP).Not all of these methods were originally designed for single image denoising.We will describe how we configured each of these methods in turn, we adhere to published code as much as possible.
Self2Self: For S2S we use the default published settings of 150000 iterations and a learning rate of 1e-4.We standardize our images differently than S2S and some of these other methods.For example, we do not clip our input noisy data [0,255] at any point.To account for this difference, we have rewritten the dataloaders for Self2Self and other methods to ensure consistency of comparison.
Noise2Self: For N2S the only change we make from their published single shot denoising notebook is to increase the number of iterations from 500 to 20000, as we found that 500 iterations wasn't nearly enough to achieve good results on these datasets.
Noise2Void: For N2V we found that their ImageJ plugin worked much better than their GitHub code for single image denoising.We therefore used the imageJ version for benchmarking purposes, which is why our results on this method deviate so much from previous publications.We used a patch size of 64 with 100 epochs and 100 steps per epoch, a batch size of 16 per step, and a neighborhood radius of 5.
DIP: If we fix the maximum number of iterations, DIP becomes a blind denoiser.However, as noted in [6], it performs better as a non-blind denoiser.For comparison purposes however, we will set the maximum number of iterations at 3000, as the authors of DIP have done in their example code on GitHub, and call this DIP3000.This turns it into a blind single shot denoiser, fully comparable in scope to our method.
Neighbor2Neighbor: For Neighbor2Neighbor we used the adaptation of the code found here: https://github.com/neeraj3029/Ne2Ne-Image-Denoising.We adapted the script to single image denoising and attempted in good-faith to optimize for the task as best we could, however we found that the results were inconsistent.We believe that this method is probably best suited to datasets as the authors intended, and not single images.We include these results only to illustrate the need to change Neighbor2Neighbor in order to achieve fast and accurate single image denoising results, as we have done in this paper.We do not believe our results are a fair illustration of the power of Neighbor2Neighbor when applied to the tasks it was designed for and we have therefore excluded it from our visual illustrations.We used a learning rate of 0.0003 and trained for 100 epochs, as suggested in their paper for synthetic datasets.

Fluorescence Microscopy Images
For fixed immunofluorescence microscopy, RPE-1 cells were fixed with 4% paraformaldehyde at room temperature for 10 min.The cells were then blocked with a blocking buffer (5% BSA and 0.5% Triton X-100 in PBS) for 30 min.Cells were washed with PBS and subsequently incubated with phalloidin-Alexa488 (Molecular Probes) and DAPI in blocking solution for 1 hour.After a final wash with PBS, the coverslips were mounted on glass slides by inverting them onto mounting solution (ProLong Gold antifade; Molecular Probes).For the fixed imaging in Figure 4A, single Z slices of cells were imaged using Nikon Ti2E/CREST X-Light V2 LFOV25 spinning disk confocal microscope in widefield mode using a 60×/1.4NA oil-immersion Plan-Apochromat lambda objective.The microscope was outfitted with a Photometrics Prime95B 25mm FOV ultrahigh sensitivity sCMOS camera and images were captured with no binning using the full 25mm diagonal FOV area at 1608px by 1608px with a bit depth of 16bit.After capture, 500px by 500px areas were cropped and used as our input dataset.For live imaging in Figure 4B, endogenously tagged MDA-MB 231 cells were seeded in Nunc Lab-Tek Chamber Slides and imaged on the Nikon Ti2E/AIR-HD25 scanning confocal microscope with temperature and CO2 control, using a 40×/1.15NA water-immersion objective Apochromat lambda S objective.High-speed image acquisition was carried out with the resonance scan head with 2x averaging at 1024px by 1024px.Full volumes of cells were captured (Z total = 20 µm, Z interval = 0.5 µm) every 5 minutes for 24 hours.Images were denoised as individual Z-slices and max projected.All are displayed with auto scaled LUTs.

Ablation Study
For our ablation study, we compare three different refinements of the model.First, we replace our unusual checkerboard downsampling with a more conventional downsample where we divide our image into 2 × 2 blocks used in Neighbor2Neighbor, and create four images consisting of all the top-left, top-right, bottom-left and bottom-right pixels, respectively.This has the advantage of preserving the proportions of our original image as well the structure of distances between pixels, however despite this advantage it does not perform as well (Table 1 -Quad).Second, we replace our feed forward neural network with a U-net architecture, which is the standard network used in Self2Self and Noise2Void.Again, our performance drops (Table 1 -Unet).Finally, using known ground truth values, we manually subtract out the s odd − s even term in 12 and show that this has virtually no impact on our denoising results, hence this term is not having a significant impact on our algorithm (Table 1

Figure 2 .
Figure 2. a) Our simple feed-forward CNN architecture.Inputs can be multi-channel, however for best results outputs are always single channel (for rgb images we predict each channel separately).b) Overview of our training scheme.Our neural network learns mappings between pairs of checkerboard downsampled images, each generated from different group of pixels.

Figure 3 .
Figure 3. a) Accuracy and per-image time required to denoise on an RTX 5000 mobile GPU, for each dataset using each method.b) Graph of speed (in images per second) versus accuracy (PSNR) of each method on each dataset.c) Visual comparison of each method on starfish image from Set12 and on BPAE cells from Confocal dataset.d) Performance reached by each method, by the time Noise2Fast has completed its denoising.

Figure 4 .
Figure 4. Performance of Noise2Fast on our own microscopy images.a) Comparison of Noise2Fast and Self2Self on epifluorescence images of actin and nuclei in RPE-1 cells with corresponding line intensity profiles.b) Comparison of live confocal imaging of endogenously tagged nuclei (H3-3B-mScarlet) and actin (mNeon-ACTB) in MDA-231 cells.Bars = 10 µm.