## Main

New FISH-based imaging methods are continuously being developed to gain insights into cellular processes, for example, by resolving the subcellular localization of single RNA molecules1,2 or subnuclear 3D arrangement of DNA regions3,4. Classically, single-molecule FISH (smFISH) has been used to visualize individual mRNA molecules for single genes in small samples1,2. New methods that employ probe amplification, probe multiplexing, or barcodes are driving the fields of spatial transcriptomics and spatial genomics, enabling the subcellular visualization of thousands of genes with single-molecule sensitivity in complex tissues5,6,7,8,9,10, as well as entire chromosomes with high resolution at nanometer scale3.

Extracting information from smFISH, spatial transcriptomics, or spatial genomics images relies on the precise detection of diffraction-limited spots. Important properties of spot-detection software include accuracy and speed of detection, as well as being accessible to researchers. Recently, scalability to large datasets has become important because the detection of subtle transcriptional changes relies on the analysis of thousands of smFISH images11,12, increasingly large samples in the tera-byte range are being imaged13, and spatial-transcriptomics methods are being applied to increasingly large samples, with many rounds of sequential hybridization and imaging (Fig. 1 and Supplementary Notes). Several methods are available; however, all commonly used packages do not allow interactive parameter tuning, which makes their application tedious. They also do not scale to large datasets because they are missing out-of-core processing capabilities for large images, have no straightforward path to automation and distribution for large sets of smaller images, and have increased runtimes because of their slower processing times1,14,15,16,17,18. To overcome these restrictions, we developed RS-FISH, which uses an extension of Radial Symmetry19 (RS) to robustly and quickly identify single-molecule spots in 3D with high precision (Fig. 1a). RS-FISH can be run as an interactive, scriptable Fiji plugin20, as a command-line tool, and as a cluster and cloud-distributable package for large volumes or for datasets consisting of thousands of images (Fig. 1g,h).

RS is an efficient, non-iterative alternative to accurate point localization using Gaussian fitting that was developed for localizing 2D circular objects by computing the intersection point of image gradients (Fig. 1a)19. We first derived a 3D version of the RS method, similar to the work of Liu et al.21 (Methods), that additionally extends to higher dimensions, which has potential for spatiotemporal localization of blinking 3D spots. Second, we extended RS to support axis-aligned, ellipsoid objects without the need for scaling the image21, enabling RS-FISH to account for typical anisotropy in 3D microscopy datasets that results from different pixel sizes and point spread functions in the lateral (x,y) compared with the axial (z) dimensions (Fig. 1d and Methods). Third, the computation speed of RS allowed us to combine RS with robust outlier removal using random sample consensus22 (RS-RANSAC) to identify sets of image gradients that support the same ellipsoid object given a specific error for the gradient intersection point (Methods). This allows RS-FISH to identify sets of pixels that support a user-defined localization error for individual spots (Fig. 1a), discriminate close detections (Fig. 1b and Supplementary Fig. SN7.1), and ignore outlier pixels that disturb localization (for example, dead or hot camera pixels).

RS-FISH first generates a set of seed points by thresholding the difference-of-gaussian (DoG)23 filtered image to identify potential locations of diffraction-limited spots, whose parameters need to be adjusted to the average size (sigma) and intensity (threshold) of the spots. Next, image gradients are extracted from local pixel patches around each spot, which are optionally corrected for non-uniform fluorescence backgrounds. Before RS localization, gradients are rescaled along the axial dimension to correct for dataset anisotropy using an anisotropy factor that depends on pixel spacing, resolution, and point spread function. The anisotropy factor can be computed from the microscopy image itself and does not change as long as acquisition parameters are held constant (Fig. 1d and Methods). Optionally, RS-RANSAC can be run in multi-consensus mode, which performs additional rounds of RANSAC filtering in order to distinguish spots that were too close to one another for the DoG detector to separate them during seed point generation (Fig. 1b, Supplementary Figs. SN7.1 and SN7.2, and Methods). Finally, to avoid potentially redundant detections, spots are, by default, filtered to be at least 0.5 pixels apart from each other. Each spot’s associated intensity value is, by default, computed using linear interpolation at the spot’s sub-pixel location or can be refined by fitting a Gaussian to the subset of pixels that support the spot as identified by RS-RANSAC.

RS-FISH pixel operations are implemented in ImgLib2 (ref. 24), and RS fitting and RS-RANSAC are implemented using the image transformation framework mpicbg25. All operations can be executed in blocks allowing straightforward parallelization and compute effort scales linearly with the size of the data up to the petabyte range (Methods). Importantly, RS-FISH’s parameters can be interactively tuned on small and large datasets using the Fiji plugin (Supplementary Fig. SN8.1). Once the right set of parameters is identified on a representative example image, RS-FISH can be run and macro-scripted in Fiji, or can be executed in a scriptable mode for straightforward parallel execution on compute clusters or cloud services (for example, Amazon Web Services (AWS)) using Apache Spark, for which we provide example scripts, including resaving into the N5 (Zarr compatible) file format (Fig. 1g and Supplementary Notes). The results are saved as a CSV file, or they can be transferred to the region-of-interest (ROI) manager for downstream analysis in Fiji (Fig. 1e and Supplementary Notes). A mask filtering tool can classify detections on the basis of a binary mask, for example a cytoplasm or nuclear mask (Supplementary Fig. SN12.1). The saved point clouds can be overlaid onto the images using Fiji20 or BigDataViewer26 for interactive visual inspection of even very large datasets (Fig. 1h and Supplementary Video 1).

To validate and benchmark RS-FISH, we performed quantitative comparisons against FISH-quant14, Big-FISH18, AIRLOCALIZE17, Starfish16, and deepBlink15 using (1) simulated smFISH images with varying noise levels to assess detection performance, (2) simulated images of spot pairs that are close to one another to assess performance on dense datasets, (3) real smFISH Caenorhabditis elegans embryo datasets for runtime measurements, (4) real smFISH cell datasets with varying noise levels, and (5) large lightsheet datasets13. We show that RS-FISH is on par with the best methods in terms of detection performance. Notably, it provides high detection accuracy and low localization error (Fig. 2a–c and Supplementary Fig. SN4.1–SN7.2) while running 3.8–7.1 times faster than established methods (Fig. 2d and Supplementary Notes). We additionally compare localization error and detection accuracy across different noise levels (Fig. 2e,f). RS-FISH shows superior detection accuracy, especially in the presence of very high noise. The localization error is very good in low-noise scenarios and slightly increases for higher noise levels, which is partially explained by having to localize more spots that other methods do not detect. We provide example images of each noise class tested in Figure 2e,f as guidance for users to estimate the expected localization quality. We highlight that RS-FISH can easily be parallelized on the cloud by running smFISH extraction on 4,010 C. elegans image stacks (~100 GB in total) in 18 minutes on AWS at the cost of US\$18.35 in June 2021 (Fig. 1g). Importantly, RS-FISH is currently the only method that can be directly applied to large volumes (Fig. 1h and Supplementary Video 1). Processing a reconstructed 148-GB lightsheet image stack took 32 CPU hours (~1 hour on a modern workstation). In comparison, a complex wrapping software for distributing AIRLOCALIZE, specifically developed for the expansion-assisted iterative FISH (EASI-FISH) project to run on the HHMI Janelia cluster, required significant development effort and took 156 CPU hours to finish the same task13.

We developed RS-FISH based on a generic derivation of 3D RS for anisotropic objects that is efficiently implemented using ImgLib2, Fiji, and Spark. RS-FISH runs as a Fiji plugin, allowing interactive parameter adjustment and result verification on small and large images, making the task of correctly detecting diffraction-limited spots in microscopy images as accessible as possible. Processing speed is significantly improved and similar localization performance to established methods is achieved. RS-FISH is simple to install and run through Fiji, additionally providing macro-recording functionality to automate FISH spot detection easily. Our efficient block-based implementation allows easy single-molecule spot detection in large datasets or big volumes using local processing, clusters, or the cloud. Importantly, although we have demonstrated RS-FISH’s utility using only a 148-GB dataset, there is no conceptual limit that prohibits RS-FISH from being executed on significantly larger volumes well into the petabyte range. RS-FISH is an accurate, easy-to-use, versatile, and scalable tool that makes FISH spot detection on small and especially large datasets amenable to researchers and whose functionality extends to the dynamically growing fields of spatial transcriptomics and spatial genomics.

## Methods

### n-dimensional derivation of Radial Symmetry localization

The goal of RS is to accurately localize a bright, circular spot pc with sub-pixel accuracy. In noise-free data, image gradients $$\nabla I({\rm{p}}_{\rm{k}})$$ at locations pk point towards the center of the spot and intersect in that single point pc (Fig. 1a), thus computing the intersection point solves the problem of accurate localization. In realistic images that contain noise, these gradients do not intersect, therefore computing pc constitutes an optimization problem that RS solved using least-squares minimization of the distances dk between the common intersection point pc and all gradients $$\nabla I({\rm{p}}_{\rm{k}})$$ (Supplementary Fig SN1.1).

We extend RS to 3D similar to Liu et al.21, and additionally describe how to generalize the derivation to the n-dimensional case. To achieve this, we replace the Roberts cross operator with separable convolution for image gradient $$\nabla I({\rm{p}}_{\rm{k}})$$ computation, and we use vector algebra to compute the intersection point pc of image gradients. The derivations are shown in detail in Supplementary Fig SN1.1 and Supplementary Notes.

### Radial Symmetry for axis-aligned ellipsoid (non-radial) objects

Diffraction-limited spots in 3D microscopy images are usually not spherical but show a scaling in the axial (z) dimension compared with the lateral (xy) dimensions. Previous solutions suggested scaling the image in order to be able to detect spots using RS21. This can be impractical for large datasets, and it might affect localization quality, as the image intensities need to be interpolated for scaling. Here, we extend the RS derivation to directly compute the intersection point pc from anisotropic images by applying a scale vector s to point locations pk and applying the inverse scale vector s−1 to the image gradients $$\nabla I({\rm{p}}_{\rm{k}})$$. Although we derive the case specifically for 3D, it can be straightforwardly applied to higher dimensions. The derivation is shown in detail in the Supplementary Notes.

RS-FISH supports a global scale factor (called anisotropy factor) for the entire dataset that compensates for anisotropy of the axial (z) dimension, which can be computed from an image containing diffraction-limited spots (Supplementary Notes).

### Radial Symmetry Random Sample Consensus

RS localization is implemented as a fast, closed-form solution, and it is therefore feasible to combine it with robust outlier removal. We use RANSAC22 to identify the maximal number of gradients $$\nabla I({\rm{p}}_{\rm{k}})$$ that support the same center point pc given a maximal distance error ε, so that all dk < ε.

To achieve this, RANSAC randomly chooses the minimal number of gradients (that is, two gradients) from the set of all gradients (candidate gradients) to compute the center point and tests how many other gradients fall within the defined error threshold ε. This process is repeated until the maximal set of gradients is identified (inlier gradients) and the final center point pc is computed using all inlier gradients. This allows RS-FISH to exclude artifact pixels and to differentiate close-by spots.

The number of gradients that are computed for each spot is defined by the support region radius, which can be selected as one of the RANSAC parameters. By default, we propose a radius of 3 pixels, which corresponds to a 7 × 7 × 7 pixel patch, resulting in 216 gradients for the 3D case. These settings are reasonable choices for acquisition parameters typically used for smFISH images (500–700 nm emission, ×63 oil detection objective, EMCCD or sCMOS camera with ~10-µm pixels, corresponding to a ~159-nm lateral pixel size in the sample plane), where the pixel patch comfortably covers the central peak of the point spread function (PSF). Importantly, the radius should be adjusted to the respective acquisition settings so that an area that is approximately twice the size of the central peak of the PSF is entirely covered to ensure that all gradients that point towards the center of each spot are included in the localization.

To identify and locate close-by points, RS-FISH runs a multi-consensus RANSAC. Here, RANSAC is run multiple times on the same set of candidate gradients. After each successful run that identifies a set of inliers, the inliers are removed from the set of candidate gradients, and RANSAC tries to identify another set of inliers (Fig. 1b). This process is iterated until no other set of inliers (corresponding to a FISH spot) can be found in the local neighborhood of each DoG spot. To not detect random noise, the minimal number of inliers required for a spot can be adjusted (typically around 30).

### Implementation details and limits

RS-FISH is implemented in Java using ImgLib2, the mpicbg framework, BigDataViewer, Fiji, and Apache Spark. The computation of RS is performed in blocks with a size of bd for each dimension d (for example, 256 × 256 × 128 pixels) and requires an overlap of only 1 pixel in each dimension with neighboring blocks, thus the overhead $$o = 1 - \frac{{{\Pi}_{\rm{d}}{\rm{b}}_{\rm{d}}}}{{{\Pi}_{\rm{d}}{\rm{b}}_{\rm{d}} - 2}}$$ is minimal (for example, 1.5% for 256 × 256 × 256 blocks or 0.6% for 1024 × 1024 × 1024 blocks). When processing each block, the local process has access to the entire input image, which is either held in memory when running within Fiji or is lazy-loaded from blocked N5 datasets when running on large volumes using Apache Spark. Because the computation across blocks is embarrassingly parallel, computation time linearly scales with the dataset size. Thus, RS-FISH will run on very large volumes supported by N5 and ImgLib2. Owing to current limitations in Java arrays, the theoretical upper limit is 231 = 2,147,483,648 blocks, with each block maximally containing 231 = 2,147,483,648 pixels (for example 2048 × 2048 × 512 pixels). Given sufficient storage and compute resources, the limit for RS-FISH is thus 4,072 peta-pixels (4,072 petabytes at 8 bit, or 8,144 petabytes at 16 bit) taking into account the overhead, whereas every individual block locally processes only 2 gigapixels (231 = 2,147,483,648 pixels).

The code can be executed on an entire image as a single block for smaller images, or in many blocks multi-threaded or distributed using Apache Spark. It is important to note that RS-RANSAC uses random numbers to determine the final localization of each spot. We use fixed seeds to initialize each block; therefore, the results for a single block of the same size in the same image with the same parameters are constant. However, for blocks of different sizes (for example, single-threaded versus multi-threaded), the results will be slightly different, as the RANSAC-based localizations are not traversing the DoG maxima in the same order, and thus initialize RANSAC differently. For practicality, the interactive Fiji mode runs only in single-threaded mode (although the DoG image is computed multi-threaded) to yield comparable results across different testing trials. Importantly, this applies only if the RANSAC mode is used for localization. Multi-threaded processing is available in the recordable advanced mode in the Fiji plugin, while the Apache Spark based distribution can be called from the corresponding RS-FISH-Spark repository.

### Data simulation for assessing localization performance

To create ground-truth datasets for assessing localization performance, we generated images simulating diffraction-limited spots in the following way: (x,y,z) spot positions were randomly assigned within the z-stack chosen dimensions, and each spot was assigned a brightness picked from a normal distribution. We computed the intensity I(x,y,z) generated by each spot as follows: we first computed the predicted average number of photons received by each pixel Ipred(x,y,z) computed using a gaussian distribution centered on the spot, with user-defined lateral and axial extensions. We then simulated the actual intensity collected at each pixel using a Poisson-distributed value with mean Ipred(x,y,z). We eventually added gaussian-distributed noise to each pixel of the image.

Code for generating the images with simulated diffraction-limited spots is available in the GitHub repository. There is also a folder included with the simulated data used in the parameter grid search and benchmarking: https://github.com/PreibischLab/RS-FISH/tree/master/documents/Simulation_of_data.

Additionally, we simulated a dataset that contains spots that are very close to each other in order to assess the ability of RS-FISH and other tools to differentiate such spots. The code is here: https://github.com/timotheelionnet/simulated_spots_rsFISH. The images are here: https://github.com/timotheelionnet/simulated_spots_rsFISH/tree/main/out.

### Benchmarking RS-FISH against commonly used spot-detection tools

RS-FISH performance was benchmarked against the leading tools for single-molecule spot detection in images. The tools compared in the benchmarking are FISH-quant14 (Matlab), Big-FISH18 (Python), AIRLOCALIZE17 (Matlab), Starfish16 (Python), and deepBlink15 (Python, TensorFlow). Localization performance comparison was done on simulated images with known ground-truth spot locations, and computation-time comparison was performed using real three-dimensional C. elegans smFISH images. We created a dedicated analysis pipeline for each tool to test localization performance and compute time. For localization performance comparison, a grid search over each tool’s pipeline parameter space was run (excluding deepBlink, as a pre-trained artificial neural network was used; more details regarding deepBlink are discussed in the Supplementary Notes). Importantly, tools use different offsets for their pixel coordinates, which depend on the respective pixel origin convention (for example, does a spot positioned at the center of a pixel lie at 0.0 or 0.5? Does the z index start with 0 or 1?). In our benchmarks, for each tool we detected these offsets by computing the precision (the average, signed per-dimension difference between predicted and ground-truth spots) and correct for these offsets if necessary (Supplementary Fig. SN4.2d,e). RS-FISH assumes that each pixel in an image is a measurement (not a square) that is located at floored coordinates (for example 11.0, 134.0, 12.0), and the top left pixel of the first slice corresponds to the coordinates (0.0, 0.0, 0.0). For computation-time comparison on real data, each pipeline’s parameters were selected to produce a similar number of detected spots for each image. Additionally, we performed benchmarks for spots that were close to each other (Supplementary Fig. SN7.1 and SN7.2 and Supplementary Notes) and on real data with varying levels of noise (Supplementary Fig. SN6.1 and Supplementary Notes).

The comparison shows that RS-FISH is on par with currently available spot-detection tools in localization performance, providing high detection accuracy and low localization error (Fig. 2a–c,e,f and Supplementary Fig. SN4.2) while running 3.8–7.1 times faster (Fig. 2d and Supplementary Notes). Additionally, RS-FISH is currently the only available tool that can be directly applied to large images, which we highlight using a 148-GB lightsheet image stack13 (Fig. 1h, Supplementary Video 1, and Supplementary Notes). The image size of the lightsheet stack is 7,190 × 7,144 × 1,550 pixels, and the block size used for detection was 256 × 256 × 128 pixels. The detection of spots using RS-FISH took 3,263 seconds (~32 CPU hours) for the entire image on a 36 CPU workstation with 2× Intel Xeon Gold 5220 Processor at 2.2 Ghz. The runtime cannot be directly compared with the custom extension of AIRLOCALIZE that was developed for the same project, as it is written to specifically run only on the Janelia cluster. The compute time of 156 CPU hours was extracted from the cluster logs of the submission scripts and was executed on a mix of Intel SkyLake (Platinum 8168) at 2.7 GHz and Intel Cascade Lake (Gold 6248 R) at 3.0 Ghz CPUs. The overall speed increase of ~5× generally agrees with our measurements in Figure 2d, and the performance of a mix of these CPUs is comparable to the workstation CPUs (according to https://www.cpubenchmark.net). Importantly, RS-FISH runs on such volumes natively and can easily be executed on a cluster or in the cloud, thus it easily scales to significantly larger datasets. At the same time, the AIRLOCALIZE implementation is limited to the Janelia cluster, but could be extended to other LSF clusters that support job submission.

Benchmarking analysis details are in the Supplementary Notes, and all scripts and complete documentation are in the RS-FISH GitHub repository.

### Further properties of RS-FISH

Independent of the software used, localization performance is influenced by the lateral and axial sampling rate of the microscope, which has been studied extensively, for example in Thompson et al.27. RS-FISH supports a wide range of parameters that are explained in detail (Supplementary Fig. SN8.1) and allows the user to adjust it to the microscope settings used.

RS-FISH supports all image data formats supported by Fiji and BioFormats, including N5/Zarr. For distributed processing using Spark, large images need to be stored in the N5/Zarr format.

### Limitations of RS-FISH

RS-FISH is a tool for detecting diffraction-limited spots in single-channel 2D or 3D microscopy images. It gives the user a lot of flexibility through interactive parameter selection to detect all spots in their images. Thus, setting these parameters could potentially be daunting for new users. However, we choose default parameters that give good results for many typical FISH spot images, and the interactive GUI allows users to test out different parameters easily on their images. Other tools have limited parameter selection, but RS-FISH is able to detect spots more accurately because it allows careful, interactive parameter selection. RS-FISH precisely localizes spots in images with little noise but is less precise in images that show high noise (compared with only FISH-quant and AIRLOCALIZE), which can be partly explained by the ability of RS-FISH to correctly detect more spots in high-noise cases. Very dense spots or clouds of spots, which might be due to smFISH spots of highly expressed genes, are particularly challenging to detect using any currently available method. The multi-consensus RANSAC improves the situation, but parameter selection is not easy and it does not correct for the fact that the gradients of two very close spots influence each other (it simply ensures that the error is not higher than the user-defined RANSAC error threshold).

The RS-FISH-detected spots can be classified on the basis of their position relative to image landmarks within the plugin using binary masks. Further downstream analysis, such as co-localization, can be easily performed on the results files within Fiji or other analytical frameworks in R or Python.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.