Fluorescent Neuronal Cells v2: multi-task, multi-format annotations for deep learning in microscopy

Fluorescent Neuronal Cells v2 is a collection of fluorescence microscopy images and the corresponding ground-truth annotations, designed to foster innovative research in the domains of Life Sciences and Deep Learning. This dataset encompasses three image collections wherein rodent neuronal cell nuclei and cytoplasm are stained with diverse markers to highlight their anatomical or functional characteristics. Specifically, we release 1874 high-resolution images alongside 750 corresponding ground-truth annotations for several learning tasks, including semantic segmentation, object detection and counting. The contribution is two-fold. First, thanks to the variety of annotations and their accessible formats, we anticipate our work will facilitate methodological advancements in computer vision approaches for segmentation, detection, feature extraction, unsupervised and self-supervised learning, transfer learning, and related areas. Second, by enabling extensive exploration and benchmarking, we hope Fluorescent Neuronal Cells v2 will catalyze breakthroughs in fluorescence microscopy analysis and promote cutting-edge discoveries in life sciences.


Background & Summary
Fluorescence microscopy is as a pivotal imaging technique in life-science experiments, allowing researchers to study biological structures or processes with remarkable precision.It employs fluorescent dyes or proteins that emit light at specific wavelengths depending on the illuminating wavelength they absorb.Exploiting this phenomenon, specific molecules can be tagged (stained) with fluorescent markers, and visualized by filtering only their emitted light, thus providing valuable insights into their localization, activity, and interactions.
Despite its widespread use, current practices in fluorescence microscopy analysis heavily rely on semi-automatic procedures, often necessitating manual recognition and/or counting of specific neuronal structures of interest [1][2][3] .For instance, in the study of torpor mechanisms, researchers depend on laborious hand-crafted operations to identify neuronal networks associated with this process 4 .This manual aspect typically delays the analyses, also introducing potential errors due to limitations of human operators.Moreover, the similarity between structures of interest and the background often leads to challenges in distinguishing and accurately recognizing biological compounds, resulting in inherent arbitrariness and interpretation bias.
For these reasons, there is a growing interest in automating the recognition and counting of tagged elements in fluorescence microscopy [5][6][7][8] .Deep learning approaches have demonstrated great promise in various object recognition tasks.However, their performance can deteriorate when applied to data from domains significantly different from those adopted for pretraining (domain shift 9,10 ).Furthermore, the effectiveness of these approaches typically relies heavily on the availability of well-annotated data 11 , which is often scarce and limited in the fluorescence microscopy domain.
To mitigate these issues, we present the Fluorescent Neuronal Cells v2 (FNC) dataset.This archive features 3 data collections, for a total of 1874 high-resolution images of rodents brain slices capturing a diverse range of neuronal structures and staining patterns.To facilitate research in this field, we also provide 750 annotations in various formats, tailored to popular supervised learning tasks such as semantic segmentation, object detection, and counting.Apart from serving as an additional benchmark for testing model generalization in microscopy applications, the FNC dataset opens up several research opportunities.Firstly, the heterogeneity of biological structures and their visual characteristics enable testing the generalization of trained models, and validating transfer learning and domain adaptation methods 12,13 .Also, the availability of multiple annotation types allows the exploration of different learning paradigms, ranging from supervised and unsupervised approaches to self-/weakly-supervised techniques.Moreover, the specific challenges of our data well suit investigations into methodological advancements, e.g., assessing the effectiveness of different annotation formats and uncertainty estimation.
The design of the data collection process involved two distinct stages.Firstly, data collection was conducted following standardized experimental protocols.Specifically, controlled experimental conditions were applied to the animals, whose brains were sliced and processed by a classical immunofluorescence protocol to stain various neuronal substructures.Subsequently, a fluorescence microscopy was employed to capture high-resolution images of the areas of interest.Secondly, domain experts performed data annotation providing ground-truth labels necessary for supervised learning.
Despite the presence of open-source fluorescence microscopy datasets, several issues hinder their utilization for training deep learning models.Firstly, these collections typically lack accompanying ground-truth annotations, thus precluding the adoption of supervised learning techniques.Secondly, labelled datasets often include just a few dozens of images 14,15 , that can be restrictive considering the data-intensive nature of deep learning models.Also, the moderate resolution of images in open datasets 16,17 hampers the effectiveness of resorting to crops as an alternative to whole images for augmenting sample size.Thirdly, most existing datasets predominantly include a single marker type 14,15 , thus lacking diversity and limiting robust model training.Alongside these aspects, public datasets typically provide label types as dot-annotations or bounding boxes [14][15][16][17] , which prevents their extension to fine-grained segmentation tasks.Additionally, the data accessibility is sometimes restricted due to the use of domain-specific formats 10 , which complicates integration into deep learning frameworks and wide dissemination.
In response to these challenges, we present a large archive comprising high-resolution fluorescent microscopy images, encompassing different markers and cell types.Furthermore, the data are shared as easily accessible PNG files, and the corresponding annotations are provided in various types and formats, enabling the exploration of different learning approaches and tasks, thereby significantly expanding the scope of potential applications.

Methods
The FNC dataset compiles images acquired from multiple studies and experimental conditions, while maintaining a consistent structure in the acquisition pipeline.Minor modifications were made to accommodate the specific requirements of each study and adapt to the current experimental circumstances and equipment.The data collection process consisted of two distinct and independent stages: image acquisition and data annotation.This section provides a comprehensive description of the data acquisition design, including the dedicated measures implemented for each image collection (refer to Figure 1 for a visual summary).

Image acquisition
In the image acquisition phase, a total of 68 rodents were subjected to controlled experimental conditions to study torpor and thermoregulatory mechanisms.At the end of the experimental session, the animals were deeply anaesthetized and transcardially perfused with 4% formaldehyde 4 .This process allowed for the tagging of several neuronal substructures located within the nucleus or cytoplasm of the neurons.Rodents brains were then sectioned into 35 µm thick tissue slices, with sampling conducted at regular intervals (105 µm for mice and of 210 µm for rats) to avoid redundant data and ensure comprehensive coverage while maintaining manageable data size.Brain slices were finally stained for distinct markers following a standard immunofluorescence protocol 4 .Only some areas of interest were observed, namely the Raphe Pallidus (RPa), Dorsomedial Hypothalamus (DMH), Lateral Hypotalamus (LH), and Ventrolateral Periaqueductal Gray (VLPAG).These specific brain regions were chosen based on their relevance to the study of torpor mechanisms.The resulting specimens were observed by means of a fluorescence microscope equipped with a high-resolution camera.A specific wavelength of excitation light was selected for each collection based on the excitation wavelength of the chosen marker, resulting in pictures acquired with the application of green, yellow/orange or red filters.For simplicity, the image collections are named according to their prevalent hue.The original images were acquired as either TIF or JPG files depending on the camera default settings.To ensure traceability, a file naming convention was adopted to indicate their respective sample origins 1 .During the analysis phase, the raw data were converted to uncompressed PNG format, taking care to preserve the extensive set of associated metadata.This conversion aimed to enhance accessibility and facilitate broader utilization of the data, allowing for inspection and manipulation without the need for specialized software.Consequently, the FNC archive includes both these derived images and the original raw images, which are retained for data recovery and reproducibility purposes.

Green and Yellow collections
The images within these collections were obtained during the same experiment 4 , in which brain sections from C57BL/6J mice were stained with two markers to highlight specific substructures present in the neurons' nucleus and cytoplasm.The resulting brain slices were then observed using a Nikon Eclipse 80i microscope, equipped with a Nikon Digital Sight DS-Vi1 color camera, at a magnification of 200x.
More specifically, the green collection corresponds to cFOS staining (cf. Figure 2d).This staining method was employed to emphasize the nuclei of active neuronal cells 18 , enabling the topographic analysis of brain areas that exhibit neuronal activity under specific experimental conditions.This approach is widely employed to identify neuronal cells responsible for regulating specific physiological phenomena.
In contrast, the yellow collection (cf. Figure 2c) utilized staining for the b-subunit of Cholera Toxin (CTb).This monosynaptic retrograde neuronal tracer migrates within the soma and axons of neuronal cells projecting to the brain area where CTb was previously injected during in vivo experiments 19 .Consequently, this staining technique facilitates the identification of morphological connections between different brain regions.

Red collection
The red collection comprises images obtained from multiple unpublished experiments, concerning specimens of both mice and rats (cf. Figure 2b).Despite sharing the same experimental setup as green and yellow collections, this time the brain tissues were stained for various elements to phenotypically characterize the cells involved in the neural circuits underlying the physiological phenomena of torpor and thermoregulation.Specifically, slices were stained for orexin, tryptophan hydroxylase, and tyrosine hydroxylase.In this case, image acquisition was conducted using both the aforementioned Nikon Eclipse 80i microscope and an ausJENA JENAVAL microscope, equipped with a Nikon Coolpix E4500 color camera, at a magnification of 250x.For further details, please refer to the accompanying metadata for each image.

Data annotation
The data annotation process was carried out by multiple proficient experimenters according to a fixed annotation protocol 2 , with multiple revision rounds to ensure data quality and minimize operator bias.We adopted the Visual Geometry Group Visual Image Annotator (VIA) annotation tool 20,21 , which employs a web interface for image visualization and allows for the overlaying of annotations in different forms.In our study, the tagging process involved creating polygon contours, and the resulting annotations were exported into CSV format.To generate the binary masks required for training, the polygon contours were transformed using programming libraries such as OpenCV and scikit-image.For the yellow collection, we utilized the binary masks available from version 1 22 as pre-annotations.Specifically, we employed erosions and dilations techniques to address fragmented contours resulting from semi-automatic labeling based on thresholding.Furthermore, we applied methods to fill small holes within segmented objects, and removed spurious objects that went overlooked in the previous annotations or were erroneously added by prior processing.Subsequently, these pre-annotations were refined manually using VIA, enhancing their accuracy and ensuring better consistency across the annotations (see Figure 3).In contrast, the green and red collections were annotated from scratch.Upon completion of the labeling process, the polygon contours exported from VIA were also converted into multiple annotation types and formats.This conversion aims at facilitating accessibility for a wide range of users and promoting the exploration of various learning problems related to our data.For a more comprehensive understanding of the available formats and annotation types, please refer to the Section Data Records.

Data Records
The FNC dataset is a collection of 1874 high-resolution fluorescent microscopy pictures, 750 of which also have their corresponding ground-truth segmentation masks, while the remaining 1124 are unlabelled.It is hosted on AMS Acta 3 , the open access repository managed by the University of Bologna.The data are organized into three standalone image collections, named for simplicity green, yellow, and red, each available under the corresponding folder (see Figure 4c).The collections share a common layout to facilitate easy access and analysis (see Figure 4a).
To aid users in navigating the archive, the metadata_v2.xlsxfile provides a comprehensive overview of the FNC data collection.It includes high-level metadata for each image, such as the corresponding animal, acquisition details, data partition, and annotation information.

Image collection folder structure
The trainval and test folders contain all labelled images for each collection.These data partitions were obtained through a random 75%/25% split and are recommended as a suggested configuration to ensure reproducibility and comparability in future studies.The remaining images were collected under the unlabelled folder.
Inside each data partition folder, the images folder contains fluorescence microscopy images in PNG format.All the images are accompanied by a rich set of metadata, stored both in their EXIF tags and as a separate TXT file under the metadata folder.The ground_truths folder contains annotations in various formats commonly used within the machine learning community (see Figure 4b).

Annotation types and formats
The FNC collection provides annotations of multiple types, encoded in several standard formats.In the masks folder, we find the binary masks typically used for segmentation tasks (cf.Figures 2d to 2f).The correspondence between the masks and the respective images can be established based on the filenames.The other folders store a light-weight encoding of the binary masks, enriched with additional annotation types/formats.The rle directory contains Running Length Encoding (RLE) of the binary masks, stored as pickle files.This encoding is a compressed representation that can effectively save disk space while preserving the complete segmentation information.It is particularly convenient for high-resolution images like those present in our dataset.
The other directories provide several annotation types, and they are named after their annotation format.Polygon annotations are available in each of the VIA, COCO and Pascal_VOC directories, in the form of or xml files.COCO 23 and Pascal_VOC 24 formats also features bounding boxes and dot annotations for object detection tasks, and count labels for object counting.

Technical Validation
In order to demonstrate the potential for successful model training and analysis using the provided dataset, we conducted three types of checks to ensure the accuracy and quality of the annotations.
Firstly, polygon annotations were obtained by experienced researchers.During this phase, the annotations underwent multiple rounds of double-checking to ensure that the polygons did not have intersecting edges and that they accurately represented the objects when transformed into binary masks.
Secondly, we leveraged domain knowledge to validate the annotations.Precisely, we tested the binary masks against our expectations regarding the sizes and shapes of the biological structures involved.This validation process relies on a quantitative evaluation concerning objects' area and diameter, complemented by a visual scrutiny of the masks to ensure they align with the expected shapes and exhibit smooth contours.image and object levels, that can be leveraged for technical validation.For instance, the annotated objects display an average area of nearly 75, 247, and 133 µm 2 for green, red, and yellow cells, respectively.These values align with the expected size of the biological structures represented in each image collection.Additionally, the analysis of Feret and equivalent diameters provides an understanding of the typical form of the stained objects.In particular, the Feret diameter 25 can be interpreted as a measure of the maximum extension of an object, whereas the equivalent diameter represents the diameter the object would have if it were a perfect circle with the same area.Thus, comparing these two metrics can offer insight into the objects' shape regularity.For green cells, the values for the two measurements are relatively close (roughly 12 VS 10 µm), suggesting that these cells are broadly circular or oval in shape.A similar conclusion can be drawn for the yellow stains, albeit with slightly more variability (approximately 17 and 13 µm), indicating generally regular shapes with occasional deviations.In the case of red objects, instead, the comparison is markedly different.This time we observe a Feret diameter around 26 µm against an equivalent diameter of 17, which suggests that these stains are typically elongated in one direction rather than concentrated around a center of mass.All these observations are also corroborated when visually inspecting annotated cells, which confirms prior expectations about objects size and shapes based on the nature of the marked structures.

Learning
Thirdly, we conducted a sample training phase for each image collection using a cell ResUnet architecture 5 , specifically designed for this type of application.Specifically, we trained a network from scratch for each collection using a Dice Loss 26 and the Adam optimizer 27 .The initial learning rate was set based on the "learning rate test" 28 implemented by fastai's 29 lr_find() method.The training phase continued for 100 epochs with cyclical learning rates 30,31 , and the best model was selected based on the best validation dice coefficient.For all technical details please refer to the GitHub repository 4 .This training phase aims to verify the effectiveness of the data in facilitating the learning of beneficial cell features.Additionally, the intent is to highlight the relevant metrics for result evaluations.In particular, we suggest performance should be assessed differently depending on the end goal of future analyses.
For segmentation tasks, we provided an implementation where matching of actual and predicted neurons 5 is done based on their overlap, quantified as Intersection-over-Union (IoU) 32 .This approach not only ensures a 1-1 correspondence of true and predicted objects but also assesses how closely the predictions reconstruct the shape of ground-truth cells.Building on top of this definition, standard metrics such as precision, recall and F 1 score can be computed as measures of global performance.
Detection tasks, on the other hand, would benefit from a looser matching criterion, comparing predicted and true objects' centers instead of overlaps.This approach prioritizes recognition over precise shape reconstruction.
Finally Counting metrics simply consider the difference between predicted and true objects.
Table 2 shows the results of the sample training for each image collection.These results are not intended to be a comprehensive exploration of the model's capabilities on FNC data, but rather to showcase some characteristics of various evaluation methods.Nonetheless, they may serve as a baseline for future studies.
Despite no optimization of the training pipeline, the initial results are mainly satisfactory (except for red segmentation), confirming the technical robustness of the data collection process.Going into more details, we observe a marked discrepancy between segmentation and detection metrics.As expected, the F 1 scores based on the distance between true and predicted centers of mass are significantly higher than the corresponding segmentation indicators.Moreover, the discrepancy is greater for image collections where the objects have more irregular shapes (green < yellow < red).This is a consequence of the more inclusive matching criterion used for detection tasks.
In terms of counting, performance is already very satisfactory.However, these metrics may not fully represent the model's performance as good results could arise due to a balancing effect between true positives and false negatives.Interestingly, despite low absolute errors, the percentage error is relatively high due to the impact of errors in images with few or no cells.To address this issue, we adopt the following formula for MPE computation: MPE = (predicted − true) max (true, 1) .In this way, the fraction is not over-inflated when there are no cells in the original mask.

Usage Notes
The Fluorescent Neuronal Cells collection is available both as a comprehensive archive and as individual image collections for specific research requirements.This enables users to download the data efficiently and selectively, based on their specific needs.The code provided is based on the Python and PyTorch frameworks, offering a robust foundation for analysis and modeling.However, thanks to the popularity of the annotation formats and the use of PNG images, users can easily employ their preferred deep learning framework.

Peculiar traits
In all image collections, the visual representation is characterized by the prevalence of two distinct color tones, which result from the deliberate selection of a specific wavelength.One tone appears darker, indicating areas where light has been filtered out, while the other tone is brighter and more intense, emitted by the fluorophore corresponding to the color of each collection (see Figures 2a to 2c).As a result, the images can generally be depicted using variations of a single color.Consequently, a 1-D representation may be sufficient, or an alternative color space other than RGB could provide more informative and less redundant data.Notice, however, that the specific colors employed in our studies were dictated not by any inherent or functional property of the stained biological structures, but rather by their accessibility and practicality during the time of the experiments.Therefore, it would be a misinterpretation to associate specific colors to particular neuronal substructures.In fact, these colors serve only as contrasting elements to discern the stained foreground objects from the background.Consequently, the emphasis should lie primarily on learning this discrimination rather than matching specific colors with the neuronal structures.Thus, the particular colors should not be considered indicative of the type of neuronal cells or their functional attributes, but merely as a practical tool aiding in the overall visualization and interpretation.Challenges Some important insights for future studies can be drawn examining ground-truth masks at the pixel level, revealing significant characteristics that impact the training process.The two classes, namely cells (1) and background (0), exhibit an extreme class imbalance, with background pixels being overwhelmingly predominant, typically exceeding cell pixels by over a factor of 100 (cf.Table 1, signal %).These observations highlight the necessity for specialized training strategies to address this pronounced class imbalance and effectively learn the pixel classification.
Additional challenges are associated with the macroscopic content of the images.The Fluorescent Neuronal Cells data showcase a diverse collection of 11704 subnuclear neuronal structures, varying in shape, size, and extension (cf Table 1, area, Feret diameter and equivalent diameter columns).The distribution of these structures across the collections is uneven, with some images containing numerous cells while others are devoid of them.Consequently, the model needs to be flexible enough to handle both scenarios.
Furthermore, despite considerable efforts to stabilize the acquisition procedure, several technical challenges persist.Firstly, there is a high variability in terms of color, saturation, and contrast from one image to another.For instance, there are instances where the tissues absorb some of the markers (see Figures 5b to 5e and 5g), causing irrelevant compounds to emit light which is then captured by the microscope.Consequently, the background's hue may shift towards values similar to those of faint neuronal cells (see Figures 5b to 5f).In such circumstances, relying solely on pixel intensity is insufficient to distinguish between signal and background, necessitating the consideration of additional characteristics such as saturation and contrast.However, even the analysis of these characteristics is not straightforward, as fluorescent emissions are naturally unstable, leading to fluctuations in the saturation levels exhibited by cell pixels (cf.Figures 5a to 5c or Figures 5f and 5g).
Moreover, the substructures of interest have a fluid nature.Also, the shot can capture different two-dimensional sections depending on how the cells are oriented within the tissues.As a consequence, the size and the shape of the stained cells can vary significantly (cf.objects dimension in Figures 2d to 2f), further complicating the discrimination between cells and the background.
Another challenge arises from the occasional presence of accumulations of fluorophore in narrow areas, resulting in emissions that closely resemble those of cells.These artifacts can manifest as small areas, such as point artifacts and filaments, or larger structures, like lateral stripes (see Figures 5b to 5g).Again, their presence hampers the detection task, making the recognition and the understanding of cells structure and size mandatory for the model.
A further source of complexity is represented by overcrowding (Figures 5a, 5d, 5f and 5g).When several cells are close-by, maybe partially overlapping, precisely localizing cell boundaries can be challenging, thus requiring adjustments to prevent the model from merging nearby cells into single agglomerations.
Last but not least, in some occasions the recognition of cells may be ambiguous even for human operators(cf.marked and non-marked instances in Figures 5a, 5d, 5f and 5g).Of course, this poses an issue of intrinsic subjectivity in the annotation process, which in turn affects both the training and assessment phases.
By and large, all of these factors make the recognition and counting tasks harder and complicate the learning process.Likewise, borderline annotations hinder model evaluation as their subjectivity deprives the model of a reliable and indisputable testbed.

Research lines
As long as potential applications, the FNC dataset offers rich opportunities for diverse research directions, including: • Object Segmentation, Detection, and Counting: The dataset's comprehensive annotations and diverse neuronal structures support studies focusing on accurate segmentation, detection, and counting of cells.Particularly, FNC may be a challenging benchmark for class imbalance, object overlapping/overcrowding, and uncertainty estimation • Transfer Learning: With the availability of multiple image collections within FNC, researchers can explore transfer learning techniques, leveraging knowledge from one collection to improve performance on another.
• Unsupervised or Self-/Weakly-Supervised Learning: The presence of both labeled and unlabeled data within the FNC dataset provides an ideal testbed for evaluating unsupervised or self-/weakly-supervised learning approaches.
• Evaluation of Annotation Types: Researchers can investigate the effectiveness of different annotation types for specific tasks, allowing for a comparative analysis and selection of the most suitable annotations considering the cost/performance requirements of a given use-case.

Limitations
Despite the Fluorescent Neuronal Cells collection presenting a variety of images in many aspects, it has limitations in terms of diversity across several parameters.Firstly, all the images were collected by the same research laboratory in Bologna, utilizing fixed experimental conditions and acquisition settings.Furthermore, the images were captured using epifluorescence microscopy, which limits the range of techniques employed.However, we believe that the adopted acquisition settings represent a more challenging scenario.Therefore, pre-training on FNC data should enable generalization to modern equipment such as confocal microscopy, which produces higher-quality images with sharper object boundaries and improved signal-to-noise ratio.
Another limitation lies in the lack of diversity in the cell types depicted and the animal species involved.Our dataset only focuses on subcellular components of rodent neurons.This might potentially impact the generalization of the models to different use cases and restrict their application to other cell types or animal species.

Figure 1 .
Figure 1.Study design.The study was designed in two phases: data collection, where high-resolution pictures of rodent brain slices were acquired; and data annotation, where expert researchers collected annotations needed for supervised learning approaches.

( a )
Green image example (b) Red image example (c) Yellow image example (d) Green mask example (e) Red mask example (f) Yellow mask example

Figure 2 .
Figure 2. Data preview.The figures show examples of fluorescence micrscopy pictures (Figures 2a to 2c) and the corresponding ground-truth binary masks (Figures 2d to 2f).
Figure 2. Data preview.The figures show examples of fluorescence micrscopy pictures (Figures 2a to 2c) and the corresponding ground-truth binary masks (Figures 2d to 2f).

Figure 3 .
Figure 3. Yellow masks v1 review.Figures 3a and 3c illustrate how binary masks were reviewed compared to version 1, respectively Figures3b and 3d.Improvements include: small objects removal, contour smoothing, holes filling and more consistent labelling.

Figure 4 .
Figure 4. FNC dataset structure.Figure4ashows the structure of each image collection folder, while Figure4bgives more details on the organization of the annotations directory.Figure4csummarizes the composition of each image collection, with the amounts of training, testing and unlabelled images.

Table 1 .
Table 1 reports summary statistics for the distribution of key features at the Summary statistics of key features' distribution for each image collection.The top portion highlights global indicators, while the bottom one reports given percentiles of each distribution.
a The difference compared to counts in following columns comes from the contribution of empty images.These amount to 6, 3 and 38 images for green, red and yellow collections, respectively.

Table 2 .
, for counting tasks we suggest common regression metrics such as Mean Absolute Error (MAE), Median Absolute Error (MedAE) and Mean Percentage Error (MPE).Performance metrics by learning task.The segmentation portion refers to TP, FP, and FN computed based on objects overlapping (IoU).For detection metrics, predicted and true cells are associated based on their centers' distance.