Background & Summary

Fluorescence microscopy is a pivotal imaging technique in life-science experiments, allowing researchers to study biological structures or processes with remarkable precision. It employs fluorescent dyes or proteins that emit light at specific wavelengths depending on the illuminating wavelength they absorb. Exploiting this phenomenon, specific molecules can be tagged (stained) with fluorescent markers, and observed through a microscope by filtering only their emitted light, thus providing valuable insights into their localization, activity, and interactions. Based on this principle, several microscopy technologies are available, with modern solutions able to acquire 3D volumetric images containing rich neuromorphological information. Nevertheless, more traditional alternatives like epifluorescence microscopy and 2D-slice imaging are still very popular due to their suitability for a broad spectrum of applications, simplicity in design and operation, fast acquisition and cost-effective setup.

A major bottleneck in the adoption of these techniques is the lack of fully automated pipelines for the analysis of the resulting data, that instead often necessitates manual recognition and/or counting of the neuronal structures of interest1,2,3. For instance, in the study of torpor mechanisms, researchers depend on laborious hand-crafted operations to identify neuronal networks associated with this process4. This manual aspect typically delays the analyses, also introducing potential errors due to limitations of human operators. Moreover, the similarity between structures of interest and the background often leads to challenges in distinguishing and accurately recognizing biological compounds, resulting in inherent arbitrariness and interpretation bias.

For these reasons, there is a growing interest in automating the recognition and counting of tagged elements in fluorescence microscopy5,6,7,8,9. Deep learning approaches have demonstrated great promise in various object recognition tasks. However, their performance can deteriorate when applied to data from domains significantly different from those adopted for pre-training (domain shift10,11). Furthermore, the effectiveness of these approaches typically relies heavily on the availability of well-annotated data12, which is often scarce and limited in the fluorescence microscopy domain.

To mitigate these issues, we present the Fluorescent Neuronal Cells v2 (FNC) dataset13. This archive features 3 data collections, for a total of 1874 high-resolution images of rodents brain slices capturing a diverse range of neuronal structures and staining patterns. To facilitate research in this field, we also provide 750 annotations in various formats, tailored to popular supervised learning tasks such as semantic segmentation, object detection, and counting. Apart from serving as an additional benchmark for testing model generalization in microscopy applications, the FNC dataset opens up several research opportunities. Firstly, the heterogeneity of biological structures and their visual characteristics enable testing the generalization of trained models, and validating transfer learning and domain adaptation methods14,15. Also, the availability of multiple annotation types allows the exploration of different learning paradigms, ranging from supervised and unsupervised approaches to self-/weakly-supervised techniques. Moreover, the specific challenges of our data well suit investigations into methodological advancements, e.g., assessing the effectiveness of different annotation formats and uncertainty estimation.

The design of the data collection process involved two distinct stages. Firstly, data collection was conducted following standardized experimental protocols. Specifically, controlled experimental conditions were applied to the animals, whose brains were sliced and processed by a classical immunofluorescence protocol to stain various neuronal substructures. Subsequently, a fluorescence microscopy was employed to capture high-resolution images of the areas of interest. Secondly, domain experts performed data annotation providing ground-truth labels necessary for supervised learning.

Despite the presence of open-source fluorescence microscopy datasets, several issues hinder their utilization for training deep learning models. Firstly, these collections typically lack accompanying ground-truth annotations, thus precluding the adoption of supervised learning techniques. Secondly, labelled datasets often include just a few dozens of images16,17, that can be restrictive considering the data-intensive nature of deep learning models. Also, the moderate resolution of images in open datasets18,19 hampers the effectiveness of resorting to crops as an alternative to whole images for augmenting sample size. Thirdly, most existing datasets predominantly include a single marker type16,17, thus lacking diversity and limiting robust model training. Alongside these aspects, public datasets typically provide label types as dot-annotations or bounding boxes16,17,18,19, which prevents their extension to fine-grained segmentation tasks. Additionally, the data accessibility is sometimes restricted due to the use of domain-specific formats11, which complicates integration into deep learning frameworks and wide dissemination.

In response to these challenges, we present a large archive comprising high-resolution fluorescent microscopy images, encompassing different markers and cell types. Furthermore, the data are shared as easily accessible PNG files, and the corresponding annotations are provided in various types and formats, enabling the exploration of different learning approaches and tasks, thereby significantly expanding the scope of potential applications.

Methods

The FNC dataset13 compiles images acquired from multiple studies and experimental conditions, while maintaining a consistent structure in the acquisition pipeline. Minor modifications were made to accommodate the specific requirements of each study and adapt to the current experimental circumstances and equipment. The data collection process consisted of two distinct and independent stages: image acquisition and data annotation. This section provides a comprehensive description of the data acquisition design, including the dedicated measures implemented for each image collection (refer to Fig. 1 for a visual summary).

Fig. 1
figure 1

Study design. The study was designed in two phases: data collection, where high-resolution pictures of rodent brain slices were acquired; and data annotation, where expert researchers collected annotations needed for supervised learning approaches.

Image acquisition

In the image acquisition phase, a total of 68 rodents were subjected to controlled experimental conditions to study torpor and thermoregulatory mechanisms. At the end of the experimental session, the animals were deeply anaesthetized and transcardially perfused with 4% formaldehyde4. This process allowed for the tagging of several neuronal substructures located within the nucleus or cytoplasm of the neurons. All the experiments were conducted following approval by the National Health Authority (decree: No.141/2018 - PR/AEDB0.8.EXT.4), in accordance with the DL 26/2014 and the European Union Directive 2010/63/EU, and under the supervision of the Central Veterinary Service of the University of Bologna. All efforts were made to minimize the number of animals used and their pain and distress.

Rodents brains were then sectioned into 35 μm thick tissue slices, with sampling conducted at regular intervals (105 μm for mice and of 210 μm for rats) to avoid redundant data and ensure comprehensive coverage while maintaining manageable data size. Brain slices were finally stained for distinct markers following a standard immunofluorescence protocol4. Only some areas of interest were observed, namely the Raphe Pallidus (RPa), Dorsomedial Hypothalamus (DMH), Lateral Hypotalamus (LH), and Ventrolateral Periaqueductal Gray (VLPAG). These specific brain regions were chosen based on their relevance to the study of torpor mechanisms. The resulting specimens were observed by means of a fluorescence microscope equipped with a high-resolution camera. A specific wavelength of excitation light was selected for each collection based on the excitation wavelength of the chosen marker, resulting in pictures acquired with the application of green, yellow/orange or red filters. For simplicity, the image collections are named according to their prevalent hue. The original images were acquired as either TIF or JPG files depending on the camera default settings. To ensure traceability, a file naming convention was adopted to indicate their respective sample origins: <animal_id>_S<sample_id>C<column_id>R<row_id>_<brain_area>_<zoom>_<collection_id>. During the analysis phase, the raw data were converted to uncompressed PNG format, taking care to preserve the extensive set of associated metadata. This conversion aimed to enhance accessibility and facilitate broader utilization of the data, allowing for inspection and manipulation without the need for specialized software. Consequently, the FNC archive includes both these derived images and the original raw images, which are retained for data recovery and reproducibility purposes.

Green and yellow collections

The images within these collections were obtained during the same experiment4, in which brain sections from C57BL/6 J mice were stained with two markers to highlight specific substructures present in the neurons’ nucleus and cytoplasm. The resulting brain slices were then observed using a Nikon Eclipse 80i microscope, equipped with a Nikon Digital Sight DS-Vi1 color camera, at a magnification of 200x.

More specifically, the green collection corresponds to cFOS staining (cf. Figure 2d). This staining method was employed to emphasize the nuclei of active neuronal cells20, enabling the topographic analysis of brain areas that exhibit neuronal activity under specific experimental conditions. This approach is widely employed to identify neuronal cells responsible for regulating specific physiological phenomena.

Fig. 2
figure 2

Data preview. The figures show examples of fluorescence micrscopy pictures (ac) and the corresponding ground-truth binary masks (df).

In contrast, the yellow collection (cf. Figure 2c) utilized staining for the b-subunit of Cholera Toxin (CTb). This monosynaptic retrograde neuronal tracer migrates within the soma and axons of neuronal cells projecting to the brain area where CTb was previously injected during in vivo experiments21. Consequently, this staining technique facilitates the identification of morphological connections between different brain regions.

Red collection

The red collection comprises images obtained from multiple unpublished experiments, concerning specimens of both mice and rats (cf. Figure 2b). Despite sharing the same experimental setup as green and yellow collections, this time the brain tissues were stained for various elements to phenotypically characterize the cells involved in the neural circuits underlying the physiological phenomena of torpor and thermoregulation. Specifically, slices were stained for orexin, tryptophan hydroxylase, and tyrosine hydroxylase. In this case, image acquisition was conducted using both the aforementioned Nikon Eclipse 80i microscope and an ausJENA JENAVAL microscope, equipped with a Nikon Coolpix E4500 color camera, at a magnification of 250x. For further details, please refer to the accompanying metadata for each image.

Data annotation

The data annotation process was carried out by multiple proficient experimenters according to a fixed annotation protocol (see Annotations protocol.pdf file in the data archive), with multiple revision rounds to ensure data quality and minimize operator bias. We adopted the Visual Geometry Group Visual Image Annotator (VIA) annotation tool22,23, which employs a web interface for image visualization and allows for the overlaying of annotations in different forms. In our study, the tagging process involved creating polygon contours, and the resulting annotations were exported into CSV format. To generate the binary masks required for training, the polygon contours were transformed using programming libraries such as OpenCV and scikit-image. For the yellow collection, we utilized the binary masks available from version 124 as pre-annotations. Specifically, we employed erosions and dilations techniques to address fragmented contours resulting from semi-automatic labeling based on thresholding. Furthermore, we applied methods to fill small holes within segmented objects, and removed spurious objects that went overlooked in the previous annotations or were erroneously added by prior processing. Subsequently, these pre-annotations were refined manually using VIA, enhancing their accuracy and ensuring better consistency across the annotations (see Fig. 3). In contrast, the green and red collections were annotated from scratch.

Fig. 3
figure 3

Yellow masks v1 review. (a,c) illustrate how binary masks were reviewed compared to version 1, respectively (b,d). Improvements include: small objects removal, contour smoothing, holes filling and more consistent labelling.

Upon completion of the labeling process, the polygon contours exported from VIA were also converted into multiple annotation types and formats. This conversion aims at facilitating accessibility for a wide range of users and promoting the exploration of various learning problems related to our data. For a more comprehensive understanding of the available formats and annotation types, please refer to the Section Data Records.

Data Records

The FNC dataset13 is a collection of 1874 high-resolution fluorescent microscopy pictures, 750 of which also have their corresponding ground-truth segmentation masks, while the remaining 1124 are unlabelled. It is hosted on AMS Acta, the open access repository managed by the University of Bologna. The data are organized into three standalone image collections, named for simplicity green, yellow, and red, each available under the corresponding folder (see Fig. 4c). The collections share a common layout to facilitate easy access and analysis (see Fig. 4a).

Fig. 4
figure 4

FNC dataset structure. (a) shows the structure of each image collection folder, while (b) gives more details on the organization of the annotations directory. (c) summarizes the composition of each image collection, with the amounts of training, testing and unlabelled images.

To aid users in navigating the archive, the metadata_v2.xlsx file provides a comprehensive overview of the FNC data collection. It includes high-level metadata for each image, such as the corresponding animal, acquisition details, data partition, and annotation information.

Image collection folder structure

The trainval and test folders contain all labelled images for each collection. These data partitions were obtained through a random 75%/25% split and are recommended as a suggested configuration to ensure reproducibility and comparability in future studies. The remaining images were collected under the unlabelled folder.

Inside each data partition folder, the images folder contains fluorescence microscopy images in PNG format. All the images are accompanied by a rich set of metadata, stored both in their EXIF tags and as a separate TXT file under the metadata folder. The ground_truths folder contains annotations in various formats commonly used within the machine learning community (see Fig. 4b).

Annotation types and formats

The FNC collection13 provides annotations of multiple types, encoded in several standard formats. In the masks folder, we find the binary masks typically used for segmentation tasks (cf. Figure 2d,f). The correspondence between the masks and the respective images can be established based on the filenames. The other folders store a light-weight encoding of the binary masks, enriched with additional annotation types/formats.

The rle directory contains Running Length Encoding (RLE) of the binary masks, stored as pickle files. This encoding is a compressed representation that can effectively save disk space while preserving the complete segmentation information. It is particularly convenient for high-resolution images like those present in our dataset.

The other directories provide several annotation types, and they are named after their annotation format. Polygon annotations are available in each of the VIA, COCO and Pascal_VOC directories, in the form of json or xml files. COCO25 and Pascal_VOC26 formats also features bounding boxes and dot annotations for object detection tasks, and count labels for object counting.

Fluorescent neuronal Cells v1 comparison

The FNC v2 archive introduced by this work features an extension and re-elaboration of FNC v16,24 content. In particular, the Red and Green collections present entirely novel and unpublished data, while the Yellow collection in v2 is derived from v1. Specifically, the 283 fluorescent microscopy images contained in v1 are reported in v2 inside the trainval and test partitions, with numbered filenames instead of the original ones. For an exact matching, the users may refer to the original_name and image_name columns in metadata_v2.xlsx for v1 and v2 filenames, respectively. As far as the annotations, Yellow v1 binary masks were used as pre-annotations for version 2 to improve their quality and consistency, as described in Subsection Data annotation. The other formats were then derived from the results of the previous re-elaboration.

Technical Validation

In order to demonstrate the potential for successful model training and analysis using the provided dataset, we conducted three types of checks to ensure the accuracy and quality of the annotations.

Firstly, polygon annotations were obtained by experienced researchers. During this phase, the annotations underwent multiple rounds of double-checking to ensure that the polygons did not have intersecting edges and that they accurately represented the objects when transformed into binary masks.

Secondly, we leveraged domain knowledge to validate the annotations. Precisely, we tested the binary masks against our expectations regarding the sizes and shapes of the biological structures involved. This validation process relies on a quantitative evaluation concerning objects’ area and diameter, complemented by a visual scrutiny of the masks to ensure they align with the expected shapes and exhibit smooth contours. Table 1 reports summary statistics for the distribution of key features at the image and object levels, that can be leveraged for technical validation. For instance, the annotated objects display an average area of nearly 75, 247, and 133 μm2 for green, red, and yellow cells, respectively. These values align with the expected size of the biological structures represented in each image collection. Additionally, the analysis of Feret and equivalent diameters provides an understanding of the typical form of the stained objects. In particular, the Feret diameter27 can be interpreted as a measure of the maximum extension of an object, whereas the equivalent diameter represents the diameter the object would have if it were a perfect circle with the same area. Thus, comparing these two metrics can offer insight into the objects’ shape regularity. For green cells, the values for the two measurements are relatively close (roughly 12 VS 10 μm), suggesting that these cells are broadly circular or oval in shape. A similar conclusion can be drawn for the yellow stains, albeit with slightly more variability (approximately 17 and 13 μm), indicating generally regular shapes with occasional deviations. In the case of red objects, instead, the comparison is markedly different. This time we observe a Feret diameter around 26 μm against an equivalent diameter of 17, which suggests that these stains are typically elongated in one direction rather than concentrated around a center of mass. All these observations are also corroborated when visually inspecting annotated cells, which confirms prior expectations about objects size and shapes based on the nature of the marked structures.

Table 1 Summary statistics of key features’ distribution for each image collection.

Learning

Thirdly, we conducted a sample training phase for each image collection using a cell ResUnet architecture6, specifically designed for this type of application. Specifically, we trained a network from scratch for each collection using a Dice Loss28 and the Adam optimizer29. The initial learning rate was set based on the “learning rate test”30 implemented by fastai’s31 lr_find() method. The training phase continued for 100 epochs with cyclical learning rates32,33, and the best model was selected based on the best validation dice coefficient. For all technical details please refer to the GitHub repository (see Section Code availability). This training phase aims to verify the effectiveness of the data in facilitating the learning of beneficial cell features. Additionally, the intent is to highlight the relevant metrics for result evaluations. In particular, we suggest performance should be assessed differently depending on the end goal of future analyses.

For segmentation tasks, we provided an implementation where matching of actual and predicted neurons – i.e., the calculation of True Positives (TP), False Positives (FP) and False Negatives (FN) – is done based on their overlap, quantified as Intersection-over-Union (IoU)34. This approach not only ensures a 1-1 correspondence of true and predicted objects but also assesses how closely the predictions reconstruct the shape of ground-truth cells. Building on top of this definition, standard metrics such as precision, recall and F1 score can be computed as measures of global performance.

Detection tasks, on the other hand, would benefit from a looser matching criterion, comparing predicted and true objects’ centers instead of overlaps. This approach prioritizes recognition over precise shape reconstruction.

Finally, for counting tasks we suggest common regression metrics such as Mean Absolute Error (MAE), Median Absolute Error (MedAE) and Mean Percentage Error (MPE).

Table 2 shows the results of the sample training for each image collection. These results are not intended to be a comprehensive exploration of the model’s capabilities on FNC data13, but rather to showcase some characteristics of various evaluation methods. Nonetheless, they may serve as a baseline for future studies.

Table 2 Performance metrics by learning task.

Despite no optimization of the training pipeline, the initial results are mainly satisfactory (except for red segmentation), confirming the technical robustness of the data collection process. Going into more details, we observe a marked discrepancy between segmentation and detection metrics. As expected, the F1 scores based on the distance between true and predicted centers of mass are significantly higher than the corresponding segmentation indicators. Moreover, the discrepancy is greater for image collections where the objects have more irregular shapes (green < yellow < red). This is a consequence of the more inclusive matching criterion used for detection tasks.

In terms of counting, performance is already very satisfactory. However, these metrics may not fully represent the model’s performance as good results could arise due to a balancing effect between true positives and false negatives. Interestingly, despite low absolute errors, the percentage error is relatively high due to the impact of errors in images with few or no cells. To address this issue, we adopt the following formula for MPE computation: \({\rm{MPE}}=\frac{({\rm{predicted}}-{\rm{true}})}{{\rm{m}}{\rm{a}}{\rm{x}}({\rm{true}},1)}\). In this way, the fraction is not over-inflated when there are no cells in the original mask.

Usage Notes

The Fluorescent Neuronal Cells collection13 is available both as a comprehensive archive and as individual image collections for specific research requirements. This enables users to download the data efficiently and selectively, based on their specific needs. The code provided is based on the Python and PyTorch frameworks, offering a robust foundation for analysis and modeling. However, thanks to the popularity of the annotation formats and the use of PNG images, users can easily employ their preferred deep learning framework.

Peculiar traits

In all image collections, the visual representation is characterized by the prevalence of two distinct color tones, which result from the deliberate selection of a specific wavelength. One tone appears darker, indicating areas where light has been filtered out, while the other tone is brighter and more intense, emitted by the fluorophore corresponding to the color of each collection (see Fig. 2a–c). As a result, the images can generally be depicted using variations of a single color. Consequently, a 1-D representation may be sufficient, or an alternative color space other than RGB could provide more informative and less redundant data.

Notice, however, that the specific colors employed in our studies were dictated not by any inherent or functional property of the stained biological structures, but rather by their accessibility at the time of the experiments. Therefore, it would be a misinterpretation to associate specific colors to particular neuronal substructures. In fact, these colors serve only as contrasting elements to discern the stained foreground objects from the background. Consequently, the emphasis should lie primarily on learning this discrimination rather than matching specific colors with the neuronal structures. Thus, the particular colors should not be considered indicative of the type of neuronal cells or their functional attributes, but merely as a practical tool aiding in the overall visualization and interpretation.

Challenges

Some important insights for future studies can be drawn examining ground-truth masks at the pixel level, revealing significant characteristics that impact the training process.

The two classes, namely cells (1) and background (0), exhibit an extreme class imbalance, with background pixels being overwhelmingly predominant, typically exceeding cell pixels by over a factor of 100 (cf. Table 1, signal %). These observations highlight the necessity for specialized training strategies to address this pronounced class imbalance and effectively learn the pixel classification.

Additional challenges are associated with the macroscopic content of the images. The Fluorescent Neuronal Cells data showcase a diverse collection of 11704 subnuclear neuronal structures, varying in shape, size, and extension (cf Table 1, area, Feret diameter and equivalent diameter columns). The distribution of these structures across the collections is uneven, with some images containing numerous cells while others are devoid of them. Consequently, the model needs to be flexible enough to handle both scenarios.

Furthermore, despite considerable efforts to stabilize the acquisition procedure, several technical challenges persist. Firstly, there is a high variability in terms of color, saturation, and contrast from one image to another. For instance, there are instances where the tissues absorb some of the markers (see Fig. 5b,e,g), causing irrelevant compounds to emit light which is then captured by the microscope. Consequently, the background’s hue may shift towards values similar to those of faint neuronal cells (see Fig. 5b–f). In such circumstances, relying solely on pixel intensity is insufficient to distinguish between signal and background, necessitating the consideration of additional characteristics such as saturation and contrast. However, even the analysis of these characteristics is not straightforward, as fluorescent emissions are naturally unstable, leading to fluctuations in the saturation levels exhibited by cell pixels (cf. Figure 5a–c or Fig. 5f,g).

Fig. 5
figure 5

Challenges. FNC data present several difficulties to take into account during their analysis. Common challenges are represented by overcrowding (a,d,f,g), ambiguity (a,d,f), and artifacts (b–g).

Moreover, the substructures of interest have a fluid nature. Also, the shot can capture different two-dimensional sections depending on how the cells are oriented within the tissues. As a consequence, the size and the shape of the stained cells can vary significantly (cf. objects dimension in Fig. 2d,f), further complicating the discrimination between cells and the background.

Another challenge arises from the occasional presence of accumulations of fluorophore in narrow areas, resulting in emissions that closely resemble those of cells. These artifacts can manifest as small areas, such as point artifacts and filaments, or larger structures, like lateral stripes (see Fig. 5b,g). Again, their presence hampers the detection task, making the recognition and the understanding of cells structure and size mandatory for the model.

A further source of complexity is represented by overcrowding (Fig. 5a,d,f,g). When several cells are close-by, maybe partially overlapping, precisely localizing cell boundaries can be challenging, thus requiring adjustments to prevent the model from merging nearby cells into single agglomerations.

Last but not least, in some occasions the recognition of cells may be ambiguous even for human operators (cf. marked and non-marked instances in Fig. 5a,d,f,g). Of course, this poses an issue of intrinsic subjectivity in the annotation process, which in turn affects both the training and assessment phases.

By and large, all of these factors make the recognition and counting tasks harder and complicate the learning process. Likewise, borderline annotations hinder model assessment as their subjectivity introduces irreducible noise in the evaluation.

Limitations

While the Fluorescent Neuronal Cells data encompass diverse images in many aspects, they exhibit reduced heterogeneity in some respects.

First, all the images were collected by a single research laboratory using fixed experimental conditions and acquisition setups. Despite being representative of standard procedures35,36, the lack of diversity in this regard may lead to suboptimal generalization when applied to data collected under different settings.

Furthermore, the images were captured using epifluorescence microscopy, resulting in noisier images and no variability in terms of acquisition technologies employed. However, we believe the reduced image quality may actually represent an interesting assessment scenario due to its more challenging nature. Indeed, it is reasonable to assume that pre-training on FNC data13 should generalize well to modern equipment like confocal microscopy, where better image definition, sharper object boundaries and improved signal-to-noise ratio should ease the recognition task.

Another limitation lies in the lack of diversity in the cell types depicted and the animal species involved. Our dataset only focuses on subcellular components of rodent neurons. This might potentially impact the generalization of the models to different use cases and restrict their application to other cell types or animal species.