Svetlana a supervised segmentation classifier for Napari

We present Svetlana (SuperVised sEgmenTation cLAssifier for NapAri), an open-source Napari plugin dedicated to the manual or automatic classification of segmentation results. A few recent software tools have made it possible to automatically segment complex 2D and 3D objects such as cells in biology with unrivaled performance. However, the subsequent analysis of the results is oftentimes inaccessible to non-specialists. The Svetlana plugin aims at going one step further, by allowing end-users to label the segmented objects and to pick, train and run arbitrary neural network classifiers. The resulting network can then be used for the quantitative analysis of biophysical phenoma. We showcase its performance through challenging problems in 2D and 3D and provide a comprehensive discussion on its strengths and limits.

it can automatically count and distinguish cells belonging to two different types in a developing quail embryo after just a few clicks.
Our aim was to create a plugin that fulfills the majority of requirements for image analysis platforms.It's versatile, handling 2D, 3D, grayscale or multi-spectral images, whether individual or in large collections.It is well documented with a rather simple installation procedure, see the hyperlinks in Fig. 1e.
The annotation module offers various labeling modes, enabling the annotation of thousands of 2D or 3D objects in minutes.The training module supports user-provided architectures as well as popular neural network architectures, see Fig. 1c.We have also designed minimalist neural network, ideal for simple classification tasks with limited annotations.
Additionally, the plugin includes a wide range of image augmentation techniques, see Fig. 1d.These techniques enhance the classifier stability across different image acquisition protocols and promote useful features such as rotation invariance.
Understanding how the neural network makes decisions can be crucial for some applications, such as deciphering biophysical phenomena or avoiding confounding variables.To this end, we integrated a network interpretation module named Grad-CAM 12 .Augmentation techniques can also facilitate interpretation through "augmentation" studies.By adding or removing augmentation types at training time and evaluating the classification performance, the user can see which features are essential.For instance if changing the image colors randomly during the training implies a performance drop for the classification, it means that a color is probably important to distinguish between cell populations.
Overall, Svetlana offers a complete solution to design robust, accurate and interpretable neural-networkbased classifiers.

Workflow of Svetlana
The principle of Svetlana is displayed in Fig. 1b.It relies on an external segmentation module which outputs a segmentation mask.Svetlana then takes the images to be labeled and the segmentation masks as inputs.It is separated in three different modules: Annotation This module allows the user to label some connected components of the segmentation masks.For simple classification tasks in 2D or 3D, we could label more than 1000 connected components in about 15 min.

Training
This module allows the user to pick an arbitrary PyTorch 13 neural network architecture (possibly pre-trained) and to further train it with the annotations generated by the previous module.A few seconds or minutes are usually enough to provide high quality classification results (see Table 1).

Prediction
This module uses the trained network to classify one or many segmentation masks.
The outputs of the plugin are: a set of manually annotated patches, a trained neural network and prediction masks.
The results are stored in files under widely accessible formats for the forthcoming analyses.

Requirements
We decided to integrate Svetlana in the Napari environment 14 , since it offers a rich interface to manipulate images and a direct connection to PyTorch, a dominant artificial intelligence Python package.The first installation step is to install a proper Python environment.To do so, we advise the user to first install Anaco nda, Mamba or Pip.
Installing Svetlana can then be done simply by following the instructions on the insta llati on page.It will be available in the plugin manager of Napari.Ideally, it should be used with a computer equipped with an Nvidia GPU (Graphical Processing Unit) for faster processing.

Main processing steps
In this section, we describe the main steps to design an efficient classifier in Svetlana.Annotation After organizing the images and segmentation masks in the right folde rs, the first step consists in annotating a few regions of interest (ROI).Various annotation modes are available: the ROI can be selected by clicking or picked randomly by the plugin.This works using both the 2D and 3D visualization.The label can be assigned by giving a number from 1 to 9, which is the maximum number of classes.
Depending on the complexity of the classification task, the number of ROI to label to generate a good classifier can vary from a few units to a few hundreds.It is hard to anticipate this.A possible solution is to start with few annotations and if the classifier is not accurate enough, additional annotations can be provided in a second stage.This active learning mecanism allows the user to rapidly generate a satisfactory training set.
Choosing a CNN architecture We designed minimalist convolutional neural network (CNN) architectures adpated to 2D or 3D images with arbitrary number of channels.They are defined by their width and depth, see Fig. 1c.In all our experiments, these architectures which contain few trainable weights proved to be sufficient to reach accurate classification results.We therefore recommend using these first.As illustrated in Table 2, the depth and width of the architecture yield minor differences in the final accuracy and a width of 32 and depth of 3 seems like a good compromise.
In addition, Svetlana offers a large number of popular pre-defined architectures in 2D (ResNet 15 , DenseNet 16 , AlexNet 17 of various depths).They contain millions of parameters, which can be excessive for small dataset.Pretrained networks can also be loaded and fine-tuned using Svetlana with additional data.
Optimization routine Svetlana uses Adam as an optimization algorithm 18 .It is widely accepted as one of the most versatile method for training neural networks.The main parameters are the number of iterations, the batch size, the learning rate and the momentum.They can all be set in the interface.Our experiments revealed that a step decay stabilizes the optimization process.If needed, additional options can be specified in the Json configuration file.
Patch normalization In most situations, we advise to rescale the patches in [0, 1] using the min-max scaling available in the interface.If the intensity plays an important role to distinguish cell types, other options are available.
Data augmentation A quite useful feature of Svetlana is the possibility to use data augmentation during the training step.
To avoid overloading the graphical interface, the augmentation possibilities are limited to vertical and horizontal flips and random rotations.It is however possible to use the much richer set of transformations available within the Albumentations library 10 by setting the configuration file.The transformations are specified in the form of a dictionary, as indicated in the docum entat ion.
Loss functions We recommend using the cross-entropy as a loss function even though other possibilities are offered for more advanced users.
The batch mode Svetlana can be used on single images or entire folders for the annotation, training and prediction plugins.It is therefore possible to build a large training dataset, possibly containing images from different modalities.
Active learning / human-in-the-loop The annotation interface offers the possibility to correct the predictions after training.The previous prediction mask is loaded as an overlay and the user can correct or complete his annotation, before restarting a training.

Validation of the method
To showcase the usefulness of Svetlana, we performed three experiments on real biological problems.

Quail embryo nuclei classification
The first problem is provided as a demo in the plugin.It does not have a particularly important biological relevance, but illustrates the main plugin features.We use a 624 × 158 × 232 voxels crop of a large quail embryo image.It was taken with a two-photon microscope with a resolution of 0.68 × 0.68 × 1 µm by B. Bénazéraf 9 .
The cropped part contains two structures of interest: • The neural tube: an axial tissue that will form the spinal cord.
• The somites: round structures located on either side of the neural tube that will form skeletal muscles and vertebrae.
The classification task we are interested in is to distinguish between nuclei within the neural tube and those within the mesoderm.They can be easily distinguished by visual inspection, allowing the user to rapidly annotate a few of them.
To classify the two cell types, we used the following protocol: 1. Segment the whole volume using the model Cyto2 in CellPose.
We used patches of size 45 × 45 × 45 voxels from the complete 3D image for the prediction.The classification accuracy is more than 94% on both nuclear types, see Fig. 1a.Obtaining such a result using only 169 annotations out of 26, 214 is remarkable.Indeed, we have already shown in Fig. 4g, that a 2D random forest classifier was unable to accomplish this task, based on simple radiometric and morphometric features.We performed a similar experiment in 3D, and again, the classification were nearly random.We do not reproduce it in this paper for conciseness.

Pathology dataset classification
In this section, we showcase that Svetlana's minimalist neural networks can solve complex classification problems, using a public dataset called PanNuke 19 (see Fig. 2a).The dataset comprises 481 visual fields, with 312 selected randomly from over 20,000 whole slide images sourced from various data sources, captured at different magnifications.It is composed of 205,343 nuclei categorized into 5 clinical classes (neoplastic, non-neo epithelial, inflammatory, connective, dead) from 19 different tissues.Other factors of variability, such as the diversity of patients from whom tissues are extracted, or the difficulty of obtaining reproducible tissues staining, make this type of dataset challenging to classify robustly.We demonstrate Svetlana's ability to produce accurate classification results for this type of application both for a small and a large dataset.We used a neural network of depth 3 and width 32 available in Svetlana.It is composed of 14,500 parameters.This is about 1000 times smaller than a Resnet18 15 , which is among the smallest classification models from the Torchvision library.
To quantify the classification results, we compute standard precision metrics.Let n i denote the number of elements in the i-th class.We compute metrics based on the numbers of True Positives ( TP i ), False Positives ( FP i ), False Negatives ( FN i ) and True Negatives ( TN i ) for each class.The metrics are then defined as follows: Vol.:(0123456789) www.nature.com/scientificreports/ We assess Svetlana's performance in two different scenarios.First, we trained our custom minimalist architecture on various subsets containing from 10 to 300 labels.For this experiment, we launch the training on two classes: the neoplastic and inflammatory cells.Figure 2b contains examples of neoplastic cells and Fig. 2c contain examples of inflammatory cells.Figure 2d demonstrates that with only 10 labels, we already reach a classification accuracy of about 94% .Increasing the dataset size to 300 yields high precision and accuracy for both classes ( 99.70%).
In a second step, we trained Svetlana on a much larger 5-class dataset containing 28,167 nuclei extracted from PanNuke dataset as well as for the previous experiment.In Fig. 2e, we computed the confusion matrix for the 5 classes, as well as the weighted average precision, recall, F1-score and accuracy, on a validation set of 15,218 nuclei.Again, high accuracies can be reached with the minimalist CNN architectures that we designed.
Overall this experiment shows that the Svetlana plugin and the minimalist neural network architectures offer a solid companion for tackling complex classification tasks, whether in a data-limited or data-rich context.

Osteoclasts classification
Segmentation in bright-field microscopy can be challenging due to low contrasts, out-of-focus blur and differences in staining across different samples for example.To illustrate that Svetlana overcomes these issues, we solved a cell classification problem used in medium throughput screening on osteoclasts.Osteoclasts are bone cells that go through different stages of differentiation which gives rise to a wide diversity of cell morphologies as shown in Fig. 3c.They are the subject of many studies (see Labour et al. 20 for example) as they can be the cause of various diseases when dysfunctional.Indeed, a physiological unbalance between osteoclasts and osteoblasts (osteoclasts antagonistic cells), can lead to various forms of bone disease, such as osteoporosis.That is why it is so important to control their respective populations.In this experiment, patient osteoclasts are isolated and grown in culture.They are then distributed in wells (see Fig. 3a) and subjected to various molecules.www.nature.com/scientificreports/others as they are typically large, with a dark violet color and a large number of nuclei manifested as small dots inside the cell (polynucleate).A few activated osteoclasts are shown in Fig. 3f, while osteoclasts in different states are shown in Fig. 3g.
A well can contain up to 20,000 cells.The classification task is performed by specialists at ABS and includes a manual counting of activated osteoclasts.This expert approach is accurate, but time consuming for high cell density wells and may require the intervention of several experts to multiply the counts, in order to reduce interoperator variability, and guarantee a high accuracy.Automating the classification process allows contract research organizations to save time on their projects, and analyze more data while reproducing their selection criteria.
The automatic classification task is really challenging due to a wide variety of image appearances, see Fig. 3c.These differences can be explained by the biological nature of the sample, but also by the diversity of patients, staining protocols, drug types and concentrations.To address this problem, we first improved the Cyto2 neural network available in CellPose2 21 using its human-in-the-loop feature.This provided satisfactory segmentation results.We then trained our classifier on 5400 patches extracted from diverse images ( ≈ 1 hour of manual labeling).We then turned to the prediction mode and applied the neural network classifier to the whole batch.Figure 3b shows a prediction on a whole well, while Fig. 3d and e show a crop and its automated classification respectively.The prediction on 63 images of size 8000 × 8000 pixels took about 1.5 h, i.e. 1.4 min per image.This has to be compared to the 20 to 45 min taken by a specialist to process a portion of a single image by hand.
The graph in Fig. 3h shows that our classifier reaches coefficient of determination R 2 of 96%, between the values measured manually by ABS experts and those produced by Svetlana classifier.For this application, Svetlana is a powerful ally for medium throughput screening.It makes it possible to quantify the effects of various drugs with a good accuracy and with a remarkably small need for human interactions.

Efficiency
Svetlana requires little computational power, time resources, training data and it uses minimalist CNN architectures.We discuss these features below in more detail.
Power consumption Table 1 summarizes the computational cost for three different training tasks.Computing the energy footprint is a complex task 22 and the numbers below are indicative.It is computed using the thermal design power (TDP) given by the manufacturer.On a regular GPU card, we see that the cost is proportional to the training time and not higher than working on a laptop for a few hours.
• CPU is a 1.9 GHz Intel Xeon in a desktop (TDP: 165W).

Towards small data?
Training a neural network classifier with just a few annotations (10-1000) in a few seconds goes against conventional wisdom.Indeed, complex architectures are usually trained for days or weeks with huge datasets.For instance, the data science bowl used to train the CellPose models contains 37,333 segmented nuclei in 841 2D images from more than 30 different imaging modalities.It is therefore legitimate to question whether the training phase could yield a good classifier.In practice, it turns out that this approach provides remarkable results with an accuracy sometimes higher than 90% for complex tasks.This may not be on par with the best possible results, but still sufficient for many quantitative analyses in biology.
Let us mention that a few recent works point out that training with a minimal amount of data and overparameterized networks is a rich research avenue 23 .This is exactly the setting explored in Svetlana.As far as we know, complete theoretical explanations are still lacking.One possible way to interpret this is Ockham razor's principle.The neural network architecture together with the training algorithm favor the "simplest" answer capable of explaining the observations.In our experiments, elementary networks tend to perform better than more complicated ones with many parameters.This looks quite natural, since a neural network with few parameters limits the expressiveness of the classifier and acts as a regularizer for the problem.
When high accuracy is critical, it is very likely that neural networks trained on large datasets would perform better.Unfortunately, such datasets are usually just not accessible.Each biology laboratory explores different tissues, at a different scale with a different modality and focus.Each collected image and labeling can be costly both in terms of money, know-how and time.Hence, Svetlana covers a crucial need which seems yet unmet.Lightweight models Choosing the best neural network architecture is a time consuming art, not accessible to non-experienced users.A method that would provide completely different scores depending on the architecture is therefore not acceptable.Hopefully, it turns out that all the minimalist architectures we designed provide fairly similar classification scores.To illustrate this fact, we studied the influence of the architecture in terms of the depth and width parameter for the 3D experiment on the developing quail embryo.Table 2 displays the accuracy scores for every architecture.As can be seen here, all of them provide a fairly similar accuracy.Svetlana makes it easy to test a few different architectures, and visually compare the results.Overall, this experiment reveals that working with a single architecture (e.g.depth 3, width 32) provides consistently good classification results.

CNN or random forests?
To the best of our knowledge, the software package offering the closest functionality to Svetlana is Ilastik 2 , and especially its object classification module.It is an excellent and powerful tool, which is widely adopted in biology laboratories.Despite obvious similarities, the backbones of the two applications are fundamentally different: Ilastik's classifier is currently based on random forests 24 while Svetlana's relies on neural networks.
Random forests require a pre-defined set of features, such as geometrical properties of the segmentation mask (volume, diameter, location, ...), or intensity properties inside and outside the segmented zone (mean intensity, variance, quantile, ...).These features are then assembled to construct a set of random decision trees.This process can be achieved through many different randomized techniques 25 .The final decision taken by the random forest classifier is based on a majority vote using the output of each tree, see Fig. 4a.
Deep neural networks and especially the convolutional neural networks used in this paper (see Fig. 4b) work very differently.One of their main difference is that they are able to infer automatically the features, instead of having a fixed set of pre-defined image characteristics.The features (here, convolution filters and biases) are learned automatically from the labeled data using stochastic gradient descents.
Both approaches have their pros and cons.Random forests are typically easier to interpret than neural networks since their features have a clear meaning right from the start.If the features are discriminant, they can perform extremely well with few training examples and little risk of over-fitting, see Fig. 4c and d.
These assets can also turn to disadvantages for more complex tasks.It is indeed possible that no feature enables discriminating different categories, see Fig. 4e and f.On their side, neural networks are able to learn the features automatically and therefore perform more diverse and complex tasks than random forests.Overall neural networks tend to perform better, see for instance this compa rison.However, this performance may require more training data, especially for large networks.This problem can be mitigated by choosing minimalist architectures when little training data is available or when the classification task is simple.
Finally, we illustrate a problem of confounding variable in Fig. 4g.In this experiment, we attempt to distinguish neural tube nuclei from somites nuclei in a developing quail embryo 9 .In Fig. 4g, the random forest classifier learned to distinguish the neural tube nuclei using their position in space rather than morphological or radiometric features.Hence, if the image is rotated, the classification becomes catastrophic.Similar issues may actually happen with convolutional neural networks.This is why the data augmentation mecanisms easily accessible in Svetlana are particularly important.In Fig. 4h, we added random rotations during the training phase to obtain a near rotation invariant classifier.Table 2. Classification accuracy for the quail embryo classification problem, depending on the neural network architecture.In this experiment, we varied the depth and width of the proposed custom 3D convolutional network.All the models have been trained for 600 epochs.The columns acc. 1 and acc. 2 represent the classification accuracy for the neural tube and somites nuclei respectively.They are sorted from the most to the least accurate.As can be seen, the classification rate depends only mildly on the architecture defined through its width and depth.www.nature.com/scientificreports/

Results interpretability
Aside from the classification results, another output of Svetlana is a trained neural network.Interpreting this algorithm, i.e., understanding how the decision process was made is a complicated task with many open research avenues.Answering recurrent questions such as: "What makes these populations different?Is it a difference of intensity, volume, ellipticity, or other things I may have missed?" is not directly accessible.Without further analysis, the neural network can therefore be considered as a black-box model, which is unquestionably a limitation of the approach.However, it is important to note that many alternative artificial intelligence models (including linear models or random forests) are often not easier to interpret 26 .In addition, the sole fact to know the existence of discriminating features significantly eases their determination.That is why the development of interpretation tools has been an active area of research for some years now 27 .Algorithms Grad-CAM 12 or guided Grad-CAM 28 give the user an insight into the areas of the image that are www.nature.com/scientificreports/prevalent in the classifier decision.We chose to integrate them in the prediction module of Svetlana, based on their popularity.New routines might be added in the future.
In addition to Grad-CAM, Svetlana offers the possibility to explore simple hypotheses by using data augmentation.Assessing whether the cell volume plays a role can be determined by adding dilations or not at the training time, and comparing the classification scores.The same experiment can be conducted using different types of intensity normalization, color changes,... to assess which features are discriminant.This is a simple approach to post hoc interpretation.

Conclusion
We presented the main features and explained the functioning of a new plugin called Svetlana for the classification of segmentation results within the Napari environment.We showed through a few applications that Svetlana is a handful tool for challenging cell classification tasks.Neural networks are at the core of the program.To the best of our knowledge, Svetlana is the first open-source plugin offering such a large variety of learning mecanisms and tackles problems that would not be solvable with more elementary artificial intelligence tools such as random forests.Svetlana is open-source with a modular architecture allowing to integrate further features, following the most promising technological progress and user feedback.Through this paper, we wish to advertise this tool which should prove valuable to the biomedical community and beyond.

Figure 1 .
Figure 1.A schematic overview of the Svetlana plugin.(a) A screenshot of the plugin in action.In this example, Svetlana is able to separate the mesorderm (in green) and the neural tube (in red) nuclei of a quail embryo 9 with a high accuracy after just a few clicks.(b) Overview of the Svetlana's three-step pipeline: Annotation, Training and Prediction.Given pairs of images and segmentation masks, the user labels a few connected components.This set is then used to train a neural network classifier.Once trained, it can be used to classify one or multiple segmented images.(c) Svetlana offers many neural network architectures with increasing complexity.The minimalist architectures can be trained faster and are usually enough to lead to high classification accuracies.(d) The training can be enriched with a large variety of image augmentation techniques available in Albumentations 10 .(e) Online resources are available to assist the user.(f) All experiments can be fine tuned and reproduced using a JSON configuration file.

Figure 2 .
Figure 2. A histopathology images classification problem.(a) A subset of 6 images taken from PanNuke dataset.(b) and (c) Example of 137 × 137 pixels training patches showing neoplastic and inflammatory cells respectively.The nucleus of interest has been circled in green for visualization.We can see that the two classes are difficult to distinguish visually for a non-expert.(d) Classification performance by training the model on increasingly large 2-class (neoplastic/inflammatory cells) small datasets.(e) Classification performance on a 5-class problem training with a large dataset.

Figure 3 .
Figure 3.The osteoclasts classification-(a,b) An example of an entire 8000 × 8000 pixels image and a classification result using Svetlana.The green spots correspond to activated osteoclasts.The image was provided by .Atlantic Bone Screen company (ABS) (c) 2000 × 2000 pixels crops of four different images illustrating the diversity of the dataset.(d,e) 750 × 750 pixels crop and its classification result after training a neural network with 600 annotations (out of 16,671).(f) Example of activated osteoclats.(g) Example of non-activated osteoclasts.(h)This graph shows the excellent correlation (96% coefficient of determination) between Svetlana's counting and a human counting.Various conditions were tested.For the red dots, no drug was applied.For the green diamonds, drugs inducing cell proliferation were tested.For the purple triangles, drugs inducing cell death were introduced.

Figure 4 .
Figure 4. Comparing random forests and neural networks-(a) The principle of random forest classifiers: a few (10-1000) random decision trees are constructed and a majority voting allows making a prediction.(b) A convolutional neural network classifier makes a prediction by a sequence of convolutions, nonlinear activation functions and pooling.(c-h) Various classification tasks performed by each classifier.In order to facilitate the visualization, only small portions of the images are displayed.(c,d) Synthetic cell classification with differences of average gray level.Both Ilastik and a simple neural network yields 100% accuracy with as little as 10 labels.(e-f) Anisotropic texture classification.The two textures possess the same mean and variance, but different orientations.The random forest yields unsatisfactory results (64%) whatever the number of labels.The neural network yields 100% accuracy with only 45 labels.The reason for the failure of the random forest is that no pre-defined feature allows discriminating the texture orientations.On their side, neural networks are able to learn the right features with just a few annotations.(g-h) 2D slice of a two-photon microscope image of a quail embryo 9 (courtesy of B. Bénazéraf).It shows a neural tube surrounded by somites.(g) From left to right: the slice containing 854 nuclei-the classification result using 35 labels.Thanks to the use of spatialization parameters, Ilastik provides excellent results for this task.However, if the classifier is applied to the rotated image, it yields unsatisfactory results since the spatialization changed.If the spatialization features are removed to construct the decision trees, Ilastik fails to classify the cells.(h) Svetlana result with 169 labels: it leads to a few mis-classifications on the original image.Nevertheless, it remains effective when the image is rotated thanks to the use of data augmentation during the training.This experiment overall shows the advantages of learning the classification features and to use data augmentation to add desirable properties such as rotation invariance.

Table 1 .
Computation time and energy consumption in watt-hour (Wh) or kilowatt-hour (KWh) for different hardware configurations.In addition, if general purpose classifiers appears in the future, they could be easily integrated and retrained within Svetlana.