ANMAF: an automated neuronal morphology analysis framework using convolutional neural networks

Measurement of neuronal size is challenging due to their complex histology. Current practice includes manual or pseudo-manual measurement of somatic areas, which is labor-intensive and prone to human biases and intra-/inter-observer variances. We developed a novel high-throughput neuronal morphology analysis framework (ANMAF), using convolutional neural networks (CNN) to automatically contour the somatic area of fluorescent neurons in acute brain slices. Our results demonstrate considerable agreements between human annotators and ANMAF on detection, segmentation, and the area of somatic regions in neurons expressing a genetically encoded fluorophore. However, in contrast to humans, who exhibited significant variability in repeated measurements, ANMAF produced consistent neuronal contours. ANMAF was generalizable across different imaging protocols and trainable even with a small number of humanly labeled neurons. Our framework can facilitate more rigorous and quantitative studies of neuronal morphology by enabling the segmentation of many fluorescent neurons in thick brain slices in a standardized manner.


Results
The ANMAF framework. As in other deep learning methods, Mask R-CNN demands many manually annotated data samples for training. Existing neuronal morphology data sets from prior studies are unsuitable for training Mask R-CNN. They have an insufficient number of images, and incomplete labeling of neurons as human annotators only labeled a small subset of neurons in each image. To address this issue, we implemented within ANMAF an automated algorithm to generate synthetic cell images from partially annotated two-photon microscopy images of the YFP channel of Clomeleon expressing neurons from acute brain slices. We first sampled empty regions in two-photon microscopy images to create background tiles (Fig. 1a). Next, we constructed a library of synthetic backgrounds by placing the background tiles at random locations (Fig. 1b). The edges of a tile were blended into its neighboring tiles to avoid unnatural contrasts. Meanwhile, somatic areas were cropped along their contours from the original microscopic images, wherever manual annotations were available (Fig. 1c). These cropped areas were then randomly transformed, pasted, and blended in at arbitrary locations of randomly selected synthetic backgrounds (Fig. 1d). Afterward, ANMAF was trained on these synthetically generated images. Model parameters were initialized by transferring values from a pre-trained Mask R-CNN on a generic object detection data set in computer vision, called the Common Objects in Context (COCO) 17 . With www.nature.com/scientificreports/ these generic parameters as a starting point, we employed the optimizer gradient descent with momentum 18 to search for a specific parameter configuration that minimizes the discrepancy between human-labeled contours and the automatically detected contours in the synthetic cell images. Once trained, ANMAF was applied on real cell images to obtain neuronal somatic region annotations (Fig. 1e).
Validation. After training, we tested the performance of ANMAF by comparing its detection results with those of human annotators on real cell images. We evaluated the agreement between ANMAF and human annotators in terms of yield rate (percentage of manually annotated neuronal somatic regions detected by ANMAF), conformity of contours, and the agreement in area measurements. To determine the yield rate, we searched for the most overlapped somatic region detected by ANMAF for each manually annotated somatic region using the Sørensen-Dice coefficient (DICE) 19 . A match occurred if the highest DICE was greater than 0.5. Out of 965 manually identified neurons, ANMAF detected 926 neurons (96% yield rate). Among these 926 neurons, the average conformity between human and model-generated annotations was 0.88, measured using DICE. We also compared the area measurements between the human annotators and ANMAF by applying the shoelace algorithm 20 Fig. 2b), implying that the areas measured by ANMAF were slightly smaller than those measured by human annotators. One primary source of difference originated from the human annotators' variability when the soma ends and when the dendritic/axonal processes start (Fig. 2d). Also, there was a weak positive correlation for ANMAF to underestimate the somatic area in dimmer neurons and overestimate the area in brighter neurons (Pearson r = 0.326, CI [0.268, 0.383], R 2 = 0.11, p < 0.0001; Fig. 2c,d). This tendency can be understood  www.nature.com/scientificreports/ since human annotators were allowed to manipulate image intensity and contrast proactively during contouring and to examine adjacent slices to obtain contextual information. In contrast, ANMAF had no flexible control over the image properties nor access to adjacent slices when contouring somatic regions on a single slice. In the course of the study, we also assessed the performance variation of ANMAF. We trained ten models on synthetic images generated using different random seeds. Weights were re-initialized at the beginning of the training process for each model and the training images were shuffled after each epoch. Overall, these ten models had a stable performance across all real images. The average yield rate was 96.06% ± 0.37% (SD). The conformity of manual contours and model-generated contours was on average 0.8882 ± 0.0056 (SD), and the correlation between manual and model-generated area measurements was on average 0.9098 ± 0.0041 (SD).  www.nature.com/scientificreports/ Across all real images, human annotators only labeled a subset of visually prominent somatic regions (965) due to the labor-intensive process of manual labeling (Fig. 3a). However, ANMAF detected 3808 regions of interest (ROIs). To further determine the performance of ANMAF, two human annotators (H1 and H2) had the task of relabeling as many somatic areas as possible in a set of ten randomly selected real images. Obtaining a comprehensive 'gold standard' from multiple human annotations is costly, time-consuming, and challenging. In neurons that overlap or change their contrast or fluorescent expression, it is near impossible. Despite the instructions to find all somatic regions as thoroughly as possible, the two annotators could not find them all, nor they had a perfect agreement. Even with flexible control over image intensity and contrast during contouring, the human annotators could not label some overlapping neurons, edge neurons, and dimmed neurons as the neurons were either ambiguous to the human labelers, they had an incomplete cell body boundary, or the human annotators were not confident in marking the correct edge (Fig. 3b). Due to higher labeling confidence, H1 labeled 315 somatic regions, but H2 only annotated 213 areas. There were 206 neurons in common, marked by both H1 and H2.
The two human annotators collectively contoured 322 unique somatic regions, and ANMAF segmented 353 somatic ROIs. Although ANMAF detected a few more ROIs, it did not find all possible areas (Fig. 3b,c). Out of the 322 manually labeled regions, 282 were also detected by ANMAF (Fig. 3c). We manually checked the rest of the 71 ROIs segmented by ANMAF and found that 63 were cell bodies with unclear visual boundaries (e.g., dimmed neurons, overlapping neurons, or edge neurons). Overall, 345 of the 353 model-detected ROIs were indeed cell bodies (true positives, precision 0.98; Fig. 3c). Among all areas labeled, there were 199 regions segmented by all three annotators (H1, H2, and ANMAF). We computed the consensus of human annotators for the 199 areas using the STAPLE algorithm 21 . We then compared ANMAF, H1, and H2 against the human agreement using the DICE coefficient (Fig. 3d). H2 tends to be more conservative than H1 as most of the regions contoured by H2 were contained in the areas annotated by H1. Therefore, the consensus produced by the STAPLE algorithm is very similar to H2's annotations. Compared with the consensus, ANMAF had its performance centered between the two human labelers H1 and H2 (Fig. 3d). The absolute average difference between H1 and H2 measured area was 55.54 ± 27.29 μm 2 , while the difference between ANMAF and the consensus area was 18.81 ± 48.24 μm 2 . Thus, on average ANMAF had a smaller area difference against a consensus measurement compared to the area discrepancy between human annotators.
Variability and reproducibility during repeated measurements. We next compared the variability and reproducibility of a given ANMAF model with human annotators in repeated measurements of the same set of somatic regions. We randomly selected twenty different neurons. Two human annotators with prior experience in contouring somatic areas were tasked to label repeatedly the twenty neurons each day for five consecutive days. The annotators did not have access to their previous annotations, and the experimental images were accessed in a fixed order. Similarly, ANMAF detected and segmented the same 20 cells' somatic regions in five independent trials. Our results show that the human annotators exhibited a significant amount of variability in area measurements compared to ANMAF, which showed none during repeated measurements (p < 0.0001, Friedman test, significance between both Human_1 and Human_2 compared to the model, p < 0.0001 and = 0.0004 respectively, Dunn's post-hoc test, n = 20; Fig. 4a). ANMAF also showed the highest reproducibility in terms of the geometry of contours, measured by the overlap of the five repeated annotations (see "Methods" section). ANMAF produced the same outlines each time for the same neuron, which was significantly different from both human annotators (p < 0.0001, Friedman test, significant differences between all groups, Dunn's post-hoc test; Fig. 4b,c). The lack of variance in ANMAF detection is critical for experimentalist. It shows that if given a same neuron, ANMAF will produce the same result without variance, compared to human annotators. Thus, ANMAF's consistency in repeated measurements of the same neuron allows increased rigor compared to human labelers.
Generalizability. Cell images produced by individual laboratories can differ in fluorescence intensity, background level, cell size, and resolution. Next, we tested how generalizable ANMAF was to various imaging devices, sequences, and protocols. We used two different data sets, namely, D1 and D2, containing Clomeleon (YFP channel) expressing neurons from acute brain slices acquired using two different microscopes and times (see "Methods" section). Using these data sets, two identical models, namely M1 and M2, were trained. We trained M1 on synthetic images generated solely from neuronal somatic regions in D1, whereas M2 was trained on synthetic images produced using somatic regions from D2. Model parameters, imaging preprocessing methods, and other settings were the same for both models. We then tested both models using real images in D1 and D2. M1 and M2 detected a similar number of manually annotated neurons in the real images of D1 and D2 (Table 1). While the differences in the average conformity of the contours were statistically significant in both D1 (mean difference − 0.048, CI [− 0.056, − 0.041]; p < 0.0001, Mann-Whitney test; Fig. 5a For example, neuronal regions in D1 were 237 μm 2 on average, and a DICE coefficient difference of 0.02 corresponded to just a 6 μm 2 variation in the area. Overall, M1 and M2 had a similar performance in D1 and D2. These results demonstrated that the training of ANMAF with one imaging protocol can be generalizable to an unseen data set from a different image protocol. Minimum supervision. Finally, we evaluated the minimal number of human-labeled neurons that are necessary for training ANMAF. Six identical models were trained on six different synthetic data sets that were generated by using 653 (100%), 490 (75%), 327 (50%), 164 (25%), 66 (10%), and 33 (5%) human-annotated neurons, respectively. Training parameters and preprocessing methods remained the same for the six models. All models    Fig. 5b, Table 2). However, the corresponding differences in area measurements were not discernible. During training, these six models' performance did not improve by increasing the number of training epochs (Fig. 5c). Instead, there was a tendency towards overfitting for models using less than 25% labeled neurons (Fig. 5d). These

Discussion
Here, we developed a novel high-throughput Mask R-CNN-based framework, named ANMAF, to automatically contour neuronal somatic areas of fluorescent protein-expressing neurons in acute brain slices. ANMAF detected neuronal somatic regions with a performance comparable to human annotators with high precision and consistency. Also, ANMAF trained with a data set from one imaging protocol was generalizable to an unseen data set from a different protocol and was trainable with as low as 33 (5%) human-labeled neurons.
The main contribution of this study is as follows. First, we proposed a novel and generic framework, enabling the application of CNNs to neuronal morphology analysis even under situations where there is an insufficient number of training samples or there is incomplete human annotation for every neuron in each image. Second, we conducted a quantitative study to determine ANMAF's efficacy, reliability, and generalizability. We found that ANMAF accurately detects a larger number of neurons than humans in a fraction of time, has high reliability showing a minimal variance when presented with the same neuron, and is generalizable to different imaging protocols.
Prior automated techniques to detect somatic bodies have used thin fixed tissue slices with tagged or injected fluorophores and confocal microscopy 7,22,23 or transmitted electron microscopes 12 . Other developed algorithms have been evaluated on neuronal cultures or thin brain tissue using few neurons 8,9 . However, thick brain slices (350 microns or more) commonly used in physiological studies have significant challenges, including overlaps among neurons across different depth planes, changes in fluorescence intensity throughout the thickness of a brain slice, low contrast between cell bodies, and the background, among others. While CNNs have been used to detect neurons from thick brain slices, they have required extensive training sets 24 , semi-supervised regularization 10 , or a hybrid architecture based on Morse theory and deep net architectures 11 . Our ANMAF has the advantage of being trainable with a minimal number of human-labeled neurons, using a simplified architecture, and producing reliable area measurements when using thick acute brain slices.
Our proposed framework has three small limitations to be addressed in the future. First, we observed that ANMAF had a slight tendency to underestimate darker neurons' somatic area and overestimate brighter neurons than manual measurements. Introducing an image normalization technique to the preprocessing stage could help mitigate this issue. Second, our current framework lacks the encoding of rotation and flip invariance into CNN. Adding mechanisms for handling these transformations can help produce more robust contours of somatic regions against perturbations. Third, when contouring a brain Z-section, human annotators were able to examine its adjacent slices. However, the current framework does not take the contextual information across slices into consideration. Also, the neuronal volume may be a more accurate estimate of somatic size compared to the area. Therefore, modifying the current framework to address these limitations is the next step.
In conclusion, we present a valid, reliable, and generalizable Mask R-CNN framework (ANMAF) to automatically contour fluorescent neuronal somatic areas in acute brain slices. Our framework will significantly contribute to the neuroscience field by detecting and labeling many neurons in an unbiased way leading to a more rigorous and quantitative study of neuronal morphology. The model will permit quantitative measurement of the changes in neuronal area during different physiological and pathological insults using many neurons without the cost of manual annotation, which may lead to selection bias, inter and intra-labeler variance, and limited and incomplete labeling. To facilitate future research, we implemented ANMAF to produce contours in an ImageJ compatible format and release the source code free for research purposes (https:// github. com/ steph enbaek/ ANMAF).

Methods
Animals and brain slice preparation. Postnatal CLM-1 Clomeleon mice (P7-P60; C57bl/6 background, a generous gift of Kevin J. Staley and also obtained from JaxLab B6.Cg-Tg(Thy1-Clomeleon)1Gjau/J 15 ) of both sexes were anesthetized using inhaled isoflurane and decapitated per protocol, approved by the Institutional Animal Care and Use Committee of the University of Iowa and Massachusetts General Hospital Center for Comparative Medicine. All methods and experiments were performed in accordance with the relevant guidelines and regulations including ARRIVE (https:// arriv eguid elines. org). We used 4 mice for D1 dataset and 3 mice for D2 dataset. An additional mouse was used for variability comparison between human and model. The brain was removed and placed in ice-cold artificial cerebrospinal fluid (aCSF) containing (in mM) NaCl (120), KCl (3.3), CaCl 2 (1.3), MgCl 2 (2), NaH 2 PO 4 (1.25), NaHCO 3 (25), and D-glucose (10) with pH 7.3-7.4 when bubbled with 95% O 2 and 5% CO 2 . Coronal brain slices, 350-450 μm thick, were cut using a vibratome (Leica VT1000S) while submerged in aCSF containing 2 mM kynurenic acid. The brain slices were placed in an interface holding www.nature.com/scientificreports/ chamber containing aCSF at room temperature for 30 min. Next, the temperature was slowly increased and held at 30 °C after that. Slices were stored for one hour minimum before being transferred to the recording chamber.
Imaging. We imaged neurons on two different two-photon microscopes: (A) a Fluoview 1000MPE two-photon microscope with pre-chirp optics and a fast acoustic-optical modulator mounted on an Olympus BX61WI upright microscope body with a 25× water immersion objective (NA 1.05) (dataset D1) with two photomultiplier tubes (Hamamatsu Photonics). (B) a Bruker Ultima In Vitro galvo-resonant system mounted on an Olympus BX51WIF upright microscope body with a 20× water immersion objective (NA 1.00) with two Hamamatsu Photonics GaAsP end-on photomultiplier tubes (dataset D2). A Ti: sapphire mode-locked laser (DeepSee Mai Tai for the Fluowiew or a Mai Tai HP DS for the Bruker Ultima; Spectra-Physics) generated two-photon fluorescence with 860 nm excitation. Emitted light was bandpass filtered through two emission filters: 460-500 nm for cyan fluorescence protein (CFP) and 520-560 nm for yellow fluorescence protein (YFP). For our experiments, we exclusively used the YFP channel as we are interested in somatic area measurements. Slices were perfused with aCSF, held at 32-34 °C, and aerated with 95% O 2 -5% CO 2 . Three-dimensional stacks (3D) of raster scans in the XY plane with a resolution of 512 × 512 pixels were imaged at a z-axis interval of 2 μm.

Manual detection and measurements of neuronal cell bodies.
Measurements of the neuronal area of 3D stacks were performed using ImageJ (National Institutes of Health) as described previously 4,16 . Briefly, we loaded YFP z-stack images and subtracted the corresponding background level from the entire 3D stack. Next, 3D planes were median filtered. Maximum intensity Z-projections were created every 10 μm for neuronal area determination. This projection allowed us to have, in a single plane, the maximal area projection for each neuron. A region of interest (ROI) was drawn manually around the cell bodies based on an automatic neuronal edge detection plugin in ImageJ ('Canny Edge Detector' plugin).

Automatic detection and measurements of neuronal cell bodies. Synthetic images generation
and preprocessing. We constructed a library of background tiles by randomly sampling empty (backgroundonly) regions from the original microscopy images. A total of 50 tiles were collected in this study. The sampled background tiles were then randomly sampled and placed in a grid to form a synthetic background image. We placed the tiles with small overlaps and blended them via linear pixel value interpolation for a smooth, seamless transition between tiles. A total of 500 synthetic backgrounds were generated as such (512 × 512 pixels). These artificial backgrounds were augmented by randomly changing the average brightness of the pixel values and by adding Gaussian random noise to increase diversity. Onto such synthetic backgrounds, somatic images were then digitally transplanted by copying image pixel values within somatic regions from real microscopic images to synthetic background images. Although a somatic region could be pasted at arbitrary locations of randomly selected artificial backgrounds, the algorithm avoided placing it at locations that already contained other areas.
During the transplantation, one pixel of padding was applied from the manually annotated cell boundaries so that the padded regions could be blended into the background, similarly to how tiles were blended. The periphery of the cells was smoothly blended into the background by varying the transparency (image alpha), in which the transparency was inversely proportional to the distance from the cell boundaries: pixels near the cell boundary were opaque while the ones away from the edge were entirely transparent. This was to implement a more natural-looking boundary between the transplanted cells and synthetic backgrounds. During the transplantation, the diversity in training images was achieved by transforming the cell regions being transplanted, with random rotations of 90 degrees, left-right flips, bottom-up flips, arbitrary resizing between 0.8 and 1.2 scale factors, and random cell brightness between 0.95 and 1.3.
Mask R-CNN framework. Before training the Mask R-CNN model, images were preprocessed by applying the contrast limited adaptive histogram equalization 25 algorithm to normalize the cell brightness variations across different image sub-regions. The original grayscale images were resized to 1024 × 1024 pixels and duplicated into three red-green-blue (RGB) channels to make them compatible with the Mask R-CNN model pre-trained on a generic computer vision object segmentation data set, called Common Objects in Context (COCO) data set 17 . In the Mask R-CNN implementation, we used the residual network 26 with 101 layers (ResNet-101) as a backbone, which generated a rich set of preliminary feature maps from a microscopic image. The feature maps generated by RestNet-101 contained numerical encodings of various geometric characteristics at different locations of the input cell image. These pieces of information were then processed in the second step by another block of the CNN module, called the region proposal network (RPN), in which bounding boxes around candidate soma instances were detected. The RPN made many highly sensitive yet less specific (many false positives) proposals on each region in the cell image where soma may be present. These region proposals, which varied in shape, were then warped and resampled to fixed-size square-shaped image patches in the third step. At the last stage, these resampled patches were fed into three additional branches of CNNs: the first branch of CNN determined whether the region proposal contained a soma or not to eliminate false-positives; the second branch of CNN refined the bounding boxes proposed by the RPN to fit the soma region tighter and more precise; lastly, the third branch generated a pixel-wise binary segmentation mask indicating which pixel contains an image of the soma (Fig. 6).
Training of mask R-CNN. During training, a multi-task loss was defined on each resampled patch as: www.nature.com/scientificreports/ The first task loss L class is the binary cross-entropy loss for classification on whether a region proposal contains a soma or not. The second task loss L bounding_box is the smooth-L1 loss for measuring how well a bounding box proposed by the RPN fits the actual soma region. The third task L mask is the average binary cross-entropy loss over each pixel in a pixel-wise binary segmentation mask of a proposed soma region. α , β, and γ are weights on these three tasks loss. Our mask R-CNN models were trained through backpropagation using a stochastic gradient descent optimizer with a 0.001 learning rate, 0.9 learning momentum, and 0.0001 weight decay. We used a mini-batch size of 4 with 500 training steps per epoch for 30 epochs. Three thousand synthetic images with different configurations were generated for tuning each model's weights, and a validation set of 1000 synthetic images was used for monitoring the training progress. We ensured that we used disjoint sets of somatic areas to produce synthetic data sets for training and validation. Each model was trained on a machine equipped with 56 cores, 512 GB memory, and one Tesla P100-PCIE-16 GB GPU for around 6 h.
Evaluation metrics. The DICE coefficient between two segmentation masks (created based on contours) was defined as: where |X| and |Y | are the number of pixels belonging to each mask. The DICE coefficient equals to twice the number of pixels common to both masks divided by the sum of the number of pixels belonging to each mask. A value of 1 indicates identical segmentations, while 0 means no pixels in common.
The shoelace algorithm, applied on the contour of a somatic region to compute its area, was defined as: where x i , y i , i = 1,2, . . . , n are the ordered vertices of the polygon (contour). As there were multiple measurements for every neuron from each annotator in repeated measurements, we quantified the reproducibility of a human annotator and ANMAF in any one neuron using the neuron's average overlapping ratio, which was defined as: where ( 5 i=1 X i ) is the number of pixels common to five segmentation masks (created based on five contours) of one somatic region and |X i | is the number of pixels belonging to ith segmentation mask.

Statistics.
We tested the Gaussian distribution of data with the Shapiro-Wilk and Kolmogorov-Smirnov tests. Paired t-tests were used for parametric comparisons, and the Mann-Whitney test was used for unpaired non-parametric data. For non-parametric analysis, a Friedman test was performed with Dunn's multiple comparison test for post-hoc analysis. Estimation statistics were performed and used to compute 95% confidence DICE(X, Y ) = 2|X ∩ Y| (|X| + |Y|) , x i+1 y i − x 1 y n , AvgOverlapping(X 1 , · · · X 5 ) = 1 5 Figure 6. Mask R-CNN architecture. The architecture took a cell image as an input, which was then processed by a convolutional neural network to generate feature maps. Each pixel of the feature maps represented an artificial neuron's activation covering a localized receptive field at the corresponding spatial location. The feature maps were then passed to the region proposal network (RPN), which processed candidate regions (red boxes) where soma might exist. This made the search process more efficient compared to an exhaustive search. The region proposals were then resampled to square-shaped image patches to make them compatible with the later processes. After resampling, three different artificial neural networks were applied to square patches: one for determining whether the region proposal contains a soma or not; another for refining the bounding boxes to fit the soma more tightly; and the other for generating segmentation masks of somas. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.