Probing spermiogenesis: a digital strategy for mouse acrosome classification

Classification of morphological features in biological samples is usually performed by a trained eye but the increasing amount of available digital images calls for semi-automatic classification techniques. Here we explore this possibility in the context of acrosome morphological analysis during spermiogenesis. Our method combines feature extraction from three dimensional reconstruction of confocal images with principal component analysis and machine learning. The method could be particularly useful in cases where the amount of data does not allow for a direct inspection by trained eye.

for possible prognostic/therapeutic strategies. The morphological analysis of spermatozoa is usually performed by a trained eye, but due to the increasing amount of digital images stored, it is becoming important to develop automatic techniques of classification and diagnosis. In this respect, there is still a pressing need to develop reliable automated method for cell morphology assessment. While objective tools for sperm motility assessment exist 19 , current automatic methods for sperm morphology are still not accurate and difficult to use 20 . Hence, subjective morphology sperm cell assessment is the standard in laboratories but results in large variability in the outcome. Machine learning-based intelligent systems could play a pivotal role to reach this goal. The method starts from an input feature matrix, including characteristic values of designated positive and negative samples, and self-trains the prediction models by learning the patterns in the feature matrix. The final goal is then to be able to automatically classify a data set with unknown labels.
In this paper, we present a machine learning approach to classify in a quantitative and semi-automatic way important morphometric characteristics of mammalian acrosomes during spermatogenesis. We start by a three-dimensional digital reconstruction of confocal images of acrosomes from which we extract a discretized mesh representing the surface of each acrosome. We then compute a series of morphological parameters such as volume, surface and local curvatures. These morphological parameters represent the features that will then be analyzed through machine learning and principal component analysis. We illustrate the method by analyzing acrosomes from spermatides and spermatozoa, obtained from seminiferous tubules of young mice, which are known to have different shapes. The ground truth is established by direct classification by eye and the results compared with automatic methods based on machine learning.

Results and Discussion
Here we develop a new method combining computational science, quantitative biology and machine learning to classify acrosomes, distinguishing spermatides from spermatozoa in a semi-automatic way, obtaining robust quantitative morphological observables. To this end, we carry out a 3D reconstruction of the surface of acrosomes of spermatides and spermatozoa from sexually mature healthy mice maintained in vitro for a few days. Quantifying differences in the fraction of spermatides and spermatozoa could be useful to detect in advance important pathological conditions related to sterility and have impact of ART 17,18 . In order to maximize the number of acrosomes for the analysis, we carried out the 3D reconstruction of the acrosomes in cells extracted from seminiferous tubules and imaged at different times, either immediately (time T0) or maintained in vitro overnight (time T1). An analysis by electron microscopy shows that the overall architecture is preserved between T0 and T1 ( Fig. 1) and we did not record any statistical difference in the quantitative parameters extracted from confocal images.
The detailed procedure for the reconstruction of the acrosomes surfaces, is discussed in the Materials and Methods section. Figure 2 shows two typical examples of meshes obtained by 3D reconstruction of the acrosomes membranes. The analysis of each acrosome yields a set of morphological characteristics (parameters): the acrosome's volume V, its surface area Σ, the sphericity Ψ, the average mean and Gaussian curvatures (M and G , respectively) and their relative fluctuations ( ∆M M and ∆G G , respectively). Averaging these morphological parameters (〈…〉) over the subpopulations of spermatids and spermatozoa gives the values reported in Fig. 3. Moreover we also report on top the p-values from a Kolmogorov-Smirnov test that considers the entire Spermatids and Spermatozoa cumulative distributions.
These data show that the acrosomes in spermatozoa are, in average, nearly 50% larger than those in spermatids and similar differences are recorded for the surfaces. This is not surprising since volume and surface are strongly correlated as illustrated in Fig. 4. In particular, volumes and surface follow the general law 〈Σ〉~〈V〉 2/3 as expected based on simple dimensional considerations.
During spermiogenesis, acrosomes from spermatids are typically more spherical than those capping the spermatozoa nuclei. The spherical shape is probably reminiscent of an early vesicle form. We recover this observation by measuring the sphericity 1 of each acrosome in both populations. By definition, when Ψ = 1 the spherical shape is recovered, while smaller values indicate eccentricity and/or asymmetry of the surface. The mean values reported in Fig. 3 confirm indeed that acrosomes from spermatids tend to be more spherical than those from spermatozoa (see also the reconstructed meshes in Fig. 3). This difference is statistically significant (p = 1.06 × 10 −6 ).
To further characterize the morphology, we have considered surface curvatures. The Gaussian curvature, defined in Eq. 2, is positive for spheres, negative for hyperboloids and zero for planes. Hence, the sign of the Gaussian curvature indicates if a surface is locally convex or saddle-like. We have measured the average Gaussian curvature G per cell, as defined in Eq. 5. The average value 〈G〉 clearly shows that spermatids tend to have a more convex acrosome membrane as compared to spermatozoa (see Fig. 3, p = 1.14 × 10 −2 ). The mean curvature, defined in Eq. 3, is zero for a plane, constant for a sphere and, more generally, it is positive for convex surfaces and negative for concave ones. Fig. 3 shows that, as in the case of Gaussian curvature, acrosomes from spermatids appear more spherical than those from spermatozoa (p = 4.15 × 10 −2 ). In addition to the average values of Gaussian and mean curvature, we also consider their standard deviations which display significant differences between spermatids and spermatozoa (p = 2.19 × 10 −6 for the Gaussian curvature and p = 4.64 × 10 −3 for the mean curvature).
In summary, the quantitative morphological analysis reveals clear, statistically significant differences between spermatids and spermatozoa. These differences, however, arise at the population level and do not necessarily translate into a successful automated classification at the individual cell level. This is clear observing the plots in Fig. 4, where we report the bivariate relations and distribution for five morphological features. Notice that while these features all give rise to significant differences in the average parameters (Fig. 3), there is an important overlap in the individual values for spermatids and spermatozoa.
To overcome these problems, we decided to investigate if machine learning and principal component analysis could be useful to provide reliable information at the single cell level and more importantly to build up a predictive semi-quantitative method. Fig. 4 shows that the data display more uniform-like densities in logarithmic space (lower-diagonal panels) rather than in the original linear space ( Supplementary Fig. S1). Hence the SVM classification is performed in logarithmic space. Having more uniform densities over the feature space is desirable for SVM classification, because penalties for misclassification are weighted according to their distance to the decision boundary. Figure 5 shows the projection onto the first two principal components of the dataset, both in linear and logarithmic space. Although certain differences in the distribution of values for spermatids and spermatozoa can be appreciated, clearly these differences are insufficient to define non-overlapping clusters. In other words, the two subpopulations cannot be distinguished by eye in a PCA projection of the 7-feature dataset. This is, indeed, what motivated us to use a SVM in the full 7-dimensional feature space.
Our results are summarized in Table 1. The values of the class accuracy (defined in Eq. 16) show that the SVM classification algorithm gets the correct answer in the 73% of trials (74% of trials for spermatids and in 69% of trials for spermatozoa acrosome, equivalent ROC AUC statistic 0.76). Although an average classification accuracy of 73% would not suffice for a potential automatized acrosome classification method, it is definitely beyond what a random or a constant classifier would achieve, marking the existing of a signal that could potentially be further exploited. In addition, it is interesting to notice the consistency by which cells are correctly classified/misclassified: 71% of all cells are correctly classified on at least 85% of the algorithm runs, i.e. r a = 0.85 = 0.71. If the value of a is raised to 0.99, then this figure drops only to 68%, i.e. r a = 0.99 = 0.68. In other words, there is a large subset of the data that is almost always correctly classified, and smaller subset of the data that is misclassified most of the time. This can be better seen in Fig. 6, where the cell accuracy has been used to color a scatter plot of the data. We have visually inspected the distribution of features, and found that misclassified cells lie in regions of mixed spermatid/ spermatozoa density, while correctly classified ones tend to be on regions of more unequal spermatid/spermatozoa density. Therefore, it appears there is no more obvious information left, and further exploiting classification results to enhance the SVM would result in over-fitting.
The choice of SVM among other classifiers responds to its simplicity and the fact that it handles well class imbalance. In particular, we compared our result with those obtained with a Random Forest (RF) classifier using either class weights or downsampling to correct for class imbalance. In the first case, we obtain a 92% accuracy for spermatids, but only 27% for spermatozoa. In the second case, we achieve 69% accuracy for spermatids and 57% for spermatozoa. Therefore, SVM gives better results than RF, probably due to how class imbalance is handled.
In conclusion, we have proposed a general strategy to classify acrosomes from spermatides and spermatozoa according to their morphological features. The methods starts from a three dimensional reconstruction of the surface of the acrosome from confocal images and extracts a set of morphological parameters from the reconstructed surface. These parameters are then analyzed by machine learning and compared with the ground truth  provided by a direct assessment by eye. The method we propose could be helpful to assist the analysis of spermatozoa during spermiogenesis, especially in presence of large quantities of data where direct classification by eye is not feasible. Future studies along these lines should aim at finding automated tools to distinguish between  Isolation of single cells from testis. Testes were isolated and decapsulated in 0.1 M Phosphate Buffer. The seminiferous tubules were gently placed onto a small cube made of 1,5% agarose and soaked in culture medium for more than 24 h to replace water. The amount of medium was adjusted in order to cover half to four fifth of the height of agarose cubes. Tubules were maintained in incubator at 34 °C, 5% and controlled humidity overnight in the following culture medium: RPMI (Euroclone), 10% Fetal Bovine Serum (FBS) (Euroclone), 2 mM Stable L-glutamine (Euroclone), antibiotic antimycotic solution (A5955, Sigma-Aldrich). Seminiferous tubules were picked up from agarose cubes (Sigma) and fixed in 2% paraformaldehyde dissolved in PBS pH 7.2-7.4 for 10 min. A single fixed tubule was laid down onto a slide, covered with a coverslip and a gentle pressure was applied in order to allow cells to come out from the seminiferous tubule. Slides were then frozen in liquid nitrogen for further analyses. Transmission electron microscopy. The seminiferous tubules were fixed in loco with 2,5% glutaraldehyde (electron microscope grade) in 0,1 M phosphate buffer (PB) pH 7.2 for 3 h at room temperature. The tubules were then mounted between two layers of 1,5% agarose (Sigma) of about 2 mm in height, which was cut into small cubes 2 × 2 × 3 mm in size and postfixed in 2% osmium tetroxide in 0,1 M PB overnight at 4 °C. The samples were  Table 1. Summary of results of the SVM classification: class-averaged accuracy A C (Eq. 16); ratio of cells with classification accuracy equal to or greater than 0.85 and 0.99, r 0,85 , r 0,99 ; and area under the curve for the receiver operating characteristic (ROC AUC). The classification accuracy of each cell is defined as the ratio of times it is correctly classified, over the different runs of the algorithm (see (Eq. 17)). Figure 6. SVM analysis. Left panel: spermatids acrosomes (green dots) and spermatozoa acrosomes (red dots) plotted in the Volume-Sphericity plane. Right panel: same data, colored according to the value of the classification accuracy A c (Eq. 16) obtained with the SVM: spermatids are colored from totally white (0% accuracy) to totally green (100% accuracy), while spermatozoa are colored from totally white (0% accuracy) to totally red (100% accuracy). Notice that a perfect classifier would render both panels identical. The two small images above the colorbar are example confocal images (a red coloring filter was applied to the spermatozoa image for clarity). The small triangular markers in the colorbar mark the class-level accuracy values (see Table 1). dehydrated in a graded ethanol series, and embedded in epoxy resin. Semithin section (1 μm) were stained with toluidine blue in borax and examined by light microscopy. Ultrathin section (70 nm) were cut using a diamond knife on a Reichert Ultracut ultramicrotome, mounted on a Cu/Rh grids (200 mesh), contrasted with uranyl acetate and lead citrate, examined and photographed with a Zeiss 902 transmission electron microscope operating at 80 kV. The exposed films were developed according to common photographic techniques, captured with an Epson V700 Photo scanner with a final resolution of 600dpi and appropriately calibrated for contrast and brightness (see Fig. 1).
3D acrosome reconstruction by immunofluorescence images of sp56. A 3D reconstruction of the acrosome obtained from confocal images of sperm cells stained with anti-sp56 has been done with ICY software tools (http://icy.bioimageanalysis.org/). Briefly, confocal stacks (at least 80-90 stacks) were first pre-processed to extract the individual cells images. Images were picked in diverse fields of the slide, to consider all the different stages that are present in a single portion of the tubule and not to overestimate the presence of cells in a particular stage of differentiation. A minimum of twenty cells were scored and analyzed for each slide. Two subpopulations in the seminiferous tubules were considered: round spermatids and spermatozoa. The formers represent the early stage of spermatogenesis and are identified by the presence of one or two spots of condensed heterochromatin in a spheroidal nucleus. The latters show a compact chromatin, an acrosome with hooked shape and the presence of the flagellum, according to previous paper 1,7 . Cells were singled out by tracing a region of interest (ROI) around every acrosome in each subpolpulation. Subsequently this ROI has been cropped by using the Fast crop tool. Hence, our analysis could take advantage of single high resolution images, for any acrosome under consideration. The 3D ROI of individual acrosomes were also refined by using the HK-Means plugin (http://icy.bioimageanalysis.org/plugin/HK-Means). This method performs a N-class thresholding based on a K-Means classification of the image histogram. The acrosome membrane reconstruction has been obtained by the segmentation technique implemented in the 3D Active Contour plugin (http://icy.bioimageanalysis.org/plugin/Active_Contours) 21 . The algorithm at the basis of this plugin performs three dimensional segmentation and tracking, using a triangular mesh optimized over the original signal as a target. In Fig. 2 and in the movie M1 (see the Supplementary Informations) the 3D reconstruction of a typical spermatid and a spermatozoa acrosome are displayed. The three dimensional renderings of the meshes (in Fig. 2 and movies (Supplementary Video S1 and Supplementary Video S2) were performed thanks to Paraview (http://www.paraview.org/).
Single cell data analysis. Once each three dimensional acrosome mesh was reconstructted, we proceeded to measure its cell volume (V) and surface area (Σ) using Meshlab tools (http://meshlab.sourceforge.net/). The acrosome sphericity is calculated according to the definition The local Gaussian and Mean curvatures were calculated by a custom python code, which massively makes use of vtk libraries (http://www.vtk.org/). Typical images of a spermatid and spermatozoon acrosome mesh, with superimposed local curvatures (blue-to-red) color maps, are reported in Fig. 2(c,d) and Fig 2(g,h) for Gaussian and mean curvature respectively. We label every node on the single mesh by i (with 1 ≤ i ≤ N), therefore the local mean and Gaussian curvature fields on each node are denoted as M i and G i respectively. The local curvature of a surface entails the notion of principal curvatures, k 1 , k 2 , defined as the smallest and largest one dimensional curvatures on a point. The Gaussian curvature is defined as where the index i runs over the nodes of an acrosome mesh (see Fig. 2(c,g)). The mean curvature instead is defined as the average of the principal curvatures: the second one accounts for as much of the remaining variability as possible, and so on. In this way, projecting the data onto the first few principal components we preserve most of the variability of the data while keeping the number of features low. In this manuscript we use PCA as a visualization technique, to discard the existence of "obvious" clusters in the dataset. By projecting the data onto the two first principal components, we obtain the 2-dimensional scatter plot that better represents the original data, in terms of explained variability.

Support Vector Machine (SVM).
We first give a brief mathematical introduction to the algorithm behind SVM, and then discuss the implementation to our problem. SVM are a set of widely-used machine learning algorithms, highly popular for their simplicity and the fact that they yield good results in many cases. Here we use its simplest version, a SVM with a linear kernel. In essence, the algorithm boils down to finding the hyper-plane h parametrized by → w b , , that better separates the data → x i into the known classes y i ∈ {−1, 1}. In mathematical terms, the problem is cast into an optimization problem with constrains, which is easily solved via Lagrange multipliers. In particular, one needs to find → w b , , ξ i that minimize under the constraints that where ξ i ≥ 0 are auxiliary variables that allow for misclassification (a penalty proportional to the distance to the decision boundary is set for misclassified points), and C sets a global weight for the misclassification penalty. We refer the reader interested in mathematical details to 22 . The hyperplane h is determined using only a subset of the data, called the training set, and then the labels of the rest of the data, called test set, are predicted as follows: ts where y is the predicted label of a point → x ts in the test set. There are many, more involved strategies to split the data into different sets for training and prediction. The interested reader will find good introductory material in ref. 22 and references therein. Our data is given by the seven morphological features of the acrosomes and the acrosome subpopulation to which each cell belongs (Spermatids/Spermatozoa). That is, each cell is represented by a pair ( → x y , i i ) with  → ∈ x i 7 a vector containing its morphological information, and y i ∈ {−1, 1} a subpopulation class label, where −1 encodes for "Spermatid" and 1 for "Spermatozoa". We use the python implementation of Support Vector Machines provided by the machine-learning library scikit-learn (https://scikit-learn.org/stable). In particular, we use the function "sklearn.svm.SVC()". Given the difference in sample size of the two groups (158 spermatids and 51 spermatozoa, see the Materials and Methods), it is important to set the keyword "class_weights" to "balanced", which effectively sets statistical weights in the computation of the error term inversely proportional to the class observed frequencies. We use 10-fold cross validation, which means that, for each run of the algorithm, the data is randomly split into ten groups: nine are used to train the SVM, i.e. to determine the parameters → w b , of the hyperplane, and one is used for prediction. This is repeated ten times, one for each group, so that in the end each datapoint has received one predicted label. Given the stochasticity in splitting the data, we average results over N r = 1000 runs of the algorithm. Increasing N r does not improve the results.
In summary, for each run of the algorithm, the output is a predicted label "Spermatid" or "Spermatozoa" for each of the 209 acrosomes, which we then compare with the ground truth. If the predicted label corresponds to the true nature of the acrosome, we assign a binary value 1, otherwise we assign 0 if it is misclassified. Thus, we obtain a binary matrix B ij of size 209 × 1000 where each row represents a cell and each column a run of the algorithm.
We define the cell accuracy a i as the ratio of the times a specific cell i was correctly classified, ∑ = .
= a N B 1 (12) i r j N ij 1 r We then define the average class accuracy A C as the average of a i over all cells i of a given class C, where C can be either spermatids or spermatozoa, For instance, if one takes a value of a = 0.99, then r a=0.99 would indicate the (relative) number of cells that would be correctly classified with a probability equal to or higher than 99%.
All the custom codes codes are available at https://github.com/ComplexityBiosystems/.