Introduction

The horsehair crab [Erimacrus isenbeckii (Brandt, 1848)] is a high-quality marine product in Japan. The capture of female horsehair crabs is prohibited in Hokkaido, Japan, and their capture for academic research purposes requires permission. Immediate gender identification and selective capture within a limited time and in the limited space on a fishing boat are important. Fishermen in Hokkaido, who have many years of experience in the fisheries industry, can distinguish crab genders by visual inspection. Some academic research regarding the gender identification of crabs has been reported. Toyota et al. investigated morphometric gender identification of the horsehair crab1. Gender identification based on 3D measurement of the shell geometry of the horsehair crab has also been investigated2.

Computer vision can automatically detect the characteristics of medical images3,4 on the basis of intuitive and high-precision characteristics at high speed and is becoming an important method in medical fields. Studies in the field of medical gender identification have used, for example, an eyeground image5 and an image of a bone6,7. In biology, mosquito classification has been studied8,9,10,11. However, the literature contains few studies12,13,14 on the recognition and classification of crabs using machine learning and deep learning methods. Among the limited examples, Wu et al.15 used abdomen images for swimming crabs and mud crabs to identify individual crabs via deep neural networks.

Zhang et al. have suggested a shell detection–recognition method that combines principal component analysis (PCA) with YOLOv5 (You Only Look Once v5), resulting in an improved method to recognize individual Chinese mitten crabs16. Cui et al.17 developed a gender classification method for the Chinese mitten crab using a deep convolutional neural network. Their original algorithm comprised a batch normalization technique and a dropout technique, and their proposed method achieved 98.90% classification accuracy. Notably, gender identification of the horsehair crab using machine learning and deep learning has not been reported.

In the present study, we investigated gender identification and its prediction precision in the shell and abdomen images of horsehair crabs using deep learning. For deep learning, we employed established and conventional network architectures. In addition, visualization of class-discriminative localization maps of the present deep learning models explains which parts of the horsehair crab images are focused on in the process of gender identification.

Methods

Horsehair crab dataset

A total of 120 crabs consisting of 60 males and 60 females were collected in Funka Bay, Pacific Ocean, Southern Hokkaido, Japan, in May 2023. Images of the shell and abdomen geometry of the horsehair crabs were used in the present study. Fishing permission for horsehair crabs used in this study was granted by the Hokkaido Governor. The images were taken using a camera (ILCE-6600, SONY) with a macro lens (SEL30M35, SONY). Each original image had a resolution of 6000 × 4000 pixels with 24-bit RGB channels. The original images were cropped to smaller sizes of 3400 × 3400 pixels to remove extra edges of the images. The cropped images were then compressed to 224 × 224 pixels to match the input sizes of the following deep convolution neural network (DCNN) models. Through this manual image acquisition process, we collected ~ 120 images for each target crab for the shell or abdomen geometry, totaling ~ 240 images.

Sec2

In the present study, we used the following established DCNN models: AlexNet18, VGG-1619, and ResNet-5020. These DCNN architectures are illustrated in Fig. 1. They consist of feature extraction and classification parts. The feature extraction part is formed by a series of convolution layers. The input RGB images are 224 × 224 pixels. Through successive convolution layers, the spatial dimensions of the feature maps were reduced from 224 × 224 to 13 × 13 (AlexNet), 14 × 14 (VGG-16), and 7 × 7 (ResNet-50). At the ends of the feature extraction part, the feature maps were flattened into a one-dimensional array for the classification through downstream fully connected (FC) layers. The present DCNN models were designed to classify binary classes of female and male so that the classification part was modified from its original structure. Outputs of the final FC layer for a target class c, denoted by yc, were fed into the softmax function to obtain the classification probability of each target class.

Figure 1
figure 1

Deep convolutional neural network architectures used in the present study: (a) AlexNet, (b) VGG-16, and (c) ResNet-50.

Training the DCNNs

For the present dataset of the crab images, we employed fine-tuning to train the aforementioned DCNN models, except AlexNet. Pre-trained weights of AlexNet were not available. The model weights were initialized by the uniform random distribution and trained from scratch. In the cases of VGG-16 and ResNet-50, model weights pre-trained on ImageNet were used for their initial parameters. In the case of VGG-16, the pre-trained weights were partially retrained by the present dataset of the crab images. Only the last block consisted of the three consecutive convolution layers and the single max pooling layer before the FC layers were retrained together with the FC layers. In the case of ResNet-50, the pre-trained weights were totally retrained. As an optimizer, the stochastic gradient descent (SGD) was used with the learning rate set to 0.01. The categorial cross-entropy was employed as a loss function. The K-fold cross-validation was performed (K = 5 in the present study). The whole dataset was randomly partitioned into 80–20%, where 80% of the whole dataset was used for the training. Ten percent of the training dataset was employed for the validation during the training to monitor the training progress (Fig. 2). The remaining 20% of the whole dataset was reserved for testing of the classification performance. Each partitioned dataset contained the same quantities of female and male data. The training epoch was set to 2000, 400, and 1000 in the cases of AlexNet, VGG-16, and ResNet-50, respectively. Depending on the network architectures, the epoch to reach the optimal validation accuracy differed. For the model training and testing, we used the Keras framework on a Python 3.9.16 environment and an NVIDIA GeForce RTX 4080 16 GB GPU platform. Keras 2.9.0, TensorFlow 2.9.1, and NumPy 1.22.3 were employed.

Figure 2
figure 2

Validation accuracy of the models during the training: (a) AlexNet, (b) VGG-16, and (c) ResNet-50. In all the models, optimal validation accuracy is achieved when fine-tuning is applied for VGG-16 and ResNet-50.

Results and discussion

Classification performance

In the present study, we evaluated both the F-1 measure and the accuracy, which are defined as

$${\text{Accuracy}} = \frac{TP + TN}{{TP + TN + FP + FN}}$$
$${\text{F - 1}} {\text{measure}} = \frac{2TP}{{2TP + FP + FN}}$$

where TP, TN, FP, and FN represent true-positive, true-negative, false-positive, and false-negative results in the binary classification, respectively. Table 1 summarizes the classification performance of the present DCNN models. The F-1 measure is the five-time averaged value with the corresponding standard deviation. Notably, all the DCNN models achieved significantly high F-1 measures with a relatively low standard deviation, indicating that they all demonstrated sufficiently accurate and stable classification capability to distinguish crab gender. Although AlexNet has a relatively shallow network architecture, it demonstrated a relatively high classification capability, where the F-1 measure was greater than approximately 90% in both the shell and abdomen cases. VGG-16 and ResNet-50, which have deeper network architectures, achieved slightly higher classification performance. We speculatively attribute this better performance to the better feature extraction of the deeper network architecture and to the fine-tuning achieving more optimal weights for higher classification performance.

Table 1 Summary of the classification performance investigated in the present work.

The high precision led to the results discussed in the preceding paragraph for VGG-16. To verify the result, an explainable visualization method for gender identification is presented in the following subsection.

Visualization of feature activation

To further comprehend the gender identification by the DCNN models, we employed Grad-CAM21 to visually explain the DCNN classification results. The class-discriminative localization map Lc of width u and height v for class c was computed using the following equation:

$$L^{c} = {\text{ReLU}}\left( {\mathop \sum \limits_{k}^{n} \alpha_{k}^{c} A^{k} } \right)$$

where \({\alpha }_{k}^{c}\) denotes the neuron importance weights and Ak denotes feature maps of a convolution layer. The neuron importance weights were evaluated by global average pooling the gradients as follows:

$$\alpha_{k}^{c} = \frac{1}{uv}\mathop \sum \limits_{i}^{u} \mathop \sum \limits_{j}^{v} \frac{{\partial y^{c} }}{{\partial A_{ij}^{k} }}$$

where \(\frac{\partial {y}^{c}}{\partial {A}_{ij}^{k}}\) denotes the gradients via backpropagation until the convolution layer. Figure 3 shows the Grad-CAM scheme. The class-discriminative localization maps were originally coarse. They were resized to match the size of the input images by bilinear interpolation and then overlayed on the input image. Figures 4 and 5 show typical visualization explanations obtained by the Grad-CAM scheme for the VGG-16 neural network. The class-discriminative localization maps were generated from the last convolution layer in the VGG-16 architecture.

Figure 3
figure 3

Schematic of the gender identification and visual explanation in the DCNN model. The class of a given horsehair crab image is predicted by two steps: (1) extracting hierarchical features and (2) classifying these features. In the feature-extraction step, feature maps are generated by filters at each convolution layer. The feature maps are used for the visual explanation by Grad-CAM.

Figure 4
figure 4

Two major cases of classification in the shell side. Samples are shown for a (a) male and (b) female with their discriminative regions as heatmaps.

Figure 5
figure 5

Two major cases of classification in the abdomen side. Samples are shown for a (a) male and (b) female with their discriminative regions as heatmaps.

Figure 4 shows the analysis of the image of the shell side. The heatmap is enhanced in the lower area for a male and in the upper area for a female. The highest point of these contours is completely divided at the carapace width (CW). Because the upper half of the female’s shell is highlighted and the lower half of the male’s shell is highlighted, in the case of gender identification of this shell side, using images of the whole shell is inevitable, consistent with Toyota et al.1 morphologic gender identification of the shell.

The abdomen geometry of a horsehair crab is shown in Fig. 5. Although Wu et al.15 identified individual crabs using a part-based deep learning network for texture features of the abdomen, our approach achieved gender identification without such a partition strategy. For this reason, in the case of gender identification, the partition strategy is not required for the present classification scheme because the form of the sexual organs is geometrically clear.

Conclusions

We demonstrated the effectiveness of deep neural networks for image recognition and revealed gender differences in the shell and abdomen geometry of the horsehair crab. We created a dataset that contained ~ 120 images of each shell and abdomen geometry of the horsehair crab. From the images of the abdomen geometry of a crab, the model could distinguish between a male and a female in the form of the sex organs, similar to gender identification by human visual inspection. The analysis of the images of the shell side included a more interesting result: Even though gender classification was impossible by human visual inspection, the present deep learning models enabled male and female classification with high precision. The F-1 measure reached approximately 95%. The discriminative region in the visual explanation was concentrated on the upper side of the shell for females and on the lower side of the shell for males.