Implementation of a deep learning model for automated classification of Aedes aegypti (Linnaeus) and Aedes albopictus (Skuse) in real time

Classification of Aedes aegypti (Linnaeus) and Aedes albopictus (Skuse) by humans remains challenging. We proposed a highly accessible method to develop a deep learning (DL) model and implement the model for mosquito image classification by using hardware that could regulate the development process. In particular, we constructed a dataset with 4120 images of Aedes mosquitoes that were older than 12 days old and had common morphological features that disappeared, and we illustrated how to set up supervised deep convolutional neural networks (DCNNs) with hyperparameter adjustment. The model application was first conducted by deploying the model externally in real time on three different generations of mosquitoes, and the accuracy was compared with human expert performance. Our results showed that both the learning rate and epochs significantly affected the accuracy, and the best-performing hyperparameters achieved an accuracy of more than 98% at classifying mosquitoes, which showed no significant difference from human-level performance. We demonstrated the feasibility of the method to construct a model with the DCNN when deployed externally on mosquitoes in real time.


Implementation of a deep learning model for automated classification of Aedes aegypti (Linnaeus) and Aedes albopictus (Skuse) in real time
www.nature.com/scientificreports/ handcrafted feature-based method and achieved a maximum accuracy of 87.41%. Mulchandani et al. 7 used wingbeats as the spectrogram input and compared different pretrained computational models for the identification of Ae. aegypti, Ae. albopictus, An. arabiensis, An. gambiae, Cu. pipiens L. and Cu. Quinquefasciatus. They reached a maximum accuracy of 86%. However, the data acquisition steps of these previous works involved considerable manual feature extraction and large quantities of data/images, making the approaches laborious and time consuming and unable to be performed in real time. Deep convolutional neural networks (DCNNs) of deep learning (DL) are state-of-the-art methods for object recognition and classification, including for agricultural pests and mosquito larva 8,9 . With feature extraction in the neural network layers, DL has a high potential to make the development of a model easier and more accurate 10 . Another major bottleneck in the classification of mosquitoes is the condition of the samples. In reality, especially for mosquitoes collected from the field, the markings on the dorsal thorax commonly have some level of disappearance 11 , which increases the difficulties of recognizing and distinguishing closely related interspecies, such as Ae. aegypti and Ae. albopictus, that share significant morphological similarities. The issue was not focused on by the previous studies that came from the perspective of computer science, in which a dataset was mostly constructed by taking data from the internet (via data mining) 6,12 or images of the mosquito species that were in good condition 10,13,14 , which may make the model impractical when deployed on actual samples. Motta et al. 15 approached this issue by constructing a dataset with field-collected Ae. aegypti, Ae. albopictus and Cu. quinquefasciatus mosquitoes but only achieved an accuracy of 83.9% for Ae. aegypti and Ae. albopictus. Park et al. 16 applied a more systematic approach by training the model with mosquitoes with different deformations and managed to obtain a higher accuracy of 97% in the classification of three genera of mosquitoes.
In this study, we built a piece of hardware-the Aedes Detector-that can regulate the process of image acquisition, training, validation and testing for a DL model. Our proposed model utilized a web-based tool that uses transfer learning with a DCNN, which allows the model to be externally executed in real time, and the web tool and transfer learning make it possible, to work in real time. The objectives of this study are as follows.
Convenient approach to developing a model. Image-based classification usually requires considerable domain expertise to design the feature extractor for the images, and common learning algorithms require large datasets and excessive amounts of central processing unit (CPU) power [17][18][19] . Therefore, we demonstrated the capability of a web-based tool from Google Creative Lab-Teachable Machine 2.0-that conducts image acquisition and trains a computational model with no coding required.

Datasets of Ae. aegypti and Ae. Albopictus.
Although the issue of imperfect mosquito samples was addressed by some previous works 10,13,15 , the datasets consisted of relatively fewer images of Ae. aegypti and Ae. albopictus that were older than 12 days, especially the white scales on the thorax that were disappeared. Therefore, one of our goals for this study was to build a dataset with a total of 4,120 images that consisted of mosquitoes of older age and different levels of head and thorax scale disappearance.
Transfer learning and hyperparameter analysis. Training a state-of-the-art DCNN such as MobileNet and AlexNet requires a large quantity of data 19 , and 4,120 images from our experiment were not enough to build the DCNN. We applied the transfer learning technique in which the weights of the layers from the DCNN were transferred to recognize the mosquitoes. Teachable Machine 2.0 allows the pretrained DCNN models to be optimized with two major hyperparameters-the number of epochs and the learning rate (LR).
Hardware implementation of the model and human-level performance. Despite numerous studies demonstrating the ability of a DCNN to provide 80 to 97% accuracy to classify medically important mosquitoes [7][8][9][10][11]15,16 , these results that use a large pretrained model might not be practical when deployed in hardware/devices. We deployed the proposed model in a platform called p5.js visual coding tool written in JavaScript by using the microcomputer of the Aedes Detector to verify the proficiency of the model at classifying a mixture of three different generations of mosquitoes in real time. By using the same dataset, the accuracy of deployment ability was compared with the human experts' performance.

Methods
Acquisition and splitting of data. Previous works [5][6][7][8][9][10][11] reported limited information, such as the focal length, background, and illumination (wavelength and intensity), when acquiring data. To address the problem, we built a piece of hardware called the Aedes Detector to improve the image acquisition, training and testing processes of the model. The construction of the Aedes Detector is detailed in Fig. 1. The device was equipped with a microcomputer (Raspberry Pi 4 Model B, Quad-core Cortex-A72, and 2 GB LPDDR4-3200 SDRAM) with a black cylindrical compartment (diameter of 12 cm and height of 4 cm), which provided a short focal length (2.5 cm), illumination (15 white-colored LEDs-RGB visible light) and background (white color). The approach we used in this study aimed to serve as a framework that can be conveniently applied by other experts, especially entomologists. To investigate and be able to fine-tune the computational model for insects, we constructed the proposed model by using Teachable Machine 2.0 (Google Creative Lab) that worked on the MobileNet DCNN 20 , which consisted of a feature extractor for the mosquito images from the ImageNet dataset. The mosquito images were acquired by a camera module (Pi NoIR v2, 8 megapixels, Sony IMX219 image sensor) on the Aedes Detector. To avoid overfitting this model, data augmentation was performed on the training dataset. All images of the dataset (training, validation and testing) were rotated by 0, 90, 180, and 270 degrees; thus, eventually, the number of samples was increased by four times. The data splitting/partitioning used for training, validating and testing the model is illustrated in Fig. 2 www.nature.com/scientificreports/ platform of Teachable Machine 2.0-Training (phase 1-70%), validating (phase 2-15%) 21 , and evaluating the testing dataset (phase 3-15%). The base images (0 degree, without rotation) and all the rotated images (90, 180, and 270 degrees) used for training were not used for validation and testing set. The model was developed to classify the mosquitoes into three different classes according to Ae. aegypti, Ae. albopictus and background (with no sample), and a total of 4,120 images (2,040 per species and 40 for background) were used in this study. Synthetic minority oversampling technique (SMOTE) was used to tackle the imbalance of the class of background, in which random oversampling was conducted to duplicate the images of background in training set.
Deep convolutional neural network (DCNN). DCNN architecture. The proposed model was based on the Teachable Machine 2.0-DCNN architecture of MobileNet that transfers the learning weights to reduce the training time, mathematical calculations and the consumption of the available hardware resources 22 . The workflow of the classification of Teachable Machine 2.0 is shown in Fig. 3. The first 29 convolutional layers of MobileNet were adopted 23 , which were used to extract the features. More convolutional layers can reduce the resolution of the feature map and extract more abstract high-level features 24 . The softmax layer of MobileNet was truncated, and the output of the model was set as the last tensor representation of the image. Therefore, the www.nature.com/scientificreports/ web-based platform allowed us to create two layers-the first dense layer and the final softmax layer-with three classes (images of Ae. aegypti, Ae. albopictus and background). The first dense layer must take the same input as the output of MobileNet, and the transformation of the data to their tensor was performed by MobileNet.
Training DCNN with hyperparameter analysis. In earlier studies, most of the attempts involving the classification of Ae. aegypti and Ae. albopictus using dynamic hyperparameters demonstrated a maximum accuracy of 84-87% 10,13,15 . Since deep learning neural networks are trained using the stochastic gradient descent algorithm, hyperparameters such as the learning rate (small positive value ranging from 0.0 to 1.0) that controls the rate of the change to the model during each step of the optimization process have to be determined based on the particular dataset 25

Hardware implementation and human-level performance. The proposed model was exported as
JavaScript, and the model was uploaded through a link and deployed on the p5.js platform. The p5.js was configured to provide the prediction on a canvas (600 × 600). To infer the performance of the deployed model in real time, although Raspberry Pi camera module was reportedly to obtain 90 fps for video recording, but to prevent screen tearing, we set the frame rate as 60 fps by calling framerate (60) syntax in p5.js. To avoid the bias of overlapping data, the model deployment ability was conducted on three different batches of mosquitoes (n = 3, each batch consisted of 100 mosquitoes) that were independent of the images of the dataset used in the process of model development. The sample (randomly on a mixture of Ae. aegypti and Ae. albopitus with the number of samples = 50:50) was placed in a Petri dish in the Aedes Detector (Fig. 1), and the outputs from the p5.js canvas were recorded as the result of inference on a computer monitor in real time. The experiment was repeated by an evaluation of human-level performance, in which we invited 30 senior entomologists (12 females and 18 males, age 32-60, mean age of 46 years, consisting of more than three years of experience with insect taxonomy) to identify the images of Ae. aegypti and Ae. albopictus from the testing subset, in which the images were randomly mixed and selected from the pool of the testing subset. The images were presented to the senior entomologist via online quiz (https:// forms. gle/ s28ck NBtfk joanF BA), and the accuracy was based on the percentage of the answer that correctly identified the mosquitoes. Independent-sample t-tests were performed to evaluate the accuracy between DL model deployment and human performance at p < 0.05 in SPSS 17.0. Senior entomologists voluntarily participated in the experiments, and informed consent was obtained prior to participation. All experimental methodologies were approved by the ethical committee of UOW Malaysia KDU Penang University College. All experiments were carried out in accordance with the guidelines of the ethical committee of UOW Malaysia KDU Penang University College.

Results
Classification performance of the DCNN. Figure 4 shows examples of 10-and 16-day-old Ae aegypti, respectively, in which the lyre shapes of the white scales on the thorax were barely observed; therefore, to allow the dataset to practically reflect samples of various ages, we constructed the dataset by using female mosquitoes that were older than 12 days old. The overall test accuracy was reported in Table 1. Our results showed that both the number of epochs and the learning rate (LR) exhibited significant effects on the testing accuracy (p < 0.05), and when the number of epochs was 30 and the LR was 0.001, the model achieved an accuracy of 98.06 ± 1.02%. The number of epochs was the number of times that the entire training dataset was used for the algorithm learning process, and each epoch consisted of one or more batches that were used to tune the internal model parameter 26 . Theoretically, more epochs should result in higher accuracy 27 , although a longer runtime was required. However, from our results, more epochs did not necessarily guarantee better accuracy. This was demonstrated at an LR of 0.001, and there was a significantly lower testing accuracy from 50 to 80 epochs (Table 1). Figure 5 shows the confusion matrix for the precision and recall of the proposed DL model with different hyperparameters fine-tuned. Loss is a number representing how unfit a model's prediction is on a single example 28 , and the impacts of the number of epochs and the LR on the loss are shown in Fig. 6. As shown in Fig. 6, the learning curve experienced greater fluctuation/noise when we used a larger learning rate (LR), and having a similar trend as the testing accuracy, the learning rate (LR) significantly impacted the loss (p < 0.05), but there was no significant difference as the number of epochs changed. MobileNet is a pretrained model with a smaller size 22,23 that allowed our proposed model to be deployed directly in the p5.js platform using JavaScript, and the inferences were obtained in real time-60 fps (by calling the function frameRate (60) in p5.js). To the best of our knowledge, this was the first time that a DCNN model was implemented in an application platform on a hardware tool and tested externally on different generations of mosquitoes. The results of the model's deployment and human-level performance are presented in Fig. 7. When the DL model was deployed in p5.js on the Aedes Detector for the randomized mixture of Ae. aegypti and Ae. albopictus, the DL model obtained an accuracy of 98.33 ± 0.69%. When the experiment was repeated with human performance, the senior entomologists achieved an accuracy of 98.00 ± 0.88%. Although the performance of the DL model slightly surpassed the human performance, the independent-samples t-tests showed that there was no significant difference between the accuracy of hardware and human-level performance t (df = 58) = 0.297, p = 0.768, indicating that the DL model performance resembled that of human experts.

Discussion
The feature presentation of the images is crucial since MobileNet uses the k-nearest neighbor (kNN) algorithm that uses the semantic information represented in the logits to compare the images in the dataset and unknown samples as the classifier 23 . To emphasize the morphology of mosquito's features on the mosquito images, we standardized the environment for model development (training, validating and testing) to minimize errors such as uneven and inconsistent light distributions in the sample and focal distances that make the sizes of the samples in the images different. To prevent possible overfitting of the model due to the standardized images/ samples throughout the development process, the mosquito images in the dataset were manually checked to confirm no overlap among the subsets. We applied computer vision to the sample mosquitoes to better understand the details of automatic mosquito classification by the DCNNs since most of the previous studies seldom analyzed which parts of mosquitoes were "viewed" and "recognized" by the machine and used to perform classification. Park et al. assessed the feature activation of the convolution layers of VGG-16 and showed that most of the recognition maps focused on the body of mosquitoes. We expanded the idea by designing our labeled image acquisition process to focus mainly on the lateral view of the thorax and head of the mosquitoes since the thorax and head occupied as much as 35% of an image (Figs. 4 and 8), although dorsal and ventral views of the thorax were also contained in the dataset but not as a good morphology of mosquitodue to the disappearance of the white scales. Figure 8 shows a comparison between the key morphology that was used to discriminate the mosquitoes by human experts and the feature activation images of the convolution layers. From the visualization of the convolution layers of our proposed DCNN model, the recognition map focused on the mesepimeron and clypeus-pedicel parts of the Ae. aegypti and Ae. albopictus images. This result suggested that the distinctive white scales on the pedicel and clypeus (Fig. 8b with red arrow) and the mesepimeron on the Ae. aegypti, which are the three types of white scales (Fig. 8b with red circles), are essential to classifying the mosquitoes. When the mosquitoes were 12 to 16 days old, the lyreshaped markings on Ae. aegypti and the white stripe marking on the thorax almost completely disappeared (Fig. 4); however, the scales on the clypeus, pedicel and mesepimeron remained despite the damage to the dorsal or thorax, and these remaining morphologies played crucial roles in differentiating female Ae. aegypti from Ae. albopictus [30][31][32] . The improved imageacquisition was attributed to the Aedes Detector, which allowed for a close distance (2.5 cm) and well-illuminated environment, especially regarding the light intensity and distribution. The high accuracy also confirmed the capability of the Pi camera module to acquire the sample images at the necessary resolution, although its resolution was only 8 megapixels.
To obtain the best-performing model, our hyperparameter analysis was in good agreement with previous arguments [25][26][27] in which the learning rate (LR) was the key hyperparameter for fine tuning (LR, p < 0.00001), which indicated that the improvement of the model accuracy was more affected by the LR. The LR should not be too small, as mentioned by Goodfellow 33 , and we demonstrated a similar result when the LR of 0.00001 generally produced significantly lower accuracy, with the test accuracy ranging from 93 to 94%. However, higher epochs www.nature.com/scientificreports/ did not perform significantly higher than lower epochs (Table 1), which may be due to the possibility of higher epochs that led to model overfitting. This was supported by the study of Ladds et al. 34 , which investigated the classification of animal behavior using a super machine learning model and showed that fewer epochs performed better than more epochs in classifying animal behavior. Furthermore, Fabre et al. 's 35 study tested more epochs, but this decreased the moderately vigorous physical activity and increased the percentage error of the accelerometer measurement. More epoch decreasing the performance of a model could be explained by the possibility of overfitting the model. Nevertheless, our result that shows the lower epochs provided higher accuracy can be   www.nature.com/scientificreports/ explained by the requirement of early stopping of model training at certain epochs, as suggested by Ying 36 , who stated that early stopping at a certain number of epochs can prevent the overfitting of a model. For an end-to-end process of a model for image classification, Teachable Machine 2.0 by Google Creative Lab is highly convenient in developing a DL model, especially for noncomputer science expertise. The web-based tool assists mostly in the construction of CNN architectures and allowed us to focus on feature engineering and the implementation of the proposed model. To the best of our knowledge, the method is the first reported end-to-end process model for the classification of closely related species of mosquitoes-Ae. aegypti and Ae. albopictus. To assess the performance of the proposed model, we compared it with human-level performance, in particular, to estimate the optimum or Bayesian error. Human experts classified Ae. aegypti and Ae. albopictus with an error of ~ 2%, and compared to the error of the proposed model, the model achieved a minimum bias (Bayes error + avoidable error) 37 . A machine learning model with a deep network wants to resemble/surpass human-level performance, and the key aspect of DCNNs is the number of internal layers 38 . We leveraged Teachable Machine 2.0 using transfer learning with DCNNs that adopt the first 29 convolutional layers of MobileNet and reached human expert accuracy. We suggest that the proposed model can surpass human-level performance for non or junior insect taxonomists. Our results extended previous studies that implemented 38,39 fewer layers of deep convolutional networks (11,13,16, and 19 layers) for object recognition but still resembled human-level performance. The implementation accuracy slightly surpassed the human experts' accuracy, although there was no significant difference (independent-samples t-tests, t (58) = 0.297, p = 0.768). As described by Andrew Ng 37 , when an ML model achieves human-level performance, the progress of reducing the desirable error rate will decrease. The automated classification of mosquitoes is mainly attributed to a computer vision system-both the algorithm and the hardware implementation in the present study. Nevertheless, hardware deployment issues such as model drift (when the production data are not representative of the training data), model maintenance and management still posed challenges, especially when the model was applied in a real-time trap or surveillance system, in which regular monitoring and calibration of the model were still required to ensure the validity of the prediction.

Conclusions
An image classification model for medically important mosquitoes is critical for practical applications in real and actual situations. This study constructed a dataset of 12-to 16-day-old Ae. aegypti and Ae. albopictus with disappearances of the head and thorax morphologies. The model was successfully developed using Aedes Detector hardware, and the results indicated the importance of the environment to obtain images for computer vision and therefore constructed a dataset and model that was used to infer unseen samples. We demonstrated the capacity of the proposed model with a DCNN to classify aged mosquitoes with more than 98% testing accuracy, with both the learning rate and epochs significantly influencing the model performance. Most strikingly, when the model was deployed externally on three different generations of mosquitoes, the accuracy resembled that of human experts. Our study provides a framework for future studies that can utilize the proposed method to assess the quality of data and training conditions, statistically study hyperparameters, and implement hardware on an external sample.