Introduction

Parasites are the different groups of organisms that are present at every corner of our house. From the very early times, they have played a very important role in shaping the course of human history, biology, as well as our immune system1. In fact, such complex interplay between humans and parasites has many implications that extend far beyond the mere infections caused by parasites. Parasites, from microscopic protozoa to complex multicellular helminths, have astonishing life strategies that challenge the understanding of the relationship between evolution and biology2. Parasite diseases are caused by various organisms such as ectoparasites, protozoa, etc., and throw very detrimental as well as severe health issues on the human being as they are long-lasting and often show consequences that are life-threatening if not treated at the right time. Parasite diseases affect millions of people at every nook of the globe and thereby it is categorized as a significant global health challenge3.

Traditionally, diagnosing parasitic diseases is laborious and time-consuming as it includes serological tests, molecular techniques, and microscopy. No doubt these methods have proven to be effective but at the same time they also demand highly skilled and professional people who can understand and analyze the disease properly. Thereby, it is important to have an early detection of such infections which is crucial for timely intervention and effective treatment4. In these years, the application of machine and deep learning models has shown exciting results in the medical sector by improving and enhancing the precision and efficiency of detecting and diagnosing multiple threatened diseases. Likewise, these techniques have also offered and exciting possibilities by improving the precision of detecting and classifying parasitic diseases5.

Machine learning techniques such as support vector machine, random forest, decision trees, etc., and deep learning techniques like RNN, CNN, etc. are very much capable of recognizing the patterns and classifying the given data. These techniques have played an amazing role in analyzing medical data which includes tissue samples, blood smears, and diagnostic images which can work on the accuracy of detecting and classifying various parasite diseases6. In addition to this, the ML and Dl techniques are not restricted to the aforementioned traditional diagnostic methods. In fact, they can be also applied to the analysis of data from emerging technologies, such as genomic sequencing, rapid diagnostic tests, and mobile health applications, making the diagnostic process faster and more accessible7.

Therefore, this study aims to investigate the usage of machine learning and deep learning methodologies in the identification and categorization of diverse parasitic diseases. By thoroughly examining existing research in this field, we will proceed to develop a system that incorporates multiple deep-learning models and fine-tune their parameters by using various optimization techniques to obtain the optimized results. The implementation of this approach is probable to introduce innovative concepts and improvements to the field of parasite detection.

The contribution made to conduct the research is hereby presented as follows:

  • Exploring the work done by the researchers in the field of detecting and classifying various parasitic organisms.

  • Compilation of a diverse dataset containing 34,298 samples encompassing various parasitic organisms including Plasmodium, Toxoplasma Gondii (T.gondii), Babesia, Leishmania, Trypanosome, and Trichomonad. Inclusion of host cells such as red blood cells and white blood cells, enhancing the dataset's complexity and real-world relevance.

  • Conversion of images from RGB to grayscale and extraction of morphological features, such as area, perimeter, height, and width, facilitating a detailed understanding of the image characteristics.

  • Implementation of Otsu thresholding and watershed techniques to distinguish foreground from background, ensuring accurate identification of regions of interest.

  • Using different deep transfer learning models including VGG19, InceptionV3, ResNet50V2, ResNet152V2, EfficientNetB3, EfficientNetB0, MobileNetV2, Xception, DenseNet169, and the hybrid model InceptionResNetV2.

  • Fine-tuning of model parameters using three different optimizer techniques: RMSprop, SGD, and Adam.

  • Thorough examination and comparison of these models on the basis of various performance metrics to demonstrate their efficacy in parasitic organism classification.

Related work

A lot of contributions have been made by researchers for the detection and classification of parasitic organisms. Zhang et al.8 explored the effectiveness of deep learning models in diagnosing infectious and parasitic diseases caused by protozoan parasites. They discussed the limitations of traditional microscopic examination methods and highlighted how deep learning models have shown exceptional performance in improving disease diagnosis. This research underscores the transformative potential of artificial intelligence in healthcare, especially in addressing infectious diseases, and suggests a promising future for deep learning in advancing global public health efforts. Alharbi et al.9 worked on the development of a model that could increase its robustness as well as precision so that it could effectively distinguish between the uninfected blood cells and parasitic cells. The work was carried out on the dataset of 13,750 parasitized and uninfected samples and was applied to the neural network, XGBoost and SVM model. During experimentation, it was found that the best accuracy to differentiate between parasite cells from healthy ones had been computed by the SVM model with 94% as compared to the other three models. Additionally, the XGBoost model also did well by obtaining 90% accuracy but the neural network lacked by obtaining only 80%. Researchers also applied the CNN model and it was found that the model boosted the accuracy level with 97% for the same sample. Wang et al.10 applied object detection techniques such as a Single shot multibox detector and an Incremental Improvement version of You Only Look Once to recognize leukocytes. The dataset of 14,700 annotated images was used and tested the model by using 1120 labeled images and 7868 labeled single object images to represent 11 types of peripheral leukocytes. The researchers conducted their work on NVIDIA GTX1080Ti GPU where the model obtained 90.09% accuracy by investing 53 ms in each image. Leng et al.11 mentioned a pure transformer based on an end-to-end object detection network which was based on DETR to identify leukocytes. The pyramid vision transformer and deformable attention module were added to the DETR model to boost the performance and convergence speed. Two types of the dataset were used by the researchers one was the Common Objects in Context dataset to obtain the pre-trained weights and another Raabin Leukocyte dataset which was used to train the transfer learning model. While execution it was found that the upgraded DETR performed quite well than the CNN and original DETR with the mean accuracy of 96.1%. Li et al.12 examined the performance of the deep learning model to automatically detect leukocytes. A novel dataset was created by the researchers of 6273 images which included 8595 leukocytes and nine clinical interference variables. Six detection techniques were trained with this dataset and later were presented as a robust ensemble model. On examining the model with the test dataset, it was found that it computed 0.922 mean of average recall, 0.853 mean of average precision, and an accuracy of 98.84%. Furthermore, the authors examined the test results of several models and discovered multiple identical false detections of the models. They then made appropriate clinic recommendations. Gonçalves et al.13 detected VL in humans by applying deep learning algorithms to the slide images of the bone marrow that had been collected from parasitological examination. For this research, they used five deep learning algorithms as a classifier after preprocessing and data augmentation. Moreover, the layers of the applied deep learning models were fine-tuned to optimize the performance and it was found that the model computed accuracy, F1 score, and kappa of 98.7% each. As a result, they proved that by using trained deep learning models with microscopic slide imaging of bone marrow biological material, professionals could precisely detect VL in patients. Gonçalves et al.14 detected amastigotes using deep learning techniques in microscopy images. Their proposed method initially segmented the Leishmania parasites in the images and pointed to the position of amastigotes. Their model computed 99.1% accuracy, 80.4% dice, 81.5% precision, 99.6% specificity, 72.2% sensitivity, and 75.2% IoU to identify VL parasites. The researchers mentioned in their paper that their findings were great and demonstrated that deep learning models can be useful to assist specialists in detecting VL in humans after being trained with microscopic images. Rajasekar et al.15 mentioned the usage of artificial intelligence for automating the identification of parasite eggs in the laboratory. They applied Detectron2, YOLOs, InceptionV3, and YOLOv8 models to detect parasite eggs and their results showcased that YOLOv8 on incorporating with SGD optimizer did well by computing the mean precision of 0.92 and 98% of F1 score. Based on these results, the researchers said that it was an outstanding model that could be used for the identification of parasite eggs. Masud et al.16 examined the application of deep learning algorithms for the identification of malaria via mobile healthcare solutions. A convolutional neural network model was developed along with a cyclical stochastic gradient descent optimizer with an automatic rate finder for examining its performance. The researchers claimed that their proposed model very well classified infected and healthy cells with great precision and accuracy of 97.30%. In fact, their paper's findings could help with the shift of malaria microscopy diagnosis to a mobile application, improving treatment reliability and addressing a shortage of medical knowledge in some areas.

Methodology

This section describes the process that has been used to develop the deep learning-based system for the detection and classification of parasitic organisms, as shown in Fig. 1. The proposed system has several notable advantages when it comes to classifying parasitic organisms. The system is able to achieve higher accuracy in detection and classification by using a diverse dataset and advanced image processing techniques. Adding host cells like red and a white blood cell makes the dataset more realistic and relevant to real-life situations. Moreover, when we extract morphological features, we gain a more detailed understanding of the characteristics of the image. The Otsu thresholding and watershed techniques are used to accurately identify regions of interest, which helps to refine the system's focus. By using a range of advanced deep transfer learning models, we can improve its adaptability and overall performance. Moreover, the flexibility of the model is enhanced by fine-tuning its parameters using various optimizers. After carefully evaluating and comparing different factors, we can confidently say that the system is very effective for classifying parasitic organisms in medical imaging datasets.

Figure 1
figure 1

Proposed system to detect and classify parasitic organisms using deep transfer learning.

Dataset description

In this study, a comprehensive data set comprising a total of 34,298 observations was gathered which include various parasites such as Plasmodium, Leishmania, Babesia, Toxoplasma Gondii (T.gondii), Trichomonad, and Trypanosome. Also, the data set includes host cells i.e. Red blood cells and Leukocytes, as shown in Fig. 2. All these images were created using either a 400× or 1000× microscope17.

Figure 2
figure 2

Image samples of parasites.

Specifically, it comprises 843 instances of Plasmodium, 3758 instances of T.gondii observed under a 400× microscope, and 2933 instances of T.gondii observed under a 1000× microscope. Additionally, the dataset contains 1173 instances of Babesia, 2701 instances of Leishmania, 2385 instances of Trypanosome, and 10,134 instances of Trichomonad, all of which were observed under a 1000× microscope. In addition, the aforementioned dataset comprises a total of 8995 red blood cells (RBCs) and 461 leukocytes observed under a magnification of 1000×, as shown in Fig. 3. Furthermore, an additional 915 leukocytes were identified using a 400× microscope.

Figure 3
figure 3

Number of images per class.

Data preprocessing

Preprocessing of images is a crucial step in image processing as it enables to improves the classification of a model. The first part of the methodology involves loading the original color image, which is represented in the RGB format, using the OpenCV library. Apart from this, the Python Imaging Library (PIL) is also used to handle numerous preprocessing tasks related to images, such as opening, processing, and saving. Later, the image was converted to a single-channel grayscale image whose values range from 0 to 255 using cv2.cvtColor () which specifies cv2.COLOR_RGC2GRAY, as presented in Fig. 4. The purpose of doing so is to reduce the computational complexity of the image data and improve the processing speed. Moreover, greyscale images retain essential features like edges and textures, making them suitable for tasks like object detection, image classification, and feature extraction.

Figure 4
figure 4

Preprocessing of original images.

Feature extraction

As we are aware that the dataset only contains images, we computed the morphological values as shown in Table 1a and b of all the features in the images using parameters such as area, diameter, aspect ratio, minimum and maximum location, etc. for extracting the feature. These values are computed by using the Eqs. (1) to (16)

Table 1 Morphological values of the images.
$$area=height*width$$
(1)
$$height=cv2.boundingRect\left(cnt\right)$$
(2)
$$width=cv2.boundingRect\left(cnt\right)$$
(3)
$$Aspect \; Ratio= \frac{width}{height}$$
(4)
$$Extent= \frac{object \;area}{bounding \;rectangle \,area}$$
(5)
$$Equivalent\; diameter= \sqrt{\frac{4*contour \;area}{\pi }}$$
(6)
$$epsilon= \sqrt{{(({x}_{2}-{x}_{1})}^{2}+{({y}_{2}-{y}_{1} )}^{2}}$$
(7)
$$Minimum \;value=cv2.{\text{min}}()$$
(8)
$$Maximum \;value=cv2.{\text{max}}()$$
(9)
$$Minimum \;value \; Location=cv2.{\text{minMaxLo}}()$$
(10)
$$Maximum \;value \; Location=cv2.{\text{minMaxLo}}()$$
(11)
$$Mean\; Color=cv2.{\text{mean}}()$$
(12)
$$Extreme\; Leftmost\; point=tuple(cnt(cnt\left[:,:,0\right].argmin()\left[0\right])$$
(13)
$$Extreme \;Rightmost \;point=tuple(cnt(cnt\left[:,:,0\right].argmin()\left[0\right])$$
(14)
$$Extreme \;Topmost \;point=tuple(cnt(cnt\left[:,:,1\right].argmin()\left[0\right])$$
(15)
$$Extreme \;Bottommost \;point=tuple(cnt(cnt\left[:,:,1\right].argmin()\left[0\right])$$
(16)

Data segmentation

Image segmentation is an important task and incorporating thresholding with the watershed technique is the most used technique to segment objects or regions of interest within an image, as shown in Fig. 5. Initially, thresholding of image has been done using the Otsu thresholding technique which automatically determines an optimal threshold for segmenting the images by minimizing the inter-class variance18. Mathematically it is represented by the Eqs. (17, 18):

Figure 5
figure 5

Segmentation of images.

Let’s \(P(i)\) be the probability of a pixel with intensity \(i\) in the image and to calculate total probability \({P}_{t}\)

$${P}_{t}= \sum_{i=0}^{L-1}P(i),$$
(17)

where L is the number of possible intensity levels. The mean intensity (\({\mu }_{t})\) of the pixels in the foreground is given by

$${\mu }_{t}= \sum_{i=0}^{L-1}i .P(i)$$
(18)

And to calculate the between class variance \({\sigma }_{B}^{2}(t)\), the equation (xix) is:

$${\sigma }_{B}^{2}\left(t\right)= P\left(t\right).\left(1-P\left(t\right)\right).{({\mu }_{total }.P\left(t\right)-{\mu }_{t})}^{2}$$
(19)

where \({\mu }_{total}\) is the mean intensity of the entire image and \({\mu }_{total }= \sum_{i=0}^{L-1}i .P(i)\). Additionally, the aim of Otsu method is to maximize \({\sigma }_{B}^{2}\left(t\right)\) by finding the threshold t that satisfies (equation xx):

$${t}_{Otsu}={argmax}_{t}{\sigma }_{B}^{2}\left(t\right)$$
(20)

The optimal threshold \({t}_{Otsu}\) obtained from this maximization process is then used for binary thresholding to separate the image into foreground and background based on pixel intensities.

As thresholding would not have been sufficient hence to refine it watershed technique has been applied where the image is treated as topographic map. The watershed algorithm can be mathematically represented using the gradient of the image \((\nabla f)\). The gradient magnitude of an image is calculated using derivatives in the x and y directions (\(\frac{\partial f}{\partial x} and \frac{\partial f}{\partial y})\). The watershed transformation is usually defined in Eq. (21):

$$Watershed \left(f\right)=\left\{x \in Image \right| \nabla f(x)=0 \; and \; x \; is \; a \; local\; minimum\}$$
(21)

Here, \(\nabla f(x)=0\) corresponds to the points where the gradient is zero, indicating flat regions in the image. These points are the markers for the watershed segmentation process. Watershed segmentation considers these markers and the gradient of the image to delineate the regions accurately.

Such a combination of Otsu thresholding followed by the watershed technique is a versatile method for segmenting the images. Otsu Thresholding provides an initial separation between object and background, while the watershed step refines the boundaries based on local image characteristics.

Applied models

The usage of advanced deep learning models, whose layers are either fine-tuned or employed as feature extractors, enhances the prediction capabilities in visual recognition tasks. Likewise, in the realm of parasitic organism detection and classification, various specialized neural network architectures have been developed and briefly explained19.

The use of VGG19, a convolutional neural network architecture well-known for its depth as well as hierarchical feature learning capabilities, has proven to be of huge value in the task of discriminating complicated patterns within parasitic images. By leveraging its abundant layers, VGG19 enables the accurate classification of these parasites, thereby enhancing the precision of identification20. Applying the various filter sizes in Inception V3 allows for the comprehensive capture of a broad range of features, which is essential in the precise detection of various parasitic organisms21. The architecture of EfficientNetB3 is designed to achieve stability between accuracy as well as computational efficiency. This makes it well-suited to analyze large and diverse datasets that are usually encountered in the field of analyzing parasitic organisms22.

The ResNet152V2 and ResNet50V2 architectures have been designed to deal with the vanishing gradient problem which is generally encountered during training deep neural networks. These models include skip connections and residual blocks to deal with mitigating the issue. By applying these architectural components, the models can maintain stable training even when dealing with complex parasitic images23,24MobileNetV2 is a convolutional neural network architecture that aims to reduce computational costs to maintain high accuracy. This makes it particularly suitable for environments with limited resources25. The dense connections in DenseNet169 enable a comprehensive analysis of parasitic images, allowing for a detailed examination of various aspects26. On the other hand, EfficientNetB0's systematic scaling strategy guarantees superior performance while minimizing computational requirements. This characteristic is particularly advantageous in real-time applications for detecting parasitic diseases, where efficiency and accuracy are crucial factors27.

On the other hand, Xception is another convolutional neural network architecture that incorporates unique convolutional layers and skip connections. These architectural choices are designed to improve the flow of gradients during training, which in turn helps in the detection of subtle parasitic features28InceptionResNetV2, a novel deep neural network architecture, combines the advantageous characteristics of Inception and ResNet models. By incorporating Inception's multiscalar feature capturing capability and ResNet's depth, ResNetV2 demonstrates exceptional performance in intricate parasitic image recognition tasks, such as object detection and segmentation. The detection and classification of parasitic diseases present a range of challenges that are addressed by various architectural approaches. Each architecture offers distinct advantages that collectively contribute to the overall effectiveness of the detection and classification process29. The general layered architecture of all these applied models has been shown in Table 2.

Table 2 Layered architecture of applied deep learning models.

There are also some additional details about the rest of the architectural features which have been mentioned in the aforementioned table:

  • Batch normalization: This technique is useful for the normalization of inputs for each layer of a neural network in order to improvise the performance and stability of the network.

  • Squeeze and excitation: This technique compresses and excites the feature maps which has been produced by each layer and thereby improves the efficiency of neural network

  • Residual connections: This also helps to improve the performance of the network by working on the vanishing gradient issue and is also used as shortcut layers of a neural network.

  • Inverted residual connections: These connections more efficient and better than the traditional residual connections.

  • Dense connections: These techniques are used for connecting each layer of a neural network with all its preceding layers so that each of these layers can learn from the previous layers and improve the performance of the network.

Performance metrics

In the context of deep learning models, there are various metrics which can be used to examine their performance which are described as following20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36:

Accuracy

It is a very fundamental metric which is the ratio of correctly predicted classes of total classes in the dataset (Eq. 22). It is used to inform us how well the model is performing for the particular dataset but on the contrary, this metric does not work well when the data is imbalanced.

$$Accuracy= \frac{True \;Positive+True\; Negative}{True \;Positive+True \;Negative+False \;Positive+False \;Negative}$$
(22)

Loss

Like accuracy, loss is also an important metric to examine the performance of the model. It works exactly the opposite to as it computes the error by generating the difference between the predicted and actual target values (Eq. 23). If the value of loss is high that means the model has not been trained well and if it is low, it means the model has been trained well and will predict the output correctly.

$$Loss= \frac{{(Actual \;Value-Predicted \;Value)}^{2}}{Number \;of \;observations}$$
(23)

Root mean square error (RMSE)

This metric is also calculated by taking the difference of predicted and actual value but it quantifies the average magnitude of the value or error generated (Eq. 24).

$$RMSE= \sqrt{\frac{{(Actual\; Value-Predicted \;Value)}^{2}}{Number\; of \;observations}}$$
(24)

Precision and recall

These two metrics work particularly in those scenarios where the dataset is imbalanced. Precision measures the accuracy of the classes that have been predicted positively and is calculated as the ratio of true positive prediction to the total positive prediction (Eq. 25). On the other hand, recall measures the ability of the model to identify all the relevant classes and is calculated as the ratio of true positives to actual positives (Eq. 26).

$$Precision= \frac{True\; Positive}{True \;Positive+False \;Positive}$$
(25)
$$Recall= \frac{True \;Positive}{True \;Positive+False \;Negative}$$
(26)

F1 score

It is a combination or a harmonic mean of precision as well as recall which considers both false negative and false positive as well as provides a balanced assessment of any model’s performance (Eq. 27).

$$F1 score=2\frac{Precision*Recall}{Recall+Precision}$$
(27)

Results

In this section, we conduct a full analysis of the models based on a set of parameters outlined in "Performance metrics" section. We have evaluated the performance of the models after applying three different optimizer techniques which include RMSprop, SGD, and Adam optimizer. These optimizers optimize the parameters or variables in the form of weights and biases of the layers of deep transfer learning techniques which determine the mapping between input features and output predictions37. Our assessment encompasses both the training and validation datasets, allowing us to compare the models for whole and various classes of datasets.

RMSprop

Initially, the applied deep learning models are optimized by using RMSprop optimizer and are examined during both training and validation phases, as shown in Table 3.

Table 3 Evaluation of models during training and validation phases after applying RMSprop.

On analyzing the models, it can be seen that all models did quite well but during the training period, it has been found that InceptionResNetV2, VGG19, and InceptionV3 performed well by computing 99.99%, 99.89%, and 99.83% accuracies respectively along with the lowest loss and RMSE values as compared to the others. On the contrary, it has been found that VGG19, InceptionV3, and EfficientNetB0 worked well for the validation dataset by obtaining an accuracy of 99.91% each. As far as loss and RMSE values are concerned, only VGG19 and EfficientNetB0 stood at the best place with 0.09 each.

In addition to this, the models are examined based on their learning curves of accuracy and loss during both training and validation periods as shown in Fig. 6.

Figure 6
figure 6

Learning curves of applied models after applying RMSprop.

All models underwent a rigorous 10-epoch iteration process, during which they consistently demonstrated their optimal performance either at the 10th epoch or between the 8th and 10th epochs. This observation holds true for both accuracy and loss metrics. However, it's noteworthy that certain models exhibited significant disparities in their performance. These disparities raise a red flag, suggesting a potential issue with overfitting in those models. Overfitting occurs when a model learns the training data too well, including its noise and outliers, leading to poor generalization on unseen or validation data. Identifying and addressing this overfitting concern is crucial for enhancing the overall robustness and reliability of the models. Other than this, the models are also evaluated for another set of parameters i.e. precision, recall, and F1 score whose results are shown in Table 4.

Table 4 Analysis of models for different parameters after applying RMSprop.

VGG19 and Inception V3 have high precision, recall, and F1 scores, indicating that they perform very well in the classification task, achieving near-perfect accuracy and completeness. EfficientNet B3, ResNet152 V2, DenseNet 169, and EfficientNet B0 also have high precision and F1 scores but slightly lower F1 score compared to VGG19 and Inception V3. ResNet50 V2 and MobileNet V2 have slightly lower precision compared to the top-performing models but still achieve high recall and F1 scores. Xception has the lowest precision, recall, and F1 score among the listed models, indicating that it may have more false positives and false negatives compared to the other models.

After examining the performance of the models for the classification of various parasitic organisms, confusion matrices of 10 × 10 have been generated which is a crucial tool for assessing the performance of multi-class classification models, as shown in Fig. 7. Based on these matrices, true positive, true negative, false positive, and false negative values are being taken to examine the performance of models for different classes, as shown in Table 5.

Figure 7
figure 7

Confusion matrix of applied models after applying RMSprop.

Table 5 Evaluation of models during training and validation phases for different classes after applying RMSprop.

In the comprehensive analysis of various models across multiple datasets, InceptionResNetV2 consistently emerges as the top performer, exhibiting exceptional accuracy and low loss across most datasets, including Babesia_1173, Leishmania_2701, Leukocyte_400X_915, RBCs_8995, Toxoplasma_400X_3758, Leukocyte_1000X_461, and Plasmodium_843. Notably, ResNet50V2 and Xception did not perform well in Trypanosome_2385 and Toxoplasma_1000X_2933 as they computed 62.46% and 64.29% accuracy respectively whereas MobileNetV2 excels in Trichomonad_10134 with 99.59% accuracy. These variations highlight the nuanced performance of different models across diverse datasets. Additionally, the overall dominance of InceptionResNetV2 signifies its robustness and reliability, making it the preferred choice for most classes of datasets, while other models demonstrate specialized efficacy in specific contexts, emphasizing the need for a tailored approach based on the dataset under consideration.

Besides this, these models are also examined for other performance metrics for different classes of the dataset whose results are shown graphically in Fig. 8.

Figure 8
figure 8

Graphical analysis of models after applying RMSprop optimizer.

SGD

The applied deep learning models are also optimized by using SGD optimizer and are examined during both training and validation phases, as shown in Table 6.

Table 6 Evaluation of models during training and validation phases using SGD optimizer.

The table shows the performance of 9 different models on a dataset of training and validation records. The metrics used to evaluate the performance of the models are accuracy, loss, and RMSE value. Overall, the models perform very well, with all accuracies above 99%. However, there are some small differences in performance between the models. InceptionResNetV2 has the highest accuracy on both the training dataset with 99.90% accuracy followed by Inception V3 with 99.81% accuracy. However, ResNet152 V2, ResNet50 V2, and MobileNet V2 also perform well with training accuracies above 99% but they have slightly lower accuracies on the validation dataset. On the contrary, Xception and VGG19 have the lowest accuracy on the training dataset, but they perform well on the validation dataset. Overall, InceptionV3 and InceptionResNetV2 are the best-performing models on this dataset. However, the other models also perform very well but the highest validation accuracy has been computed by InceptionV3 with 99.91% on a loss of 0.98.

In addition to this, the models are examined based on their learning curves of accuracy and loss during both training and validation periods as shown in Fig. 9.

Figure 9
figure 9

Learning curves of applied models after using SGD optimizer.

The models went through a thorough process of 10 iterations, where they consistently showed their best performance either at the 10th iteration or between the 8th and 10th iterations. This observation applies to both accuracy and loss metrics. On the contrary, the learning accuracy curve of the models such as InceptionresNetV2, Xception, ResNet50V2, MobileNetV2, EfficientNetB3, ResNet152V2, and InceptionV3 have shown good fit learning curves which means the learning of the model exist between overfit and underfit model. In addition to that, the models are also assessed for another set of parameters: precision, recall, and F1 score. The results for these parameters are displayed in Table 7.

Table 7 Analysis of models for different parameters using SGD optimizer.

All of the models in the table perform very well, with precision, recall, and F1 scores above 99%. This indicates that the models are accurately identifying both negative and positive cases in the dataset. InceptionResNetV2 has the highest precision, recall, and F1 scores, followed by Inception V3 and EfficientNet B3. ResNet152 V2, Xception, MobileNet V2, DenseNet 169, ResNet50 V2, and EfficientNet B0 also perform very well, with precision, recall, and F1 scores above 99%. Overall, all of the models perform very well on this dataset.

We have generated confusion matrices of size 10 × 10 for evaluating the performance of different models to classify parasitic organisms, as shown in Fig. 10. These matrices are mainly to assess how well the models are performing multi-class classification by using these to examine the performance, as shown in Table 8.

Figure 10
figure 10

Confusion matrix of applied models of SGD optimizer.

Table 8 Evaluation of models during training and validation phases for different classes of SGD optimizer.

The table analyses the performance of 10 different deep learning models on 10 different classes of datasets based on their accuracy, loss, and RMSE. InceptionResNetV2 performs very well on the Leishmania, Babesia, and Leukocyte datasets, but it does not perform well on the Trypanosome dataset. EfficientNet B3 and Resnet50V2 perform well on most of the datasets except the Plasmodium and Toxoplasma datasets respectively. Overall, InceptionResNetV2 is the best-performing model on most of the datasets. MobileNet V2 computed the lowest accuracy on the Trypanosome dataset but achieved the highest one on the Trichomonad dataset. This signifies that MobileNet V2 is a good choice for tasks where the dataset is small or complex. Xception has the lowest accuracy on the Trypanosome dataset and the Toxoplasma dataset, but it has the highest accuracy on the Plasmodium dataset. Likewise, DenseNet 169 as well as InceptionV3 have the lowest accuracies on most of the datasets.

Besides this, the performance of the models has been also analyzed by using different performance metrics such as precision, recall, and F1 score for different classes of parasites as shown in Fig. 11.

Figure 11
figure 11

Examining the performance of models for different classes using SGD.

Adam

This subsection defines the performance of the models for different performance metrics on fine tuning their parameters by using ADAM optimizer.

On analyzing the models, it can be seen that all models did quite well during training as well as the validation period, as shown in Table 9. It has been found that InceptionResNetV2, EfficientNetB3, and VGG19 performed well by computing 99.99%, 99.98%, and 99.92% training accuracies along with the lowest loss and RMSE values of 0.12 (0.34), 0.17 (0.41), and 0.16 (0.40) respectively as compared to the others. On the contrary, it has been found that for the validation phase, InceptionResNetV2computed the highest accuracy of 99.96% followed by InceptionV3 and EfficientNetB3 obtained an accuracy of 99.94% and 99.91% respectively while the best loss and RMSE values have been obtained by InceptionResNetV2 with 0.13(0.36) followed by EfficientNetB0 and VGG19 with 0.14(0.37) each.

Table 9 Evaluation of models during training and validation phases using Adam optimizer.

In addition to this, the models are examined based on their learning curves of accuracy and loss during both training and validation periods as shown in Fig. 12.

Figure 12
figure 12

Learning curves of applied models using Adam optimizer.

The layers of all the models have been iterated for 10 epochs and it can be found that they have obtained their best value or score either at the 10th epoch or in between the 8th to 10th epoch for both accuracy and loss. Besides this, here also large gaps have been seen in the performance of a few models such as MobileNetV2 and ResNet50V2 which directs us toward the overfitting error of the model.

Other than this, the models are also evaluated for another set of parameters i.e. precision, recall, and F1 score whose results are shown in Table 10.

Table 10 Analysis of models for different parameters using Adam optimizer.

It can be assayed from the table that the best performance has been showcased by VGG19, EffcieintNetB3, and ResNet152V2 with 0.99, 1.00, and 1.00 as precision, recall, and F1 score respectively followed by InceptionV3, ResNet50V2, MobileNetV2, and Xception. The performance of these models indicates that they perform very well in the classification task, achieving near-perfect accuracy and completeness. On the contrary, DenseNet169 has the lowest precision, recall, and F1 score among the listed models, indicating that it may have more false positives and false negatives compared to the other models. The models other than the mentioned ones also tried their best to perform well for these metrics.

After examining the performance of the models for the classification of various parasites, confusion matrices of 10 × 10 have been generated which is a crucial tool for assessing the performance of multi-class classification models, as shown in Fig. 13. Based on these matrices, true positive, true negative, false positive, and false negative values are being taken to examine the performance of models for different classes, as shown in Table 11.

Figure 13
figure 13

Confusion matrix of applied models uaing Adam optimizer.

Table 11 Evaluation of models during training and validation phases for different classes using Adam optimizer.

In the case of Babesia_1173, we can observe that InceptionResNetV2 and EfficientNetB0 achieved impressively high training accuracies of 99.76% and 99.56%, respectively. However, Xception experienced a noticeable drop in validation accuracy to 54.21%, indicating some level of overfitting from 96.49% accuracy during the training phase. EfficientNet B3 showed consistency between training and validation with accuracies of 98.46% and 98.59%, respectively, suggesting its stability. In contrast, ResNet50 V2 exhibited lower training accuracy (62.63%) but excelled on the validation set with 99.76% accuracy, indicating the potential for generalization despite challenges in the training phase. Moving to Leishmania_2701, we observe VGG19 and Inception V3 performing well in training with accuracies of 98.59% and 97.56%, respectively. However, Inception V3 and Xception struggled to generalize, achieving only 62.29% and 56.21% accuracy respectively on validation. EfficientNet B3 showcased consistency with training and validation accuracies of 98.76% and 90.31%, respectively. ResNet50 V2's training accuracy was lower (62.29%), but its validation performance was strong at 97.26%. In the case of RBCs_8995, VGG19, and xception had a noticeable gap between training (97.63%) and validation (88.59%) accuracy and training (98.73) as well as validation (54.76%) accuracy respectively, suggesting overfitting. In contrast, Inception V3 exhibited strong performance with 97.45% training and 98.22% validation accuracy. EfficientNet B3 and DesneNet169 faced challenges with 94.05% and 97.63% training and 91.49% and 88.59% validation accuracy respectively. ResNet152 V2 showcased robustness with 99.36% training and 95.46% validation accuracy, while ResNet50 V2 demonstrated consistent performance with 94.38% training and 98.19% validation accuracy. Lastly, for Toxoplasma_400X_3758, InceptionresNetV2 achieved a high training accuracy of 99.67% and VGG19 as well as EfficientNetB0 obtained a lower validation accuracy of 91.31%. EfficientNet B3 exhibited consistency with training (98.45%) and validation (98.23%) accuracy. ResNet152 V2 excelled with 99.05% training and 96.36% validation accuracy, while Xception faced challenges with 64.63% accuracy and 149.33 loss training but performed well on the validation dataset with 98.59% accuracy. Similarly, ResNet50V2 performed well for the training dataset with 99.36% accuracy and showed the lowest performance at the validation phase with 54.21% and 343 loss.

In the same way, the performance of models for the other set of classes is also being examined and the results are mentioned in the aforementioned table. Besides this, the performance of the models has been also examined graphically for precision, recall, and F1 score on being trained with a different class of dataset as shown in Fig. 14.

Figure 14
figure 14

Performance of the models for different classes using Adam optimizer.

Computational time

While training the deep learning with the dataset, the computational time varies with various factors as well as the choice of optimizers which plays an important role in this regard. In Table 12, training times of the applied models with different optimizers i.e. RMSprop, SGD, and Adam have shown some interesting behaviours.

Table 12 Computational time of the models.

It has been found that on incorporating the RMSprop optimizer, the Xception model did the training of the dataset within 2 h 50 min while the maximum time was taken by InceptionResNetV2 with 16 h 20 min. Likewise, on using SGD optimizer to fine-tune the parameter of deep learning models, the minimum time to train the dataset was taken by ResNet152V2 with 1 h and the maximum was taken by ResNet50V2 with 14 h 15 min. In the end, fine-tuning the parameters of deep learning with ADAM optimizer, the model which computed the least time is VGG19 with 4 h 5 min and the max was taken by DenseNet169 15 h 49 min.

RMSprop and SGD appear to be the fastest optimizer for most of the models, with shorter training times compared to Adam. This suggests that RMSprop and SGD are efficient at converging to good model weights quickly. However, it's essential to consider that the performance of the optimizer may vary depending on the specific problem, dataset, and hyperparameters. On the other hand, Adam, a popular optimizer known for its adaptive learning rates, often falls between RMSprop and SGD in terms of training times.

Conclusion

This study represents a significant advancement in the field of parasitic disease detection and classification. Harnessing the capabilities of deep learning models, coupled with meticulous image processing techniques, this research has demonstrated exceptional accuracy and efficiency in identifying and categorizing various parasitic organisms. The integration of deep learning models, including VGG19, InceptionV3, EfficientNetB3, Xception, MobileNetV2, ResNet50V2, ResNet152V2, DenseNet169, EfficientNetB0, and InceptionResNetV2, along with strategic optimization using RMSprop, SGD, and Adam, has yielded remarkable results. Incorporating these optimizers significantly enhanced the performance of the models, with InceptionResNetV2 achieving the highest accuracy.

Furthermore, the applied models were evaluated based on precision, recall, and F1 score, consistently achieving values around 0.99. This research not only demonstrates the effectiveness of artificial intelligence in parasitology but also underscores the importance of interdisciplinary approaches in scientific research. Despite these achievements, certain challenges were encountered, such as overfitting due to large iteration gaps and extended computational time required for training with the dataset. Addressing these challenges in future research is crucial, and diversifying the training dataset with a broader range of parasitic organisms is recommended to enhance the model's robustness and applicability in real-world scenarios.