Automated detection of selected tea leaf diseases in Bangladesh with convolutional neural network

Globally, tea production and its quality fundamentally depend on tea leaves, which are susceptible to invasion by pathogenic organisms. Precise and early-stage identification of plant foliage diseases is a key element in preventing and controlling the spreading of diseases that hinder yield and quality. Image processing techniques are a sophisticated tool that is rapidly gaining traction in the agricultural sector for the detection of a wide range of diseases with excellent accuracy. This study focuses on a pragmatic approach for automatically detecting selected tea foliage diseases based on convolutional neural network (CNN). A large dataset of 3330 images has been created by collecting samples from different regions of Sylhet division, the tea capital of Bangladesh. The proposed CNN model is developed based on tea leaves affected by red rust, brown blight, grey blight, and healthy leaves. Afterward, the model’s prediction was validated with laboratory tests that included microbial culture media and microscopic analysis. The accuracy of this model was found to be 96.65%. Chiefly, the proposed model was developed in the context of the Bangladesh tea industry.

CNN is a highly popular method to analyze the results and classify the problems in agriculture that merges the approaches of engineering technology and mathematics.The present research work has been undertaken to develop a deep learning model based on CNN for the automated detection of selected tea leaf diseases in Bangladesh and pave the way for future prospects for a wider range of applications.A dataset of tea leaves has been created by collecting samples from several tea fields to implement the proposed model.The dataset is used to detect four classes of leaves, namely grey blight, brown blight, red rust, and healthy leaf, and aims to provide an effective solution for identifying common tea foliage diseases in Bangladesh.

Literature review
Plant parts are diseased when microorganisms such as fungi, bacteria, and viruses invade a plant to thwart its natural growth 15 .The disease of tea can affect several parts, such as the leaf, stem, nods, and root.In this work, only selected major tea diseases of Bangladesh are considered relevant to tea leaves, including (a) 'brown blight' disease, one of the destructive foliage diseases of tea-producing countries.The disease is caused by Colletotrichum camelliae, which is widely regarded as the most significant plant pathogenic fungus worldwide 16 .Isolation and identification of varied characteristics and pathogenicity of Colletotrichum camelliae and Colletotrichum fructiocola from various tea plants were done 17 , where they collected 38 Colletotrichum from tea plants and cultured the pathogens on PDA and SNA medium for the collection of conidia and appressoria.(b) 'grey blight/blister blight' disease is also a destructive, widespread fungal disease.The disease is also caused by Exobasidium vexans, which affects production in several tea-dependent economies, resulting in huge losses.This disease generally impacts tea crops from June to September 18 .(c) 'red rust' disease is an algal disease that can infect tea plants.The algae produce huge spores, which are hypha-like vegetative bodies.After passing the maturation stage, the sporangia are separated and spread around by rain, dew, and wind to healthy plants.Young and old tea plants are attacked via red rust under unfavorable conditions of soil and climate 19 .Cultural characteristics of Cephaleuros parasiticus and a microscopic view of zoosporangium-containing zoospores were presented by 10 .They curated tea leaves infected by red rust disease from several tea plantations in southern India.They tested the different broth media for isolation and found that Trebouxia and Bristol media were most suitable for C. parasiticus.
One study evaluated seven types of tea leaf diseases using an artificial neural network.For classification, they got 90.16% accuracy for LeafNet, superior to conventional SVM and MLP algorithms 20 .Another similar study generated a method to identify and classify plant leaf diseases via k-means clustering and artificial neural networks with a precision of around 93% 21 .Leveraging the computational efficiency of MobileNet, researchers investigated its potential for robust plant leaf disease detection and classification tasks.They worked on 5 categories of tomato disease and used 5512 images while obtaining 98.7% average accuracy 22 .MobileNet was also employed for disease detection in cassava plants in another study where researchers obtained an impressive precision of 80.6% on a dataset of images and 70.4% on a dataset of videos 23 .Researchers used both AlexNet and VGG16 to classify tomato crop diseases.They got a classification accuracy of 97.29% for VGG16 and 97.49% for AlexNet from the image dataset obtained from a secondary source 24 .One particular report analyzed 1796 images of maize leaves and achieved 97.8% accuracy on the validation set 25 .Another particular model of CNN detected the disease of the tomato plant leaf, with a 98% success rate for VGG-16 and 99.23% for GoogLeNet 26 .
Existing studies on automated tea leaf disease detection using complex models such as ResNet, VGG16, and others require a vast amount of data to achieve an optimal result, which is typically countered with secondary data.This study aims to use high-resolution but limited images collected directly from tea fields in Sylhet, Bangladesh, and achieve a promising result.The contribution of this study is as follows: 1.This work introduces a novel dataset of diseased tea leaf images acquired directly from tea gardens in Sylhet, Bangladesh.This unique dataset can serve as a valuable resource for training and testing CNN models and their modifications for tea leaf disease detection, potentially benefiting both our present study and future research efforts in this domain.2. To our knowledge, limited research was conducted with CNN to recognize patterns of brown blight, grey blight, and red rust along with healthy leaves from primary images.This study shows the implementation of a basic CNN architecture to achieve competitive accuracy.3.This study, for the first time, uses an additional layer of validation testing where an independent set of diseased and healthy leaves collected from fields were tested, which was coupled with microscopic confirmation of causal pathogens.This step was to address adaptability to real-world situations, and hopefully this methodology will provide future researchers with a foundation to improve the dimensions of this work.

Materials and methods
The proposed approach for the development of CNN model is presented in block diagram in Fig. 1; The entirety of the proceedings was carried out in Python.

Image dataset
The images in our dataset were acquired by collecting samples and capturing images directly from the different fields of Sylhet division: (i) Bangladesh Lakkatura Tea Garden, Sylhet, Bangladesh; ii) Malnicherra Tea State, Sylhet, Bangladesh; and (iii) Experimental Tea Garden, Department of Food Engineering and Tea Technology, SUST, Sylhet, Bangladesh.Tea leaves and images were collected in compliance with relevant guidelines provided by the authorities from each respected tea field, and formal permission was acquired.Throughout the data curation and disease identification process, four field experts were consulted under the supervision of Dr. Mainuddin Ahmed, a renowned entomologist in Bangladesh and former director of the Bangladesh Tea Research Institute (BTRI), and Professor Iftekhar Ahmad from the department of Food Engineering and Tea Technology, Shahjalal University of Science and Technology (SUST).The research did not involve any rare or endangered cultivars.The primary dataset was manually labeled using LabelImg, an open-source image annotation tool developed by TzuTa Lin from Label Studio.Images were captured using a Canon EOS 90D DSLR camera equipped with an 18-55 mm STM lens.The dataset comprised 3,330 images, with a distribution across classes as follows: 24.29% brown blight, 26.06% grey blight, 24.50% red rust, and 25.15% healthy leaves.While many researchers use several accessible image datasets for research purposes, such as the "PlantVillage" image dataset, the APS image dataset, and so on 27 , our image dataset consisted wholly of primary material.It is considered that the megapixels of the camera and its orientation can affect the quality of the captured images 28 .To ensure the model could learn from detailed visual information, we prioritized capturing high-quality images for our primary dataset.This ensures the model can learn from detailed visual information, potentially leading to more accurate disease identification compared to secondary datasets with lower resolution or compressed images.This approach ensures the model extracts the most valuable information from each image for accurate disease detection.The samples were transported in a sterile polythene zipper bag, labeled by field experts, and brought to the laboratory for testing.The samples were stored at a room temperature of approximately 25 °C.Following are the signs and symptoms of the various commonly occurring tea diseases in Bangladesh: • Brown blight: yellow-brown to chocolate-brown spots appear at the boundary area of leaves.Little black dots, similar to fruiting bodies, are produced on the affected part.• Grey blight: little, slightly brown spots become visible on the upper surface of the leaves.Finally, the spot becomes dark brown with a slightly gray appearance at the core of the affected part.• Red rust: tiny circular to semi-circular red-colored spots come into view on the surface of the infected leaf.
Afterward, the small spots become steadily thicker.
In this paper, the dataset consists of pictures of four classes of RGB images.Some example images for each category of the dataset are shown in Table 1.

Image preprocessing
The primary purpose of image pre-processing is to improve the quality of image and reduce the unwanted noise resulting from the presence of dust, shadows, and complex backgrounds.Preprocessing is a necessary step to make images suitable for further processing 29 .We read images using cv2.imread()function, and it converts images into BGR format.The images were transformed to RGB format, and the initial size of the images was large and dissimilar.To reduce processing time, the size of the images was reduced to 224 × 224, as shown in Fig. 2.
Overfitting is one of the inherent obstacles in machine learning that results from limitations of the data set, such as noise, a limited training set, classification complexity, etc. 30 .To improve the generalizability of our model and address the limitations of the dataset size, data augmentation techniques were performed using flow_from_directory().As the pixel value of our images ranged from 0-255, we used rescale-1./255to normalize it within 0 to 1 in order to reduce convergence and mitigate sensitivity due to lighting variations 31 .Furthermore, shear_range = 0.2 and zoom_range = 0.2 were applied for shearing transformation and randomly zooming images, respectively.Additionally, randomly flipping the images horizontally and vertically was also done.
Our dataset exhibits some degree of class imbalance, with class sizes ranging from 809 to 868.To address this imbalance and improve model performance for the minority class, we employed over-sampling, a technique that increases the representation of the minority class in the training data.We chose oversampling over under-sampling because our dataset size was clearly not overly large, and under-sampling could potentially discard valuable information 32 .Moreover, oversampling is reported to have performed well on comparable datasets such as ours 33 .

CNN model implementation
As in the case of our limited dataset, CNN is a pragmatic starting point because it offers a good balance between efficiency, capability, and ease of use while maintaining high performance 34,35 .Owing to its simplicity and interpretability, a basic CNN was chosen to conduct this particular study, which can also be used as a gauge for comparison to other advanced models.A typical CNN model is fundamentally composed of convolution layers, activation layers, max-pooling layers, and fully connected layers.It is an architecture that receives a single vector of images as input and converts them into a series of multiple hidden layers.All the hidden layers consist of a collection of neurons, and every neuron is connected with all the neurons of previous layers.Firstly, the model extracts feature and then classifies the diseases.In this case, 70% of the total data were randomly selected for the purpose of training, and the other 30% were selected for testing.Various hyperparameter configurations were explored, which included different numbers of convolutional layers, filter sizes, and learning rates, and the architecture of Fig. 3 was found to have the best fit.The model was designed with four convolutional layers.The first three convolutional layers are composed of 64 filters of size 3 × 3. The last convolutional layer employed 128 filters of size 3 × 3. Four consecutive max-pooling layers were incorporated to further reduce dimensionality and extract features.The shear range and zoom range were 0.2.The batch size of the model was 16, while the learning rate was 0.001.The loss function was categorical cross-entropy.The images of the dataset were resized to fit into 224 × 224 dimensions, which were chosen as they were relatively close to the average size of all images.The padding of the model was kept the same for the study where stride was one.The Rectified Linear Unit (ReLu) function has been used as an activation function in each convolution layer.After the extraction of features, the final pixel matrix was flattened and transferred to the fully connected network as input.Softmax has been used in the fully connected layer as an activation function.Then the model was compiled with the Adam optimizer.We opted for the Adam optimizer due to its advantages in deep learning tasks, such as efficient handling of sparse gradients and good convergence behavior 36 .The tensor details at each layer of this architecture are presented in Table 2.
The architecture of the proposed research work was presented in Fig. 3; Convolution Layer: A convolutional layer is fundamental to the CNN model to perform feature extraction.The model was designed with 4 convolutional layers.Convolution layers take RGB images as input from a specific folder via given instructions and send them as output to another layer for further processing.After receiving input, the layer reads pixels as values to produce feature maps.The output calculation was done by summation of an element-by-element multiplication of the input pixel value with the filter value.An example of a convolution operation for a 5 × 5 input image and a 3 × 3 filter is presented in Fig. 4a.
Activation Layer: The architecture utilized ReLU as the activation layer after each convolutional layer.The model was designed with four layers of ReLU activation functions.By incorporating ReLU, the model enhanced its non-linear properties while preserving positive values from the convolutional layer.It also changes all negative values to zero.The equation will be: f(x) = max (0, x).An example of a ReLU operation with a 5 × 5 matrix is presented in Fig. 4b.
Pooling Layer: 4 Max-pooling layers were used in this model, where the maximum value taken via maxpooling from every sub-region was bound via filter.An example of a pooling operation is presented in Fig. 4c.
Fully connected layer: These were used in the model to detect very high levels of features, and those layers connect to the previous layers where neurons transfer vectors of input by a weight matrix.The fully connected layer output was one-dimensional via flattening the output of the pooling layer.The classification of diseases is performed in this fully connected layer.

Metrics of evaluation of the model's efficiency
The efficiency of the CNN model was estimated by precision, recall, f1 score, and accuracy.These metrics are commonly employed to determine the performance of a machine learning algorithm.Precision, shown in Eq. (1),  www.nature.com/scientificreports/ is the estimation of true positives among all the positive predictions made by the model.Recall, or true positive rate (Eq.2), focuses on how well the model is capable of identifying all the positives.The F1 score is defined as a metric that merges precision and recall into a single value, as presented in Eq. ( 3).The range of the metrics is from 0 to 1, where a higher score indicates better performance.Moreover, the accuracy of each individual class was also measured.Accuracy is a metric used to measure how well a model can correctly identify objects within an image.It's calculated by dividing the number of correct predictions by the total number of predictions made.Following are the equations for these performance metrics:

Method of laboratory tests
To identify the pathogens, a total of 120 symptomatic leaves from three diseased classes and 40 healthy leaves were brought into the laboratory for microscopic examination in order to observe the pathogen's morphological characteristics.The healthy leaves were observed with the naked eye.The surface of the symptomatic leaves was washed with 2% Sodium Hypochlorite (NaOCl) for 2 min, rinsed in sterilized distilled water.The infected part of the leaves was cut with a sterilized blade, and the smaller portions were further cut into smaller pieces.Afterwards, Potato Dextrose Agar solution was made by following the manufacturer's ratio (39 gm/1000 ml).
Then the suspension was heated and stirred on a magnetic stirrer to dissolve the agar solution.Dissolved media was autoclaved at 15 lbs.pressure and 121 °C for 15 min.After sterilization, dissolved PDA was poured into Petri dishes, and it was allowed to cool down until it solidified.250 µl samples from 10 -1 and 10 -2 dilutions were poured into different petri dishes using micro pipet.The petri dishes were wrapped in parafilm and incubated in a chamber at 25 °C for 6 days.For microscopic observation, a clean slide was taken, 1 drop of lactophenol cotton blue was added to the slide via a plastic dropper, a small portion of culture was taken from petri dishes via sterilized forceps and put into the cotton blue drop, smeared by fungal or algal loop, and the prepared slide was observed under a binocular microscope.

Results and discussion
Image dataset preparation, model implementation, and performance metrics were included in this work.A large dataset of primary images was built.We focused on a model based on a convolutional neural network that was implemented with 40 epochs, where epochs are defined as the number of times the entire training set is passed through the network, and training, testing, and validation were applied.We employed early stopping to prevent overfitting by monitoring validation accuracy and loss during training.Training was stopped after 40 epochs as the gap between validation accuracy and training accuracy plateaued, and validation loss did not exhibit significant improvement.It took approximately 13 h for training.

Simulation environment
The proposed CNN model was developed using numpy, pandas, os, re, shutil, PIL, matplotlib, tdqm, cv2, tensorflow, and Keras libraries.The model was implemented by an Anaconda environment (Jupyter Notebook) on a laptop with Windows 10 Pro, 8 GB of random-access memory, an Intel Core i5 Central Processing Unit, a 256 GB solid-state drive, and a 64-bit operating system.

Performance measure
The  37 , while another study 38 showed confusion metrics for different 10 classes to evaluate different performances.The classification performance of their model was 90% to 99%.In the confusion matrices presented in Fig. 6, the number of true positives, false positives, true negatives, and false negatives was analyzed.
Performance metrics such as precision, recall, and F1-value were identified in the model because they are indicative of its strengths and limitations.First of all, various performance evaluation metrics were calculated for brown blight disease, where the true positive number is 189, the false positive number is 8, and the false negative number is 11.True positivity means both predicted and actual values are positive.On the other hand, false positive implies predicted values are positive, but the actual values are negative.Moreover, false negatives mean both predicted and actual values are negative.The results of precision, recall, and F1-value of the proposed model are graphically represented in Fig. 7.
We got 0.9642 of precision, 0.945 of recall, and 0.9545 of f1-score for brown blight disease.For grey blight disease, the true positive number is 193, the false positive number is 10, and the false negative number is 7.This resulted in 0.9507 of precision, 0.965 of recall, and 0.9578 of f1-score for grey blight.Similarly, for the healthy www.nature.com/scientificreports/leaf class, the precision, recall, and f1-value were 0.9653, 0.975, and 0.9699, respectively.Finally, for red rust disease, 0.9549 of precision, 0.99 of recall, and 0.9924 of f1-score have been found for the proposed model.A recent, comparable study on tomato leaf disease detection found that the red rust class had 0.975 precision and a 0.982 f1 score in their study by VGG-16, which was highest, while lowest performance was observed in the brown blight class 26 .The accuracy of each disease class was also evaluated and presented in graphical form in Fig. 8.Our model achieved exceptional performance across all disease classes, with individual accuracies ranging from 97.75% to 99.63%.This degree of consistency demonstrates the model's ability to avoid overfitting and to identify various tea leaf diseases.
The performance of the proposed CNN model was compared against the methods reported in the literature according to accuracy presented in Table 3.
Confirmation with Laboratory Tests: To the best of our knowledge, no previous automated disease identification study conducted confirmation of disease-causing pathogens to solidify the predictions.For the purpose of cross-validation with laboratory tests and to ensure the generalizability of our model to real-world scenarios, we collected an independent validation set of 160 unknown leaf samples directly from the tea plantations.This approach ensured the samples originated from the same geographical location and potentially shared similar environmental factors and tea plant varieties as the original dataset.To ensure our unknown samples reflected a realistic mix of disease presence, we implemented a stratified random sampling approach within the tea plantations.This involved dividing the entire collection of tea leaves into smaller groups based on the specific disease they exhibited (healthy, brown blight, grey blight, red rust).We then randomly selected samples from each of these disease groups, aiming to collect approximately 40 samples for each category.Sourcing unknown samples from the same plantations ensured consistency in the ecological context, including disease prevalence specific to that region.While these leaf samples weren't part of the original dataset, they were randomly collected to provide a complementary layer of validation to the model's predictions in a real-world context.These validation samples were observed in the laboratory with microbial culture media and microscopic analysis to compare causal pathogens and further ensure the specificity of the predictions.Table 4 presents the prediction results of the model compared to the results of laboratory tests.performance.The model also achieved high overall accuracy (96.65%), along with strong recall, precision, and f1 scores ranging from 0.94 to 0.99.Notably, these results surpass those reported for previously employed CNN models in the discussion section.Additionally, cross-validation with laboratory tests yielded a perfect match between the model's predictions and the presence of causal pathogens confirmed through microscopic examination of all samples.However, the current study has its limitations.Future research could explore more sophisticated oversampling techniques like Synthetic Minority Oversampling (SMOTE), advanced CNN architectures like ResNet, VGG, or DenseNet, potentially fine-tuned with transfer learning techniques, to investigate their adaptability and performance on the task of tea leaf disease classification.Additionally, dataset expansion remains a key area for

Figure 1 .
Figure 1.Block diagram of proposed approach.

Figure 3 .
Figure 3. CNN Architecture for detection of tea leaf diseases.
model's performance was measured in various ways.The loss of the model was minimized by the Adaptive Moment Estimation (ADAM) optimizer.It is observed that the final training accuracy of the model is 97.02% and the validation accuracy is 96.65%.Accuracy and loss after implementation of the model are graphically presented in Fig. 5.A notable increment in both training and validation accuracy was observed during the first five iterations of training, accompanied by a simultaneous reduction in associated training and validation losses.From epochs 5 to 40, the accuracy and loss fluctuated gradually, and the validation accuracy swiftly increased from epoch 17.The confusion metrics of our proposed model have been designed based on the samples that are unknown to the model dataset.Thakur et al. evaluated five different publicly available datasets of different species of crops

Figure 5 .
Figure 5. Accuracy and loss after CNN model implementation.

Figure 6 .
Figure 6.Confusion matrix of the proposed model.

Figure 7 .
Figure 7. Performance evaluation of CNN model.

Table 4 .
Cross-checking CNN model's prediction with laboratory test.https://doi.org/10.1038/s41598-024-62058-3www.nature.com/scientificreports/improvement.Future work should focus on collecting a wider variety of diseased tea leaf samples encompassing diverse cultivars, fertility stages, and shooting angles within field settings.By incorporating microscopic confirmation of disease presence alongside image data, researchers can further strengthen the model's accuracy and reliability.