A CNN-based model to count the leaves of rosette plants (LC-Net)

Plant image analysis is a significant tool for plant phenotyping. Image analysis has been used to assess plant trails, forecast plant growth, and offer geographical information about images. The area segmentation and counting of the leaf is a major component of plant phenotyping, which can be used to measure the growth of the plant. Therefore, this paper developed a convolutional neural network-based leaf counting model called LC-Net. The original plant image and segmented leaf parts are fed as input because the segmented leaf part provides additional information to the proposed LC-Net. The well-known SegNet model has been utilised to obtain segmented leaf parts because it outperforms four other popular Convolutional Neural Network (CNN) models, namely DeepLab V3+, Fast FCN with Pyramid Scene Parsing (PSP), U-Net, and Refine Net. The proposed LC-Net is compared to the other recent CNN-based leaf counting models over the combined Computer Vision Problems in Plant Phenotyping (CVPPP) and KOMATSUNA datasets. The subjective and numerical evaluations of the experimental results demonstrate the superiority of the LC-Net to other tested models.

significant number of leaves are overlapped.Also, the angle of the images and the lighting effect on the leaves can affect the efficiency of leaf area segmentation.Another crucial factor for analysing plant growth in plant phenotyping is determining the leaf count of the plant 6 .It is also challenging and time-consuming process to determine the number of leaves.A comprehensive plant phenotyping method for camera-captured images contributes to the cost reduction and improvement of plant and agricultural production.As a result, numerous researchers are looking into plant phenotyping using images recorded by cameras.
Furthermore, current CNN-based methods show their efficient performance in accurate leaf count predictions.A CNN is a class of neural networks within the field of DL.CNNs are designed with one or more convolutional layers and are mostly used for tasks such as image processing, classification, segmentation, and analysis of auto-correlated data.CNNs are primarily employed for the purpose of extracting features from grid-like matrix datasets, particularly in the context of image analysis.The CNN architecture encompasses various levels, including the input layer, convolutional layer, activation layer, pooling layer, and completely linked layers.The convolutional layer is responsible for applying filters to the input image in order to extract relevant features.The activation function is then applied in an element-wise manner to modify the matrix values, specifically converting negative values to zero.Following this, the pooling layer is employed to downsample the image, thereby reducing computational requirements.Finally, the fully connected layer is used to make the ultimate prediction.The process by which the network acquires the most effective filters is achieved through the use of backpropagation and gradient descent.
Therefore, in this manuscript, a novel CNN-based model for leaf counting has been proposed called LC-Net.The proposed model takes segmented leaf parts as additional input for achieving better leaf counting accuracy.Hence, well-established SegNet 7 is utilized as it performs better leaf segmentation compared to existing wellestablished CNN-based models, namely, DeepLab V3+ 8 , U-Net 9 , Fast FCN with Pyramid Scene Parsing (PSP) 10 , and Refine Net 11 .Lastly, the proposed LC-Net has been compared with the other state-of-the-art leaf count models, and the performance of all tested leaf count models has been tested over the combined CVPPP and KOMATSUNA datasets.The qualitative and numerical results indicate that the proposed LC-Net outperforms the existing leaf counting results.
In a nutshell, the significant contributions of the proposed work are enlisted in the following: -

Related work
The past few years have witnessed successful research work in the domain of phenotyping of plants.In the recent past, several research papers have been published on this hot topic.For example, in Ref. 12 14 , MSU-PID 15 , and KOMATSUNA 16 datasets, respectively.In Ref. 17 , two novel DL approaches had been proposed, elaborated, and compared by Farjon et al. for visual leaf counting tasks.The first model fed input image in different resolutions into the network that had ResNet-50 as the backbone network to estimate features of leaves at multiple scales.Next, in order to get the final count, repetitive regression of leaf heat map got from the previous step.The second model counted the number of leaves by locating the centers of the detected leaves and finally aggregating them.They got 1.17 MSE and 43.7% of the percentage agreement on the CVPPP dataset 14 .Miao et al. 18 proposed a dataset for maize leaf counting and also proposed two DL methods to count the maize leaves.The first model counted the leaves by regression, and the second model counted the leaves by detection.In Ref. 19 , Lu et al. came up with a better way to use DL to count the dense leaves and leaves that overlap in natural environments.The developed approach was used to detect the object by merging a space-to-depth module, an Atrous spatial pyramid pooling, and a convolutional block attention module into the network.The experimental results showed that the improved DL approach achieved 96% accuracy.To improve the phenotyping process, in Ref. 20 , Karthik et al. introduced a unique semantic segmentation pipeline for the segmentation task.In CVPPP14 competition, a number of deep CNN models such as U-Net, Attention-Augmented Net, and Attention-Net were introduced.These networks were trained using the Arabidopsis Thaliana plant dataset.The Attention-Net achieved a 0.985 dice score, which is the best among others.
www.nature.com/scientificreports/In Ref. 21 , Kumar et al. came up with a new orthogonal transform domain-based method to segment the leaf region and further counted it by fine-tuned deep CNN models.On the CVPPP dataset, fine-tuned AlexNet and VGG19 had been used to count the leaves and got 25.51%, 33.67% of percentage agreement, 5.43, 2.03 of MSE, 1.71, 1.03 of AbsDiC, and 0.39, 0.11 of DiC, respectively.In Ref. 22 , Buzzy et al. proposed a novel real-time object detection approach for the identification, localization, and quantification of plant leaves.A comparative analysis was conducted between a Tiny-YOLOv3 model and a faster R-CNN model.The Tiny-YOLOv3 and Faster R-CNN models were evaluated based on several performance metrics, including DiC, AbsDiC, MSE, and percentage agreement.The obtained values for these metrics were 0.25 and 0.0556, 0.8056 and 1.2778, 2.0833 and 2.8889, and 56% and 27.78%, respectively.Hati et al. 23 presented a regression model for the purpose of leaf counting.The images were subjected to segmentation and subsequent enhancement, resulting in the removal of noise and transformation of the pixel data associated with the leaves.Subsequently, the images were inputted into the regression model, which is founded upon the architecture of AlexNet.In Ref. 24 , Ayalew et al. presented a domain-adversarial learning method in which a domain adaption technique was used to estimate a density map for leaf counting.Due to its flexibility in accommodating variations in distribution across source and destination datasets, the method exhibits potential for application in a broader spectrum of leaf counting and plant organ counting scenarios.The method got -0.95 of DiC, 1.56 of AbsDiC, 5.26 of MSE, and 29.33% of percentage agreement on the CVPPP dataset.Gomes and Zheng 25 presented an experimental study on the limitations of datasets used for phenotyping and the performance strategy of the leaf segmentation tasks.They also looked at how test-time augmentation and model cardinality might help with single-class image segmentation.Another investigation had been conducted in Ref. 26 by Yang et al.where they utilized a mask R-CNN based model to effectively segregate and classify leaf images that contained intricate backgrounds.
In the year 2019, a new approach 27 to extract leaf regions from images of plants and count the number of leaves had been introduced by Kumar et al.There were three phases to the proposed methodology which are statistical image enhancing technique, a graph-based leaf area extraction approach, and a circular hough transform (CHT) based technique.In Ref. 28 29 to count the number of maize leaves.To reduce redundant information, the Fisher Vector (FV) was used, and the Random Forest (RF) method was used to get the final prediction.They got 0.0018 of DiC, 0.35 of AbsDiC, and 0.31 of MSE.
In 2018, Giuffrida et al. presented a single deep network 30 that counted the number of leaves from multimodal 2D images of different species for any rosette-shaped plant.The model had a DiC of 0.19, an AbsDiC of 0.91, a MSE of 1.56, and a 32.9 percentage agreement.For the CVPPP dataset, the proposed approach achieved a segmentation accuracy of 95.4%, a DiC of 0.7 and an AbsDiC of 2.3.In Ref. 31 , Ubbens et al. proposed a plant phenotyping dataset by demonstrating that performance could be improved on the leaf counting task using 3D synthetic plants.These synthetic plants were also applied to augment a dataset.They recreated the architecture used in the reference experiment using the Ara2013-Canon dataset in the augmentation experiment 3 for the augmentation experiment.They also switched the real and synthetic image datasets utilized to both train and test the DL-based model.Additionally, it was demonstrated that these datasets could be utilised in a comparable manner for training a neural network to accurately quantify the number of leaves.
For the plant phenotyping task, Aich et al. developed a data-driven strategy in Ref. 32 that could be employed to a variety of plant species and imaging configurations.They used a deconvolutional network to segment the leaves, and the predicted images were used in a CNN for leaf counting.The model got 0.73 of DiC, 1.62 of Abs-DiC, 4.31 of MSE, and 24.0 of Percentage Agreement on the CVPPP dataset.A Data Augmentation technique had been proposed by Kuznichov et al. in Ref. 33 for segmenting the leaves followed by the counting the leaves from the images of rosette plants.In Ref. 34 , Itzhaky et al. developed two novel deep learning algorithms designed for the purpose of leaf item counting.The researchers utilised the CVPPP 2017 Leaf Counting Challenge dataset to demonstrate the efficacy of these methods.The findings show that they defeated the CVPPP challenge winner from 2017.In Ref. 35 , Pape and Kulkas introduced a methodology for leaf segmentation that utilises edge detection techniques.Additionally, they devised a method to analyse images by employing the software IAP 36 in order to extract a multitude of image attributes that might be utilised for estimating the quantity of leaves.
The aforementioned discourse unequivocally illustrates the immense utility of performing precise tasks, such as leaf segmentation and counting, in the field of plant phenotyping.Furthermore, it should be noted that there exists a restricted range of CNN models and techniques that have demonstrated the capability to generate precise outcomes across two widely utilised datasets, namely CVPPP and KOMATSUNA.It has been observed that these algorithms frequently encounter difficulties in scenarios including a green background and/or a substantial amount of overlapping leaves.Consequently, in order to address this deficiency, the primary objective of this research endeavour is to construct a Convolutional Neural Network (CNN) model, specifically referred to as LC-Net.The LC-Net, as proposed, demonstrates the ability to achieve higher predictions in leaf counting even when confronted with various types of backgrounds.Furthermore, LC-Net demonstrates the capability to precisely quantify the number of leaves, even in cases when they are overlapping.The next part provides a detailed description of the LC-Net model that has been proposed, as well as an overview of the dataset that was utilised in this study.

Methodology
The proposed LC-Net has been designed to count the number of leaves in the rosette plants.The suggested model takes both the RGB image of a rosette plant and the segmented image of its leaves as input in order to enhance accuracy.The subsequent subsections provide an exposition of the LC-Net architecture and an elucidation of the design's underlying reasoning.

CNN based leaf segmentation
The necessity of accurate leaf segmentation arises from the fact that the segmented leaf portion provides an additional benefit to the proposed LC-Net model, as detailed in Section 2.2.This study employs five well-known CNN models, namely DeepLab V3+ V3++ 8 , SegNet 7 , Fast FCN with PSP 10 , U-Net 9 , and Refine Net 11 , to accurately segment the leaves of rosette plants.The effectiveness of an image segmentation CNN model is determined by its backbone.The backbone network is used to estimate the feature maps from the image, and followed by, these feature maps are further applied and processed in order to get desired result 37 .The backbone of the DeepLab V3+ model is the modified Aligned Xception 8,38 .VGG-16 39 serves as the backbone for SegNet, while ResNet-101 40 is utilized as the backbone for both FCN with PSP and Refine Net.U-Net 9 segments using its own backbone.This paper adopts SegNet as the segmentation model for the proposed LC-Net because SegNet provides the best visually and numerically segmented results.In the section on experimental results, the segmented results of the tested CNN models are presented.

Normalization layer
The normalization layer has been utilized to improve the segmented outcomes.This normalization Layer's purpose is to eliminate any unwanted pixels from the images.The uneven background or light reflection could be the cause of these unwanted pixels.The precision of segmentation and leaf count may be affected as a result of these circumstances.The inclusion of the normalization layer has facilitated the construction of models that exhibit enhanced predictive accuracy.Figure 1 shows the segmented outcomes of SegNet with and without using normalization layer.Visual analysis clearly shows that utilization of normalization layer over SegNet's output enhances the results.

Proposed LC-Net model for leaf counting
This work adopts the conventional workflow of traditional computer vision systems in which the segmented output and counting model's input are integrated as depicted in Fig. 2b which also indicates the entire architedture of the proposed model.The counting model has obtained additional information from the segmented output.Figure 2a indicates the structure of a Conv Block which is utilized in Fig. 2b.One Conv Block is made of convolution layer, batch normalization, and activation function.Batch normalization is a technique employed to enhance the speed and stability of the CNN.The leaf segmentation model and leaf counting model are trained independently of each other.The training of the counting model relies on the segmentation model, as the output generated by the segmentation model are utilised in conjunction with the RGB image input.Furthermore, the CNN models under consideration are trained without the inclusion of supplementary information pertaining to the specific species of the plant.LC-Net demonstrates a high level of accuracy in quantifying the quantity of leaves in a substantial portion of the dataset by solely utilising the segmented leaf regions as input.Nevertheless, it is worth noting that in certain instances, the segmentation model may generate inaccurate segmentations for a subset of the data.Consequently, it has the potential to impact the precision of leaf counting in the proposed LC-Net.Furthermore, the quality and texture of the original images are significant factors in CNN based leaf counting models.It has been observed within the dataset employed that certain images exhibit suboptimal quality, particularly in relation to leaf area.The average intensity of certain images is poor.Hence, the exclusive utilisation of original images also impacts the efficacy of the leaf counting model being proposed.As a consequence, both the original and segmented outcomes are inputted into the LC-Net in order to enhance the accuracy of leaf counting.We have also used a Normalization layer, which is nothing but a filter which makes the pixel values zero which are less than a threshold which is set to 0.5 from the experience during experiment.The Normalization Layer is elaborated in the section Section 2.1.The two inputs are concatenated at the beginning of the model.
Then we have used one (1 × 1) Conv Block followed by three (3 × 3) Conv Block which is a replacement of one (5 × 5) Conv Block to reduce the parameter size.After that a maxpooling layer is added then again one (1 × 1) Conv Block followed by three (3 × 3) Conv Block.We have used (1 × 1) convolution with less filter size before all (3 × 3) convolutions to reduce the number of parameters used in the model.

Experimental results
The model is trained and tested using a system containing an NVIDIA GeForce 1650 having cuDNN CUDA 10.0, a 256 GB SSD, 16 GB RAM, and an AMD Ryzen 5 3550H CPU.TensorFlow and Scikitlearn were used in all of the studies.The experimental results have two folds: one is leaf segmentation and other is leaf counting.Both the results are presented in the following sub-sections.

Dataset design
As stated earlier, experiments on both the leaf segmentation and leaf counting have been done on two well-known datasets.They are (i) the KOMATSUNA [14] dataset, and (ii) the plant phenotyping benchmark dataset, known as CVPPP 14 .In our experiment, these datasets are merged.The KOMATSUNA dataset has been annotated for the leaf counting task by the experts.The CVPPP dataset have four sections.They are named as A1, A2, A3, and A4.All sections are included with segmented ground truth.On the other hand, there are two portions in the the www.nature.com/scientificreports/KOMATSUNA dataset.Among them, one portion is made by capturing the images by a RGB-D camera, while the images in the other segment of the dataset is captured by several RGB cameras.As a whole, there are total of 1200 images in these two categories.As a result, there are a total of 2010 images along with the corresponding segmentation and leaf number ground truths in the the combined dataset.Bilinear interpolation 41 has been used to resize the images into 224×224.Subsequently, the combined dataset is partitioned into distinct subsets, namely the training, validation, and testing sets of images.In our experiment, the training set consists of 1410 images, whereas the validation and test sets each contain 300 images.In addition, a series of vertical flips, 90 • and 180 • clockwise and anticlockwise rotations are randomly applied in order to generate a greater quantity of images.In Fig. 3, some of the samples from each of the dataset are shown.In order to compare more accurately with stateof-the-art CNN models, another dataset is prepared using 160 images chosen randomly from the CVPPP dataset.

Results of leaf segmentation
This section presents the results of CNN based leaf segmentation models.The five common CNN models such as Fast FCN with Pyramid Scene Parsing (PSP), DeepLab V3+, SegNet, U-Net, and Refine Net are employed for the proper leaf segmentation.For all CNN based leaf segmentation models, the well known 'binary cross-entropy'  1.
In order to evaluate and compare the performance of the segmentation process by the CNN models, three commonly used metrics such as, (a) segmentation accuracy (AC); (b) intersection over union (IoU); and (c) dice score (DI), are used.The quality metrics are summarised in Table 2.
In the training period, DeepLab V3+ shows the best performance among all.DeepLab V3+ achieves 97.38% of train accuracy, 97.30% of test accuracy, and 0.0108 of train loss.But while the models have been tested on the merged dataset and on the CVPPP dataset, the SegNet model shows the best results.Test results on the merged and CVPPP datasets are shown in Tables 3 and 4, respectively.SegNet achieves a 95.04% dice score and a 90.58%IoU.Therefore, the SegNet model has been chosen as the segmentation model for the proposed leaf counting

Results of leaf counting
This section presents the results of CNN based leaf counting models.The proposed LC-Net has been compared to VGG 21 , Alex Net 21 , and the model proposed by Ubbans et.al. 31 over merged dataset.In addition to that the proposed LC-Net is also additionally compared to counting models developed by Ayalew et.al. 24 , Giuffrida et.al. 30 , and Aich et.al. 32 over CVPPP dataset for better comparison.The parameters settings for the mentioned CNN models are performed as per Table 5.For both images i.e. original image and predicted image, the size is considered as (224, 224).The batch size varied based on machine capacity and the model used.
Table 1.Parameters Setting of the existing CNN-based models used for leaf segmentation.

DeepLab V3+
The input image has three channels and is (224, 224) in size.The backbone's filters are identical to the original 8 .For prediction, the sigmoid activation function is utilised.DeepLab V3+ has 3.5M trainable parameters.

SegNet
The input image has three channels and is (224, 224) in size.The backbone's filters are identical to the original 7 .For prediction, the sigmoid activation function is utilised.We have 16.7M trainable parameters for the SegNet model.

U-Net
The input image has three channels and is (224, 224) in size.The backbone and decoder filters are identical to the original 9 .Similar to the Fast FCN with PSP model, at the final level, the sigmoid activation function is applied in order to predict the class levels of each pixel.There are 31.0Mtrainable parameter present in the U-Net model used for the leaf segmentation.

Refine Net
The input image has three channels and is (224, 224) in size.The backbone and decoder filters are identical to the original 11 .Similar to the above mentioned models, at the last layer, the final prediction is performed using the sigmoid activation function.The Refine Net model have 89.1Mparameters that are needed to be trained.
Table 2.The quality metrics for leaf segmentation.

Sl. Parameters Remarks
1 AC AC is the ratio of the total of correctly identified pixels and the total number of Pixels.The higher accuracy value represents better results 42 .

IoU
The value of IoU ranges between 0 and 1.It indicates the amount of overlapping is present between predicted image and the ground truth.In the existing literature, if the IoU for the outputs of the model is more than 0.5, then it is considered that the model is predicting well 43 .

DI
Basically DI indicates the similarity of predicted image with the ground truth.It is calculated by matching the overlapped region of the predicted image by the technique and the ground truth image 44 .A higher DI value better performance of the participating technique.6.
After training, we have tested all models on our merged test dataset and the CVPPP dataset as well.We have used four evolution measures, i.e., Abs DiC, MSE, R 2 , percentage Agreement (%) and testing loss.Table 7 shows the test results of the models on the merged dataset, and Table 8 shows the test results of the models on

Effect of the combined input
One of the major features of the proposed LC-Net is the combined input i.e. original image and segmented image.Therefore this section discusses the influence of this combined input in the proposed LC-Net.As a result, to study the consequences of combined input, a performance comparative study has been made with and without combined input.Hence, two CNN models have been compared which are (i) LC-Net with combined input, and (ii) LC-Net with original image input.Table 9 shows the numerical results of the LC-Net with combined and A novel CNN-based model to count the number of rosette plants' leaves has been proposed which has been provided two inputs i.e., segmented output and the original image for better accuracy -The proposed LC-Net model is tested on the merged version of the dataset of KOMATSUNA and CVPPP (annotated for the leaf counting task by the experts).It is seen that the proposed LC-Net model outperforms existing state-of-the-art techniques.-The proposed model use a normalization layer that is used to filter the unwanted pixels present in the images.Application of this layer in the proposed LC-Net is discussed in detail in Sect."CNN based leaf segmentation".The remainder of this manuscript is structured as follows.Next, Sect."Related work" summarises about the existing work in the related field.Thereafter, Sect."Methodology" discusses the proposed technique.Furthermore, Sect."Experimental results" presents the experimental investigation.Lastly, Sect."Conclusion and future work" contains the final observations and concluding remarks. https://doi.org/10.1038/s41598-024-51983-ywww.nature.com/scientificreports/

Figure 1 .
Figure 1.Visual representation of the significance of the normalization layer.
www.nature.com/scientificreports/Four commonly used metrics namely28,21 , (a) Mean Square Error (MSE); (b) Abstract DiC; (c) R 2 ; and (d) Percentage Agreement are utilized to study the performance of Leaf counting models.The brief descriptions of the mentioned metrics are reported in Table

Figure 4 .
Figure 4. Qualitative comparison of the segmentation of proposed method with existing approaches.
, Valente et al. presented a preliminary study showcasing the efficacy of a trained deep neural network in accurately quantifying the number of leaves in plant images obtained by greenhouse workers through the use of handheld equipment.They got 0.31 of DiC, 0.62 of AbsDiC, 0.77 of MSE, and 47% of percentage Agreement.A Google Inception Net V3 based CNN model had been used by Jiang et al. in Ref.

Table 3 .
Test result that are obtained by various architectures on merged dataset (Test set size = 300 images).Significant values are in bold.

Table 4 .
Test result that are obtained by various CNN model on CVPPP Dataset (Test set size = 160 images).Significant values are in bold.

Table 5 .
31rameter settings of the CNN-based models for leaf counting.imagehasthreechannelsandsegmentedmaskhasonechannel.The size of both input image is (224, 224).For prediction, the Linear activation function is utilised.We have 45M trainable parameters for VGG model of Kumar et.al.andusedlossfunction is half MSE and the optimizer is SGD with 0.0001 learning rate, 0.9 momentum (As mentioned in Kumar et.al.21)AlexNetTheoriginalinputimage has three channels and segmented mask has one channel.The size of both input image is (224, 224).For prediction, the Linear activation function is utilised.We have 53M trainable parameters for Alex Net model of Kumar et.al.andusedlossfunction is half MSE and the optimizer is SGD with 0.0001 learning rate, 0.9 momentum (As mentioned in Kumar et.al.21).Ubbans et.al.The original input image has three channels and segmented mask has one channel.The size of both input image is (224, 224).For prediction, the Linear activation function is utilised.We have 45M trainable parameters for proposed model of Ubbans et.al.and the loss function is MSE (As mentioned in Ubbans et.al.31).Proposed LC-NetThe original input image has three channels and segmented mask has one channel.The size of both input image is (224, 224).For training, smooth L1 loss function has been used in Adam optimizer.We have 5M trainable parameters for our proposed LC-Net.

Table 6 .
Quality parameters for leaf count.It's calculated by taking the mean of the squares of the errors -i.e., differences between the predicated leaf number and the ground truth.Lower value indicates better results.2AbsoluteDiC (Difference in count) The calculation involves determining the average value of the absolute differences between the predicted leaf number and the actual leaf number.Lower value indicates better results.3 Coefficient of determination ( R 2 ) It measures how well the predicted values match the ground truth.High value indicates better results.4 Percentage agreement It takes the percentage of how many times the predicted value is exactly same as ground truth.High value indicates better results.

Table 7 .
Testing result of leaf count have obtained by different architectures on merged dataset (Test set size = 300 images).Significant values are in bold.

Table 8 .
Testing result of leaf count have obtained by different architectures on CVPPP dataset (Test set size = 160 images).Significant values are in bold.

Table 9 .
Numerical result for measuring the effect of combined input over CVPPP dataset (Test set size = 160 images).Significant values are in bold.