An efficient estimation of crop performance in sheep fescue (Festuca ovina L.) using artificial neural network and regression models

Festuca ovina L. (sheep fescue), a perennial grass plant found in mountainous regions, is important from both an ecological and economic viewpoint. However, the variability of biological yield of sheep fescue due to its reliance on different characteristics makes it difficult to accurately prediction using classic modeling techniques. In this study, machine learning methods and multiple regression models (linear and non-linear) are used to investigate the interdependence of various morphological and physiological characteristics on accurate prediction of the biological yield (BY) of sheep fescue. Principal components analysis and stepwise regression were used to select six agronomic parameters i.e. thousand seed weight (TSW), relative water content (RWC), canopy cover (CC), leaf area index, number of florescence, and viability (VA), while the output variable was BY. To optimized the artificial neural network (ANN) structure, different transfer functions and training algorithms, different number of neurons in each layer, different number of hidden layers and training iteration were tested. The accuracy of the models and algorithms is analyzed by root mean square error (RMSE), mean absolute error (MAE), and determination coefficient (R2). According to the findings, ANN models were more accurate than regression models. The ANN model with two hidden layers (i.e. structure of 6–4–8–1) which had RMSE, MAE and R2 scores of 0.087, 0.065 and 0.96, respectively, was discovered as the best model for predicting the BY. In addition, result of the sensitivity analysis showed TSW, RWC and CC, in that order, were the variables most important for high-quality BY estimation in both models regardless of input combination. Finally, the paper concludes that early flowering sheep fescue genotypes with long maturation and great TSW must be regarded as the most suitable model for increasing BY in breeding projects.

www.nature.com/scientificreports/ seeded. The seeds of the Festuca ovina L. were collected from its habitat in Sablan rangelands, Ardabil, Iran (47° 52′ E and 38° 12′ N). After harvesting from rangelands, seeds of Festuca ovina are stored in the herbarium of Faculty of Agriculture and Natural Resources, University fo Mohaghegh Ardabili with number 1342. It should be mentioned that harvesting the seeds of rangeland plants from the rangeland of Iran is free, however, the permission to harvest the seeds of the Festuca ovina from the Sabaland rangelands was obtained from the Department of Natural Resources and Watershed Management of Ardabil province, Iran. According to two planting seasons, 9 fertilizer combinations, dividing each plot into three plots and three replications, 162 data were collected, 18 outlier data were removed and 144 data were used for analysis.
Applying of treatments and maintenance. Powder of potassium silicate nanoparticles (PSN), effective microorganisms (EM) and Super absorbent (SA) were supplied by Sigma Aldrich Company (Fig. 2), Emkanpazir Pars Company (Shiraz, Iran), and Bojnourd Water Crystal Production Company, respectively. The morphological study of these nanoparticles was conducted by scanning electron microscope (SEM). The superabsorbent polymer used in this research was purchased from North Khorasan Bolour Ab Company (Shirvan, Iran). Organic fertilizer (cattle manure) was prepared from the green space of University of Mohaghegh Ardabili. When the plants had four primary leaves, PSN and EM, were added to the soil of each clump as solution. Treatments of PSN and EM in three steps and ten days apart were added to soil as solution. Super absorbent and Organic fertilizer were added to the soil. For the Super absorbent treatment, 10 and 30 g of Super absorbent separately were mixed with 1 kg soil and pits were filled with them. For the Organic fertilizer treatment, 100 and 200 g/kg of cow manure were combined with soil and were poured in the pits. During the growing season, weeds were removed mechanically, without the use of chemical pesticides.
Field sampling and measurement. Three randomly chosen plants from each plot were evaluated for harvest plant information using indicators including plant height (PH) (At the end of the growth period, the length of the shoots and roots of the plant were harvested and cleaned, then measured with a precise ruler), basal diameter (BD) (The basal diameter of harvested plants was measured with a precise ruler), canopy cover (CC) (First, the geometric shape of the circle or oval of the canopy was recognized, then they were calculated based on the formulas of the area of the circle and oval) and etc. Ten morpho-physiological traits including plant height (PH (, basal diameter (BD), canopy cover (CC), number of florescence (NI) (At the end of the vegetative period, the number of inflorescences was measured), thousand seed weight (TSW) (The thousand seeds of each species was considered randomly and accurately obtained using a digital scale), viability (VA) (It was considered based on the number of holes in which seed cultivation was done. Total Chlorophyll (TCh) (The amount of chlorophyll in plant leaves was measured by a chlorophyll meter) 26 , leaf area index (LFA) (leaf area index was measured based on ground based method) 27 , relative water content (RWC) 28 and biological yield (BY) 29 were evaluated   30,31 . Summary of statistical indices for estimated characteristics are shown in Table 2. The relative water content of leaf was specified by the following equation.
where FW: fresh leaf weight immediately after sampling, DW: dry weight of leaf after drying in oven and SW: saturated leaf weight after placing in distilled water.
Data preprocessing and statistical analysis. In order to avoid bias's estimation due to differences in units of input variables, the following equation was used to normalization within in the ranges [0.1, 1] (Eq. 1).
where x i is the original value, x n is the normalized value, and x max and x min are the max and min values, respectively. The nature and size of connections between BY and other qualities were investigated using a multiple linear regression (MLR) model and nonlinear regression models such as Exponential, Logistic, Quadratic, Gompertz, Asymptotic exponential, and Chapman-Richard. We chose the MLR model as the best regression model to characterize the correlations between variables based on model performance results (Table 5). As a result, the MLR model was used as a starting point for additional research. It should be noted that the most important hypotheses for selected MLR model including the existence of a linear relationship between the dependent and www.nature.com/scientificreports/ independent parameters, the independence of errors across the independent variables and homoscedasticity (Durbin-Watson value of 1.83), the absence of multicollinearity between the predictor factors (variance inflation factor (VIF value < 5) and tolerance (TOL value > 0.2) were evaluated (Table 6). SAS Institute Inc's statistical analysis system (SAS software) version 9.4 was used to assess the hypotheses and normalcy.
Input parameters selection. Input variable selection (IVS) is a crucial phase in the creation of a statistical model that has a significant impact on the model's performance. In general, when a high set of input parameters are utilized for a small sample size, researchers typically use models with the fewest input variables to answer their problems 14,32 . The simple correlation as an input variable selection method only shows the magnitude of the relationship among attributes and does not provide clear information about different kinds of direct or indirect effects among them. This drawback is important because the correlation coefficient between two variables can be affected by indirect effects of other variable(s) in a positive or negative way, reducing the chance of achieving a unique solution 15 . Because correlation analysis is ineffective as an IVS method, we employed principal component analysis (PCA) and stepwise regression (SWR), two more well-known and powerful IVS approaches (Tables 3 and 4). Both SWR and PCA are methods to separate the variables influencing the dependent variable for modeling to reduce the data volume. In the PCA, a linear combination of independent variables with the highest relationship with the dependent variable is determined, and usually this linear combination justifies a high percentage of changes in the dependent variable. In the SWR method, the variables with the highest correlation with dependent variable are entered into the model, and in the final stage, a model with a combination of the most influential variables is obtained.  where y i is the biological yield, β 0 − β n are the regression coefficients, x 1 -x n are input factors, and ε is error associated with the observation. where y t is the network output (BY), n is the number of hidden nodes, m is the number of input nodes, f is tangent sigmoid transfer function, α j {j = 0,1,...,n} symbolize the weight vectors from the hidden to the output nodes, β ij {i = 1, 2, ..., m; j = 0,1,...,n} are the input weights to the hidden nodes, and α 0 and β 0j are the arc weights that lead from the bias terms, which are always equal to one. The initial fine-tuning of ANN topology was optimized by changing the hidden layers (1-3 layer), neurons in each hidden layer (1-30 neurons/layer), learning algorithms (Levenberg-Marquardt, Momentum and Conjugate gradient), transfer functions for hidden layer (Tansig, logsig and purelin), and the most effective network structure was created (Fig. 3).

Artificial neural network.
Each run on the training dataset was performed with a 2000 epoch size (training cycles) and a mean square error (MSE) cutoff value of 0.01 (based on normalized dataset scale). The mean of MSE values during the training, testing, and cross-validation stages in different epochs  was explored to obtain an appropriate training procedure and avoid overtraining. Around 700 epochs are sufficient for convergence between training, testing, and cross-validation, as illustrated in Fig. 4. The entire dataset in this study was 144, which was randomly divided into three subsets: training (65 percent), testing (20 percent), and validation (15 percent) 35 . Due to the Model implementation and sensitivity analysis of input parameters. Three statistical quality indicators, namely mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R 2 ), were utilized to objectively examine the efficacy of ANN and MLR models to predict the biological yield of sheep fescue according to its variables.  A sensitivity analysis was used to examine the impact of several independent factors on the outcome. Sensitivity analysis reveals the usefulness of each variable, and can be used to identify the components that are most important for forecasting output 36 . For this, the dataset was run without any input variables (i.e., TSW, RWC, CC, LAI, NI, VA), and the models' implementation was assessed using R 2 , RMSE, and MAE.
Ethical approval. Experimental research and field studies on plants were approved by Review Board of Faculty of Agriculture and Natural Resources, University of Mohaghegh Ardabili, Ardabil, Iran. All methods were carried out in accordance with relevant guidelines and regulations. Harvesting the seeds of rangeland plants from the rangeland of Iran is free, however, the permission to harvest the seeds of the Festuca ovina from the Sabaland rangelands was obtained from the Department of Natural Resources and Watershed Management of Ardabil province, Iran.

Results and discussion
Input variables selection. To create an applicable model, two powerful methods were used: principal component analysis (PCA) and stepwise regression (SWR) (applied a model with a small number of input parameters to account for a large proportion of BY variance as an output). Based on SWR, the six attributes TSW, RWC, CC, LAI, NI, and VA were incorporated in the model ( Table 3). The first two PCA components were chosen because their eigenvalues were > ≈ 1. Together, these two variables contributed for more than 86 percent of the variation (Table 4). According to PCA, the eigenvectors for the two initial components for the above six qualities were the largest (justifying over 86% of BY variation) ( Table 4). Based on the same results achieved from both IVS techniques (SWR and PCA), these attributes (TSW, RWC, CC, LAI, NI, and VA) were chosen to be the most proper input parameters for both the ANN and MLR models (Table 5).

Multiple linear regression (MLR) model development.
Although the convenience of linear regression models is the fundamental reason for their usage in research, these models can also accurately predict the output variable, particularly when there is a strong linear relationship between the input and output variables 37,38 . For this aim, six factors chosen among SWR and PCR (i.e., TSW, RWC, CC, LAI, NI, and VA), were included the MLR model (as independent parameters) to forecast the BY (as the dependent parameter) ( Table 6). The following formulas were used to forecast BY for all, training, testing, and cross-validation data sets, respectively: where y is the biological yield, x 1 is the thousand seed weight, x 2 is the relative water content of leaf, x 3 is the canopy cover, x 4 is the leaf area index, x 5 is the number of florescence, and x 6 is the viability.
According to above MLR equations (Eq. 7-10), the forecasted value of BY is a linear composition of TSW, RWC, CC, LAI, NI, and VA parameters, such that the sum of the squared deviations of the real and anticipated BY 7) All data: y = −0.167 + 0.909x 1 − 0.536x 2 + 0.403x 3 + 0.365x 4 + 0.264x 5 + 0.119x 6 (8) Training data: y = −0.143 + 1.04x 1 − 0.581x 2 + 0.421x 3 + 0.384x 4 + 0.252x 5 + 0.095x 6 (9) Testing data: y = −0.182 + 0.856x 1 − 0.495x 2 + 0.398x 3 + 0.324x 4 + 0.243x 5 + 0.131x 6 (10) Cross validation data: y = −0.144+0.928x 1 −0.552x 2 +0.414x 3 +0.370x 4 +0.259x 5 +0.114x 6 However, the ability of these models to forecast the BY is contingent on the existence of a strong linear connection among the parameters. An overview of the statistical variables for regression models created using various types of data is shown in Table 6. As can be shown in Table 6, the MLR models were unable to accurately anticipate the BY. Linear regression models are unable to predict performance due to a small number of input parameters or the existence of complex/nonlinear interactions among components 15 . Despite the fact that there is no MLR modeling research with like attributes to estimate BY in this study, it is obvious that the MLR model (with R 2 = 0.810) cannot accurately forecast BY.

Artificial neural network (ANN) model development.
In order to create an appropriate ANN model with same input and output parameters which were used for the MLR models, ANN models with some of the most essential ANN architectural features were trained and optimized. As shown in Tables 7 and 8, the least amounts of RMSE and MAE and the highest R 2 values were acquired by the ANN model with 6-7-3-1 structure, the Levenberg-Marquardt as learning algorithm, tansig as transfer function in hidden layers, and pureline transfer function in output layer. Although, Levenberg-Marquardt algorithm requires more memory compared other algorithms, it is a fast algorithm with high accuracy and efficiency, especially for small data samples (i.e. about 100) 39 . Various modeling studies have also concluded that Levenberg-Marquardt is the superior learning algorithm when compared to other algorithms like as Momentum and Conjugate gradient 5,15,32,40 . Tansig's superiority as a non-linear transfer function in our ANN models is most likely due to its characteristics in converting the analyzed input and afterwards conveying it to the output layer. It converts negative to positive infinity input values to a 0 to 1 output range 41 . On the other hand, linear transfer functions like "purelin" perform a basic linear conversion on the analyzed input before passing it to the output layer. However, the type of relationship between the input and output variables is critical in selecting the transfer function, and the higher implementation of nonlinear functions in the present research could be attributed to the nonlinear relationship between BY and input parameters. Nonlinear transfer functions, which cover non-linear fluctuations, have been used more than other transfer functions relying on the feature studied, particularly when there were non-linear correlations across qualities 14,40,[42][43][44] . www.nature.com/scientificreports/ Tables 7 and 8 demonstrate that the number of hidden layers and neurons within every layer, as well as the training algorithm and transfer function, all contributed significantly to the total variance in ANN efficiency. However, the complexity of the model depends on the nature of the subject, and an enhance in the number of hidden layers or the number of neurons in each layer does not always indicate an improvement in model efficiency 5 . In general, the ANN model we discovered in this research has a clean topology, which researchers prefer (with one or two hidden layers and a small number of neurons) 5,12,40,45,46 . The presence of significant nonlinear interactions between variables, as well as the creation of a model with an appropriate topology, can lead to such a high-performance forecast 14,32,40 . Mokarram and Bijanzadeh 18 applied the MLR model in conjunction with two ANN models, the MLP and radial basis function (RBF) models, to estimate biological yield (BY) and stated that the MLP model had the highest R 2 values for forecast of BY of Hordeum vulgare.
According to prior research, ANN modeling approaches are preferred over MLR modelling techniques 15,40,47,48 . This preference appears to be due to ANN modeling techniques' superior ability to capture the extremely nonlinear and complicated relationship between oil content and important parameters 11,15,49 . In predicting soybean yield, Kaul et al. 50 stated that ANN produces better findings than traditional statistical procedures.
Comparing MLR and ANN models for forecasting BY. To give a more complete contrast between the two modelling techniques for forecasting BY, the models were tested using statistical qualitative metrics and also visual display on scatter plots based on numerous datasets (Figs. 5 and 6). As illustrated in Fig. 5a-d the elected ANN models had higher efficacy to forecast BY than the MLR models (Fig. 6a-d), and compared with MLR models could forecast BY for all, training, testing and cross validation dataset with a 18.52, 17.17, 19 and 20.48% enhance in R 2 and a decrease of 30.49, 42.86, 37.35, and 32.22% in RMSE, respectively. The use of scatter plots to compare estimated and actual BY values for two models allows for a better understanding of the data distribution and the capability of the selection models to forecast the BY. The observed and forecasted values for the ANN model had the same distribution (with R 2 = 0.962 and 0.958 for the training and testing datasets, respectively), and the measured values of BY through the ANN model tend to track the matching actual ones very closely (as shown in Fig. 5).
On the other hand, the estimated and forecasted values for the MLR models, did not have a similar pattern as the ANN models, as demonstrated by the more dispersed distributions and outlines in Figs. 7 and 8. Also, Table 7. The performance of the artificial neural network model with different training algorithm and transfer to predict biological yield of F. ovina. a determination coefficient; b Root mean square error; c mean absolute error.  www.nature.com/scientificreports/ there was no significant difference in statistical summaries (i.e. minimum of sample, lower quartile, median, upper quartile, and maximum of sample) between the observed and measured data by the ANN model, based on box plots ( Fig. 7a and b). These characteristics, along with the absence of outliers (unusual data values) in box plots, are necessary modeling characteristics 15,32 . The presence of outlines and variations in the statistical summaries in the box plots generated by the MLR models demonstrated their poor efficacy for predicting BY, in contrast to the ANN models ( Fig. 8a and b). It appears that displaying three datasets (the real dataset, data estimated by the ANN, and MLR for BY) on a graph is a better technique for evaluating the efficacy of the two models in forecasting BY. The predicted BY values by the ANN model exhibited a more similar trend to the BY actual values and were more accurate in forecasting BY than the MLR model, as shown in Fig. 9. In general, according to the findings, it can be concluded that ANN models with the same input parameters can forecast the BY (R 2 ≈ > 0.95) more accurately than MLR models (R 2 ≈ < 0.82). This benefit could be attributed to nonlinear or complicated interactions among variables, as well as nonlinear functions' improved ability to find and take them in ANN models. Contrasting two models in forecasting BY, on the other hand, demonstrated the importance of selecting a model that is appropriate for the subject matter being examined. Many forecasting researches have demonstrated the ANN's benefit in modeling due to its excellent capacity to take highly nonlinear and complex relationships among variables 11,23,40,47,49 . www.nature.com/scientificreports/ Sensitivity analysis. In both ANN and MLR models, the sensitivity tests without a particular input parameter (i.e., TSW, RWC, CC, LAI, NI, and VA) were used to better understand the individual impacts of every input parameter and identify the most and slightest substantial inputs to anticipate BY, and the individual effects of input variables to forecast BY were arranged in Table 9 from highest to lowest. As illustrated in Table 9, both ANN and MLR models without STW had the lowest R 2 (0.654, 0.504) and the highest RMSE (0.123, 0.142), and MAE (0.102, 0.119), respectively. When the ANN and MLR models are performed without TSW, their ability to predict BY appears to be severely reduced. Based on the sensitivity tests, TSW, RWC, and CC, in that order, were the most crucial variables for forecasting BY in both models. The sensitivity analysis provided a valuable insight of how different variables affected the yield. According to previous studies, TSW are still the most crucial factors having considerable impacts on BY 51 . In addition to the three most important features to forecast BY, other qualities in the model (i.e. LAI, NI, and VA) explained roughly 28.2 percent and 23.1 percent of R 2 in the ANN and MLR models, respectively. Based on the findings of the sensitivity testing, cultivars having early blooming, prolonged maturity, and a high TSW are excellent for enhancing BY when designing the proper structure. These early blooming cultivars are not likely www.nature.com/scientificreports/ to experience stress at this important phase, and if they are not exposed to drought and warm tension at the end of the growing season under rainfed conditions, they can continue their vegetative and reproductive growth.

Conclusion
In harsh environmental conditions, such as rainfed areas under heat and drought stress, the inheritance of quantitative polygenic features such BY is decreased significantly. In such circumstances, the genetic benefit of selection is decreased, and direct selection has a poor effect on the targeted feature. Applying modeling techniques to determine and combine indirect selection indications will help create a favorable design for the targeted feature, which is one way to overcome this difficulty. In order to achieve this, we created an ANN model as well as an MLR model to forecast BY using attributes chosen using PCA and SWR techniques (i.e. TSW, RWC, CC, LAI, NI, and VA). According to the findings, the ANN model with the Levenberg-Marquardt learning algorithm, tansig transfer function, and two hidden layers (i.e. structure 6-3-7-1) predicted the BY more accurately than the MLR. The capacity of the ANN model to detect complex and nonlinear effects, as opposed to the MLR model, may explain the advantage of the ANN model in forecasting BY. The ANN model could be a favorable alternative to classic modeling approaches like as path analysis, regression, and so on, due to the significant difference in efficacy of the two models in forecasting BY. Sensitivity analysis for both models revealed that STW was the most influential component in predicting BY, followed by RWC and CC, respectively. So, genotypes of sheep fescue with early flowering, long maturation, and high TSW are optimal for increasing BY. It appears that designing breeding strategies to produce plants with the above structure could open up a new window in the future evolution of this plant.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.