A hybrid machine learning approach for estimating the water-use efficiency and yield in agriculture

This paper introduces the narrow strip irrigation (NSI) method and aims to estimate water-use efficiency (WUE) and yield in apple orchards under NSI in the Miandoab region located southeast of Lake Urmia using a machine learning approach. To perform the estimation, a hybrid method based on an adaptive neuro-fuzzy inference system (ANFIS) and seasons optimization (SO) algorithm was proposed. According to the irrigation and climate factors, six different models have been proposed to combine the parameters in the SO-ANFIS. The proposed method is evaluated on a test data set that contains information about apple orchards in Miandoab city from 2019 to 2021. The NSI model was compared with two popular irrigation methods including two-sided furrow irrigation (TSFI) and basin irrigation (BI) on benchmark scenarios. The results justified that the NSI model increased WUE by 1.90 kg/m3 and 3.13 kg/m3, and yield by 8.57% and 14.30% compared to TSFI and BI methods, respectively. The experimental results show that the proposed SO-ANFIS has achieved the performance of 0.989 and 0.988 in terms of R2 criterion in estimating WUE and yield of NSI irrigation method, respectively. The results confirmed that the SO-ANFIS outperformed the counterpart methods in terms of performance measures.

Water resources are declining in many regions of the world. Due to climate change, increased air temperatures, and reduced precipitation, we will face a decline in water resources in the future 1, 2 . Iran is an arid and semi-arid region in terms of climate, the amount of rainfall, and the limitations of water resources in this region. Optimal use of available water resources is an important goal of water conveyance and distribution systems. Surface irrigation is one of the most common irrigation methods in the world. More than 95% of the agricultural land in Iran is currently under the surface irrigation method. Despite the complexity of this irrigation method, researchers and users have not paid much attention to it. The current efficiency of surface irrigation in Iran is estimated at less than 35% 1,3,4 . Surface irrigation is easy and needs inexpensive equipment to convey and distribute water in different areas. The maintenance and operation costs of the surface irrigation method are lower than other methods. Surface irrigation is performed according to topography and product type with different methods, including basin irrigation (BI) and two-sided furrow irrigation (TSFI). Due to the limited facilities for developing irrigated agriculture, increasing water-use efficiency by managing the irrigation and productivity of existing water and soil resources is necessary 4,5 . Considering the excessive consumption of water resources, especially in the agricultural part of Lake Urmia, it is essential to precisely estimate the water-use efficiency (WUE) and yield by using optimal irrigation methods combined with artificial intelligence methods. WUE is an essential factor for identifying the adaptability of crops in water-limited regions under current climate conditions and future global changes [6][7][8][9][10][11][12][13] . In addition, yield prediction, particularly strategic products is an interesting research topic for agricultural meteorologists due to the importance of national and international economic planning.
In the recent decade, researchers have evaluated the yield and WUE in orchards according to irrigation management and different surface irrigation methods [14][15][16][17][18] . Osman 19 stated that weak design and improper irrigation management in surface irrigation are the main reasons for low water-use efficiency. Lampinen et al. 20 investigated soil and plant data and evapotranspiration for irrigation management of walnut trees in California, USA. Fernandes-Silva 21 by examining the effect of different irrigation regimes (dryland irrigation with 30% and 100% • Introducing the narrow strip irrigation (NSI) method for the first time and estimate its WUE and yield parameters. The NSI method reduces the growth of weeds and prevents the penetration of water outside the shade of the tree. • Introducing the hybrid SO-ANFIS method to estimate the WUE and yield parameters of the NSI irrigation method. The SO-ANFIS takes the advantages of both SO algorithm and ANFIS methods. • Evaluating the SO-ANFIS method on a benchmark dataset and compared it with state-of-the-art WUE and yield estimation methods. The results justify that the proposed SO-ANFIS outperformed its counterparts in terms of performance measures.
The remaining part of this paper is organized as follows. Section "Materials and methods" describes the test case and the working principle of the proposed approach. In section "Results and discussion", the results and discussions are presented. Section "Conclusion" concludes the paper and presents some suggestions for future work.

Materials and methods
Test case. This study was conducted in the agricultural lands of Dolatabad village located in Miandoab region. Miandoab is a city in the northwest of Iran located in the southeast of Lake Urmia. The geographical coordinates of Miandoab are 46° 2′ N and 36° 58′ E at 1314 m above sea level (Fig. 1). In this region, the weather is variable, with relatively hot summers and cold winters. Miandoab is a significant agricultural region in West Azerbaijan province. The main crops in Miandoab are wheat, barley, sugar beet, corn, and apple orchards.
Field studies and sampling. In this study, a total of 120 field data from two farms under study (M 1 and M 2 farms in Fig. 1) were collected. This data set was randomly divided into two parts; 80% of the data was used for model training and the remaining 20% for tests. Soil sampling was performed from the end of the tree shading surface and three depths of 0-30 cm, 30-60 cm, and 60-90 cm. Three types of irrigation methods including BI, NSI, and TSFI, were considered. The cultivars studied were Golden Delicious. The distances of the trees were 6 × 6 m 2 . The dimensions of the control and treatment strips were 3.6 × 6 m 2 and 6 × 6 m 2 , respectively. The irrigation interval was considered equal to 10-15 days based on the climate condition to ensure an optimal outcome. The crop was harvested on September 30, 2021.
where Y denotes the economical yield was measured base on the delivered product to the market, I is irrigation water measured using a WSC flume, P e is effective rainfall and SW indicates soil water depletion from the root zone during the growing season. The SW is estimated based on the water balance at the selected farm.
Improving the economic water use efficiency at the farm level requires better adaptation and coordination of water use according to the needs of products at the time and amount of its use, which ultimately improves crop yield. This is possible by using new emerging technology and applying better management methods. Applying new management methods in planning for planting, irrigation, and using other inputs plays an effective role in achieving high WUE. Chemical and physical analyses of soil and fertilizers used are presented in Tables 1, 2 and 3. Application efficiency. Application efficiency (AE) indicates losses in the farm in the form of deep infiltration and runoff at the end of the farm. At each irrigation interval, AE is calculated as follows 43 :   Irrigation methods. Narrow strip irrigation. In the NSI method, the entire orchard surface is not irrigated, and evaporation losses are minimized. Therefore, the daily water requirement of the tree is mainly limited to the amount of transpiration from the aerial parts exposed to sunlight. Plant shading level is one of the critical factors in calculating the water requirement of trees. This parameter is determined experimentally in terms of the type and age of the plant (between 50 and 70%) 31 . Overall, in the NSI, the main area where transpiration occurs is the shading level. In the NSI method (Fig. 2), a space is created in the middle of the trees row in orchards. This area is dry during irrigation, and applied water is reduced due to the lack of weed growth and water evaporation from this area. In the NSI, the following equation is used to calculate the daily water requirement of the plant 31 : where R r is the maximum daily water requirement (mm/day), ET c is the maximum daily evapotranspiration (mm/day), and h s is the maximum shading level (%). According to the studies, the shading level for trees is 50-70% in the optimal state 31 . In this study, the shading level of apple trees was determined based on age and crown environment.
Two-sided furrow irrigation. In the TSFI method, water moves inside the furrow on both sides of the trees and deep in the soil irrigates the root development area vertically and laterally (Fig. 3). This method tries to wet the soil surface less. The water is directed by two furrows created on either side of the rows of trees. The distance of the furrows from the rows of trees varies depending on the distance between the rows of trees, soil texture, and age of the trees. By performing furrow irrigation, two dry parts are created in the orchard. These arid areas form one along the rows of trees and the other between the furrows in the middle of the rows of trees. Try to prevent weeds from growing in dry areas as much as possible with tools such as garden tractors, cultivators, or retractors. In fact, the existence of these arid areas and the lack of weed growth and water evaporation from these arid areas, which play a role in the real reduction of water consumption. If these two arid areas are full of weeds, water consumption will not be really saved and only the irrigation efficiency will increase due to the movement of water in the furrows 32 .
Basin irrigation. BI is a method in which water penetrates the soil permanently or intermittently, and the soil is permanently submerged (Fig. 4). In basin irrigation, water penetrates the crown area of the plant, and the problem of clogging heavy soils and reducing soil aeration occurs 32 .
In general, in the NSI method, compared to the TSFI and BI methods, the water in the shade of the tree travels in a straight path at the same width and travels to the next tree. In this method, water penetration is prevented outside the shade of the tree, and conditions for weed growth will not be provided.
Season's optimization (SO) algorithm. The SO algorithm is a population-based optimization metaheuristic 33 . It models the growing process of trees in four seasons of a year. Figure 5 illustrates the flowchart of www.nature.com/scientificreports/ the SO algorithm. The SO is an iterative algorithm in which each agent is called a tree. For solving an optimization problem, the algorithm starts its process with a population referred to as a forest. Each member of the population is called a tree which denotes a potential solution for the given problem. For an optimization problem f (X) = f (x 1 , x 2 , ..., x D ) with D dimensions, the initial forest F is initialized as follows 33 : where is r ij a random number in the interval [0, 1] generated by the uniform distribution u ij and l ij are the upper and lower bounds of t ij , respectively. The fitness of each tree is evaluated by a strength function.
The algorithm updates the trees using four operators, including renew, competition, seeding, and resistance. The renew phase models the impact of the spring on the growth of trees. The following equations are defined to model the renew phase mathematically: where R indicates the set of new seedlings, F y shows the forest at the yth iteration, A y is the number of seeds generated in the previous autumn and p r is the renew rate. The function randomly produces some seedlings in various locations of the forest. The algorithm does not execute the renew phase in the generation y = 0. The competition phase modes the growth of trees in the summer. In this phase, the trees compete with their neighbor trees on shared resources, including nutrients, water, light, and other resources. To simulate the  τ i is the normalized fitness of T i , which is calculated as follows: Then, Z i neighbors are elected to create the neighborhood zone. To simulate the impact of the competition on a neighbor T i , the below relationship is defined:  www.nature.com/scientificreports/ where T y j is the location of T j in the generation y. j is the value of competition index or crowdedness, which computes the effect of the neighbors on T j . D shows the number of variables of trees. The function ϕ(.) calculates the growth of T j in the same environment when its neighbors are ignored. S k indicates the strength/ fitness of the kth neighbor tree, j,k is the distance between T j and the kth neighbor, the variable j,k is the effect of the neighbor on the growth of the tree T j . The parameter γ ∈ [0, 1] is a random asymmetry index, which shows the value to which the impact of relatively weak neighbor is decreased 33 .
The new location of the cored tree T i is calculated as where T * shows the strongest neighbor tree around T i . The seeding phase is inspired by the seeding mechanism of trees in the autumn. In this phase, several trees are randomly selected and participate in the seeding phase. The number of seeds (A) at each generation is calculated as where p s indicates the seeding rate, which is a uniform random number. The ψ function identifies the fittest trees from the population. From each tree T i selected in the seeding phase, several elements are randomly identified, and their current values are updated with new random deals in the boundary of search space. Let m be a random number, and {t i1 , t i2 , ..., t im } are the elements selected from T i , where m < D . Each component t ij ∈ T i is calculated as ℓ is a two-valued variable, either 1 or − 1, and r ∈ [l j , u j ] is a random number.
The resistance phase simulates the resistance of the trees against harsh winter cold. The resistance operator removes the least-strength trees from the population. This operator is mathematically modeled as follows: where W is the collection of weak trees. χ(.) , removes p w × N trees from the forest, p s is the resistance rate. Figure 6. A big picture of the ANFIS system with two inputs. www.nature.com/scientificreports/ When the stopping measures are met, the algorithm updates the trees in the population by iteratively applying to renew, competition, seeding, and resistance operators. Finally, the fittest tree is identified as the optimal solution 33 .
Adaptive neuro-fuzzy inference system (ANFIS). The ANFIS integrates the artificial neural networks (ANNs) and fuzzy inference system (FIS) 34 . The ANFIS combines the advantages of both FIS and ANNs. The ANFIS system has high adaptation and fast learning capacity, captures the non-linear structure of processes, and causes less memorization. These characteristics make the ANFIS the best choice for predictive problems such as WUE and yield estimation problems. The ANFIS has been used successfully in various fields including mechanical design problems, chemical processes, data mining applications, communications, economics, geotechnical engineering problems, scheduling problems, and many other engineering problems.
In ANFIS, the relationship between inputs and outputs and the best values for the parameters related to the membership functions are identified by the fuzzy section and ANNs, respectively. The structure of ANFIS is determined considering the input data, rules, functions of the output membership function, and the membership degree. The ANFIS system with five layers is shown in Fig. 6. In the first layer, the level of dependence of each input data on different fuzzy domains is determined. The weight of the rules is obtained by multiplying the input values of each node in the second layer. The computation of the importance of regulations is carried out in the third layer. The rules layer is created by performing operations on the input signals described by the fourth layer. The network output is indicated by the fifth layer. ANFIS has n rules and m input components. Each rule R i is represented as follows: where x j indicates the jth input q ij indicates the membership function of the rule on x j , f i is the output of rule, The output of the network is presented as follows: where µ i indicates the activation degree of the rule. Each node has a function with adjustable parameters. µ i is defined as follows: In the current implementation of ANFIS, we used the Gaussian membership functions, which is defined as follows: where c ij and σ ij are the center and standard deviation of the Gaussian membership function, respectively. Gaussian membership function is a popular method for specifying fuzzy sets because of its smoothness and concise notation. Five factors should be determined in designing ANFIS, the number and type of input and output fuzzy sets, the number of iterations, and the optimization method. The SO algorithm was used to optimize the parameters of the ANFIS membership function. In this paper, the fuzzy c-means clustering (FCM) is used to create fuzzy inference system which obtained superior results in the literature. SO-ANFIS model. Two structural parameters of the ANFIS system are antecedent and consequent parameters. For tuning these parameters, researchers often used gradient-based techniques. The main drawback of the gradient-based methods is the low convergence rate and trapping in local optima. Meta-heuristic algorithms can be used as efficient alternatives to overcome the limitations of gradient-based methods in training the ANFIS model. To train the ANFIS system using the SO algorithm, two issues need to be determined: strength function and the boundary of variables. In this study, root means square error (RMSE) is used as a strength function for evaluating the performance of the ANFIS system. Assume the following relationship: The input variables are water consumption during the growing season (Ir), temperature (T emp ), average relative humidity (RH avg ), the amount of solar radiation in terms of sunshine hours (S sh ), and the rainfall (P e ) of each month of the growing season. The model parameters that need to be configured are σ , c, s 1 , s 2 , s 3 , s 4 , s 5 , s 6 . The variables s 1 , s 2 , s 3 , s 4 , s 5 , s 6 are consequent parameters, which should be measured during the ANFIS training process. The optimal values for parameters c and σ is measured by the SO algorithm. To identify the value of parameters c and σ , first, a forest composed of several trees is initiated. Each tree contains candidate values for R i : if I r is q 1 (σ 1i , c 1i ) and T emp is q 2 (σ 2j , c 2j ) and P e is q 3 (σ 3l , c 3l ) and RH avg is q 4 (σ 4k , c 4k ) and S sh is q 5 (σ 5t , c 5t ) then www.nature.com/scientificreports/ the ANFIS parameters. The trees are updated iteratively using four operators (renew, competition, seeding, and resistance). This process iterates for a pre-determined number of generations (Fig. 7).
Performance Criteria. In the present study, R 2 , RMSE, SI, δ, and NSE indices were applied to appraise the ability of the introduced hybrid method 35 :  Figure 7. Flowchart of the SO-ANFIS.

Results and discussion
Field monitoring. Table 4 and Fig. 8 present the results of irrigation depth, inlet flow, net irrigation requirement, and AE in research treatments. The first irrigation has the lowest WUE due to the dryer soil surface and impacts of tillage operations. Deep penetration losses are primarily due to the excellent permeability of the soil. The application efficiency increased to 21.4% due to the NSI method and irrigation time management compared to the two-sided irrigation method. Irrigation depth increased in treatment BI from the first to fourth irrigation event because of the loss of large amounts of water as deep infiltration and the scarcity of soil moisture increases at the point of root access. By increasing the irrigation depth, application efficiency decreased accordingly. Based on the results, the average increase in application efficiency in NSI compared to BI is about 62.40%. The amount of applied water in the NSI method was 3455 m 3 /ha, which indicates a reduction of 42.80% and 22.70% of applied water in NSI treatment compared to BI and TSFI treatments (Fig. 9). The decreases were mainly attributed to the less soil surface wetted area, which was minimum in NSI and NSI < TSFI < BI.  www.nature.com/scientificreports/ The estimated yield in BI, NSI, and TSFI treatments was 30,000 kg/ha, 35,000 kg/ha, and 32,000 kg/ha, respectively. Higher yield in NSI and TSFI attributed to the soil moisture condition. The yield of NSI and TSFI treatments compared to the BI treatment was 14.30% and 6.25%, respectively, and in comparison with each other, increased by 8.57% (treatment NSI compared to treatment TSFI) ( Table 5). The estimated WUE in NSI and TSFI treatments was 7.14 kg/m 3 and 5.24 kg/m 3 , respectively ( Table 5).
Effect of soil properties. The results of descriptive statistics are shown in Table 6. Based on 36 classification, the variation coefficient (CV), less than 15% shows low changes, between 15 and 35% moderate changes, and more than 35% great changes. According to this classification, soil sand content, tree age, and irrigation interval have moderate changes, soil acidity has low changes, and other variables have great changes (due to management factors). Absorbable phosphorus concentration at a depth of 30-60 cm in the soil is the most influential parameter for crop yield. The results of the present study are consistent with 37, 38 studies. Elimination of tree irrigation at different growth periods reduces the quality and quantity of crop yield. Sedaghati et al. 28,37 concluded that increasing the irrigation interval from 25 to 45 days increased the percentage of porosity. According to studies, water flow and irrigation interval have a positive effect on crop yield 38 . Therefore, reducing the irrigation interval with methods such narrow strip 38 can be considered as one of the management methods. Increasing the percentage of sand reduces the soil's ability to retain water and nutrients used by the plant. The age of the trees in this area is high, and with the increase of the tree's age, its ability to grow and produce its production gradually decreases.
Modeling results. Investigating the effect of input combinations. Yield and WUE of apple trees depend on various factors, including water consumption during the growing season (Ir), climatic factors including temperature (T emp ), average relative humidity (RH avg ), the amount of solar radiation in terms of sunshine hours (S sh) , and the rainfall (P e ) of each month of the growing season [http:// tatwe ather. areeo. ac. ir/? LRef= 52c6c 899-7597-412d-83d7-c4cd2 d0520 4b] ( Table 7) 39,40 .
To examine the most appropriate input parameters, different input combinations of parameters were evaluated. To select the most effective input parameters, first, all input combinations are considered to train the ANFIS model, and then the effective input combinations are selected. Next, ignore the remaining parameters one by one from the input combinations and train the model with the same structure and the rest ignored. This approach is also used by other researchers as given in references 39,40 . Figure 10 shows 7 of the best-performing models.
The results obtained by the proposed SO-ANFIS method using different input combinations are shown in Table 8. Psize and fitness function evaluations (FEs) in the SO-ANFIS are considered 50 and 3000, respectively. According to the results in the observed data, the model ω2 obtained the most accurate results. The irrigation parameter (Ir) was proposed as the influential input variable in estimating yield and WUE. Then, rainfall and  www.nature.com/scientificreports/ sunshine hours are essential, respectively. Sensitivity analysis showed that after irrigation and rainfall parameters, which affect leaves and plant reproductive growth, sunshine hours also play an important role in estimating yield. Montazer et al. 38 , Zeinadini et al. 40 , and Emami and Choopan 24 . Also stated that the amount of water consumed has an influential effect on crop yield . According to Figs. 11 and 12, it is clear that the yield using the SO-ANFIS hybrid method is estimated with high accuracy and is in good agreement with the observed values. Also, ω2 modeled the yield with lower error (RMSE = 0.006) according to the irrigation parameter.
The SO-ANFIS and ANFIS error distribution diagrams on the test stage are shown in Fig. 13. The results show that about 80% of the yield values estimated utilizing the SO-ANFIS have an error of less than 2%.
Comparison of SO-ANFIS with other methods. Table 9 compares the results generated by the proposed SO-ANFIS and other counterparts. The results confirm that the proposed SO-ANFIS outperformed its counterparts in estimating yield and WUE. Comparison of the results of the present study with other works shows acceptable accuracy (R 2 = 0.988 in test stage). Compared to similar studies such as Sharifi 26 and Prasad et al. 27 , which have evaluated the crop yield and WUE using the random forest (RF) and Gaussian process regression (GPR), the SO-ANFIS with R 2 = 0.988 and RMSE = 0.006 has a better performance than the mentioned methods and can be used as a powerful method in estimating the yield and WUE.

Conclusion
In this study, the effect of NSI method on yield and WUE in apple orchards was investigated. The SO-ANFIS method was proposed to estimate WUE and yield in the NSI model. In the SO-ANFIS, six models were created to determine the most effective parameters in estimating WUE and yield of NSI method. The SO-ANFIS with model ω6 generated the superior results with R 2 = 0.988, RMSE = 0.006, SI = 0.007, δ = 0.860, and NSE = 0.982, respectively. One of the future works is to apply the SO-ANFIS method to other engineering problems to identify its strengths and weaknesses.

Guidelines statement
All measurements and laboratory tests performed in this study are following scientific and international standards, such as soil texture determination 41 , volumetric soil moisture monitoring 42 , and water quality analysis (EPA).  www.nature.com/scientificreports/

Data availability
The data that support the findings of this study are openly available.