Winter wheat yield prediction using convolutional neural networks from environmental and phenological data

Crop yield forecasting depends on many interactive factors, including crop genotype, weather, soil, and management practices. This study analyzes the performance of machine learning and deep learning methods for winter wheat yield prediction using an extensive dataset of weather, soil, and crop phenology variables in 271 counties across Germany from 1999 to 2019. We proposed a Convolutional Neural Network (CNN) model, which uses a 1-dimensional convolution operation to capture the time dependencies of environmental variables. We used eight supervised machine learning models as baselines and evaluated their predictive performance using RMSE, MAE, and correlation coefficient metrics to benchmark the yield prediction results. Our findings suggested that nonlinear models such as the proposed CNN, Deep Neural Network (DNN), and XGBoost were more effective in understanding the relationship between the crop yield and input data compared to the linear models. Our proposed CNN model outperformed all other baseline models used for winter wheat yield prediction (7 to 14% lower RMSE, 3 to 15% lower MAE, and 4 to 50% higher correlation coefficient than the best performing baseline across test data). We aggregated soil moisture and meteorological features at the weekly resolution to address the seasonality of the data. We also moved beyond prediction and interpreted the outputs of our proposed CNN model using SHAP and force plots which provided key insights in explaining the yield prediction results (importance of variables by time). We found DUL, wind speed at week ten, and radiation amount at week seven as the most critical features in winter wheat yield prediction.

Weather data.Temperature (minimum and maximum), radiation, precipitation, relative humidity, and wind speed data at a daily time scale were provided by the Deutscher Wetterdienst (DWD) for 21 years (ranging from 1999 to 2019) which were then interpolated to a 1 km grid scale using the method explained by Zhao et al. 24 and aggregated to weekly values at the level of NUTS3 (Nomenclature of Territorial Units for Statistics 25 ) for input to the ML models.For aggregation to the level of NUTS3, area weighted average values were calculated using the agricultural land-use ratio in each grid cell.The calculations were based on CORINE Land Cover 2006 land-use data available at a resolution of 250 m 26 .The meteorological variables used in our analyses are wind speed, maximum and minimum temperature, relative humidity, precipitation, and solar radiation.
Soil data.Soil data was prepared and aggregated at DWD grids level using the major soil types (soil categories corresponding to the agricultural land-use categories as per CORINE Land Cover 2006).The source of the soil data is Germany's soil reconnaissance map available at 1:1,000,000 ratio (differentiated by BÜK1000N (BGR) land use) 27 .The soil variables included the volumetric (%) crop available water at permanent wilting point (LL), crop available water at field capacity (DUL), and saturation point (SAT), and bulk density (BD) available up to soil depth of 1.3 m.We used 1 km 2 as the spatial resolution of the soil data.
Crop yield data. Crop yield statistics of 271 counties at the subnational NUTS3 28 level for winter wheat was studied for years 1999 to 2019 across Germany.The regional database of Germany was used to extract NUTS3 crop yield data for the study 29 .
Crop phenology data.Instances of winter wheat sowing, flowering, and harvest were accessed from the DWD phenology database 30 .DWD 1 km simulation grids were created by resampling the NUTS3 level data and joining grid centers to NUTS3.In the next step, we randomly sampled 50 weather and soil samples from each county and calculated their averages to extract representative meaningful values for weather and soil data (Fig. S1 in Supplementary Information).
We downsampled the daily weather data by taking the average and aggregating the feature values at the weekly level.The daily data is considered too detailed and granular, which would be problematic for knowledge discovery.Using weekly information reduced the dimension of the weather data with a 365:52 ratio, which significantly diminished the number of model parameters.Such preprocessing and downsampling the daily weather data to a weekly level is a common practice in yield prediction studies 11,12,31,32 .The combination of weekly meteorological, crop phenological, and soil variables across 271 counties in Germany from 1999 to 2019 resulted in a total of 5692 instances and 277 column features.The summary statistics of winter wheat yield (dependent variable) and all independent variables used in the study are described in Supplementary Information (Tables S2 and S3). Figure 1 shows a heatmap of correlation coefficients across all variables used in the study.For the weather data, the average output across the weeks is used in the graph.Based on Fig. 1, the yield value has the highest correlation with flowering and harvest day of the year and relative humidity.There is a high correlation observed between relative humidity and sowing, flowering, and harvest day, and radiation values.
There are different sowing/harvest dates for various counties, which results in different growing season lengths.The histogram of growing season length across all counties in 2019 is shown in Fig. 2.
Based on the summary statistics results of input variables in Table S3, they have varying scales and a diverse range of values.To avoid the intrinsic influence and dominance of a single feature over prediction results, all feature values are standardized using the z-score normalization technique [Eq.( 1)].Using this method, all features are rescaled to have the properties of a standard normal distribution.There are also other benefits associated with standardizing the feature values, such as enhancing the models' numerical stability and improving the training speed.X j i refers to the i th instance of the j th feature ( j = 1 to K where K is the total number of independent variables).µ j and σ j are the mean and standard deviation of the j th feature.

Description of ML models. Proposed method (CNN).
The proposed method of this study combines convolutional neural networks and fully-connected (FC) neural networks for winter wheat yield prediction.The convolutional neural network part of the model takes in the weather variables measured through the growing season as input and captures their temporal and nonlinear effects using 1-dimensional convolution operations.All six weather variables go into the same CNN model separately, and we concatenate their corresponding  output of the CNN.The soil and phenology data go to a fully-connected neural network with two layers.We combine the high-level features of the CNN with the output of the fully-connected neural network for soil and phenology data.Finally, the combined features go to three other FC layers before the final yield prediction.The modeling architecture of the model is shown in Fig. 3.

Random Forest (RF)
. RF makes use of multiple trees and uses "bagging" for ensembling decision trees into a single model 33 .It uses bootstrapping to randomly select K samples from the original dataset.was used in this study to calculate the distances between the data points.The Euclidean distance equation can be represented as Eq. ( 2) where, d is the Euclidean distance; p and q are the data points consisting of various dimensions; n is the total number of data points, and i is an index number.

K-Nearest Neighbor (KNN
LASSO and Ridge Regression.Lasso and Ridge Regression are two distinct methods of reducing the variance of linear regression models by shrinking their coefficients.The Least Absolute Shrinkage and Selection Operator (LASSO) is a regularization method that can set the coefficients of some features to zero and consequently drop them from the model 36 .In this manner, Lasso seems like a feature selection algorithm.In LASSO, the L1 regularization is added to the objective function of the linear regression model, which shrinks coefficients toward zero.On the other hand, ridge regularization (L2 regularization) reduces coefficients toward zero, but it never sets any of them to zero 37 .

Regression Tree (RT).
A tree is created by splitting the root node of the tree into various partitions that constitute the successor nodes.The process is repeated recursively on each partition known as binary recursive www.nature.com/scientificreports/partitioning 38 .Initially, all data points are in the same group and the algorithm partitions the data into binary branches.Across the possible binary splits in each of the tree levels, the tree algorithm used in this paper chooses the split with the minimum sum of squared differences from the mean in the two partitions.The splitting continues till every node becomes a terminal node with a pre-defined minimum node size or if the node's sum of squared difference from the mean becomes zero (whichever is reached earlier in the tree).

Support vector regression (SVR)
. SVM uses a classification algorithm for predicting a continuous variable by fitting the best line within a threshold of values (epsilon-insensitive tube) 39 .This tube contains the margin of error the model can have.The distance between the data points and the tube is measured and labeled as the support vectors.The objective function of the optimization problem is shown in Eq. ( 3).
where w stands for the distance between the support vectors; C is a trade-off parameter defined by the user and ǫ i is the margin of error.
XGBoost.XGBoost stands for extreme gradient boosting (improved version of gradient boosted decision trees based on speed and performance), and its inception dates back to a project carried out by Tianqi Chen in 2016 40 .
XGBoost is an ensemble of decision trees; it consists of a sequential build-off from various decision trees where each tree works to enhance the performance of the prior tree.In XGBoost, the training of each tree is parallelized, and this significantly boosts the training speed.XGBoost has been widely used for crop yield prediction 11,14,41,42 .

Deep Neural Networks (DNN) .
DNNs are artificial neural networks with multiple levels of abstraction, which can learn the underlying representation of data without the need for handcrafted features 43 .The DNN model used in this paper is a feedforward neural network with multiple hidden layers.DNNs apply a nonlinear function to the output of each hidden layer which makes them highly nonlinear.DNN models are trained with gradient-based optimization methods to minimize the desired loss function for the task they are used.DNN models have shown great success by outperforming other traditional machine learning methods in predicting the crop yields 12,15,16,[44][45][46] .
The traditional neural networks with single hidden layer have also been widely used to estimate DUL and permanent wilting point 47,48 .This method was inspired by human brain structure 49 and can learn data pattern from the training dataset and save them as weighted connection of nodes.After learning, the network recognizes the pattern in the given dataset and predicts the output 50 .
We used Python programming language for writing the scripts, Python visualization libraries such as Matplotlib and Seaborn for drawing the graphs, and the machine learning and deep learning libraries of Python, including scikit-learn and TensorFlow for training and testing the prediction models.We also used QGIS software for plotting the spatial graphs.

Model interpretability.
Recently, there has been a massive inclination toward applying interpretation tools on black-box models such as tree-based ensembles (e.g., Random Forest and Gradient Boosting Trees) and deep learning models (e.g., DNNs, CNNs, and RNNs) for explaining the yield prediction results 11,12,14,51 .In general, deep neural network models lead to more accurate predictions than other interpretable machine learning models, such as Linear Regression.However, the black-box nature of deep learning models does not disclose the reasons (features) that are important in predicting crop yield values and whether these features are decreasing or increasing the yield outputs.Therefore, in this study, we apply SHAP as a post-hoc method to calculate Shapley values and explain the predictions of our proposed CNN model.In this way, the model achieves both optimal accuracy and decent interpretability.Shapley values were created based on the game theory 52 ; they can be used to show the contribution of each feature in the model's outcome.In computing the Shapley values, the mean marginal effect of feature values on the prediction results is analyzed by evaluating feature permutations for all features to calculate their associated Shapley values.More important features are assigned with greater Shapley values due to their higher contribution to the model results.This paper uses an implementation of Kernel SHAP called KernelExplainer, an efficient, model-agnostic method for estimating SHAP values for any model.

Input combinations and parameter calibration.
We used a combination of weekly meteorological, crop phenological (sowing, flowering, and harvest dates), and soil variables across 271 counties in Germany from 1999 to 2019 to train and test our ML models.This resulted in a total of 5692 instances and 277 column features.We tested various hyperparameter domain values on the validation set to achieve the best hyperparameter setting resulting in the lowest error on the test data.The validation dataset was chosen randomly using cross-validation from the data between 1999 and 2016.The best parameters setting was found using a grid or randomized search cross-validation with 1000 iterations on a 3-fold cross-validation setting (3000 trials for each model).The range of tested values for hyperparameters is chosen based on domain knowledge.The tested hyperparameters and the resulting best hyperparameter estimations for baseline models are shown in Supplementary Information (Table S4).The architecture and hyperparameters of the CNN model are described in Fig. 3.

Model evaluation.
The performance of prediction models was evaluated using mean absolute error (MAE) [Eq.( 4)], root means square error (RMSE) [Eq.( 5)], and correlation coefficient (r) [Eq.( 6)] metrics: www.nature.com/scientificreports/ŷi , ȳi and y i stand for the predicted, average, and ground truth target values, respectively.n stands for the total number of data points.

Results
Prediction results.Using the best hyperparameter settings achieved through hyperparameter tuning, three models are trained and validated for each of the machine learning methods in the study.The train-test splitting is done using a non-random hold-out method and is based on the time variable to ensure the valid training of the models for predicting future yield and avoiding retroactive prediction.For enhancing the reliability of the training and testing results, one model is trained on the data between 1999 to 2016 (training data) and tested on the data of 2017 (test data), the other model is trained using the data between 1999 to 2017 (training data) and tested on 2018 (test data), and the last model is trained on the data between 1999 to 2018 (training data) and tested on the data of 2019 (test data).The prediction performance results based on the train and test dataset are shown in Table 1.
Based on the results shown in Table 1, our proposed model (CNN) outperformed other tested methods to a varying extent due to its highly nonlinear structure as well as the ability to capture the temporal dependencies of weather data.XGBoost and DNN models showed comparable performance, while DNN was slightly better for some test years and XGBoost was marginally better during the others.DNN outperformed other baseline models (except XGBoost) based on all validation metrics as the DNN model is highly nonlinear and automatically discovers the relationship between the input data and the yield by extracting relevant features from the input data.The linear models, i.e., Lasso and Ridge Regression had a weak performance compared to nonlinear models due to their linear structure, which fails to capture the nonlinear effects of weather and soil conditions.XGBoost outperformed Random Forest, and Regression Tree models during most of the test years due to the use of gradient boosted trees which significantly improves its performance.KNN and SVR models had comparable performance, and they had a higher prediction accuracy than linear models due to capturing the nonlinear structure of variables.Although extensive hyperparameter tuning was conducted to find out the best set of hyperparameters leading to the minimum cross-validated prediction error, there is still a slight overfitting problem existent in the prediction results of some baseline models such as KNN, SVR, and Random Forest.We plotted the ground truth yield and predicted yield values of all the models using hexagonal plots in Fig. 4 to see how close the prediction results are to the ground truth values.Data values in the hex-binned plots are covered with hexagon arrays; the color of each hexagon is based on the number of observations it covers; darker hexagons show higher densities of data points, and lighter hexagons are representatives of lower densities.The deviation of the data points from the 1:1 line shows the distribution of residuals (for better observation of residual distribution, refer to Fig. 5).As shown in Fig. 4, for XGBoost, DNN, and CNN models, the predicted and ground truth values have a significantly stronger association compared to the other models, with CNN having the strongest association [XGBoost (r = 0.71), DNN (r = 0.70), and CNN (r = 0.78)].In other words, more than 60 percent of the variability in the ground truth values can be explained by the predicted results of the CNN model (R 2 = 0.61).Based on Fig. 4, KNN, Lasso Regression, and Regression Tree have similar performance with respect to the correlation coefficient between the ground truth and the predicted yield value; seemingly, Ridge Regression has the worst performance as most of the predicted value hexagons are off-diagonal (It also has the lowest correlation coefficient among the models).The range of predicted yield for the SVR model is very narrow, which shows that the SVR model predicts most of the instances between 6 and 7 tons ha −1 and cannot be a reliable predictor for yield values.Although the primary goal of predictive modeling is to minimize the validation error and make the most accurate predictions, performing residual analysis would help us choose the best-performing model with more confidence.Figure 5 shows the distribution of prediction errors for the top three most accurate models observed in Table 1 (XGBoost, DNN, and CNN) using the test data (2019).
Based on Fig. 5 and the p-value results of Shapiro-Wilk and Anderson-Darling normality tests, we do not have sufficient information to reject the null hypothesis (normal distribution) for XGBoost, DNN, and CNN (p-value = 0.05; borderline); thus, we conclude that the prediction residuals are approximately normally distributed for these models.Again, in this paper, as we are focused on conducting predictive modeling using machine learning techniques, "we treat the data mechanism as unknown" as the data modeling assumptions of inferential statistics are less of a concern 33 .
The spatial distribution of the prediction error for winter wheat yield using the proposed CNN, DNN, and XGBoost models across the 271 counties in 2019 is shown in Fig. 6a-c.Fig. 6d shows the distribution of observed yield values across different counties in the test year (2019).This figure helps identify counties with higher error percentage values, and diving deeper into this information could reveal helpful patterns to improve the data collection quality in these regions.Based on the CNN model results, the error percentage varies from 0.1 to 77.6% across the counties in 2019.Counties with higher prediction errors are primarily located in the eastern and northeastern part of Germany; central counties represent lower error values.Although the range of the percentage error values is higher in the proposed CNN model than the DNN and XGBoost models, based on Table 1, the overall performance of the proposed CNN model is significantly better than the mentioned baselines.The prediction percentage error is calculated using Eq.(7).A i , and P i stand for the actual and predicted yield values in county i.
Analysis.Feature importance explanation.We have predicted the yield using weekly weather data, soil conditions, and crop phenology variables.In this section, we explain the importance of each feature in the predictions made by our proposed CNN model to find the relative importance of features.Weather and soil data are generally represented by numerous variables, which do not possess equal effects or importance in yield prediction.Thus, it is of paramount importance to find more important features and help farmers and agronomists focus on those critical features that are more important in increasing the yield values than the others.In this paper, we used SHAP as a post-hoc method to explain the prediction outputs and find the input variables that have a higher contribution to the model outputs.Features with larger/smaller Shapley values show a higher/ lower contribution to the predicted yield outputs.The average absolute Shapley value per feature across the data is calculated to find the global importance.The results are shown in Fig. 7. Red bars show features that positively correlate with yield impact (thus positively impacting the yield value) and blue bars show the features that negatively correlate with yield impact (thus negatively impacting the yield value).
In addition, to further dive into understanding the feature effects and their importance, we have created a series of force plots; the plots for some important features (refer to Fig. 7) in yield prediction are shown in Supplementary Information (Figs.S2-S5).They explain the contribution of several important features to the yield prediction values.The vertical axis in these plots shows the predicted yield value (in tons ha −1 ), and the horizontal axis refers to the selected feature values.The intercept of the vertical and horizontal axis shows the average of ( 7) Additionally, using SHAP explanation force plots, three instances with the highest, median, and lowest yield values were selected to explain the effects of their features on the predicted yield (Fig. 8).Force plots are effective in interpreting the prediction value of the model in critical instances.The contribution of each feature to Fig. S6 (in Supplementary Information) shows the estimated effects of weather components on the predicted winter wheat during 45 weeks of 2019.Out of the six weather variables examined, the wind speed, temperature (minimum), and radiation have the highest predictive importance shown by the normalized Shapley values of 13.20, 12.03, and 9.56, respectively.These observations make sense from an agronomic perspective and are in line with the findings of other studies suggesting temperature-related indicators having a high crop yield prediction power 53 .The higher Shapley values for minimum temperature and wind speed compared to the other weather components around weeks 35-37 coincide with the reproductive phase.This stage is the most sensitive growth stage to low temperatures and wind speed, affecting photosynthesis by changes in stomatal conductance and leaf temperature 54 .
Based on Fig. 7, minimum and maximum temperature are among the top features contributing to yield prediction outputs.Several studies have reported that increased minimum temperature reduces the cold and frost  www.nature.com/scientificreports/induced damage to the seedlings 55,56 , leading to increased crop yield and warming-induced increases in the crop grain weight 57 .Based on an agronomic point of view, the higher Shapley values of minimum temperature around week 31 compared to other weather components (Fig. S6) coincide with the flowering phase.On the other hand, a high temperature can negatively affect winter wheat grain numbers, grain size 58,59 and the duration of grain filling period 60 , resulting in reduced crop yields.Furthermore, the adverse yield effects of high temperatures are also associated with the water and heat stress conditions around the sensitive stages of crop growth period.This may lead to reduced photosynthesis rate, increased respiration, higher rate of leaf senescence, and evapotranspiration, eventually reducing the grain numbers 61 .Precipitation holds an equally crucial predictive factor for winter wheat and the crop yield in general.This could be attributed to the fact that crops are exposed to drought stress under less precipitation, especially during the reproduction phase.Thus, grain yields can be negatively affected due to obstructed nutrient uptake, and in conjunction with reduced surface cooling, crop canopy temperatures shoot up leading to the further decline in photosynthetic rates 62 .Higher Shapley values of precipitation compared to the other weather variables around week 25, a period that could be linked to stem elongation, indicated higher predictive importance of this variable around this time.Stem elongation is a critical phase for yield formation and water is crucial for expansive growth processes of the crop, cell expansion, pollen ripening, spike growth and grain filling 63 .In terms of soil variables (refer to Fig. 7 and 8), our analysis showed DUL, SAT, and BD are important for yield prediction.This observation alig ns with Peichl et al. 's 64 findings, stating that soil moisture is the most critical variable for predicting winter wheat.The studied soil factors are known to affect the available crop water, which directly impacts growth and development.For insta nce, sandy soils with a lower water holding capacity may pose water stress conditions during the winter wheat growth period hampering the yield 65 .On the other hand, excessive water in the soil (waterlogged conditions) could lead to aeration stress, damaging the plants and increasing yield losses 66 .It was also reported that excessive soil water encourages the development of pathogens and hurdles the necessary management operations 67 .
Feature selection.For evaluating the performance of the feature importance metric (Shapley values), we studied the prediction results using an importance-based feature selection technique.As such, we ranked features based on their Shapley values and selected 75 and 50 percent of the most important features for the analysis.Table 2 shows the yield prediction performance of the proposed CNN model with the mentioned feature selection thresholds.Based on the results, the performance of the proposed CNN model not only did not drop significantly as a result of feature selection, but also got slightly improved, which confirms that the SHAP explanation method can effectively find the important features in the winter wheat yield prediction.In addition, the proposed CNN model was retrained using only the weekly weather variables to evaluate the yield prediction performance solely using weather features.

Conclusions
In this paper, we proposed a convolutional neural network (CNN) model with 1-D convolution operations to predict the winter wheat yield using the weather, soil, and crop phenology data across 271 counties in Germany over 21 years (1999 to 2019).We used eight supervised machine learning models, including KNN, Random Forest, XGBoost, Regression Tree, Lasso and Ridge Regressions, SVR, and DNN, as baselines to compare the performance of our proposed model with them.Based on the chosen validation metrics (RMSE, MAE, and correlation coefficient), the results of our analysis showed that nonlinear models such as DNN, CNN, and XGBoost outperform linear models to a varying extent because crop yield is a highly complicated feature that depends on numerous interactive factors such as genotype and environmental conditions.Thus, nonlinear models are more effective in finding the functional relationship between the crop yield and input data.The results also revealed that our proposed CNN model outperforms all baselines across the test years of 2017, 2018, and 2019.The proposed CNN model has 7 to 14% lower RMSE, 3 to 15% lower MAE, and 4 to 50% higher correlation coefficient than the best performing baseline across the test years.We also moved beyond prediction in our analysis and measured feature importance using Shapley values on the test data to explain the important features in predictions made by our proposed black box CNN model.Furthermore, we visualized how changing the values of the underlying features will increase/decrease the winter wheat yield prediction values and identified the important ranges of values in each feature and their contribution to the predicted yield outputs.Also, we provided instance-level www.nature.com/scientificreports/force plot visualizations and studied the important features, and their respective values for the instances at various predicted yield levels (the highest, lowest, and median).This information helps evaluate the important features at the middle and both ends of the prediction distribution.Additionally, we estimated the contribution of the weather components (precipitation, minimum and maximum temperature, radiation, relative humidity, and wind speed) across different weeks of the year to the winter wheat yield prediction results.The provided information is a helpful asset for farmers and agricultural scientists to understand the important time periods of weather components for yield values.Finally, we sorted the entire set of weather, soil, and phenological features based on their estimated effects on the prediction results (Shapley values) and conducted feature selection on them; we then retrained the proposed CNN model using the top 75 and 50% of features and also using weather features only and recorded the test results.This helped us to assess the performance of our feature importance evaluation metric (Shapley values); we found that the performance of the proposed CNN model did not drop drastically when using only a subset of the most important features compared to using the entire feature set, confirming the robustness of our feature importance method.The primary limitation of the proposed CNN model in this research is its black box nature; we could alleviate the lack of transparency in predictions made by our proposed black box model using post-hoc explanation tools such as SHAP.However, some mathematical problems are identified about using Shapley values as feature importance tools (i.e., causal reasoning) 68 ; also they are not considered a natural explanation tool based on human reasoning 68 .Explaining black box models using surrogate tools instead of building inherently-interpretable models might not be a sound scientific practice 69 .Thus, the future direction of this research is to build more advanced models that are both highly accurate and inherently interpretable. Vol

Figure 1 .
Figure 1.Pearson correlation heatmap of weather, phenology, soil, and yield data; darker colors show high correlation (close to 1), and lighter colors show low correlation (close to 0).

Figure 2 .
Figure 2. Frequency distribution of growing season length (from sowing to harvest) across the 271 counties in 2019.

Figure 3 .
Figure 3. Network architecture of the proposed CNN.The convolutional layer parameters are shown as "(convolution type-number of filters-kernel size-stride size)".Conv, AvgPool, and Dense stand for standard convolutional, average pooling, and fully connected layers, respectively.The padding type is "valid" for all layers.C shows matrix concatenations and T refers to matrix transpose.The activation function of all layers except the last layers is ReLu.

Figure 4 .
Figure 4. Hexagonal plots of the predicted winter wheat yield vs. ground truth yield values for the nine machine learning models in the test year of 2019.

Figure 5 .
Figure 5. Distribution of prediction errors of XGBoost, DNN, and CNN models.

Figure 6 .
Figure 6.Percentage prediction error in winter wheat yield using (a) CNN, (b) DNN, (c) XGBoost models, and (d) distribution of observed yield (ground truth) in 271 counties.The counties in dark gray color are indicative of non-availability of ground truth data in that county in 2019.Figures were created using QGIS software version -3.18.3, https:// qgis.org/ en/ site/.

Figure 7 .
Figure 7. Shapley values for the top 15 features in winter wheat yield prediction in 2019 (For the weather variables on the vertical axis, the variable name refers to the title of the weather variable plus the week of the year the information was recorded, separated by an underscore sign) (HarvestDOY Harvest day of the year, LL crop available water at permanent wilting point, SAT crop available water at the saturation point, DUL crop available water at the field capacity, BD soil bulk density).

Figure 8 .
Figure 8. Force plots to explain the predicted yield values for instances with the minimum (a), maximum (b), and median (c) yield values in 2019.

Table 1 .
Training and validation results (RMSE, MAE, and r (correlation coefficient)) of the baseline machine learning models and the proposed CNN model to predict winter wheat yield.

Table 2 .
Test results of using different subsets of features for predicting winter wheat yield with the proposed CNN model.