A time-dependent parameter estimation framework for crop modeling

The performance of crop models in simulating various aspects of the cropping system is sensitive to parameter calibration. Parameter estimation is challenging, especially for time-dependent parameters such as cultivar parameters with 2–3 years of lifespan. Manual calibration of the parameters is time-consuming, requires expertise, and is prone to error. This research develops a new automated framework to estimate time-dependent parameters for crop models using a parallel Bayesian optimization algorithm. This approach integrates the power of optimization and machine learning with prior agronomic knowledge. To test the proposed time-dependent parameter estimation method, we simulated historical yield increase (from 1985 to 2018) in 25 environments in the US Corn Belt with APSIM. Then we compared yield simulation results and nine parameter estimates from our proposed parallel Bayesian framework, with Bayesian optimization and manual calibration. Results indicated that parameters calibrated using the proposed framework achieved an 11.6% reduction in the prediction error over Bayesian optimization and a 52.1% reduction over manual calibration. We also trained nine machine learning models for yield prediction and found that none of them was able to outperform the proposed method in terms of root mean square error and R2. The most significant contribution of the new automated framework for time-dependent parameter estimation is its capability to find close-to-optimal parameters for the crop model. The proposed approach also produced explainable insight into cultivar traits’ trends over 34 years (1985–2018).

Algorithm 1 Bayesian optimization 1: Input: Data set D = {(x 0 , y 0 )}. T as maximum number of iterations. 2: Output: A local optimal solution x * ∈ R 1×p . 3: Step 0: Set incumbent solution x * = x 0 . Set K, σ , and u as type of kernel, set of kernel parameters, and acquisition function. 4: for t = 1 to T do 5: Step 1: Use GP to update the posterior probabilityf and construct acquisition function u.

6:
Step 2: Find x t by optimizing the acquisition function u over function f : x t = arg max x u(x|D).

7:
Step 3: Sample x t to calculate the objective function y t = f (x t ) and augment the data D = {D, (x t , y t )}.

8:
Step 4: Update incumbent solution x * = x arg max(y) . 9: end for Gaussian Processes GP as a stochastic process is a generalization of the Gaussian distribution such that any finite subset of which has a multivariate Gaussian distribution 3 . Distribution over functions f (.) ∼ GP(µ(.), k(., .)) is Gaussian Process, where mean function µ(.) and covariance kernel k(x, x ) for any pairs of input points x, x ∈ R 1×p are defined as follows, For a given finite set of input points . . .
For convenience, BO assumes the zero-mean Gaussian process (µ(.) = 0), and the GP's power to fit posterior distribution relies on the shoulders of the covariance function. For the covariance function k, various types of kernels have been defined. The most popular ones are: where σ f is a signal standard deviation parameter, σ l is a characteristic length scale parameter and x, x ∈ R 1×p are pair of input points. At each iteration, BO uses the observed samples D = {(X, f (X))} to fit posterior probabilities for new query pointx by applying GP. A new query point and observed samples have the same Gaussian distribution as follows, By applying Bayes's rule to compute the posterior probability, we havef ( . Instead of the scalar value for new query pointx, GP returns the probability distribution over all possible values of f (x) such that if the D is large enough, GP can provide a close estimation of the function f (x) distribution.

Acquisition Functions
After construction of posterior distribution of the function f (x), BO tries to find the new test pointx in order to maximize the function f . For this purpose, BO utilizes the acquisition function u to determine the next test pointx by considering the uncertainty in posterior distribution and trade-off between exploration and exploitation. Maximizing the acquisition function is equivalent to maximizing the function f . In general, the acquisition function u relies on the previous sample observations and GP hyperparameters. Several acquisition functions have been defined as follows: Probability of Improvement as an intuitive utility function that maximizes the probability of improving over the incumbent solution x * 4 is defined as follows, where φ (.) is the cumulative distribution function of the standard normal distribution. The idea of this acquisition function is intuitive because it takes samplex from the areas near the incumbent solution x * which is the optimal solution of exiting acquisition function. Therefore, as one of its drawbacks, it focuses on the exploitation and takes the sample in limited ranges, and it falls into local maxima easily. To address this drawback, a parameter ε is added to acquisition function PI such that

2/16
difference between values of next sample pointx and incumbent solution x * is equal and greater than ε. Hence, the PI function is: ).
Expected Improvement maximizes the expected degree of improvement over the incumbent solution x * , where the degree of improvement is I(x) = max{0, f (x) − f (x * )} as the difference between values of next sample pointx and incumbent solution x * 4 . Therefore, the expected improvement acquisition function is defnined as where Z equals to , and ϕ(.) is probability density function of the standard normal distribution. Comparison between PI function and EI function shows that EI function is less prone to fall into the local maxima.
GP Upper Confidence Bound as the recent acquisition function is a trade-off between exploiting lower confidence zone or exploring near incumbent solution x * with a higher expected value 4 . This acquisition function is defined as where parameter K indicates the balance between exploitation and exploration.