Abstract
Empirical yield estimation from satellite data has long lacked suitable combinations of spatial and temporal resolutions. Consequently, the selection of metrics, i.e., temporal descriptors that predict grain yield, has likely been driven by practicality and data availability rather than by systematic targetting of critically sensitive periods as suggested by knowledge of crop physiology. The current trend towards hyper-temporal data raises two questions: How does temporality affect the accuracy of empirical models? Which metrics achieve optimal performance? We followed an in silico approach based on crop modelling which can generate any observation frequency, explore a range of growing conditions and reduce the cost of measuring yields in situ. We simulated wheat crops across Australia and regressed six types of metrics derived from the resulting time series of Leaf Area Index (LAI) against wheat yields. Empirical models using advanced LAI metrics achieved national relevance and, contrary to simple metrics, did not benefit from the addition of weather information. This suggests that they already integrate most climatic effects on yield. Simple metrics remained the best choice when LAI data are sparse. As we progress into a data-rich era, our results support a shift towards metrics that truly harness the temporal dimension of LAI data.
Similar content being viewed by others
Introduction
Estimating crop production, particularly that subject to international trade, is becoming an urgent imperative for global food security due to the growing world population, shifts in diets, and the development of biofuels. Instead of quantifying production directly, it is common practice to address its constituent terms: crop area and crop yield. The latter can itself be thought of as the product of genetic, environment, and management (G × E × M) factors. Agrometeorological models have long been deployed with some success to estimate yields on a regional basis, either based on statistical relationships relating yield to meteorological data or based on crop growth models that not only relate weather parameters to yield but also explain plant growth1,2,3. Satellite remote sensing has gradually become instrumental in assessing crop yields because several Vegetation Indices (VIs) derived from spectral data integrate some G × E × M effects, from meteorological factors such as precipitation and solar radiation to cropping practices such as fertilisation and irrigation. In fact, there is a considerable amount of within-field variance that is not explainable by meteorological data but that can be accounted for by multi-spectral satellite data4.
Numerous methods have been devised to predict crop yield based on satellite data and include for instance assimilation of satellite data into crop growth or light-use efficiency models5,6. Another straightforward approach to estimate crop yields is to establish an empirical relationship between ground-based yields and VIs or metrics describing VI time series. These empirical models rely on the correlation between spectral bands (and their combinations) and biophysical properties of the crops, such as Leaf Area Index (LAI), which are themselves related to final yields7. A large range of VIs has been tested, with mixed results, in different regions and for different crops including the well-known Normalised Difference Vegetation Index8,9,10,11. Capitalising on the ability to retrieve biophysical variables from satellite data12,13, some attempts have empirically correlate biophysical variables to yield14,15. The principle remains the same for biophysical variables as for VIs: metrics are first extracted from time series of biophysical data and are then related to measured yields.
The lack of sufficient field- or pixel-level yield measurements for model calibration and validation has long hindered attempts to deploy empirical models at scale for operational monitoring. In addition, empirical models are specific to the crop cultivars, the crop growth stages, and the geographical regions they are calibrated on16,17. Therefore, they do not generalise well in data-poor contexts. An elegant solution, referred to as the scalable satellite-based crop yield mapper or SCYM, was recently proposed to solve the lack of availability of calibration data18. In essence, SCYM calibrates empirical yield models with modelled data obtained from the Agricultural Production System sIMulator19 (APSIM), a thoroughly-validated crop model, rather than with in situ yield measurements. The role of the crop model is to generate a large number of simulations that span a realistic range of soil, climate, and management conditions in the region of interest so that robust statistical relationships may be established between yield and crop canopy descriptors. SCYM models can then be transferred and applied to satellite images to map yields across vast areas. It has been tested for multiple crops and countries and explained, for instance, half of the wheat yield variability in India20. While several avenues still exist for improving the accuracy of SCYM models, its strategy of using simulations from regionally-tuned and parameterised crop models to calibrate empirical models in lieu of costly in situ yield measurements paved the way to deploying empirical yield models anywhere in the world.
Empirical yield estimation from space has also been constrained by the trade-off between the spatial and temporal resolution which restricted the use of high spatial and temporal images for agricultural applications21. Data availability hampered multi-temporal analyses or these were limited to time series with coarser spatial resolutions, leading to pixel purity issues22, which are particularly challenging in complex landscapes23. Consequently, it is likely that the choice of metrics, i.e., time series descriptors used to predict grain yield, was driven by practicality and data availability rather than by systematic targetting of critically sensitive periods suggested by knowledge of crop physiology. Constellations of satellites, e.g., Sentinel-2 A and B24 (5-day revisit, 10-m resolution), or the Dove constellation from Planet25 (daily global coverage at 3 m with 175+ satellites), have opened an avenue for overcoming these spatiotemporal restrictions. Therefore the advent of hyper-temporal data offers an unprecedented opportunity to revisit empirical yield estimation and explore new alternatives to exploit finer and denser temporal patterns.
Our overarching goal is to advance routine yield assessment across the Australian grain zone based on satellite observations and to do so by leveraging the capabilities of the most recent and upcoming imaging systems. Here we evaluated how the density of LAI observations affects performance and the choice of metric required to achieve optimal performance. We premised our work on three observations widely supported by evidence from the literature: (1) crop growth models can accurately simulate plant growth, yield, and leaf area. For instance, the wheat model within APSIM has been extensively validated across Australia and internationally in a range of experimental and farm conditions26,27,28,29,30,31,32,33; (2) leaf area index can be retrieved from satellite images13,34; (3) empirical yield models calibrated off simulated yields, i.e., a la SCYM18,20, provide reasonable spatially-explicit yield estimates when applied to remotely-sensed data. The direct implication is that data generated by crop models can be used to evaluate in silico different forms of empirical models.
The in silico approach has the following advantages: (1) the range G × E × M conditions that can be explored in silico is larger than what observational data would otherwise allow, which improves generalisation; (2) in silico data can simulate forthcoming temporal resolutions or mimic current imaging systems lacking sufficient archive data, which facilitates systematic comparisons; and (3) in silico testing provides these results for a fraction of otherwise prohibitive costs associated with image acquisition and in situ yield data. Here, We deployed APSIM to simulate wheat growth during 30 consecutive seasons under 10 management scenarios at 50 locations across Australia. These simulations provided time series of LAI and yields which were used to calibrate and evaluate different types and configurations of empirical models. It should be emphasised that our objective was neither to predict past wheat yields nor to apply our findings to remotely-sensed data. Rather we sought to generate likely LAI time series and yields under a variety of growing conditions and evaluate the stregnth of their relationship under different scenarios represnetating present and forthcoming observation capabilities.
Our main contributions are three-fold:
-
We provide a systematic comparison of LAI metrics and highlight that linear empirical models with advanced metrics (e.g., Senescence Fit, or Fourier Decomposition) capture up to 80% of the yield variability. This is remarkable because the generalisation of empirical models has often been criticised. Therefore, this suggests that models using metrics that truly harness the temporal dimension of the LAI data can achieve regional to national relevance;
-
We evaluate the contribution of weather variables to the overall performance and show that they can double the accuracy of models calibrated with simple metrics such as peak LAI. Average and cumulative maximum temperatures, as well as cumulative post-anthesis rainfall, are particularly strong predictors of grain yield. However, with advanced metrics, there is no significant improvement when adding weather variables because their effects are already captured by the metrics;
-
We quantify the loss of accuracy that occurs when the temporal resolution decreases. In particular, we show that simple metrics remain competitive in data-poor contexts.
These results can serve as a guideline for selecting an appropriate metric depending on the temporal availability of earth observation data at hand.
Results
Accuracy of the prediction models without weather variables
Wheat crops were simulated under nine management strategies for 15 years at 50 locations representative of the Australian grain zone from which we obtained daily LAI, thermal time, phenological stages, and the associated yields. Six types metrics were then extracted from the LAI time series (Peak LAI, Early/Late Windows, Integral, Partial Integral, Senescence Fit, and Fourier Decomposition) for three time scales (calendar, thermal and phenological time, the last two adjusting the time series for growth rate). The metrics were finally regressed against grain yields. We computed the R2 (Fig. 1A,C) and the RMSE (Fig. 1B,D) to evaluate the performance of the regression models with and without weather variables.
Between 30% and 78% of the yield variability can be explained by LAI features depending on the choice of features and time scale. Note that the Peak LAI and the Early/Late Windows approaches are by definition insensitive to a change in the time scale. Peak LAI consistently explained the least variance (R2 = 0.36; RMSE = 1,100 kg ha−1) followed closely by the two windows metrics (R2 = 0.41; RMSE = 1,032 kg ha−1). Using calendar time, the Partial Integral metric reached the highest coefficient of determination (R2 = 0.77; RMSE = 560 kg ha−1) followed by the Integral and the Fourier metrics. Interestingly, the ranking of the best performing methods changed with respect to the time scale. For instance, the Senescence Fit and the Fourier Decomposition metrics were both improved when accounting for thermal time or phenology: R2 values increased from 0.60 to 0.69 and 0.78 and from 0.72 to 0.80, respectively. It is worth noting that these two metrics are related to a smoothing of the time series. Switching to phenological and thermal times can lead to worse results than calendar time in some cases, e.g., Integral and Partial Integral. This drop might be partly attributed to the current calculation of thermal time in APSIM, to the LAI features themselves, and to the abrupt transitions inherent at some phenological stages.
Contribution and importance of weather variables
The R2 of the linear model based only on the weather features reached 0.36 and the corresponding RMSE was 1,044 kg ha−1, which was slightly better than the accuracy reached by the peak LAI metric (R2 = 0.36; RMSE = 1,100 kg ha−1). Adding weather features was particularly beneficial to those models using simpler metrics but had little effect otherwise (Fig. 1C,D). They help reduce by half the difference in accuracy between the poorest and best models. For instance, the R2 of the peak LAI method reached 0.56 whereas the R2 of the Fourier approach only increased by 0.01.
We evaluated the contribution of the weather variables to the model R2 (Fig. 2). The weather variable with the highest contribution is the cumulative rainfall after the LAI peak. The sum and average maximum temperature post-peak were also important. The remaining variables only exhibited a very low contribution (<0.03) to the R2. The contribution of weather variables decreased as they were combined with LAI metrics derived from more advanced methods. Differences between the contribution of the variables computed for different temporal scales were small.
Robustness to reduced temporal frequencies
Reducing the temporal frequency and accounting for cloud contamination reduced the prediction accuracy and increased the number of predictions with missing values (Fig. 3). However, this effect was metric-specific and three groups could be defined. The first group contained simple metrics (Peak LAI and Early/Late windows) that displayed robustness to a reduction of the temporal density, with little impact on the accuracy and the proportion of missing values. The second group (Integral and Senescence Fit) maintained a relatively stable accuracy but this was achieved at the expense of a higher rate of missing values. The third group (Partial Integral and Fourier Decomposition) was sensitive to a reduction of the temporal frequency both in terms of accuracy and failed predictions. This underscores that some metrics, in order to explain yield variations, require a higher temporal density, i.e., less signal contamination can be tolerated, while others are more robust and can be applied on sparser time series. This also varied with respect to the time scale: the Partial Integral approach was effectively best for calendar time whereas phenological time was best for the Fourier Decomposition and Senescence Fit approaches when the observation frequency was >5 days.
Mapping optimal metrics
Finally, we mapped the optimal metrics across the Australian wheat production area (Fig. 4). These varied by location and by time scale which was consistent with previous results: as the temporal frequency becomes sparser, the Peak LAI and the Early/Late season approaches became increasingly the preferred choice. This underscores that, while the Partial Integral, the Senescence Fit, and the Fourier Decomposition metrics yield higher accuracy, their use can only be recommended when the temporal resolution is ≤5 days.
Discussion
There is a high demand for grain yield estimates for food security, logistics, or crop insurance purposes. Empirical models have been criticised for their lack of generalisation, i.e., their applicability has been found to be limited to specific crop cultivars, crop growth stages, and geographical regions17,35. The ever-increasing availability of satellite data is a potential boon for delivering accurate grain yield predictions across vast areas. To quantify the potential gains of leveraging hyper-temporal data, we developed an in silico approach which uses the crop growth model APSIM as data generator and calibrated empirical models with a series of LAI metrics for different time scales and temporal resolutions. The poor accuracy obtained with simple metrics suggests they cannot capture such diversity with single national-scale models and that locally-tuned models could improve their prediction skills8. Advanced metrics achieved high accuracies with single empirical models, which provides evidence that metrics harnessing the temporality of the data have national relevance.
Peak LAI consistently registered some of the worst predictions despite its widespread use in the remote-sensing literature. The strength of the peak LAI relationship to yield (R2 = 0.36) was weaker than what previously reported9,36, which could be partly explained by the larger range of G × E × M effects encountered in this study. Peak LAI completely disregards the critical period of grain filling37 and therefore cannot capture the impact of post-peak events such as terminal drought, which is often experienced in Australia38. Besides, large biomass early in the season does not necessarily result in large grain yield. These shortcomings are illustrated in Fig. 5, where three time series reach similar peak LAI values but end up with drastically different yields. Therefore, peak LAI is most useful to provide early estimates of grain yield. Integrating temporal profiles outperformed the peak LAI approach because the cumulative effect of photosynthetic apparatus efficiency during the entire growing period was taken into account. Conditions affecting the flag leaf and the penultimate leaf, which are the most active parts from a photosynthetic perspective, greatly influence final grain yield39.
An important contribution of this research is to better understand the optimum application conditions of different metrics depending on temporal resolution and availability of the LAI data. While advanced methods such as the Senescence Fit or the Fourier Decomposition outperformed simple methods when hyper-temporal data are at hand, the latter should be preferred with sparse time series. Not only do these simple methods perform better in data-poor contexts but they are also less sensitive to missing values. This confirms that biweekly composites cannot adequately characterise crop productivity if crop critical periods are smoothed by the compositing algorithm8 and implies that the use of maximum LAI or Early/Late Windows LAI was driven by practicality and data availability rather than by systematic targetting of critically sensitive periods suggested by crop physiology. Integrating radar data40,41 and blending lower resolution time series42,43 are two mitigation options to increase data frequency in areas with persistent cloud cover. Attention should be paid to correct the spatial scaling bias when fusing LAI data44,45 because it does not correlate linearly with spatial resolution46,47. Given the unprecedented revisit cycle of current Earth Observation systems, data fusion capabilities, and the prospects of future missions, our modelling suggests that the time is ripe for a shift towards the use of data-intensive metrics for empirical yield estimation.
Weather variables were instrumental in doubling the R2 of the models calibrated with simple metrics but had a marginal effect on those using advanced metrics. This suggests that half of the accuracy of scalable satellite-based crop yield mappers parametrised with the Early/Late Windows metric18,48 can be attributed to weather variables and confirms that the explicit consideration of weather was the main factor explaining the better performance of the original scalable satellite-based crop yield mapper compared to a peak VI model20. The three most important variables were cumulative rainfall, cumulative maximum temperature and average post-peak maximum temperature when the evaporative demand is higher. They all relate to water and heat/drought stresses and, by extension, to stored soil water which is critical for the growth of rainfed wheat in Australia. During grain filling, high temperature decreases leaf chlorophyll content and accelerates senescence49, leading to a shorter grain filling duration with an ultimate decrease in individual grain weight and yield that cannot be compensated by the higher grain filling rate under high temperatures50. The appropriate combination of predictors to include in empirical yield models depends on the cost of obtaining and using such data compared to the benefits4. The choice of adding weather predictors depends on their availability and on the temporal resolution of the LAI time series. This suggests that, if accurate and appropriate weather data are not readily available and if the temporal resolution allows advanced metrics to be robustly derived, the prediction model may be shrunk to LAI variables.
In advancing routine yield assessment, our study illustrates the importance of the temporal resolution for accurate yield prediction and provides some guidelines to inform on the choice metric depending on data availability. To some extent, the accuracy values reported here may represent the upper bound of what could be achieved when applying empirical models calibrated with crop model data to satellite imagery and, therefore, some considerations about the premises of this study ought to be raised. First, we assumed that yields and LAI could be accurately simulated by crop growth models. While their ability to predict grain yield has been thoroughly evaluated and confirmed, less emphasis was on modelling LAI, e.g., it has been reported that APSIM tends to slightly overestimate LAI33. Nonetheless, crop models provide water-limited yield potential (the yields that can be achieved when water and the environment are the only limiting factors) rather than actual yields (the yields achieved in commercial fields) so discrepancies are expected, e.g., where biotic stresses have a significant impact. Simulation of grain yield and particularly LAI could further be improved, and comparison against measured field data would be instrumental to succeed in doing so. Secondly, satellite-derived LAI products are affected by measurement and retrieval errors which introduce noise in the time series. Smoothing methods such as double logistics, splines, adaptive Savitzky-Golay filters51, or canopy structural dynamic models52 have successfully been applied to reconstruct temporal trajectories and improve the signal-to-noise ratio. LAI also correlates non-linearly with reflectance and tends to saturate over dense canopies (LAI values > 4)53,54. Error-adjustment methods have been proposed when ground measurements of LAI are available55. Recent empirical evidence converged inemphasising the importance of red-edge bands for operational estimation of biophysical parameters56,57,58 to bypass this saturation effect57 as well as to reduce some impacts of leaf angle distribution58. LAI estimates obtained from Sentinel-2, which carries three red-edge bands, are thus expected to improve in the near future. Finally, there might be a less than perfect agreement between the LAI values obtained from APSIM and those retrieved from space, even in the absence of noise or saturation. Indeed, satellites sense green LAI because the electromagnetic radiation reflected from the crop canopy is contributed by all the aerial plant organs55. Adjustment techniques might thus be required to improve the correspondence between these two LAI quantities. Despite these shortcomings, further developing approaches that calibrate empirical models with data generated from crop growth models is essential to reduce the burden of in situ yield measurement and to advance yield monitoring across the globe.
Conclusions
The lack of suitable combinations of spatial and temporal resolutions of satellite image time series has long constrained large-area empirical yield estimation from space. Here, we sought to systematically evaluate how temporal resolution affects empirical relationships between wheat yields and descriptors of crop canopy dynamics as observed in leaf area time series and, in turn, to define their optimal conditions of use. Using the crop growth model APSIM as a data generator, we developed an in silico approach which allowed us to explore a wider range of G × E × M combinations than what observational data currently permit as well as to simulate the temporal resolutions of current and forthcoming satellites or satellite constellations.
We simulated wheat crops across Australia and regressed six types of metrics derived from the resulting time series of Leaf Area Index (LAI) against wheat yields. Empirical models solely based on LAI metrics captured between 30 to 80% of the wheat yield variability, the highest accuracy being achieved with advanced metrics (R2 > 75; Senescence fit and Fourier decomposition). This provides evidence that empirical metrics that truly harness the temporal dimension of LAI data and exhibit national relevance. Adding weather variables doubled the R2 values of models based on simple metrics (R2 > 0.55; Peak LAI and Early/Late windows) but had no significant improvement for those based on advanced metrics. This indicates that metrics intensively exploiting the temporal dimension already reflect most of the influence of weather on crop yield. Finally, simple metrics emerged as the best choice when dealing with sparse time series, e.g., 16 days, but were gradually outperformed by advanced metrics as the temporal resolution increased.
As we progress in a data-rich era, our findings support a general shift in the use of large-area empirical yield mapping towards the inclusion of metrics that truly harness the temporal dimension of leaf area data.
Methods
Wheat modelling across Australia
Australia is one of the top five wheat exporting countries in the world and accounts for 11% of global wheat trade during 201559. It is estimated that 55% of Australian cropland is occupied by the current wheat area of ca. 14 Mha. Wheat yields in Australia have experienced substantial increase but evidence suggests that they have stalled at an average of 1.7 t ha−1 since 199060. Wheat is sown around mid-May and is harvested from November to January61. The average field size exceeds 100 ha and irrigation is marginal.
We deployed the APSIM-Wheat model Version 7.819 to grow continuous wheat from 1981 to 2015 at 50 high-quality weather stations representative of the Australian grain zone (Fig. 6). APSIM is a process-based model that simulates crop growth and development at a daily time-step in response to weather, soil water, soil nitrogen, and management practices. It calculates daily biomass accumulation using light interception and radiation use efficiency which is penalised under water and nitrogen stresses. Growth of leaf area is modelled daily using initial leaf area, leaf appearance rate and the relationship between plant leaf area and their processes are sensitive to daily temperatures. Grain yield is a function of grain number and grain weight. Grain number is determined pre-anthesis by stem weight and subject to reduction due to water stress during anthesis. Final weight per grain is determined by carbohydrate remobilisation, photosynthesis during grain filling, and the grain filling period which is accelerated by temperature and water stresses.
We used a state-of-the-art parameterisation of APSIM for the dominant soil types and nine management rules (see Hochman and Horan62 for more details). Similar model parameterisation at these locations explained ≥65% of the national and sub-national wheat yield variability60,63. Therefore, the simulation outputs were assumed to be reliable and no further validation was undertaken. The nine management scenarios were variants of standard simulation rules and covered a range of cropping practices (Tables 1 and 2). These included changes in the rate of nitrogen fertilisation (N-fertilisation), plant density (50, 75, 100, and 125 plants ha−1), sowing rule (Sow-1, -2, and -3), and fallow management. All sites in Queensland and northern New South Wales above latitude −32.24° were classed as northern sites and used the northern sowing rule, all other sites used the southern sowing rule. If the sowing criteria were not met during the sowing window, a crop was automatically sown on the 15th of July. We considered five wheat cultivars spanning the range of Australian maturity types, namely: Bolac, Endure, Wyalkatchem, Derimut, and Correll. The parameterisation of these varieties was kept to their default values. Daily weather records of rainfall, maximum and minimum temperature (max and min T), and vapour pressure deficit (VPD) were sourced for the period of interest from the Australian Bureau of Meteorology64. Model runs from 1981 to 1999 were used to reach a credible soil water content and were thus discarded in further analysis. Simulations with a maximum LAI value < 1 were also discarded because they were likely associated with simulations of failed crops. The final data set had 7,712 entries that consisted of daily values of LAI, thermal time, phenological stage, and end-of-season yield.
We summarised the main characteristics of the simulation outputs in Fig. 7. Emergence started as early as April 4th (Sow-3) and finished as late as August 18th (Sow-2). Flowering ranged from July 22nd (Sow-3) to November 17th (Sow-2), which covered reported flowering periods37,65,66,67. Maturity occurred from September 9th (Sow-3) to December 19th (Sow-1), with strong differences across treatments. Under the Sow-1 strategy, wheat was sown before the cutoff date of July 15th in ca. 50% of the cases. Note that Bolac was never the highest yielding variety so no Sow-3 simulation was available for that variety. Maximum LAI values averaged 3.93 across simulations with maximum values up to 8.34 (Sow-3). Simulated yields averaged 3247 kg ha−1 with a range of 105 kg ha−1 to 5,824 kg ha−1. Harvest indices (the ratio of grain yield and biomass) averaged 0.375 across simulations, spanning from 0.178 (Sow-1) to 0.518 (Plants 100). Further, yields and harvest indices were within the range of values reported in an exhaustive search of the literature for dryland wheat in Australia68. Therefore, we concluded that our simulations provided realistic scenarios of dryland wheat growth across the Australian wheat belt.
Predicting grain yields with empirical models
The empirical yield prediction model followed the following form:
where X is a vector of LAI metrics derived from simulated LAI time series, W is a vector of weather attributes over the season, and β0, β1, and β2 are the associated coefficients. First, we restricted the empirical model to LAI metrics (Y = β0 + β1X) and assessed its performance for three time scales (calendar time, thermal time, and phenological time). Secondly, the added-value of weather variables (W) for yield prediction was evaluated and the importance of weather variables was quantified by partitioning the coefficient of determintation. Thirdly, we investigated the loss in accuracy due to reduced observation frequencies. Finally, we mapped optimal metrics for three temporal revisit frequencies (5, 10 and 16 days) across the Australian wheat area.
Yield prediction with LAI metrics
To quantify the yield variability explained by LAI metrics, we extracted six groups of metrics from the simulated LAI time series provided by APSIM (Table 3).
The first approach (Peak LAI) identifies the maximum LAI value from the time series as it corresponds to the onset of the reproductive stage which is a critical period for the determination of wheat yield69. Empirical evidence has also shown that the best single-date correlation between wheat yield and LAI occurs at the time of highest LAI which concurs with the transition from the vegetative stage to grain filling9,36.
In the second method (Early/Late Windows), maximum LAI values observed during two windows, one early (day of year 203–day of year 253) and the other late in the season (day of year 274–day of year 314) were derived18.
Integration of seasonal LAI profiles was also examined a third and fourth method for feature extraction. The integration of satellite observations over time was shown to represent the intensity and the duration of the photosynthetic activity of the crop throughout the growing cycle well and, as a result, it was highly correlated with the actual yield8,70,71. The definition of the integration interval is critical and previous work recommended to start from the beginning of nutrient substance accumulation in storage organs39, which corresponds to flowering in wheat, rather than from the beginning of the crop cycle. We compared these two integration approaches and computed the area under the curve for the entire LAI profile (Integral) and from peak LAI to harvest (Partial Integral).
The fifth approach (Logistic Fit) estimated wheat yield from three parameters describing the crop senescence72,73. These were obtained by fitting a modified logistic model to the LAI time series74:
where mall refers to the maximum value of LAI, p is the position of the inflection point in the decreasing part of the LAI curve, k is the relative senescence rate, and t is input time.
Finally, we used Fourier Decompositions which is an approach known to capture the temporal dynamics while reducing the dimension and the noise75. Fourier Decomposition transforms an input signal from the time domain into the frequency domain. In a closed interval [0; N], this approach assumes that the signal f(t) can be decomposed into a series of sine-waves with increasing frequencies76:
The result of a discrete Fourier transform is a complex number with a real (a) and an imaginary (b) part that can be converted to polar form. Then, each harmonic wave i can be defined by a phase and an amplitude77:
Together with the additive term (a0), the harmonic components can together reconstruct the initial signal. By discarding higher order harmonics, it is possible to retrieve lower noise signal. In this study, we kept the additive term and the first two harmonics as predictors of yield.
Three time scales
Measuring time in calendar days has been the dominant approach in remote sensing because it matches the acquisition dates of the satellite images. However, this approach might be limited when dealing with large G × E × M variations, e.g., for estimating yield at a national scale. There is a considerable advantage in describing crop development based on thermal time units as the duration in thermal time required to reach a certain ontogenetic phase is relatively constant, while that in calendar days may considerably vary78. Relying on phenological stages is a further refinement that accounts for vernalisation and/or photoperiod requirements which affect the rate of crop development.
All LAI metrics were derived for these three time scales: calendar time, thermal time, and phenological stages. Thermal time was computed following Zheng et al.79 and the phenological stages were described using Zadok’s decimal scale80 as simulated by APSIM. The Zadok’s growth scale is based on ten principal cereal growth stages from germination to ripening, each of these is divided into ten secondary stages, extending the scale from 00 to 99.
Model evaluation
Empirical models were calibrated using 50% of the data set (n = 3,845) and validated with the remaining 50% (n = 3,867). Note that, to avoid any bias, the split between the calibration and validation sets was done to guarantee that all simulations relative to a station-year would either belong to the calibration or validation set. The performance of the models was quantified using the Root Mean Square Error (RMSE) and the coefficient of determination (R2). The RMSE gives the weighted variations in error (residual) between the predicted and observed yields while the R2 expresses the percentage of variance explained by the model.
Contribution of weather metrics
First, the accuracy of a yield model only based on weather features (Y = β0 + β2W) was quantified. To that aim, we extracted 14 weather variables by averaging daily observations (VPD, min and max T) and summing daily observations (P, VPD, min and max T) before and after the peak of LAI. We then evaluated the merit of adding meteorological features to the empirical model in order to boost the prediction accuracy. All models were recalibrated to consider the weather variables, and the net effect on the R2 and the RMSE was measured. To identify the most relevant variables, relative importance metrics for linear models were computed by partitioning the coefficient of determination81,82.
Robustness to reduced temporal frequencies
So far, all models were calibrated on LAI metrics derived from gap-less daily time series. As these are gap-less daily time series, they set the upper limit in terms of attainable accuracy, their performances might significantly change with sparser time series resulting from coarser temporal resolutions or missing values due to cloud/cloud shadow contamination. To provide insights on their generalisation potential, we applied the previously calibrated models to LAI metrics extracted from time series with lower temporal resolution, accounting for cloud conditions.
Daily, 5-day, 10-day, and 16-day LAI time series were created to simulate the revisit cycles of the Dove constellation, the Sentinel-2A or/and -2B, and Landsat-8. The remaining LAI values were further removed according to their corresponding daily cloud probability. Monthly mean cloud frequencies were sourced from Wilson and Jetz83. This data set integrates 15 years of twice-daily remotely sensed cloud observations at 1-km resolution. We applied a linear interpolation to generate daily cloud probability assuming the monthly average was representative of the 15th of each month. To avoid artifacts, values for December and January were duplicated at the end and the start of the time series, respectively. Therefore, daily cloud probabilities were interpolated based on an input time series of 14 values and the first 16 (December 15th–December 31st) and last 15 values (January 1st–January 15th) were discarded. Missing values in the LAI time series were then linearly interpolated prior to yield estimation. A Monte Carlo approach was used and this process was repeated ten times. The impact on the prediction was measured using the average RMSE across the ten realisations. As an additional evaluation criterion, the number of times the metrics computation failed due to a lack of input data was computed.
Finally, the optimal metrics were identified for each temporal resolution. These were then interpolated to the Australian wheat production area at a 1-km2 resolution based on a nearest-neighbour search. In other words, pixels were attributed to the best-performing metrics of the station they were the most similar to in terms of cloud frequency. Similarity between cloud patterns was measured with the Euclidean distance.
Data availability
Thee datasets generated and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Baier, W. Crop-weather analysis model: review and model development. J. Appl. Meteorol. 16, 937–947 (1973).
Stephens, D. J., Lyons, T. J. & Lamond, M. H. A simple model to forecast wheat yield in Western Australia. J. R. Soc. West. Aust. 71, 77–81 (1989).
de Wit, A. et al. Using ERA-INTERIM for regional crop yield forecasting in Europe. Clim. Res. 44, 41–53 (2010).
Nalepka, R. F., Colwell, J. E. & and Rice, D. P. Forecasts of winter wheat yield and production using Landsat data. Final report for Contract NAS 5-22389, NASA, Goddard Space Flight Center, Greenbelt, Maryland (1977).
Lobell, D. B. The use of satellite data for crop yield gap analysis. F. Crop. Res., https://doi.org/10.1016/j.fcr.2012.08.008 (2012).
Jin, X. et al. A review of data assimilation of remote sensing and crop models. European Journal of Agronomy 92, 141–152 (2018).
Ferencz, C. et al. Crop yield estimation by satellite remote sensing. Int. J. Remote Sens. 25, 4113–4149 (2004).
Labus, M. P., Nielsen, G. A., Lawrence, R. L., Engel, R. & Long, D. S. Wheat yield estimates using multi-temporal NDVI satellite imagery. Int. J. Remote Sens. 23, 4169–4180 (2002).
Lopresti, M. F., Di Bella, C. M. & Degioanni, A. J. Relationship between MODIS-NDVI data and wheat yield: A case study in Northern Buenos Aires province, Argentina. Inf. Process. Agric. 2, 73–84 (2015).
Battude, M. et al. Estimating maize biomass and yield over large areas using high spatial and temporal resolution Sentinel-2 like remote sensing data. Remote Sens. Environ. 184, 668–681 (2016).
Lai, Y. R. et al. An empirical model for prediction of wheat yield, using time-integrated Landsat NDVI. Int. J. Appl. Earth Obs. Geoinf. 72, 99–108 (2018).
Li, W. et al. A generic algorithm to estimate LAI, FAPAR and FCOVER variables from SPOT4_HRVIR and landsat sensors: Evaluation of the consistency and comparison with ground measurements. Remote Sens. 7, 15494–15516 (2015).
Verrelst, J. et al. Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties - A review. ISPRS Journal of Photogrammetry and Remote Sensing, https://doi.org/10.1016/j.isprsjprs.2015.05.005 (2015).
Baez-Gonzalez, A. D. et al. Large-area maize yield forecasting using leaf area index based yield model. Agron. J (2005).
Lambert, M.-J., Traoré, P. C. S., Blaes, X., Baret, P. & Defourny, P. Estimating smallholder crops production at village level from Sentinel-2 time series in Mali’s cotton belt. Remote Sens. Environ. 216, 647–657 (2018).
Doraiswamy, P. Crop condition and yield simulations using Landsat and MODIS. Remote Sens. Environ. 92, 548–559 (2004).
Fang, H., Liang, S. & Hoogenboom, G. Integration of MODIS LAI and vegetation index products with the CSM–CERES–Maize model for corn yield estimation. Int. J. Remote Sens. 32, 1039–1065 (2011).
Lobell, D. B., Thau, D., Seifert, C., Engle, E. & Little, B. A scalable satellite-based crop yield mapper. Remote Sens. Environ. 164, 324–333 (2015).
Holzworth, D. P. et al. APSIM–evolution towards a new generation of agricultural systems simulation. Environ. Model. Softw. 62, 327–350 (2014).
Azzari, G., Jain, M. & Lobell, D. B. Towards fine resolution global maps of crop yields: Testing multiple methods and satellites in three countries. Remote Sens. Environ (2017).
Inoue, Y. Synergy of Remote Sensing and Modeling for Estimating Ecophysiological Processes in Plant Production. Plant Prod. Sci. 6, 3–16 (2003).
Duveiller, G. & Defourny, P. A conceptual framework to define the spatial resolution requirements for agricultural monitoring using remote sensing. Remote Sens. Environ. 114, 2637–2650 (2010).
Waldner, F., Duveiller, G. & Defourny, P. Local adjustments of image spatial resolution to optimize large-area mapping in the era of big data. Int. J. Appl. Earth Obs. Geoinf. 73, 374–385 (2018).
Drusch, M. et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 120, 25–36 (2012).
Butler, D. Many eyes on Earth. Nature 505, 143–144 (2014).
Asseng, S. et al. Performance of the APSIM-wheat model in Western Australia. F. Crop. Res. 57, 163–179 (1998).
Asseng, S., Van Keulen, H. & Stol, W. Performance and application of the APSIM Nwheat model in the Netherlands. Eur. J. Agron. 12, 37–54 (2000).
Wang, E. et al. The new APSIM-Wheat Model—performance and future improvements. In Proceedings of the 11th Australian Agronomy Conference 2–6 (Australian Society of Agronomy, 2003).
Asseng, S. et al. Simulated wheat growth affected by rising temperature, increased water deficit and elevated atmospheric CO2. F. Crop. Res. 85, 85–102 (2004).
Chen, C., Wang, E., Yu, Q. & Zhang, Y. Quantifying the effects of climate trends in the past 43 years (1961–2003) on crop growth and water demand in the North China Plain. Clim. Change 100, 559–578 (2010).
Carberry, P. S. et al. Re-inventing model-based decision support with Australian dryland farmers. 3. Relevance of APSIM to commercial crops. Crop Pasture Sci. 60, 1044 (2009).
Brown, H. E. et al. Plant Modelling Framework: Software for building and running crop models on the APSIM platform. Environ. Model. Softw. 62, 385–398 (2014).
Ahmed, M. et al. Calibration and validation of APSIM-Wheat and CERES-Wheat for spring wheat under rainfed conditions: Models evaluation and application. Comput. Electron. Agric. 123, 384–401 (2016).
Fang, H., Baret, F., Plummer, S. & Schaepman‐Strub, G. An overview of global leaf area index (LAI): Methods, products, validation, and applications. Rev. Geophys., https://doi.org/10.1029/2018RG000608 (2019).
Doraiswamy, P. C., Moulin, S., Cook, P. W. & Stern, A. Crop yield assessment from remote sensing. Photogramm. Eng. Remote Sens. 69, 665–674 (2003).
Johnson, D. M. A comprehensive assessment of the correlations between field crop yields and commonly used MODIS products. Int. J. Appl. Earth Obs. Geoinf. 52, 65–81 (2016).
Lawes, R. A., Huth, N. D. & Hochman, Z. Commercially available wheat cultivars are broadly adapted to location and time of sowing in Australia’s grain zone. Eur. J. Agron., https://doi.org/10.1016/j.eja.2016.03.009 (2016).
Chenu, K. et al. Environment characterization as an aid to wheat improvement: Interpreting genotype-environment interactions by modelling water-deficit patterns in North-Eastern Australia. J. Exp. Bot., https://doi.org/10.1093/jxb/erq459 (2011).
Benedetti, R. & Rossini, P. On the use of NDVI profiles as a tool for agricultural statistics: the case study of wheat yield estimate and forecast in Emilia Romagna. Remote Sens. Environ. 45, 311–326 (1993).
Bériaux, E., Waldner, F., Collienne, F., Bogaert, P. & Defourny, P. Maize Leaf Area Index retrieval from synthetic quad pol SAR time series using the water cloud model. Remote Sens. 7 (2015).
Jin, X. et al. Combined multi-temporal optical and radar parameters for estimating LAI and biomass in winter wheat using HJ and RADARSAR-2 data. Remote Sens. 7, 13251–13272 (2015).
Verger, A., Baret, F. & Weiss, M. A multisensor fusion approach to improve LAI time series. Remote Sens. Environ. 115, 2460–2470 (2011).
Löw, F. et al. Regional-scale monitoring of cropland intensity and productivity with multi-source satellite image time series. GIScience Remote Sens., https://doi.org/10.1080/15481603.2017.1414010 (2017).
Jiang, J. et al. Evaluation of Three Techniques for Correcting the Spatial Scaling Bias of Leaf Area Index. Remote Sens. 10, 221 (2018).
Wu, L. et al. Spatial up-scaling correction for leaf area index based on the fractal theory. Remote Sens. 8, 197 (2016).
Friedl, M. A., Davis, F. W., Michaelsen, J. & Moritz, M. A. Scaling and uncertainty in the relationship between the NDVI and land surface biophysical variables: An analysis using a scene simulation model and data from FIFE. Remote Sens. Environ. 54, 233–246 (1995).
Liang, S. Numerical experiments on the spatial scaling of land surface albedo and leaf area index. Remote Sens. Rev. 19, 225–242 (2000).
Jin, Z., Azzari, G. & Lobell, D. B. Improving the accuracy of satellite-based high-resolution yield estimation: A test of multiple scalable approaches. Agric. For. Meteorol. 247, 207–220 (2017).
Zhao, H., Dai, T., Jing, Q., Jiang, D. & Cao, W. Leaf senescence and grain filling affected by post-anthesis high temperatures in two different wheat cultivars. Plant Growth Regul. 51, 149–158 (2007).
Pradhan, G. P., Prasad, P. V. V., Fritz, A. K., Kirkham, M. B. & Gill, B. S. Effects of drought and high temperature stress on synthetic hexaploid wheat. Funct. Plant Biol. 39, 190–198 (2012).
Moreno, Á., Garcia-Haro, F. J., Martinez, B. & Gilabert, M. A. Noise reduction and gap filling of fapar time series using an adapted local regression filter. Remote Sens. 6, 8238–8260 (2014).
Lauvernet, C. Assimilation variationnelle d’observations de télédétection dans les modèles de fonctionnement de la végétation: utilisation du modèle adjoint et prise en compte de contraintes spatiales. (Université Joseph-Fourier-Grenoble I, 2005).
Weiss, M. & Baret, F. Evaluation of canopy biophysical variable retrieval performances from the accumulation of large swath satellite data. Remote Sens. Environ. 70, 293–306 (1999).
Bsaibes, A. et al. Albedo and {LAI} estimates from {FORMOSAT}-2 data for crop monitoring. Remote Sens. Environ. 113, 716–729 (2009).
Duveiller, G., Baret, F. & Defourny, P. Crop specific green area index retrieval from MODIS data at regional scale by controlling pixel-target adequacy. Remote Sens. Environ. 115, 2686–2701 (2011).
Delegido, J., Verrelst, J., Alonso, L. & Moreno, J. Evaluation of sentinel-2 red-edge bands for empirical estimation of green LAI and chlorophyll content. Sensors. https://doi.org/10.3390/s110707063 (2011).
Delegido, J. et al. A red-edge spectral index for remote sensing estimation of green LAI over agroecosystems. Eur. J. Agron. 46, 42–52 (2013).
Dong, T. et al. Assessment of red-edge vegetation indices for crop leaf area index estimation. Remote Sens. Environ. 222, 133–143 (2019).
Workman, D. Wheat exports by country. Retrieved from, http://www.worldstopexports.com/wh (2017).
Hochman, Z., Gobbett, D. L. & Horan, H. Climate trends account for stalled wheat yields in Australia since 1990. Glob. Chang. Biol. 23, 2071–2081 (2017).
Hochman, Z. et al. Re-inventing model-based decision support with Australian dryland farmers. 4. Yield Prophet®helps farmers monitor and manage crops in a variable climate. Crop Pasture Sci. 60, 1057–1070 (2009).
Hochman, Z. & Horan, H. Causes of wheat yield gaps and opportunities to advance the water-limited yield frontier in Australia. F. Crop. Res. 228, 20–30 (2018).
Hochman, Z., Gobbett, D., Horan, H. & Navarro Garcia, J. Data rich yield gap analysis of wheat in Australia. F. Crop. Res., https://doi.org/10.1016/j.fcr.2016.08.017 (2016).
Australian Bureau of Meteorology. Climate Data Services (2015).
Wang, B., Liu, D. L., Asseng, S., Macadam, I. & Yu, Q. Impact of climate change on wheat flowering time in eastern Australia. Agric. For. Meteorol. 209–210, 11–21 (2015).
Rouse, J. W., Haas, R. H., Schell, J. A. & Deering, D. W. Monitoring vegetation systems in the Great Plains with ERTS. In Proceedings of the Earth Resources Technology Satellite Symposium NASA SP-351 (eds Freden, S. C., Mercanti, E. P. & Becker, M. A.) 309−317 (NASA, 1974).
Flohr, B. M., Hunt, J. R., Kirkegaard, J. A. & Evans, J. R. Water and temperature stress define the optimal flowering period for wheat in south-eastern Australia. F. Crop. Res. 209, 108–119 (2017).
Unkovich, M., Baldock, J. & Forbes, M. Variability in harvest index of grain crops and potential significance for carbon accounting: Examples from Australian agriculture. Adv. Agron. 105, 173–219 (2010).
Fischer, R. A. Yield Potential in a Dwarf Spring Wheat and the Effect of Shading. Crop Sci. 15, 607–613 (1975).
Tucker, C. J., Holben, B. N., Elgin, J. H. Jr. & McMurtrey, J. E. III. Remote sensing of total dry-matter accumulation in winter wheat. Remote Sens. Environ. 11, 171–189 (1981).
Rudorff, B. F. T. & Batista, G. T. Spectral response of wheat and its relationship to agronomic variables in the tropical region. Remote Sens. Environ. 31, 53–63 (1990).
Idso, S. B., Pinter, P. J. Jr., Jackson, R. D. & Reginato, R. J., others. Estimation of grain yields by remote sensing of crop senescence rates. Remote Sens. Environ. 9, 87–91 (1980).
Baret, F. & Guyot, G. Potentials and limits of vegetation indices for {LAI} and {APAR} assessment. Remote Sens. Environ. 35, 161–173 (1991).
Gooding, M. J., Dimmock, J., France, J. & Jones, S. A. Green leaf area decline of wheat flag leaves: the influence of fungicides and relationships with mean grain weight and grain yield. Ann. Appl. Biol. 136, 77–84 (2000).
Geerken, R., Zaitchik, B. & Evans, J. P. Classifying rangeland vegetation type and coverage from NDVI time series using Fourier Filtered Cycle Similarity. Int. J. Remote Sens. 26, 5535–5554 (2005).
Schönwiese, C.-D. Praktische statistik für meteorologen und geowissenschaftler. Zeitschrift für Geomorphol. 52, 3 (2006).
Jakubauskas., M. E., Legates., D. R. & Kastens., J. H. Harmonic Analysis of Time-Series AVHRR NDVI Data. Photogramm. Eng. Remote Sens. 67, 461–470 (2001).
Purcell, L. C. Comparison of thermal units derived from daily and hourly temperatures. Crop Sci. 43, 1874–1879 (2003).
Zheng, B., Biddulph, B., Li, D., Kuchel, H. & Chapman, S. Quantification of the effects of VRN1 and Ppd-D1 to predict spring wheat (Triticum aestivum) heading time across diverse environments. J. Exp. Bot. 64, 3747–3761 (2013).
Zadoks, J. C., Chang, T. T. & Konzak, C. F. A Decimal Code for the Growth Stages of Cereals. Weed Res. 14, 415–421 (1974).
Lindeman, R. H. Introduction to bivariate and multivariate analysis. (1980).
Chevan, A. & Sutherland, M. Hierarchical partitioning. Am. Stat. 45, 90–96 (1991).
Wilson, A. M. & Jetz, W. Remotely sensed high-resolution global cloud dynamics for predicting ecosystem and biodiversity distributions. PLoS Biol. 14, e1002415 (2016).
Waldner, F. et al. A Unified Cropland Layer at 250 m for Global Agriculture Monitoring. Data 1, 1–13 (2016).
Kouadio, L. et al. Estimating regional wheat yield from the shape of decreasing curves of green area index temporal profiles retrieved from MODIS data. Int. J. Appl. Earth Obs. Geoinf. 18, 111–118 (2012).
Acknowledgements
The authors received funding from “Grains”, a project of the CSIRO’s Digiscape Future Science Platform.
Author information
Authors and Affiliations
Contributions
F.W. and Z.H. designed research; H.H. and F.W. conducted crop modelling; F.W. and Y.C. analysed data; F.W., H.H., Y.C. and Z.H. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Waldner, F., Horan, H., Chen, Y. et al. High temporal resolution of leaf area data improves empirical estimation of grain yield. Sci Rep 9, 15714 (2019). https://doi.org/10.1038/s41598-019-51715-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-019-51715-7
This article is cited by
-
Modeling crop yield using NDVI-derived VGM metrics across different climatic regions in the USA
International Journal of Biometeorology (2023)
-
Accelerating leaf area measurement using a volumetric approach
Plant Methods (2022)
-
Effects of spatial, temporal, and spectral resolutions on the estimation of wheat and barley leaf area index using multi- and hyper-spectral data (case study: Karaj, Iran)
Precision Agriculture (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.