Capturing warming effects on rice phenology is vital for obtaining reliable assessments of climate change impacts on global rice production. A number of studies have simulated declining rice production due to temperature increase associated with climate change1,2,3. Such yield losses are primarily attributed to changes in phenology. In particular, rising mean daily temperatures from emergence to maturity, i.e. growing season temperature (GST), shortens the vegetative growth4 and grain-filling3 stages and abridges the duration of vital physiological processes, such as light interception and photosynthesis5. Consequently, such processes significantly contribute to simulated yield declines in climate change-crop model studies6.

Two recent rice model inter-comparison studies7,8 showed that rice simulation models predicted both prolonged and shortened growth duration when models were applied to 3–6 °C warmer climate. Such contrasting results indicate uncertainty between model projections and fail to inform what models to use for climate impact studies. Although some degrees of model bias are always occurred, meaningful model prediction requires that model biases do not significantly depart or amplify when applied to warmer climates. Otherwise, unknown bias of model prediction might result in unreliable phenology and yield model projections under warmer climates.

However, maintaining the accuracy of rice phenology simulation has been proven a challenging task9. For example, using the ORYZA2000 model10, Zhang et al.11 found that with increasing GST, the simulated growth duration for one rice cultivar gradually departed from the observed despite a satisfactory calibration with the first three years of a 21-year record. By extending the study11 and including more observations, van Oort et al.12 were able to reproduce observed flowering dates, while the bias still increased progressively for the simulated maturity date, despite using all data for model calibration that requires further validation using independent dataset.

To our knowledge, there is no comprehensive assessment of the simulated bias in rice phenology models applied to warmer climates. Modelling experiments often assume no change in cultivar under future climate scenarios1,2,3, as the parameterization is considered cultivar-specific13 in most rice simulation models, e.g. ORYZA200010. Excluding the potential influence of cultivar on the phenology simulation bias requires a substantial number of field observations for each cultivar under a wide range of GSTs to enable the resulting difference between the calibration with lower GST and validation with higher GST indicate the accuracy of the phenology model.

The objectives of this study were to (i) evaluate the bias of four phenology models with increasing GST; (ii) quantify the influence of phenology bias on yield simulations; and (iii) identify possible explanations for model bias.


Model calibration in lower GST

To cover the 2–7 °C temperature changes projected by 2100s in the major global rice-growing regions14, we collected data for 19 rice cultivars from 775 trials in 86 sites in 5 Asian countries for the period 1981–2009 (Fig. 1), where each cultivar includes between 14 and 260 trials that cover a GST range of 2.2–8.2 °C (Table 1). Here, a combination of cultivar, site and year is referred to as a single trial.

Figure 1: Locations of rice trial sites included in the study.
figure 1

The map was generated with licensed ArcGIS 10.2 using the map of Asia.

Table 1 The genotype, DAE to flowering and maturity coefficient of variation (CV) and the range of GST of 19 cultivars used in the study.

Four phenology models were examined: growing-degree-day (GDD), exponential, bilinear and beta models (Fig. 2). Description and function of each model can be found in Supplementary Note S1. In short, the four phenology models represent (i) two phenological patterns where the GDD and exponential models assume there is no optimum temperature for phenology development while the bilinear and beta models assume that phenology development firstly accelerates, then decelerates after temperature has exceeded the optimum temperature; and (ii) two modelling curves for phenology, where the GDD and bilinear models use a linear response and the exponential and beta models a non-linear response.

Figure 2: Schematics of the four temperature response functions in the phenology models.
figure 2

(a) GDD (Tb = 8); (b) exponential (Tb = 8, TSEN = 0.25); (c) bilinear (Tb = 8, To = 30, Tc = 42); and (d) beta function (Tb = 8, To = 30, Tc = 42; TSEN = 1.25).

The phenology observations were grouped by the 19 cultivars. We identified the trial with the lowest GST for each cultivar, the GST was then subtracted from the GST for each remaining trial of that cultivar. The difference is referred to as ΔGST. Then we divided the trials for each cultivar group into three temperature intervals for calibration: ΔGST ≤ 1 °C, ≤2 °C, ≤3 °C (here referred to as calibration experiments 1, 2 and 3, respectively), where the trials with lower GST were used for calibration and the remaining for validation (see Supplementary Note S2).

Each phenology model was parameterized for each of the 19 cultivar groups, using an auto-calibration program (see Supplementary Note S3). The graphical comparison (Fig. 3) and linear regression parameters (Table 2) for simulated and observed days after emergence (DAE) to flowering and to maturity are close to 1:1, indicating a good agreement between simulated and observed phenology across all models. The mean absolute deviation (MAD) between the simulated and observed values of DAE to flowering and maturity ranges from 3.5 to 7.2 days, with a normalized root mean square error (NRMSE) between 4.3 and 7.4% (Table 2).

Figure 3: Observed and simulated DAE to flowering (light grey) and DAE to maturity (dark grey) based on the four phenology models (upper x-axis) and three calibration experiments (right y-axis).
figure 3

The solid line is 1:1.

Table 2 Evaluation of DAE to flowering and maturity for the trials in calibration dataset in the three calibration experiments for the four phenology models.

Model validation in warmer GST

When the phenology models were validated with the higher GST data, their capacity to maintain the same accuracy as the calibration datasets differed. Figure 4 shows the percent bias in DAE to maturity between observed and simulated values. Three regression lines of the percent biases against increasing ∆GST, at three quantiles, 0.025, 0.5 and 0.975, were derived to quantify the lower, middle and upper bias levels (see Supplementary Note S4). Our results show an increase in model biases for DAE to maturity simulation when the bilinear and beta models were applied in warmer GSTs. Specifically, the bilinear model predicted a longer DAE to maturity than that for the observations, as indicated by the augmented upper bias from 5 to 40% and lower bias from −10 to 0% (Fig. 4c,g,k). In contrast, the beta model simulated shorter DAE to maturity with the upper bias decreasing from 5 to 0% and the lower from −10 to −35% (Fig. 4d,h,l). Both the GDD (Fig. 4a,e,i) and exponential (Fig. 4b,f,j) models retained relatively constant bias. Similar results were found when the analysis was repeated for DAE to flowering (see Supplementary Fig. S1).

Figure 4: Bias between simulated and observed DAE to maturity expressed per ΔGST for each of the four phenology models (upper x-axis) and three calibration experiments (right y-axis).
figure 4

Solid lines represent the 0.5 percentile regression, lower and upper dashed lines represent the 0.025 and 0.975 percentile regression lines. Dark grey dots illustrate trial data used for calibration and light grey dots data used for validation.

Figure 5 illustrates bias per cultivar for the four phenology models under the three calibration experiments. The model performance varied with cultivar, where the overall most reliable results were derived with the ∆GST ≤ 2 °C calibration dataset using the GDD model for all simulations. For DAE to maturity, the bias was essentially zero for 10 of the 19 cultivars (P > 0.05), whereas the remaining 9 cultivars maintained significant bias (P < 0.05) (Fig. 5e). Similar results for DAE to flowering are presented in Supplementary Fig. S2.

Figure 5: Changes in bias of DAE to maturity with increasing GST per cultivar for each of the four phenology models (upper x-axis) and three calibration experiments (right y-axis).
figure 5

The cultivar ID numbers are presented in Table 1. Grey bars indicate statistically significant bias in yield simulation with GST increase (P < 0.05), and white bars indicate insignificant bias (P > 0.05).

Influence of phenology bias on yield simulation

To demonstrate to what extent biased phenology would influence yield simulations under warmer climate, we used the default crop parameters in ORYZA2000 except for those related to phenology, and computed percent bias between simulated yields based on the observed and simulated phenology (see Supplementary Note S4). The model bias for each cultivar under the three calibration experiments is shown in Fig. 6. The direction of the simulated yield bias is consistent with the model performance for DAE to maturity (Fig. 5). The yield bias varied between −30–10% depending on model and calibration experiments (Fig. 6). Notably, on average 1% bias in DAE to maturity in warmer GSTs returned a 2% (1.6–2.4%) yield bias (Fig. 7).

Figure 6: Changes in yield bias with increasing GST per cultivar for each of the four phenology models (upper x-axis) and three calibration experiments (right y-axis).
figure 6

The cultivar ID numbers are presented in Table 1. Grey bars indicate statistically significant bias in yield simulation with GST increase (P < 0.05), and white bars indicate insignificant bias (P > 0.05).

Figure 7: Relation between yield bias and DAE to maturity bias with increasing GST for each of the four phenology models (upper x-axis) and three calibration experiments (right y-axis).
figure 7

Each dot represents a cultivar and the line indicates the ratio of yield bias to phenology bias over all 19 cultivars.


With the unique phenology dataset of 19 cultivars covering a span of 2–8 °C GST, we investigated the performance of four phenology models in cooler and warmer climatic conditions. The calibration results in the lower GST range showed MAD between the simulated and observed DAE to flowering and maturity (3.5–7.2 days with NRMSE between 4.3 and 7.4%) similar to other studies11,15,16,17 (2–9 days for MAD and 2–7% for NRMSE). However, under warmer GST, the models’ capacity to maintain accuracy diverted: the GDD and exponential models retained constant bias, while the bilinear model tended to overestimate and the beta model underestimate the duration of growth stages compared to observations. Our study provide further-step results compared with earlier model inter-comparison study8 which focused on uncertainty between models but did not demonstrate which models are able to provide better predictability in climate change assessment. Our result indicates that the GDD and exponential models capture the effects of warming better than the bilinear and beta models.

Two possible reasons for the gradual increasing bias in the bilinear and beta models are that (i) the models do not reflect actual phenological pattern to climate11; and (ii) the size of the calibration data fails to represent the actual phenological pattern and results in incorrect parameterization when there is a significant increase in GST between the calibration and validation datasets12. To exclude the latter cause of bias, we used all data as input to the auto-calibration program. The assumption was that a constant model bias across GST for all models would indicate that primarily incorrect parameterization caused the changing predictive biases in Fig. 4. In contrast, if gradually increasing model biases were still present with the bilinear and beta models, this would indicate difficulties in capturing the actual phenology response patterns to climate change of the two models. The resulting bias trends (Fig. 8) were quite similar for all three calibration experiments in Fig. 4. Hence calibrating bilinear and beta phenology models with a range of GST data did not reduce bias under warmer climates, which suggests that the two models are likely to misrepresent phenological patterns under warmer climates.

Figure 8: Bias between simulated and observed DAE to flowering and maturity (right y-axis) expressed per ΔGST for each of the four phenology models (upper x-axis) when all data were used for calibration.
figure 8

Solid lines represent the 0.5 percentile regression line, and the lower and upper dashed lines represent the 0.025 and 0.975 percentile regression lines.

The bilinear and beta models may be better suited for phenological patterns in chamber experiments18, where the phenology development rates accelerated until optimum temperature and then decelerated when the temperature threshold was reached. Similar to our study, recent field-based observation studies found no particular advantage of using the more complex phenology models. For example, the beta model performed better than simpler models for phenology simulation in only one of China’s seven rice-producing regions7. Analogous results were found for wheat6, where the observed plant development rates in northern India accelerated rather than decelerated when temperatures exceeded optimum temperature. Comparing field and chamber experiments is recommended in further investigations to identify differences in phenological patterns to temperature.

By convention previous climate impact studies1,2,3 have calibrated rice models with only one cultivar. However, our results suggest that such approach may result in biased predictions. Even though the GDD and exponential models give overall consistent phenology responses across GST regimes (Fig. 4), they do not necessarily result in stable predictability for each cultivar (Fig. 5). This indicates that calibration with multiple cultivars and using GDD and exponential models would improve rice phenology predictions.

Another important consequence of phenology bias is the doubling in simulated yield bias (Fig. 6). Similar to studies for rice8 and wheat19, model uncertainties in simulated yields under optimal water and nutrient conditions, strongly depend on temperature and inherited errors in the phenology simulation. This is because of that phenology determines many basic physiological processes that affect biomass and yield simulation, such as photosynthesis20 and light interception21. Therefore, our results stress that selecting the model that captures phenology over a range of temperatures is very important for more accurate yield predictions in warmer climates.

Despite the demonstrated advantages of GDD and exponential models for rice phenology and yield simulations, the bilinear model is widely used in rice models10,22,23. These crop models have often simulated that CO2-fertilization can offset the negative impacts of increased temperature in rice production in parts of Asia2,24,25, and thus benefit rice production. However, our study showed that the bilinear model frequently overestimated growth duration (Fig. 4), hence the actual yield reduction due to warming may have been underestimated in previous studies, as CO2-fertilization may not compensate for the loss. We recommend that the combined effects of CO2 and temperature on rice need to be re-evaluated using the GDD or exponential phenology models.

In conclusion, the phenological patterns of bilinear and beta models (where growth decelerates after a certain optimum temperature) resulted in increasing phenology bias under warmer temperatures. The model bias will be doubled when it carry over to yield simulation. More constant phenology biases were achieved with the GDD and exponential models, which simulate phenology without optimum temperature. Moreover, simulations based on multiple cultivars are also able to provide better predictability than using one cultivar. Therefore, for better estimates of climate change-impacts on rice phenology and production, we recommend calibrating with multiple cultivars and using GDD and exponential phenology models.



The rice phenology data include field trial observations from agro-meteorological stations in China operated by the China Meteorological Administration, and trials operated by the International Rice Research Institute (IRRI) in the Philippines and its associated institutes in Bangladesh, India and Thailand (Fig. 1). The dataset includes phenology (dates for emergence, transplanting, panicle initiation, flowering and physiological maturity) for 19 cultivars at 86 sites in 775 trials (Table 1), where all except the IR36 cultivar were conducted in China. The dataset includes only cultivars and trials that were carried out under optimal water and nitrogen conditions and with comparable research protocols for phenology measurements for each specific cultivar, to minimize the potential difference in measurement standards between the sites. Daily meteorological data, including sunshine hour, minimum and maximum temperatures, vapour pressure, wind speed and precipitation, were obtained from weather stations at respective site.

Calibration experiments

Four phenology models were examined: GDD, exponential, bilinear and beta models (algorithms for each model are found in Supplementary Note S1). For the trials sharing one cultivar group, the lowest GST was subtracted from the GST for each trial of the specific cultivar (referred to as ΔGST). For each cultivar group, three calibration experiments were conducted (see Supplementary Note S2). For experiment 1, the calibration data consisted of the trials with ΔGST ≤ 1 °C with the remaining data for validation. For experiments 2 and 3, the calibration data comprised the trials with ΔGST ≤ 2 °C and ≤3 °C. For calibration, we used an auto-calibration program that iterates to return the parameter values with the least difference between observed and simulated phenology for each model (see Supplementary Note S3). For validation, we applied the phenology models and associated calibrated parameters to the trial data with higher GST.

The percent bias between simulated and observed DAE to flowering and maturity was calculated for each cultivar. The changes in model accuracy were measured by regressing percent bias with increasing ∆GST at three quantiles, 0.025, 0.5 and 0.975, to quantify the changes in lower, middle and upper biases as increases in GST (see Supplementary Note S4).

To estimate the error carried over from a biased phenology to yield simulation, we ran ORYZA2000 for each trial with observed phenology dates and phenology simulated using each of the four phenology models. ORYZA2000 was run with potential water and nitrogen conditions with default parameters, except for those related to phenology (default values are available at Lastly, we calculated the percent bias between yields generated from observed and simulated phenology (see Supplementary Note S4).

Additional Information

How to cite this article: Zhang, T. et al. Model biases in rice phenology under warmer climates. Sci. Rep. 6, 27355; doi: 10.1038/srep27355 (2016).