Introduction

The Coupled Model Intercomparsion Project Phase 5 (CMIP5) simulations have been widely used both in their native form as well as with multiple refinements by statistical downscaling, dynamical downscaling, bias correction, etc. for regional climate change projections under global warming.1 Since there are numerous biases in the CMIP5 simulations as reported by many previous studies cited in the Intergovernmental Panel for Climate Change (IPCC) Assessment Report 5 (AR5), additional care has been taken while extracting climate information from the outputs of individual models to reduce the impact of these biases on climate change projections. The multi-model mean (MMM) is a simple way to reduce biases in individual model outputs,2 and thus it is widely used for climate change projections. The usefulness of MMM may vary from one region to the other based on the regional climate and on the diagnostic variables of interest.3,4 Therefore, before assessing the climate changes over the region (e.g., Indian summer monsoon (ISM)) that affects a significant fraction of the world population, a targeted analysis is needed to test the usefulness of MMM over the region.

Krishnamurti et al.5 used multi-model ensembles in the context of weather forecasts and seasonal climate, where elaborate methodologies, such as “super-ensemble” were used to construct ensembles. Conclusions that an MMM performs better than the individual models when compared to observations have been drawn by numerous studies in the past that have looked at mean climate,6 as well as climate variability.7 However, some studies, such as Annamalai et al.8 have used an alternative approach instead of simply using the MMM. In their study, using models from CMIP3 to investigate the ENSO–monsoon relationship on interannual and decadal timescales, they start with 18 models and systematically exclude out models in batches that do not perform well, finally ending up with the “best” model.

In a recent study, Sabeerali et al.9 found that the ensemble means based on a small fraction of the CMIP5 models perform quite well in simulating several important characteristics of monsoon intra-seasonal oscillations associated with the ISM, but even these models struggle to produce some other characteristics of the ISM. A number of previous studies showed that the projections of ISM have a large spread when individual models are used, and that the ensemble mean summer monsoon rainfall showed an increasing trend from the middle to the end of the century.10,11,12 Noted is that the ensemble mean results depend on the subset of models selected and on the methodology adopted (for example, the weights assigned to each model while computing the MMM). Also recently, Sabeerali et al.13 found that the reliability of ensemble mean projections of the South Asian Monsoon rainfall from the CMIP5 models is questionable due to the relationship between model rainfall and precipitable water being much stronger than observed, and emphasized the need of improvements of individual models, particularly the convective parameterization and cloud microphysics used in the CMIP5 models.

In other words, the projection of future climate could be reliable only when the models are able to produce the past climate reasonably well. Therefore, before assessing the ensemble mean projection, it is important to investigate its various advantages and limitations of MMM climate states by comparing those with observations. In this study, we examine how well does the MMM perform in simulating temperature and rainfall over India during the monsoon season, and we identify what are the features that the MMM is able to reproduce and what are the major shortcomings.

Results

The climatological mean (1975–2005) surface air temperature for June– September (JJAS) from Indian Meteorological Department (IMD) observations and MMM are shown in Fig. 1a, b, respectively. The IMD climatology for the historical period shows that the highest temperature occurs over the northwestern region of the country, and locates over the axis of the monsoon trough inclined with a northwest–southeast orientation (Fig. 1a). It can be seen from Fig. 1b that the MMM captures the observed pattern to a great extent, however, there are biases in magnitude. As can be seen from Fig. 1c, the temperatures are severely underestimated over the northern and northeastern parts of India with biases of around 10–15 °C, which was also previously noted by Basha et al.14 On the other hand, the temperatures over the axis of monsoon trough are overestimated by 2–5 °C, with higher biases towards the northwestern flank of the trough. Over southern and southeastern India, the biases are relatively low with the magnitude of around 1–5 °C. As mentioned above, in order to compare the models with observations, all the models were regridded to the observational grid. To verify if the severe underestimation over the northern parts of India could be an artifact of interpolation, in Fig. 1d, e, we show the example of one of the models (MIROC-ESM), with temperatures at its native coarse resolution of 2.8° × 2.8° and its re-gridded version at 0.25° × 0.25°. As noted from Fig. 1d, e, this underestimation is not an artifact of regridding, but is a real bias that can be seen in the native resolution as well.

Fig. 1
figure 1

JJAS climatological mean (1975–2005) temperatures from a IMD, b MMM, c MMM–IMD, d MIROC-ESM model at 2.8° resolution, e MIROC-ESM model regridded to 0.25° resolution, f MIROC-ESM at 2.8°, g MIROC5 at 1.4°, h MIROC4h at 0.56°, i CCSM4 (best model), and j GISS-E2H (worst model)

One of the obvious reasons one may think to explain the cold bias in the MMM is the inaccurate representation of orography over the Himalayan and Tibet region in the models due to their coarser resolution. In order to check this, we have analyzed a modeling system (MIROC) for which data is available from the CMIP5 archive at multiple resolutions (see Fig. 1f–h), however, there are differences in their physics packages as well, so the difference in simulations can only be partially attributed to the difference in spatial resolution. Analyzing the corresponding figure panels, it can be seen that counter-intuitively enough, the biases over northern India systematically increases with the increase in spatial resolution. Hence, one may conclude that the coarser resolution of some of the CMIP5 models included in the MMM may not be playing a significant role in causing the cold bias seen in surface air temperature over the northern parts of India.

Since the MMM used in this case is an unweighted (each model gets the same weight) mean of all models, one may wonder if assigning the same weights to the better and the worse models could lead to large biases. In order to verify this we compute the root mean square error (RMSE) and the pattern correlation coefficient (PCC) of the individual models (see Table 1), and then choose the best and the worst performing models based on a simple ratio of PCC by RMSE (the best model would have the highest value and the worst model would have the lowest value). It is noted that the CCSM4 is the best model, while the GISS-E2H turns out to be the worst model (see Fig. 1i, j). Although the cold bias over northern India is somewhat larger in the worst model, it is not too different than the best model. Thus, one may conclude that the cold bias in surface air temperature seen over northern India in the MMM is a systematic bias in all models, and increasing the model resolution does not help to alleviate this problem. The large difference between MMM and observations over the Himalayan region could either be due to the inaccuracies in the observational dataset over this region or there is something missing in the models physics/dynamics that is causing this large bias, which is common across all models and resolutions, and it needs targeted numerical experiments to identify the reasons behind this.

Table 1 RMSE, PCC, and ratio of PCC by RMSE for JJAS surface air temperature and precipitation

It is to be noted that the value of the ratio of PCC by RMSE is ~0.15 for the MMM (Table 1, with north box). For 16 out of 28 models, this ratio is less than MMM, whereas there are 12 models for which this ratio is equal to or even higher than MMM. We have also examined the biases in these 12 individual models and it is found that some of them, for example, CCSM4, have biases similar to MMM (figure not shown). This, therefore, suggests that though MMM is better than several individual models, however, there are still few individual models, which perform as good as MMM. It is previously shown that the biases over the northern India are common across individual models and therefore we also compare the performance of the MMM after removing the north box. It is noted that though the values of this ratio for MMM and most individual models increase after removing the north box, however, for this selected domain too, the performance of best individual model is equivalent to the MMM. Therefore, the general notion that the MMM performs better than individual models is not valid for climatological seasonal mean temperature over the Indian land region.

In Fig. 2, we analyze some other important aspects of surface air temperature characteristics. Figure 2a shows the annual cycle of the area-weighted spatial mean of pentad surface air temperature over the Indian land. The annual cycle is shown for IMD observations and the MMM. Since there exist extremely large cold biases in the MMM over northern India, the annual cycles from both observations and MMM are plotted for the whole Indian land in one case (blue lines), and by excluding the northern box (Indian land between 33 and 38.5°N) in the other (red lines). From Fig. 2a, one may conclude that (i) the MMM captures the annual cycle of temperature over Indian land, (ii) there is an overall negative bias in surface air temperature in the MMM throughout the year, (iii) the negative bias is small during JJAS, but large in other months, and (iv) excluding the northern box reduces the total bias in all months, but the MMM temperatures are still lower than that observed throughout the year. Our results, thus, show that the cold bias in the annual mean throughout the year is not entirely because of the lower than observed temperatures simulated over north India.

Fig. 2
figure 2

a Annual cycle of all-India weighted area average pentad surface air temperature from IMD and MMM with (blue) and without (red) the north box (Indian land between 33–38.5°N), b PDFs (%) of daily surface air temperature from IMD and mean of the distributions of individual models, without the north box. PDFs for MIROC-ESM and MIROC-4h are also shown. Spatial distribution of JJAS surface air temperature trends (°C per decade) from c IMD and d MMM. Grid points with a trend value significant at the 90% level have been stippled

Since the daily distribution of surface air temperature is important for various societal applications of climate information, we next analyze this aspect and show it in Fig. 2b. As noted above, there are large biases over north India, so in order to be fair to the models, we exclude this box of large negative biases while computing the frequency distribution. Since taking a MMM before computing the distribution will lead to the elimination of extremes due to the out-of-phase occurrence of daily extremes in individual models, we first compute the distribution for each of the models and then take the mean of all distributions. It is to be noted that the computation of the probability distribution function (PDF) for individual models does not involve any spatial or temporal averaging. Figure 2b thus shows that, even without the northern box, the occurrence probability of temperatures in range 0 to +15 °C is overestimated in the MMM. In the range of 15–35 °C, there is a good agreement between the MMM and observations in the distribution of daily surface air temperatures over the Indian land. However, in the temperature range above 35 °C, the MMM overestimates the occurrence probability as compared to the observations. To check the improvement in the occurrence of temperature extremes with increasing model resolution, we also show the PDF for MIROC-ESM (one of the coarsest resolution model) and MIROC4h (one of the finest resolution model) at their native resolutions in Fig. 2b. The PDFs for both the models (blue and green lines) are quite similar with large over-estimation on both sides of PDF. No improvement is noted from MIROC-ESM to MIROC4h thereby suggesting that the simulation of temperature extremes in these models may not improve just by increasing the spatial resolution.

Since trends provide some of the most valuable information in the context of climate change, we analyze how well the models perform in simulating the observed trends. The JJAS trends from observations and the MMM are shown in Fig. 2c, d, respectively. To calculate the MMM trend at each grid point, first the MMM time series is calculated and then the trends are estimated using Theil–Sen method (which is more robust than linear trends) and the significance is determined using Mann–Kendall test. Only the grid points where the trend is significant at the 90% level have been stippled. As can be seen from observations, there is a net warming trend of 0.05–0.30 °C per decade over almost whole Indian region, except east coast of India where there is a cooling of around 0.05–0.15 °C per decade. Note that the temperature trends are not significant over the entire Indian region in observations. However, the MMM show significant warming over the entire Indian land and captures broad features of the trend distribution with warming over the northwest and peninsular India. Thus, one may conclude that the MMM captures the magnitude of the trend in surface air temperature for some locations but the trend is not significant in observations.

Figure 3a, b shows the climatological mean (1975–2005) precipitation for JJAS from observations and MMM, respectively. The IMD climatology for the historical period shows that the highest rainfall occurs over the Western Ghats and northeast India (Fig. 3a). From Fig. 3b, it can be seen that the MMM captures the observed pattern to some extent (such as the maxima over the Western Ghats and northeast India), however, it fails to capture the local and regional scale features and there are large biases in the magnitude. Figure 3c shows severe dry bias over most parts of Indian land, especially over the Western Ghats, whereas there is a wet bias over the leeward side of Western Ghats. A similar exercise as that done for temperature was repeated to investigate if some of the large biases, such as those over peninsular India, are due to the interpolation while re-gridding. It is confirmed from Fig. 3d, e that the biases are not an artifact of regridding as similar biases can be seen in the native resolution as well.

Fig. 3
figure 3

JJAS climatological mean (1975–2005) rainfall (mm day−1) from a IMD, b MMM, c MMM-IMD, d MIROC-ESM at 2.8° resolution, e MIROC-ESM regridded to 0.25° resolution, f MIROC-ESM at 2.8°, g MIROC5 at 1.4°, and h MIROC4h at 0.56° resolution

To investigate the resolution dependence of the model bias, a similar exercise as that done for temperature was repeated (Fig. 3f–h). It is worth noting that increasing the spatial resolution of a model leads to a more accurate representation of the bottom boundary (such as the orography, coastlines, land-use, land-cover, etc.), however, if the physical parameterizations are not satisfactory, the final simulation may turn out to be worse even though the resolution is high. The major improvement in such a case would be seen over the locations where the bottom boundary plays a critical role. The MIROC5 model has a spatial resolution of 1.4° × 1.4°, whereas MIROC4h model has a spatial resolution of 0.56° × 0.56°. Some aspects such as rainfall over the Western Ghats are better simulated in MIROC4h due to its higher resolution. As shown above, the increase in model resolution does not show any major improvement in the temperature biases, however, for rainfall, some of the biases get significantly alleviated due to the increase in spatial resolution. We have also checked the PCC by RMSE ratio for each individual model and the MMM (Table 1). This ratio is ~0.17 for the MMM and for the individual model it varies from 0.04 to 0.20. There are two models, viz. EC-Earth and MIROC5, for which this ratio is higher than MMM. For the EC-Earth model, the PCC is higher and RMSE is lower than MMM and the magnitude of regional biases is similar to the MMM, thereby suggesting the good performance of this model over the MMM.

We next look at other aspects of rainfall that are equally important for many applications of the climate information provided by models. Similar to the analysis done for temperature, in the following we analyze the annual cycle, occurrence probability, and trends in rainfall over the Indian land. It can be seen from Fig. 4a that the MMM captures the annual precipitation cycle over the Indian land quite well, but there is a dry bias during June–August, and wet bias during November–January. Since the distribution of rainfall on sub-monthly timescales is equally important for many climate applications (e.g., agriculture), we next look at the PDFs of daily precipitation from observations and the MMM. Given the large range in rainfall values (as compared to temperature values), we use a finer bin width (0.2 mm day−1) for the 0–20 mm day−1 range, and a coarser bin width (5 mm day−1) for the 20–500 mm day−1 range, and show the PDFs in two separate panels. The PDFs for the precipitation are computed using the same methodology as the PDFs for the temperature to avoid issues due to the out-of-phase behavior of the individual models in simulating rainfall over a given grid point. It can be seen from Fig. 4b, c that the MMM overestimates the occurrence probability in the low to moderate precipitation range (0–20 mm day−1) and fail to capture the heavy precipitation. In the MMM, the maximum daily precipitation values are ~60 mm day−1, whereas in observations it goes beyond 500 mm day−1. The underestimation of extreme events in CMIP5 models may be attributed to the coarse resolution of the models, and to the issue of high-frequency low-intensity drizzle that prevents the moisture to build up in the atmosphere to a level where it can cause extreme rain events. This could also be substantiated from Fig. 4b, c which shows that the PDFs for MIROC-ESM model (shown by the green line) and MIROC4h model (shown by the blue line) at their native resolutions. The MIROC-ESM model overestimates the low to moderate precipitation intensity (0–20 mm day−1) but severely underestimates the high precipitation intensity >20 mm day−1. This model also fails to capture the precipitation intensity >120 mm day−1. In contrast to this, the fine resolution model, MIROC4h, is very close to the observations and captures the extreme events realistically.

Fig. 4
figure 4

a Annual cycle of all-India weighted area average pentad rainfall (mm day−1) from IMD and MMM. PDFs (%) of daily rainfall from IMD, mean of the distributions of individual models, MIROC-ESM and MIROC4h for b 0–20 mm day−1 range with bin width of 0.2 mm day−1 and c 20–500 mm day−1 range with bin width of 5 mm day−1. JJAS rainfall trends (mm day−1 per decade) from d IMD and e MMM. Grid points with a trend value significant at the 90% level have been stippled

Finally, we analyze the fidelity of the MMM in reproducing the observed spatial pattern of trends in the last few decades. Figure 4d shows that the trend in IMD precipitation is less consistent in space as compared to temperatures. The trend values are significant (stippled) over a very limited region of Indian land and over this limited region, there is a drying trend in precipitation with values from −0.3 to −0.6 mm day−1 per decade. However, during the same period, the MMM shows a weak positive trend over the monsoon trough zone, and the values are not significant over most of the grid points. Thus, one may conclude that the MMM fails to capture the observed rainfall trend over the Indian land.

Discussion

From the analysis of the MMM in regard to its fidelity in simulating some of the salient features of the ISM, the following broad conclusions may be drawn. The MMM captures the observed pattern of surface air temperature to a great extent (pattern correlation of ~0.80), however, there are large biases in the magnitude of the order of 10–15 °C (RMSE of 5.3 °C), particularly over the northern and northeast India. The analysis shows that the coarser resolution of many of the CMIP5 models may not be playing an important role in causing this large negative bias. The large cold bias in surface air temperature over northernmost parts of India persists across models and resolutions, and it may need targeted numerical experiments to identify and alleviate this bias.

The MMM captures the annual cycle of surface air temperature over Indian land, although there is a cold bias throughout the year. Even after removing the north box, the cold bias still prevails throughout the year. The MMM simulates the moderate temperatures between 15 and 35 °C realistically, but overestimates the extremes (<15 and >35 °C) on both sides of the PDF. Similar to the systematic cold bias in models, the simulation of temperature extremes does not appear to improve with the increasing spatial resolution of the models and independent experiments are required to address this particular aspect. Analysis of the spatial pattern of temperature trends reveals that the MMM fails to capture the observed trends.

The MMM captures the large-scale features of observed rainfall but there are large biases in the magnitude. For example, the MMM shows maxima over the Western Ghats, but the seasonal mean precipitation is severely underestimated over this region. There is general dry bias over the Indian land region. Investigation of the dependence of model biases on the spatial resolution of the model shows that rainfall features that have a strong dependence on the lower boundary (e.g., orography over the Western Ghats region) show significant improvement with the increase in resolution. However, over other locations, the model physics seems to have a more important role to play. The MMM captures the seasonal cycle quite well, although there is a dry bias during JJAS and wet bias during November–January. The MMM overestimate the low to moderate precipitation (0–30 mm day−1) and underestimate the heavy precipitation. The MMM also fails to capture the extreme events with precipitation >60 mm day−1, however, the simulation of extremes is expected to improve with the increase in the model resolution. The analysis of trends in JJAS rainfall shows that the MMM fails to capture the observed rainfall trend.

Thus, while MMM is a useful way of extracting first-order climate change projections information from the CMIP5 models it has its own limitations that need to be understood by the community of researchers working on climate applications. There are several individual models (for example, CCSM4 for temperatures) that are found to perform better than the MMM in terms of biases and overall performance. Some systematic biases have been identified in this study that are present across the models and resolutions, and hence simply taking a MMM does not help in such cases. Thus, only a targeted exercise of model improvement would help in reducing such model biases in the longer term, unless one prefers to adopt other methods, such as bias correction to get better estimates of climate change projections but again it has its own shortcomings.

Methods

Monthly and daily means of surface air temperature and precipitation from 28 CMIP5 models have been analyzed over the Indian land for June–September. Daily gridded observed surface air temperature (2 m above the surface) at one-degree resolution15 and daily gridded observed precipitation at quarter-degree resolution16 over the Indian land have been analyzed. The model and observed surface air temperature datasets have been interpolated to quarter degree resolution using bilinear interpolation for making them consistent with the quarter degree rainfall observation, and also to avoid exclusion of land grid points in the neighborhood of the Indian coastline.