Bias-corrected climate projections for South Asia from Coupled Model Intercomparison Project-6

Climate change is likely to pose enormous challenges for agriculture, water resources, infrastructure, and livelihood of millions of people living in South Asia. Here, we develop daily bias-corrected data of precipitation, maximum and minimum temperatures at 0.25° spatial resolution for South Asia (India, Pakistan, Bangladesh, Nepal, Bhutan, and Sri Lanka) and 18 river basins located in the Indian sub-continent. The bias-corrected dataset is developed using Empirical Quantile Mapping (EQM) for the historic (1951–2014) and projected (2015–2100) climate for the four scenarios (SSP126, SSP245, SSP370, SSP585) using output from 13 General Circulation Models (GCMs) from Coupled Model Intercomparison Project-6 (CMIP6). The bias-corrected dataset was evaluated against the observations for both mean and extremes of precipitation, maximum and minimum temperatures. Bias corrected projections from 13 CMIP6-GCMs project a warmer (3–5°C) and wetter (13–30%) climate in South Asia in the 21st century. The bias-corrected projections from CMIP6-GCMs can be used for climate change impact assessment in South Asia and hydrologic impact assessment in the sub-continental river basins. Measurement(s) hydrological precipitation process • volume of hydrological precipitation • temperature of air • climate Technology Type(s) computational modeling technique Factor Type(s) geographic location • temporal resolution Sample Characteristic - Environment climate system • river basin Sample Characteristic - Location India • Pakistan • Bangladesh • Nepal • Bhutan • Sri Lanka Measurement(s) hydrological precipitation process • volume of hydrological precipitation • temperature of air • climate Technology Type(s) computational modeling technique Factor Type(s) geographic location • temporal resolution Sample Characteristic - Environment climate system • river basin Sample Characteristic - Location India • Pakistan • Bangladesh • Nepal • Bhutan • Sri Lanka Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12963008


Background & Summary
South Asia is one of the most densely populated regions of the world. A majority of the population in South Asia depends on agriculture for their livelihood. South Asia is among the global hot spots that are likely to face the detrimental impacts of climate change 1,2 . Considerable changes in precipitation and temperature are projected in South Asia that will have implications for water resources and agriculture [3][4][5][6] . The risk of floods and droughts are likely to increase in South Asia under the warming climate [7][8][9][10][11][12] . Both recent droughts and floods have affected a large population and caused enormous damage to agriculture and infrastructure in South Asia [13][14][15][16] . Similarly, the frequency and intensity of severe heatwaves have increased in South Asia and projected to increase in the future [17][18][19][20][21] . Overall, the frequency of both precipitation and temperature extremes has considerably increased in the past decades and likely to rise further under the warming climate 18,22 .
Projections from the General Circulation Models (GCMs) play a vital role in understanding the future changes in climate. However, spatial resolution at which GCMs are run is often too coarse to get reliable projections at the regional and local scale 23 . Precipitation and temperature projections at higher spatial resolution are required for the climate impact assessments [24][25][26] . Moreover, precipitation and temperature from the GCMs have a bias due to their coarse resolution or model parameterizations 27,28 . Therefore, for the assessment of the climate change and its impacts on different sectors (e.g., water resources, agriculture), bias-correction is required 23,[29][30][31][32][33][34] . Both statistical and dynamical approaches are used for downscaling and bias correction of climate change projections from GCMs. Statistical approaches are based on the distribution and relationship between the observed and projected data for the historical period 33,34 . On the other hand, dynamical downscaling approaches are based on regional climate model forced with the boundary conditions from the coarse resolution GCMs 35,36 . Both statistical and dynamical downscaling approaches have limitations 37,38 . The primary limitation of the dynamical downscaling is related to the requirement of computational efforts to run the regional climate models at higher spatial and temporal resolution 27,39 . Moreover, dynamical downscaling may not remove the bias in climate variables, which might require corrections based on the statistical approaches 39 . Given these limitations, statistical bias correction approaches are widely used in climate change impact assessments 40,41 .
Considering the climate change impacts in South Asia, we develop a bias-corrected dataset of daily precipitation, maximum and minimum temperatures using output from 13 GCMs that participated in the Coupled Model Intercomparison Project-6 (CMIP6). The 13 GCMs were selected based on the availability of daily precipitation, maximum and minimum temperatures for the historical and four scenarios (SSP126, SSP245, SSP370, SSP585). We used empirical quantile mapping (EQM) to develop bias-corrected data at daily temporal and 0.25° spatial resolution for six countries in South Asia (India, Pakistan, Bangladesh, Nepal, Bhutan, and Sri Lanka). Also, the bias-corrected projections are developed for 18 sub-continental river basins. The bias-corrected projections from 13 CMIP-GCMs can be used for estimating the projected changes in mean and extreme climate in South Asia. Bias corrected data for 18 sub-continental river basins can be used to develop hydrologic projections using hydrological models.

Methods
Bias-corrected projections were developed for South Asia (India, Pakistan, Bangladesh, Nepal, Bhutan, and Sri Lanka) and the 18 Indian sub-continental river basins (Fig. 1). We used basin boundaries in the Indian sub-continent from Shah and Mishra 42 [ Fig. 1]. We obtained observed daily gridded precipitation, minimum and maximum temperatures for South Asia for the 1951-2018 period. Daily precipitation at 0.25° was obtained from the India Meteorological Department (IMD) for the Indian region 43 . Pai et al. 43 developed gridded daily precipitation for India using station observations from more than 6000 stations located across India. The precipitation captures critical features of the Indian summer monsoon, including higher rainfall in the Western Ghats and northeastern India and lower rainfall in the semi-arid and arid regions of western India. Besides, gridded precipitation captures the orographic rain in the Western Ghats and foothills of Himalaya. The gridded precipitation data from IMD has been used for various hydroclimatic applications 13,44,45 . Gridded daily maximum and minimum temperatures from IMD were developed using station-based observations from more than 350 stations located across India 46 . There is bias in temperature observations from IMD in the Himalayan region, which can be attributed to sparse station density 44,47 . Gridded precipitation and maximum and minimum temperatures were obtained from Sheffield et al. 48 for the regions outside India. Datasets from Sheffield et al. 48 are available at 0.25° spatial and daily temporal resolutions. Consistency between IMD and Sheffield et al. 48 dataset was checked in Shah and Mishra 42 , who reported that Sheffield et al. 48 dataset has a good agreement with the IMD observations. Nonetheless, we used IMD gridded dataset for the Indian region and Sheffield et al. 48 for outside India for bias correction of projections from CMIP6 as the IMD data is widely used for hydroclimatic studies in India. We used gridded observations for bias correction as station data are not available.
We obtained daily precipitation, maximum and minimum temperatures from 13 CMIP6-GCMs from https:// esgf-node.llnl.gov/search/cmip6/. All the three variables and for all the scenarios were available only for these 13 GCMs. Therefore, we restricted bias-correction to only these models. Precipitation, maximum and minimum temperatures from CMIP6-GCMs are available at different spatial resolutions (Table S1). For instance, the spatial resolution of the CMIP6 projection varies from 0.7° (EC-Earth3) to more than 2° (CanESM5). All the three variables were selected for the historical (1850-2014), ssp126 (2015-2100), ssp245 (2015-2100), ssp370 (2015-2100), and ssp585 (2015-2100) scenarios under r1i1p1f1 initial condition at daily time scale 49 . The scenarios used in the CMIP6 combine Shared Socioeconomic Pathways (SSP) and target radiative forcing levels at the end of the 21 st century 50 . For instance, SSP126 indicates SSP-1 and target radiative forcing at the end of the 21 st century 2.6 Watt/m 2 . Therefore, SSP126 is a mitigation scenario. On the other hand, SSP585 is based on the emission scenario considering SSP-5 and radiative forcing of 8.5 Watt/m 2 at the end of the 21 st century 50 . Further details on the scenarios used in the CMIP6 can be obtained from Gidden et al. 50 . We regridded all the variables from CMIP6 to 1° spatial resolution to make them consistent. However, the effect of regridding using bilinear interpolation was checked by comparing the gridded datasets against the raw data for all-India mean of precipitation, maximum and minimum temperatures. We did not find any considerable differences in the all-India averaged precipitation and temperature using regridded and raw output from the GCMs (Fig. S1).
Outputs of the various atmospheric (e.g., maximum and minimum temperatures, and precipitation) variables obtained from GCMs are known to exhibit systematic biases (Fig. S2). Hence, these outputs need to be bias-corrected to produce reliable estimates at regional and local scales for climate impact assessment. To achieve this, statistical transformations that attempt to find a function that maps the model output to a new distribution such that the resulting distribution matches that of observations. In general, this transformation can be formulated as Piani et al. 30 : where x m o is the bias-corrected model output. If the statistical distribution of x m and x 0 are known, the transformation can be written as:  34,52,53 are estimated from the percentiles calculated from x m and x 0 . As a result, EQM and its variants can be applied to both temperature and precipitation even if their underlying distributions are different and hence recommended for statistical bias correction 54 .
In the context of statistical downscaling, since the observations are at a higher resolution than models, EQM on bilinearly interpolated model outputs at observation resolution is often used to address the scale mismatch and generate post-processed model outputs 44 . We choose non-parametric transformation approaches over the parametric approaches as has shown better skills in the comparison to parametric methods in reducing biases from GCM as well as Regional Climate Model (RCM) outputs 55 . www.nature.com/scientificdata www.nature.com/scientificdata/ We used EQM to statistically downscale the daily maximum and minimum temperatures, and precipitation for South Asia and Indian sub-continental river basins (Fig. 1). We use the outputs (x m ) from 13 CMIP6-GCMs (Table S1), which are available at different resolutions (Table S1). Observations for the three variables at the resolution of 0.25-degree are obtained from the IMD, Pai et al.) 43 for Indian Region and Sheffield et al. 48 for grid-points within and outside India, respectively. We used the 1951-2014 period to obtain the transformation function to map the distribution of x m to x o . For precipitation, the drizzle effect is corrected by using a wet day threshold of 1 mm/day 30,55 . If the values from model projections are larger (smaller) than the training values used to estimate the empirical CDF, the correction found for the highest (lowest) quantile of the training period is used. We used mapped transformation to bias correct the outputs for the historical period and the SSP126, SSP245, SSP370, and SSP585 scenarios for the 2015-2100 period for all the three variables. Raw and bias-corrected data for INM-CM5 is shown against the observed maximum temperature for a randomly selected grid in the Indian subcontinent (Fig. S1). Quantile mapping based statistical bias correction has been widely used, and its performance was found to be satisfactory in comparison to the other methods 25,34,56 . www.nature.com/scientificdata www.nature.com/scientificdata/

Data records
Bias corrected daily precipitation, maximum and minimum temperatures are available for the 13 GCMs (Table S1) for the historical (1951-2014) and future (2015-2100) periods. We also provide the reference gridded observed data of daily precipitation, maximum and minimum temperatures that were used for the bis correction. Projections for the future are available for the four scenarios (SSP126, SSP245, SSP370, and SSP585) for South India (India, Pakistan, Bangladesh, Sri Lanka, Bhutan, and Nepal) and Indian sub-continental river basins (Fig. 1). The basin wise dataset 57 and country wise dataset 58 have been made available through Zenodo. Details on the data format can be obtained from a readme file provided at the above link.

technical Validation
First, we estimated the projected changes in mean annual precipitation, maximum and minimum temperatures using the raw data from the CMIP6-GCMs (Fig. 2). The projected changes were estimated for each GCM for the late 21 st century (2074-2100) against the historical reference period (1988-2014). Then, the multimodel ensemble mean of the projected changes from all the 13 CMIP6-GCMs was taken. The multimodel ensemble mean annual www.nature.com/scientificdata www.nature.com/scientificdata/ precipitation is projected to increase in South Asia under the projected future climate (Fig. 2a-d). The projected increase in precipitation in South Asia under the future climate varies with the scenario considered. For instance, under the high-emission scenario (SSP585), a considerably higher increase (more than 30%) in the multimodel ensemble mean is projected in comparison to the low emission (SSP126) scenario (less than 13%). Similarly, there are regional differences in the projections of precipitation from the CMIP6-GCMs. For example, a more substantial increase in the multimodel ensemble mean precipitation is projected for the semi-arid and arid regions of western South Asia than the other regions ( Fig. 1a-d). Similar to rainfall, mean annual maximum and minimum temperatures are projected to rise substantially in South Asia under the future climate (Fig. 2). Projected changes in mean annual minimum temperature are generally greater than the changes in mean annual maximum temperature. As expected, the high emission scenario (SSP585) will lead to a much higher rise in temperatures than the low emission scenario of SSP126 (Fig. 2).
The raw datasets of precipitation, maximum and minimum temperatures can be used to estimate the projected changes in South Asia under the future climate for different scenarios. However, climate impact studies need bias-corrected projections for decision making at regional and local scales. Since the bias-corrected dataset is consistent with observation for a climatological mean period, it is easier to infer the project changes and its implications in different sectors (e.g., water resources and agriculture) for observations. We, therefore, bias-corrected precipitation, maximum and minimum temperatures for the historical (1951-2014) and future (2015-2100) periods for all the four scenarios for South Asia and Indian sub-continental river basins. The bias-corrected dataset can be used for any region or river basin in South Asia or the Indian sub-continent (Fig. 1).
We estimated the multimodel ensemble mean bias in precipitation, maximum and minimum temperatures from the 13 CMIP6-GCMs (Fig. 3). The bias in mean annual precipitation, maximum and minimum temperatures was estimated against the observations from IMD (for the Indian domain) and Sheffield et al. 48 observations (for outside India). The CMIP6-GCMs show a dry bias (15-20%) in mean annual precipitation in the majority of South Asia (Fig. 1a). On the other hand, the multimodel ensemble mean positive bias in mean annual precipitation was found in the regions located in Nepal, Pakistan, and Peninsular India (Fig. 3a). A high cold bias in both mean annual maximum and minimum temperatures were found in the Himalayan region in the CMIP6-GCMs www.nature.com/scientificdata www.nature.com/scientificdata/ (Fig. 3c,e). Also, CMIP6-GCMs exhibit warm bias in mean annual minimum temperature in the majority of South Asia except for the Himalayan region (Fig. 3e). We applied the EQM approach to correct the bias in the CMIP6-GCM output at daily timescale. The bias was substantially reduced after the bias correction in all the three variables for the historical (1985-2014) period ( Fig. 3b-f). The reduction in bias in mean annual precipitation, maximum and minimum temperatures shows the effectiveness of our bias correction approach based on EQM.
Similar to mean annual precipitation, maximum and minimum temperatures, we estimated bias in precipitation and temperatures extremes in the raw output from the CMIP6-GCMs (Fig. 4). The 90 th percentiles of precipitation of rainy days (precipitation more than 1 mm), maximum and minimum temperatures were compared for the historical period (1985-2014) from CMIP6-GCMs against the observed dataset. Consistent with mean annual precipitation, a considerable dry bias is present in extreme precipitation in CMIP6-GCMs across South Asia (Fig. 4a). We find that the CMIP6-GCMs show a warm bias in the 90 th percentile of maximum and minimum temperatures across South Asia except in the Himalayan region (Fig. 4c,e). In the Himalayan region, a cool bias in CMIP6-GCMs in maximum and minimum temperature extreme was found (Fig. 4c,e). We find that the EQM based bias correction has successfully removed the bias in extreme precipitation, maximum and minimum temperatures across South Asia (Fig. 4). Therefore, the bias in both mean and extremes of precipitation, maximum and minimum temperatures were removed. Also, we compared the season cycle of bias-corrected precipitation, maximum and minimum temperatures from the CMIP6-GCMs against the observed dataset for the 1985-2014 period. Uncertainty in the bias-corrected precipitation, maximum and minimum temperatures were estimated using one standard deviation. We find that the seasonal cycle of the multimodel ensemble mean bias-corrected precipitation, maximum, and minimum temperatures compare well against the observations (Fig. 5). Moreover, the covariability of the monsoon season precipitation and air temperature is well captured by the bias-corrected dataset (Fig. S3). Overall, our results show that the EQM approach successfully corrects the bias in the CMIP6-GCMs, which can be used for climate impacts studies in South Asia. Also, the bias-corrected dataset can be used for hydrological studies in the Indian sub-continental river basins.

Climate projections for south asia and indian sub-continental river basins. Daily bias-corrected
projections of precipitation, maximum and minimum temperatures at 0.25° from CMIP6-GCMs are developed for South Asia and the 18 Indian sub-continental river basins (Fig. 1). The projections are available for www.nature.com/scientificdata www.nature.com/scientificdata/ the historical (1951-2014) and future (2015-2100) periods. We estimated projected changes in precipitation, maximum and minimum temperatures in the late 21 st century from the bias-corrected dataset against the historical reference (1988-2014) period for all the scenarios (SSP126, SSP245, SSP370, and SSP585) [Fig. 6]. Our bias-corrected precipitation projections show consistent spatial patterns that were observed in the raw CMIP6-GCMs (Fig. 2). For instance, a larger increase in mean annual precipitation was found in the western parts of South Asia in both raw and bias-corrected datasets (Fig. 6a-d). A considerably large increase in mean annual precipitation is projected under SSP585 (median 23%) than under SSP126 (median 12%) [ Fig. 6a-d]. The ensemble mean median change in maximum temperature is projected to be 1.3 °C in SSP126 and 2.2 °C in SSP585 (Fig. 6e-h). Similarly, the ensemble mean minimum temperature is projected to rise significantly across South Asia with a median increase of more than 3 °C increase in SSP585 scenario (Fig. 6i-l).
Projected changes in precipitation, maximum and minimum temperatures were estimated using a 30-year moving window for all the six countries in South Asia under the highest emission scenario of SSP585 (Fig. 7). We considered the SSP585 scenario to estimate the projected change in precipitation, maximum and minimum www.nature.com/scientificdata www.nature.com/scientificdata/ temperatures under the worst case (Fig. 7). Projected change in each CMIP6-GCM was estimated for each 30-year window (1986-2015, 1987-2016 … 2071-2100) against the historical reference period of 1985-2014. Moreover, we estimated uncertainty in the bias-corrected CMIP6-GCMs using one standard deviation of projected change in the individual GCMs. The multimodel ensemble mean annual precipitation is projected to rise in all the six countries under the future climate (Fig. 7). All the six countries in South Asia are projected to experience a 20-40% rise in mean annual precipitation under the SSP585 scenario by the end of the 21 st century. However, the bias-corrected precipitation projections show more uncertainty for Pakistan than the other countries (Fig. 7). Uncertainty in the bias-corrected maximum and minimum temperatures is substantially lesser than that of precipitation (Fig. 7). The multimodel ensemble mean bias-corrected mean annual maximum temperature is projected to rise by 3-4 °C by the end of the 21 st century under SSP585. Moreover, the bias-corrected ensemble mean annual minimum temperature is projected to rise by 3-5 °C by the end of the 21 st century (Fig. 7). We find a different level of uncertainty in mean annual precipitation, maximum and minimum temperatures for the six countries in South Asia (Fig. 7). Overall, the climate is projected to become wetter and warmer in South Asia in the future, and the magnitude of change will depend on the scenarios.
We estimated projected changes in mean annual precipitation, maximum and minimum temperatures for the six countries, and 18 sub-continental river basins for the Near (2020-2046), Mid (2047-2073), and Far (2074-2100) periods against the historical reference of 1988-2014 (Tables S2-S7). The multimodel projected changes were estimated for all the four scenarios along with the mean for the historical period (Table S2-S7). The multimodel ensemble mean bias-corrected precipitation is projected to change between 3-20% in the Near term under the SSP126 (Table S2). The most substantial increase in precipitation is projected in Pakistan, while the lowest rise www.nature.com/scientificdata www.nature.com/scientificdata/ is expected in Bhutan. Precipitation is projected to rise substantially in the Far term in all the countries in South Asia under the SSP585 (Table S2). The ensemble mean bias-corrected precipitation is projected to change by 31-53%, with the most considerable projected rise in Pakistan in the Far term under SSP585 (Table S2). The projected increase in mean annual maximum temperature is far lesser (0.48-0.97 °C) in the Near term under SSP126 in comparison to the Far (2.6-5.3 °C) term under the SSP585 (Table S3) in the six countries in South Asia. Our bias-corrected data based on 13 CMIP6-GCMs projected an increase of 0.7-1.3 °C) in mean annual minimum temperature in the Near term under SSP126 (Table S4).
Moreover, mean annual minimum temperature is projected to rise between 3.5 to 5.5 °C in the late 21 st century under the SSP585 (Table S4). Uncertainty in the projections of bias-corrected precipitation, maximum and minimum temperatures was estimated for each country and period under all the four scenarios (Tables S2-S4). Temperature projections show lesser uncertainty than the projections of precipitation, which might have implications for hydrologic applications of the bias-corrected projections (Tables S2-S4).
We estimated projected changes in mean annual precipitation, maximum and minimum temperatures using bias-corrected data from the 13 CMIP6-GCMs for the 18 sub-continental river basins (Tables S5-S7, Fig. 1). Bias corrected projections of the three climatic variables are essential for hydrologic modelling, and the climate change impact assessment. Mean annual precipitation is projected to rise across the basins under all the scenarios in the projected future climate (Table S5). The projected rise in the mean annual precipitation is considerably higher in the SSP585 in comparison to the SSP126. The projected rise in precipitation in the sub-continental basins is higher in the Far period than the Near period.
Notwithstanding a considerable uncertainty in the precipitation projections, bias-corrected data show that precipitation is projected to rise more in the river basins located in the semi-arid/arid regions of the Indian sub-continent (Table S5). Similarly, significant warming in the mean annual maximum and minimum temperatures is projected based on the bias-corrected data from the 13 CMIP6-GCMs (Tables S6, S7). Mean annual maximum temperature is projected to rise between 2.5-4.4 °C in the Far period under SSP585 in the Indian sub-continental river basins (Table S6). Moreover, the mean annual minimum temperature is projected to rise by 3.0-5.0 °C in the Far period under SSP585 in the river basins of the Indian sub-continent (Table S7). Basin specific projections and associated uncertainty can be seen in supplemental Tables S5-S7. Overall, the bias-corrected projections can be used for the hydroclimatic impact assessment in the sub-continental river basins. www.nature.com/scientificdata www.nature.com/scientificdata/

Usage Notes
Daily bias-corrected projections of precipitation, maximum and minimum temperatures at 0.25° are essential for climate impact assessment for the administrative boundaries or at river basin scale 25 . We developed bias-corrected projections from 13 CMIP6-GCMs that can be used for hydroclimatic impact assessment based on mean and extremes in South Asia. Bias corrected data performs well against the mean and extremes. The dataset has been arranged based on the geographical boundaries of six countries in South Asia. Moreover, we provide separate data for each of the 18 sub-continental river basins. Daily bias-corrected projections can be used to estimate climatic indices associated to mean and extremes. For instance, daily maximum and minimum temperatures can be used to estimated projected changes under different scenarios for the crop growing seasons. Moreover, the temperature dataset can be used to estimate growing degree days (GDD) 59 and other indicators of extreme heat during the crop growing period 60 .
Similarly, daily precipitation projections can be used to estimate changes in mean and extreme precipitation for any period during the 21 st century 22,27 . Data users can also estimate the differences in indicators and potential impacts based on the low (SSP126) and high (SSP585) emission scenarios. Most of the hydrological models require daily precipitation, maximum and minimum temperatures as the primary inputs of meteorological forcing. Therefore, hydrological models can be used with the bias-corrected projections to estimate the impacts of the projected future climate on hydrology for a river basin or a region.
As an example, we use the bias-corrected projections to estimate the frequency of precipitation and temperature extremes for an administrative region (state of Uttar Pradesh, India) and a river basin (Godavari, India) [Figs. 8,9]. The frequency of extreme precipitation was estimated using 95 th percentiles of rainy days (precipitation more than 1 mm). Similarly, the frequency of extreme hot maximum and minimum temperatures was estimated using the 95 th percentile of the two hottest months (April-May) in the region. As expected, both precipitation and temperature extremes are projected to rise in Uttar Pradesh and Godavari basin under the SSP585 scenario (Figs. 8,9). The projected rise in the frequency of precipitation and temperature extremes is higher for the Far period than the Near-term climate. Overall, daily bias-corrected CMIP6 projections can be used for multiple assessments related to climate and hydrology in one of the most populated regions of the world. As we provided the bias-corrected data for individual GCMs, users can select the GCMs that perform well in the region of interest. Moreover, the range of future projections can be estimated using the bias-corrected projections from the individual GCMs (Fig. S4). In the future, bias-corrected projections will be made available from more CMIP6-GCMs as their output becomes available. More details on the data can be found in the link and from the readme file.

Code availability
Codes used for bias correction of CMIP6-GCMs are available through the Github link: https://github.com/ udit1408/cmip6_downscaling