Population exposure across central India to PM2.5 derived using remotely sensed products in a three-stage statistical model

Surface PM2.5 concentrations are required for exposure assessment studies. Remotely sensed Aerosol Optical Depth (AOD) has been used to derive PM2.5 where ground data is unavailable. However, two key challenges in estimating surface PM2.5 from AOD using statistical models are (i) Satellite data gaps, and (ii) spatio-temporal variability in AOD-PM2.5 relationships. In this study, we estimated spatially continuous (0.03° × 0.03°) daily surface PM2.5 concentrations using MAIAC AOD over Madhya Pradesh (MP), central India for 2018 and 2019, and validated our results against surface measurements. Daily MAIAC AOD gaps were filled using MERRA-2 AOD. Imputed AOD together with MERRA-2 meteorology and land use information were then used to develop a linear mixed effect (LME) model. Finally, a geographically weighted regression was developed using the LME output to capture spatial variability in AOD-PM2.5 relationship. Final Cross-Validation (CV) correlation coefficient, r2, between modelled and observed PM2.5 varied from 0.359 to 0.689 while the Root Mean Squared Error (RMSE) varied from 15.83 to 35.85 µg m−3, over the entire study region during the study period. Strong seasonality was observed with winter seasons (2018 and 2019) PM2.5 concentration (mean value 82.54 µg m−3) being the highest and monsoon seasons being the lowest (mean value of 32.10 µg m−3). Our results show that MP had a mean PM2.5 concentration of 58.19 µg m−3 and 56.32 µg m−3 for 2018 and 2019, respectively, which likely caused total premature deaths of 0.106 million (0.086, 0.128) at the 95% confidence interval including 0.056 million (0.045, 0.067) deaths due to Ischemic Heart Disease (IHD), 0.037 million (0.031, 0.045) due to strokes, 0.012 million (0.009, 0.014) due to Chronic Obstructive Pulmonary Disease (COPD), and 1.2 thousand (1.0, 1.5) due to lung cancer (LNC) during this period.

Increased cardiovascular and respiratory diseases in addition to a decreased life expectancy are associated with chronic exposure to particulate matter with aerodynamic diameters < 2.5 µm, PM 2. 5 1 . The Global Burden of Disease (GBD) 2015 study identified air pollution as a major cause of global disease burden, with low and middle-income countries being the worst affected 2 . In India, PM 2.5 standards were included in the National Ambient Air Quality Standards (NAAQS) in November 2009 and are monitored by the central and several state pollution control boards. However, PM 2.5 concentrations vary in space over sub-kilometer to continental scales 3 . The current surface PM 2.5 monitoring network in India is inadequate to capture this variability and to provide adequate data for population exposure studies. Satellite retrieval proxies of PM 2.5 such as Aerosol Optical Depth (AOD), which is the measure of the overall light extinction attributed to the aerosols in the atmospheric column, has been extensively used to estimate surface fine aerosols concentrations worldwide [4][5][6][7] . During the mid-2000s, various studies estimated surface PM 2.5 concentrations by establishing a linear relationship between AOD and surface PM 2.5 . Another popular approach was by using η factor (the ratio of PM 2.5 modelled and AOD modelled) obtained from various global Chemical Transport Models (CTMs) such as GEOS Chem as a scaling factor to convert satellite-derived AOD to surface PM 2.5 concentration 8,9 . During the last decade or so, several studies estimated the global concentrations of surface PM 2.5 using satellite-derived AOD and η factor obtained from various CTMs 10,11 as this method does not require surface measurements to develop the model. However, results

Study area
This study was conducted over MP state (Fig. 1) in central India [27 N, 74E-21 N, 84E]. MP is the second largest state in India by area with a total geographical area of 3,08,245 km 2 and the fifth-largest state by population Census 2011 28 . The northeastern boundary of MP is lined by Indo Gangetic Plain (IGP), one of the most airpolluted regions in the world 10,11 . Previous studies have shown 24 h mean PM 2.5 mass concentrations of up to 170 μg m −3 in the IGP and in parts of MP 23 . Based on the Indian Meteorological Department classification, MP has four distinct seasons: winter (Jan, Feb), pre-monsoon (Mar, Apr, May), monsoon (Jun, Jul, Aug, Sep) and post-monsoon (Oct, Nov, Dec) and has a mixture of semi-arid, tropical, and subtropical climate. The annual mean temperature over MP is 24.7 °C with an average daily high temperature of 33 °C (averages over the last 20 years). MP receives most of its rainfall in the monsoon season with a mean rainfall of 1160 mm with high spatial variability (rainfall decreases from east to west). The mean elevation of MP ranges between 72 m amsl and 1317 m amsl.

Material and methods
Ground PM 2.5 measurement. The Central Pollution Control Board (CPCB) monitors ambient air quality under a nation-wide program: National Ambient Air Quality Monitoring Programme (NAMP). The monitoring stations report PM 2.5 mass concentration in μg m −3 at 15 min resolution, measured using the tapered element oscillating microbalance (TEOM) or the beta-attenuation method (BAM) (CPCB 2011) 29 . Daily mean PM 2.5 concentration data were thus obtained for 12 stations in different cities across MP for the period between Janu-  Fig. 1 and data completeness over these stations is shown in Supplemental Figure S1.

MODIS AOD. The MODerate resolution Imaging Spectroradiometer (MODIS) sensors onboard Earth
Observation System (EOS) Terra and Aqua were launched to Sun-synchronous polar orbits in late 1999 and in 2002, having satellite overpass times over India at 10:30 a.m. and 01:30 p.m., respectively. They have a circular orbit of 705 km and a swath of approximately 2330 km. Daily measurements across a wide spectral range yields multiple datasets of AOD and a variety of other products. In this study, we have used the Multi-Angle Implementation of Atmospheric Correction (MAIAC) for a combined data product of Terra and Aqua AOD. MAIAC algorithm was developed for MODIS data to perform the retrieval of aerosols and for atmospheric correction over bright and dark surfaces (vegetated) utilizing image processing and time-series analysis to derive bidirectional reflectance distribution function at a resolution of 1 km 30 . Due to clear surface characterization, MAIAC AOD has lesser urban bias and increased spatial coverage when compared to the dark target AOD products 31 . Previous studies in the literature have successfully estimated daily PM 2.5 concentrations by utilizing Terra and Aqua AOD averages 14,32 . MAIAC files contain multiple (2-4) AOD files per day depending upon the number of Terra and Aqua overpasses. Due to changing cloud cover during a day, there is a difference in the spatial and temporal availability in AOD data per day. Availability of MAIAC AOD over the location of surface stations in MP is given in Supplemental Figure S2. www.nature.com/scientificreports/ from MERRA-2 surface flux diagnostics. These datasets were obtained hourly from 05:00 UTC to 08:00 UTC for the study period at a spatial resolution of 0.625° × 0.5° from GES-DISC over MP.

MERRA-2 AOD.
Land-cover and road-density data. Shapefile  Data integration. In this study, PM 2.5 ground measurements were point data. Additionally, four different gridded datasets were used: the MAIAC AOD obtained from LAADS DAAC in HDF format; daily mean MERRA-2 AOD and meteorological parameters obtained from GES DISC originally stored in NetCDF; and land cover classification gridded maps obtained from ECMWF. Additionally, the road map of Madhya Pradesh was in a shapefile format. A brief description of these datasets and their source is given in Table 1.
To maintain a reasonable file size MAIAC AOD data is stored in 1200 km × 1200 km tiled file structure. Madhya Pradesh is covered by two tiles i.e. h25v06 and h24v06. These files contain multiple AOD files per day (2)(3)(4) from multiple overpasses of Terra and Aqua platform. Daily mean AOD values from these multiple swaths were calculated for each grid and were clipped for MP state. MAIAC AOD data was available in curvilinear projection and this had to be remapped to rectilinear projection for obtaining a consistent spatiotemporal dataset with MERRA-2 AOD and meteorological parameters (originally in rectilinear projection). AOD re-mapping is timeconsuming and computationally expensive, therefore, to balance between spatial resolution and computational time, the original curvilinear 1 km × 1 km MAIAC AOD was remapped to 0.03° × 0.03° rectilinear projection. Previous studies in the literature have adopted a similar remapping technique to estimate AOD at a lower resolution. Ma et al. 36,37 remapped MODIS C6 AOD (3 km × 3 km) to 0.1° × 0.1° and MODIS C5.1 (10 km × 10 km) to 50 km resolution data over China, respectively. vanDonkelaar et al. 11 remapped global MAIAC AOD to regional 0.1° × 0.1° and 0.01° × 0.01° to estimate the global burden of surface PM 2.5 concentrations. Meanwhile, a bilinear-interpolation method was used for MERRA-2 AOD and meteorological parameters to match the spatial resolution with gridded MAIAC data. For stations like Singrauli, PM 2.5 mass concentration was as high as 800-1000 μg m −3 for a few days (Supplemental Table S2). Very high episodic PM 2.5 concentration days were outliers and not reflected by unusually high AOD over these stations on such days. Therefore PM 2.5 concentration values greater than 99.9th percentile were not used in model development or validation. Gridded road density network (Supplemental Figure S3) was obtained using the method described in "Land-cover and road-density data" section. Finally, the number of grid cells of urban cover, grassland and cropland falling in the predefined 0.03° × 0.03° grid were calculated.
Model development. Stage 1: imputing missing MAIAC AOD with MERRA-2 AOD. MODIS data has often missing AOD values due to cloud coverage, monsoons and satellite calibration issues. Spatial distribution of the percentage of annual mean available data days for 2018 and 2019 are shown in Supplemental Figure S4. Percentage of daily MAIAC AOD available over MP ranged from 0.0 to 73.15% with a mean value of 59.47% and 0.0% to 63.83% with a mean value of 50.56% during 2018 and 2019, respectively. There are no specific spatial patterns in missing data, except almost no data available over large water bodies in MP. To fill daily MAIAC AOD data gaps, first, grid-wise linear regression was fitted between daily pixel centroid values of re-sampled 0.03° MAIAC AOD and 0.03° MERRA-2 AOD for each year to obtain regression coefficients for every grid (201 × 334 = 67,134 in total). The missing MAIAC AOD (s,t) on day "t" and grid point "s" was then filled using the regression coefficient obtained for grid "s" from the first step and MERRA-2 AOD (s,t) over that grid point on day "t" as shown in Eq. (1).
(1) Final_AOD s,t = α s + β s × MERRA2_AOD s,t www.nature.com/scientificreports/ where Final_AOD s,t is the imputed AOD over grid s on day t provided MAIAC s,t is unavailable; α s and β s are the coefficient obtained from linear regression between daily MAIAC AOD and MERRA-2 AOD over grid s; and MERRA2_AOD s,t is the MERRA-2 AOD over grid s on day t. MERRA-2 AOD was able to capture the temporal variation in MAIAC AOD with a mean temporal r 2 value of 0.63 and 0.61 during 2018 and 2019, respectively over the MP region. Grid-wise temporal r 2 between MAIAC and MERRA-2 AODs and range of r 2 values are provided in Supplemental Figures S5 and S6. However, MERRA-2 consistently under estimated AOD value throughout the study period compared to MAIAC AOD (slope and intercept of the regression equations are provided in Supplemental Figures S7 and S8). We also fitted seasonal linear regression between daily MAIAC and MERRA-2 AOD to estimate seasonal slope and intercept along with the corresponding p-values (Supplemental Text S1 and Supplemental Figures S9-S12). For the monsoon season, the p-values for the slope were greater than 0.01 for a substantial number of grid points during both 2018 and 2019 (Supplemental Figures S9 and S10) bringing down the confidence in imputed AOD. Therefore in the final imputation, estimates of slope and intercept from yearly regression, as shown in Eq. (1) were used.
Stage 2: linear mixed effect model. In the second stage, a linear mixed-effect (LME) model was developed following the approach proposed by Lee et al. 38 , over the New England region in the USA. A LME model captures daily variability in the relationship between AOD, PM 2.5, and meteorological parameters. Day-specific slopes and intercepts for the relationship between PM 2.5, AOD and meteorological parameters are calculated and incorporated into both fixed-effects terms and random-effects terms in a LME model. Several studies have shown that the relationship between AOD and surface PM 2.5 varies with changing meteorology due to changing PM 2.5 vertical profile, hygroscopic particle growth and optical properties 15,36 . Also spatially changing parameters such as urban cover, forest cover and vegetation can also affect AOD-PM 2.5 relationship due to changing sources and chemical composition of PM 2.5 36 . Therefore, in this study, we used meteorological parameters along with land cover variables as independent variables and surface PM 2.5 as a dependent variable to develop a LME model in the second stage, which can be written as: where PM 2.5(s,t) is the surface PM 2.5 concentration in (µg m −3 ) over grid s on day t: a 0 and a 0,t are fixed and random (daily varying) intercept, respectively: AOD s,t is the imputed AOD (unitless) over grid s on day t. RH s,t , Temp s,t , U10 s,t , Pressure s,t are relative humidity and temperature (℃) at 2 m above ground level, U10 is the eastward component of wind speed at 10 m above the ground level and Pressure is the surface pressure over grid s on day t. UrbanCover and GrassLand are the number of grid points of urban cover and grass-land available inside the grid s. (a 1 -a 7 ) represents the fixed slopes of variables over the entire study period while (a 1,t -a 5,t ) are the changing slopes with day t. ε st is the error term on grid s on day t and Σ is the variance-covariance matrix for the random effects. Additional predictors such as road density, the daily height of planetary boundary layer, crop-land cover were also included in the LME model development. But the slope estimate values were not statistically significant therefore they were not included in the final model given by Eq. (2).
Stage 3: geographically weighted regression. In the final stage, a Geographically Weighted Regression (GWR) model was developed to capture the spatially varying relationships between AOD and PM 2.5 using the output from the LME model. A GWR model captures spatial heterogeneity by generating a continuous surface of model parameters at every grid cell instead of universal value for all observations (predictor and response variable). We fitted a daily Gaussian GWR model using adaptive bandwidth selection to minimize the Akaike Information Criterion (AIC c ) value using adaptive bi-square kernel in MGWR python 39 given by Eq. (3).
where PM2.5_residual s,t is the residual PM 2.5 concentration (observed-estimated) obtained after fitting the LME model over grid s and day t, AOD s,t is the imputed AOD (unitless) over grid s and day t: b 0,s and b 1,s are location specific intercept and slope over grid s, respectively, which is a function of the geographic location, and ε 1 st is the error term over grid s and day t.
To assess the model fit performance, statistical indicators (r 2 , Root Mean Square Error (RMSE) and Mean Absolute Error (MAE)) were used to estimate the goodness of fit for both stage-2 and stage-3 models. Furthermore, to avoid any model overfitting, a tenfold Cross-Validation (CV) approach was used after stage-2 and stage-3 to estimate the overall model performance. In a tenfold CV, the entire dataset of 4922 observations was divided into 10 random sub-groups (~ 492 points each), and data from 9 sub-groups were used to train the model. The remaining group was used for model validation. This validation scheme is repeated ten times until every subgroup is validated. The metrics were then calculated by comparing PM 2.5 observations and estimates that are collected from all 10 subgroups. Furthermore, due to unavailability of AERONET stations or any campaign mode in situ AOD measurement over MP, to estimate the performance of imputed AOD using MERRA-2 AOD we chose days over study area where MAIAC AOD data was unavailable but surface PM 2.5 data were available. We then checked final model performance on such days against surface concentration to assess the performance of imputed AOD to estimate PM 2.5 concentration.
Model fitting and validation. The fixed effect terms of all the variables from stage-2 after fitting the LME model are summarized in Table 2. The PM 2.5 concentration increases with increasing AOD, surface pressure, urban cover, grassland cover and northward winds in a grid cell and decreases with increasing temperature and relative humidity. Detailed results from the model are provided in Supplemental Table S3. ECMFW LULC classifies Mosaic herbaceous cover (> 50%)/tree and shrub (< 50%) and grassland as grassland. This herbaceous cover land becomes open/dry land during summer/dry season and could potentially contribute dust aerosol to PM 2.5 loading in MP giving a positive slope with PM 2.5 .
A GWR model was then developed using residual PM 2.5 (LME estimated PM 2.5 -observed PM 2.5 ) and AOD as shown in "Model development" section. The model results were then compared with ground observations to evaluate the model performances. Scatter plots between modelled and observed PM 2.5 concentrations for the model fitting and cross-validation of stage-2 and stage-3 models are shown in Fig. 2.
Overall coefficient of determination (r 2 ) values for model fitting were 0.56 and 0.60 for stage-2 and stage-3 models, respectively. The RMSE values also decreased from 22.63 to 21.63 µg m −3 from stage-2 to stage-3 indicating that the overall prediction accuracy increased after using the GWR model. However, CV r 2 values were 0.51 and 0.55 for stage-2 and stage-3 models while the respective CV RMSE values were 23.91 µg m −3 and 22.92 µg m −3 . The r 2 value decreases for CV than for model fitting and the corresponding RMSE value increases for both models indicating the model slightly over fitted at both the stages. Also, after fitting the GWR model slope was increased to 0.58 from 0.55 (Fig. 2) in model fitting and to 0.54 from 0.52 (Fig. 2) Table S4. Modelled PM 2.5 for days with surface PM 2.5 concentration more than 100 µg m −3 were underestimated by both models in model fitting and cross-validation, and with increasing concentration underestimation also increased. This may be because the model was developed with most of the points below 100 µg m −3 , therefore, less weighted is given to points with such high concentration. One more possible explanation could be that such high concentrations were local and were not reflected in a coarse resolution of 0.03° to 0.03°.
Station-wise r 2 and RMSE for model fitting and CV for stage-2 and stage-3 models are shown as Table 3. CV r 2 value between observed and modelled PM 2.5 varied from 0.359 to 0.689 while RMSE varied from 15.83 to 35.85 µg m −3 . Further, r 2 value increased at every station after using the GWR model. In order to assess the usefulness of imputed AOD values in estimating surface PM 2.5 , we selected days when MAIAC AOD was missing but surface PM 2.5 data were available.
Modelled PM 2.5 data on only those days were selected and compared with observed values using statistical metrics (r 2 , RMSE, MAE) and a scatter plot is provided in Fig. 3. The agreement between modelled and observed PM 2.5 on such days was good (r 2 = 0.54) and the overall model RMSE value was 19.42 µg m −3 clearly indicating the www.nature.com/scientificreports/ usefulness of imputed AOD to predict surface PM 2.5 . We also fitted stage-2 and stage-3 models using 0.03° × 0.03° MERRA-2 AOD instead of imputed AOD and the results are discussed in Supplemental Text S2.
Spatial distribution of surface PM 2.5 . Daily surface PM 2.5 maps at a spatial resolution of (0.03° × 0.03°) were generated using the stage-3 model over Madhya Pradesh for 2018 and 2019. This data was then used to estimate the annual average of daily surface PM 2.5 concentration and the final maps are presented in Fig. 4. The annual average daily PM 2.5 concentration over MP varied from 22.73 to 95.24 µg m −3 with a mean value of 58.19 µg m −3 during 2018 and from 20.80 to 96.72 µg m −3 with a mean value of 56.32 µg m −3 during 2019. It was observed that the topography of MP had a strong influence on the surface PM 2.5 , with the highest concentration in northeast MP which is a part of the Indo-Gangetic Plain (IGP) and high PM 2.5 mass loading downstream of the Narmada valley, while locations at a high elevation (Dindori, Mandla, Amarkantak) had the least PM 2.5 mass loading. It is also worth noting that the IGP is highly industrialized and very high population density region potentially leading to high PM 2.5 mass loading while districts of Dindori and Mandla are the least industrially developed with huge forested cover (Kanha National Park) with negligible anthropogenic activities, potentially leading to a cleaner environment when compared with the rest of MP.
Seasonal PM 2.5 maps. To examine the seasonal variation of surface PM 2.5 over MP, mean seasonal maps were generated utilizing the estimated daily PM 2.5 maps for both 2018 and 2019, for a given season (Fig. 5).  www.nature.com/scientificreports/ Population exposure to surface PM 2.5 . Integrated Exposure-response function (IER) 40 has been used widely to estimate the age and cause-specific mortality associated with exposure to PM 2.5 concentrations. IER estimates the risk function for a particular disease as a function of PM 2.5 concentrations based upon previous health studies. Equations (4)-(5) shows the IER framework that accounts for the dependence of relative risk (RR) on PM 2.5 concentrations, Cn.
(4) For Cn < Cn cf , RR i (Cn) = 1 Table 3. Summary statistics of LME and LME + GWR mode (MT is model training and CV is cross validation).  www.nature.com/scientificreports/ where. RR is the relative risk of ith disease for exposure to PM 2.5 concentration of Cn, Cn cf , counterfactual concentration below which there is no associated risk due to PM 2.5 (RR = 1) and ɑ i , γ i and δ i are disease-specific parameters. Previous studies have reported that Lung Cancer (LNC), Ischemic Heart Disease (IHD), Chronic Obstructive Pulmonary Disease (COPD) and Stroke deaths account for 97% of total deaths due to air pollution 41,42 . Therefore in this study, we have estimated district-wise premature mortality in the adult population (age > 25 years) due to LNC, IHD, COPD and Strokes using RR values provided in the look-up table by Apte et al. 43 . In that study, age-independent RR values for LNC and COPD, and age-dependent RR values for IHD and stroke for various PM 2.5 concentrations were generated using the mean of 1000 IER curves 44 and Cn cf was taken as 5.8 µg m −3 . Disease-specific premature mortality for the ith district and jth age group was then calculated using Eq. (6).  www.nature.com/scientificreports/ where BM j,k is the disease-specific baseline mortality rate for the jth age group for kth year obtained from GBD India Compare Data Visualization (ICMR, PHFI, and IHME; 2019) for the year 2018 and 2019, RR i,k is relative risk for the kth year over ith district and Pop i,j is the total population of the ith district in the age group "j". Disease-wise baseline mortality rate was provided with 95% CI which was then translated to a 95% confidence interval for disease-specific mortality over the MP. Further details on the data source and method are given in the Supplemental Text S3. Finally, the total premature mortality was calculated by adding premature mortality due to individual disease over MP. District-wise population-weighted PM 2.5 concentrations are shown in Supplemental Figure S17. The total combined (2018 and 2019) premature mortality due to exposure to PM 2.5 concentrations in MP is estimated to be 106,115.2 (85,717.46, 127,604.6) at 95% CI including deaths due to COPD 11,720.5 (8777.197,14,114 (30,672.47, 45,180.44). IHD is the major cause of premature mortality in MP causing 52.3% of total deaths followed by Strokes (35.4%), COPD (11.04) and LNC (1.17%). Indore city, which is the commercial capital of MP, with very high population density, tops the number of premature deaths due to air pollution with 4853.30 (3921.40, 5836.10) total deaths are 2018-2019 followed by Rewa, Jabalpur, Satna and Sagar. Disease-wise cause specific death for every city is provided in Supplemental Table S8. Cause-specific deaths for the top 5 cities in MP are shown in Supplemental Figure S18. Total premature mortality during 2018-2019 for districts in MP is shown in Fig. 6.

Comparison with a CTM-satellite AOD based global PM 2.5 estimate
An ancillary goal of this study was to assess the usefulness of our model in estimating surface PM 2.5 compared to other CTM output-satellite AOD approaches. In order to do so, we benchmarked annual surface PM 2.5 estimated in this study for locations with surface measurements in MP (Fig. 1) against recent estimates over the same locations derived from Hammer et al. 27 . Our model results agreed better with surface measurements (r 2 = 0.89) compared to the CTM output-satellite AOD estimates (r 2 = 0.55) (See Fig. 7). Also, to understand the spatial variability in surface PM 2.5 over MP estimated by the two approaches (this study and Hammer et al. 27 ), difference maps for 2018 and 2019 were generated (Supplemental Figure S19). These maps suggest that the CTM based approach did not satisfactorily capture the spatial variation in surface PM 2.5 over MP, while over-predicting PM 2.5 over elevated regions and under-predicting its concentrations over the Narmada valley.

Conclusions
This study developed a three-stage statistical model to generate full coverage daily 0.03° × 0.03° surface PM 2.  www.nature.com/scientificreports/ seasons with a mean concentration of 32.10 µg m −3 . Also, topography and land use has a strong influence on the surface PM 2.5 concentration in MP. IGP and low elevation areas were the most polluted while the high elevation areas had the low PM 2.5 concentrations. Indore city had the highest premature mortality in MP during 2018-2019 followed by Rewa, Jabalpur, Satna and Sagar illustrating the fact that air pollution and associated health burden is not only a crowded commercial city problem in MP. This observation reiterates the need for current and future air quality management strategies to focus on regional air quality issues and identify air quality management districts for meaningful and effective public health protection.
Missing AOD data is a major problem in accurately estimating the surface PM 2.5 concentration for exposure and epidemiological studies. This study has demonstrated one approach to address that problem. Although MP is used as an illustrative example to elucidate the usefulness of the model developed in this study, the method is robust and applicable across locations in the world. However, during the course of conducting this study, it was observed that the rationale for choice of locations for the CPCB stations is neither clearly documented nor obvious. It appears that the choice of location is driven by considerations of determining NAAQS violations, logistics and ease of operation. For instance, the data used in training the LME model in this study was from stations that cannot be classified as either urban hotspots or regional background locations. Thus, a country-wide network of ground monitoring stations, carefully situated in accordance with network design rules with sufficient density to capture regional and/or urban aerosols are essential to effectively exploit satellite products to provide reliable spatially continuous surface PM 2.5 estimates. These estimates can then be used for planning both air quality management strategies and to enhance population exposure studies and other epidemiological models that assess the PM 2.5 induced burden of disease. It is hoped that the availability of high time resolution surface PM 2.5 measurements at several locations across India in conjunction with models, such as those developed in this study, will help enhance GBD exposure assessment estimates for this region in the future.

Data availability
There are no linked research data sets for this submission. All data used in this study are publicly available. Web links/citations as appropriate to the data used are listed in the manuscript.