Introduction

Application of satellite-derived aerosol optical depth (AOD) to estimate ground-level particulate matter with aerodynamic diameter ≤ 2.5 µm (PM2.5) has advanced dramatically since it was initiated more than one decade ago. In early work, Wang and Christopher1 demonstrated the potential of using satellite-based AOD from Moderate Resolution Imaging Spectroradiometer (MODIS) to derive PM2.5 concentrations. Further studies have attempted to improve the PM2.5-AOD relationship through many linear and nonlinear statistical models in which additional parameters such as meteorological and environmental parameters are introduced to develop multiple linear regression model2, geographically weighted regression model3, land use regression models4, artificial neural networks5, chemical transport models (CTM)6, mixed effects model7.

Although these studies differ to some extent in their methodologies, the fundamental requirement of these methods is the same, i.e., satellite AOD products should be available; otherwise, it is all but impossible to derive PM2.5 from AOD with sufficient observational constraint. While satellite-predicted PM2.5 provides larger spatial coverage than ground-based measurements, its availability, 1–2 times per day by polar orbiting satellites at most, is frequently affected by clouds, snow cover and even heavy aerosol pollution (misclassification)7. For example, there were only about 120 days with AOD-PM2.5 matchups a year for MODIS products in Beijing8. Gupta et al.9 found that satellite daily AODs were generally available less than 50% of the time over 38 locations in the southeastern United States. Hence, solving the under sampling problem is one of fundamental requirements for the improvement of PM2.5 estimation from space10. Several methods have been made to solve this issue. Kloog et al.11 estimated daily PM2.5 for grid cell without AOD data by using the mean PM2.5 levels from nearby grid cells. A combined MODIS and MISR AOD has been used to improve AOD sampling3. Lu et al. proposed a method to estimate missing AODs by assuming a linear relationship between PM2.5 and AOD12. The methods mentioned above require either surface PM2.5 measurements to constrain the AOD filling or more satellite products.

Different from space-borne sensors, Aerosol Robotic Network (AERONET) has been providing robust AOD measurements in good temporal resolution for nearly two decades. More importantly, the boundaries of the region resembling AOD temporal variability as that at AERONET sites vary between 200 and 500 km depending on their specific locations, which indicates temporal variation of AOD at AERONET sites could be representative for a larger region13,14,15. This deduction is reasonable since the temporal variation of AOD is overwhelming determined by weather conditions that are able to lead to a coherent variation of AOD in a fairly large area. For example, a stable stagnant condition favors for a regional haze, on the contrary, a cold front always disperses large-spread haze dramatically. Both phenomena are often observed in North China Plain (NCP). Therefore, it is not surprising that a high agreement between spring AOD at Beijing and Xianghe has been reported16. A close connection between spatial distribution of AOD and the circulation types has also been shown15,17. Changes of temporal and spatial emissions should have played a minor role in the AOD representation of one location relative to weather conditions. This opens opportunities to enhance PM2.5 estimation from AOD if we can establish a robust synergy between spatial converge from satellite and temporal coverage from AEROENT for estimating PM2.5.

Beijing, the capital of the largest developing country in the world, China, has been suffering from heavy air pollution in recent years, especially in winter. A persistent regional air pollution episode occurred in winter of 2013 as recorded by a regional air quality monitoring network18,19, however, very few MODIS AOD retrievals are available for the estimation of a fine spatial variation of PM2.5. Heavy aerosol pollution is probably misclassified into clouds by the MODIS cloud discrimination algorithm since aerosol signal is to some extent close to that of clouds. Figure 1 presents examples of MODIS AOD retrievals under different conditions in NCP. Missing MODIS retrievals are likely due to clouds (on February 15) or misclassification of heavy haze to clouds (on February 14, 16, 18). On the contrary, AERONET AODs are available due to its high temporal resolution, especially on polluted days. Thus, AERONET AODs show their potential in the estimation of PM2.5 under these conditions.

Figure 1
figure 1

MODIS/Aqua RGB images in North China Plain overlaid with MODIS AODs as well instantaneous AERONET AODs at 550 nm on Feb. 14, 2014 (a); Feb. 15, 2014 (b); Feb. 16, 2014 (c); Feb. 17, 2014 (d); Feb. 18, 2014 (e) and Feb. 19, 2014 (f). The figure was generated in ArcMap10.2.

Here we evaluate the representative spatial boundaries of AERONET-derived AODs at Beijing station (39.97°N, 116.38°E) and investigate its potential for the estimation of PM2.5 concentration at a regional scale. MODIS/Aqua daily level-2 AOD products in Beijing area from 2002 to 2014 are firstly interpolated into a regular gridded product with a spatial resolution of 0.1°. Linear equations have then been established from simultaneous daily AERONET AODs at Beijing and gridded MODIS AODs in the Beijing Administrative area, i.e., we create a distinct linear equation between Beijing AEORNET AODs and MODIS AODs at each grid. These equations are then used to fill missing MODIS AODs from AERONET AODs. Spatial distribution of PM2.5 concentration is finally estimated from a mixed effects model. Validation shows that this method is robust in the Beijing Administrative area that suggests a great potential of AERONET AOD products for monitoring PM2.5 concentration, especially in heavily polluted regions.

Result

AOD sampling enhancement by the synergy of AERONET and MODIS

Figure 2 shows spatial distribution of correlation coefficients (R) between MODIS gridded AODs in the entire Beijing administrative area and Beijing AERONET AODs. Seasonal mean R values are 0.73 ± 0.14, 0.76 ± 0.09, 0.78 ± 0.11, 0.74 ± 0.14 for spring, summer, autumn and winter, respectively. As expected, R values decrease as a function of distance from the site. Meanwhile, R presents a decreasing gradient from eastern to western region (especially in winter), likely due to their differences in topography, land use and transport path. Spatial variation of paired data points should also contribute to this result.

Figure 2
figure 2

Seasonal correlation maps between daily-paired MODIS gridded AODs and AERONET AODs at Beijing for 2002–2014. The figure was produced using NCL. The map was created using ArcGIS 10.2 (ESRI Inc. Redlands, California, USA).

Figure 3 presents the performance of data fusion method during the winter of 2013. Before data fusion, MODIS retrieves AOD at approximately 50% of probability (regional mean). The retrieval percentage shows a spatial variation, ranging from about 20% in north to about 70% in south. After data fusion, regional mean AOD sampling over entire area increases to 81%. More specifically, AOD sampling substantially increases by ~50% in west and by ~40% in north.

Figure 3
figure 3

The ratio of days with AOD values to total day numbers before data fusion (a) and after data fusion (b) in the winter of 2013. This figure was processed using NCL. The map was created using ArcGIS 10.2 (ESRI Inc. Redlands, California, USA).

MODIS and fused AODs were compared with independent sunphotometer measurements at Xianghe and SDZ (Fig. 4). MODIS AOD at the grid closest to the station is used to compare with sunphotometer daily-mean AOD. MODIS works very well in the retrieval of AOD at Xianghe and SDZ, with R of 0.91 (data points of 440) and 0.86 (data points of 191), respectively. 70.5% and 48.7% of MODIS AODs falling within the expected uncertainty of ±0.05 ± 20% × AOD at Xianghe and SDZ. MODIS AODs are closer to ground truth at Xianghe than at SDZ. This is likely associated with complex terrain at SDZ. The accuracy of AOD estimation from the synergy of AERONET and MODIS AOD is close to that of the MODIS retrieval. The mean prediction error (MPE) and root mean square error (RMSE) fused versus sunphotometer AOD are similar as those between MODIS and sunphotometer AOD. Fusion of AERONET-derived pixel AOD and MODIS AOD results in an increase of AOD sampling by 65% at Xianghe and 93% at SDZ in the winter (Fig. 4), which definitely would be expected to improve PM2.5 estimation from AOD.

Figure 4
figure 4

Comparison of MODIS only (left), linearly fitted (center) and fused AOD (right) to sunphotometer AOD data in the winter at Xianghe (upper) and SDZ (bottom) during 2005–2011. The solid lines represent the slopes of linear regression and the dotted lines the expected MODIS aerosol retrieval errors (±0.05 ± 0.20 × AOD). MPE and RMSE represent the absolute difference and root mean square error, respectively. This figure was produced using MATLAB.

PM2.5 prediction

The mean PM2.5 concentrations in the winter of 2013 were estimated using the satellite-based and fused AOD values (Fig. 5). Averaged satellite-derived PM2.5 concentrations over the entire area were 95.5 ± 67.8 μg m−3 and 104.3 ± 74.6 μg m−3 for these two datasets. PM2.5 values in southwest and north estimated from the fused AOD are larger than those from MODIS AOD by >20 μg m−3, which is mainly because AOD sampling increases substantially in these sub-regions by the fusion method. This result indicates that PM2.5 is probably underestimated if MODIS only AODs are used due to its under-sampling of AODs.

Figure 5
figure 5

PM2.5 concentration maps retrieved using only MODIS AOD (left) and fused AOD data (right) in the winter of 2013. The corresponding mean concentrations at 35 sites are dotted on the figures. The figures were produced using NCL. The map was created using ArcGIS 10.2 (ESRI Inc. Redlands, California, USA).

Improvement of PM2.5 estimation as a result of AOD sampling enhancement by fusion of AERONET AODs is clearly shown in Fig. 6, the scatter plot of station PM2.5 measurements and estimations at 25 stations. To evaluate the performance of both AOD datasets, we adopt the cross validation (CV) method. Here we collect collocate PM2.5 and AOD data at 25 stations. Only data at 24 stations are used to train the model while the data at the remaining station are used to evaluate the model each time. This leave-one-out process was repeated for each of the 25 sites, which follows the same procedure as previous study for cross-validation10. R between measured and MODIS AOD derived PM2.5 is 0.63. The MPE and RMSE are 24.5, 29.9 μg m−3, respectively. Much better performance of the fused AOD in the derivation of PM2.5 is evidenced by increased R (0.89), decreased MPE (19.7 μg m−3) and RMSE (24.4 μg m−3).

Figure 6
figure 6

Scatter plots of the MODIS AOD derived PM2.5 (left) and the fused AOD derived PM2.5 (right) against ground-observed PM2.5 at 25 sites in the winter of 2013. The cyan shaded area represents the standard deviations of the retrieved PM2.5 and the color bar indicates the standard deviations of the ground-measured PM2.5. This figure was produced using MATLAB.

Figure 7 presents the histogram of three PM2.5 datasets at 25 stations, namely, ground measurement, estimations from fused and MODIS AOD. Compared with MODIS AOD derived PM2.5, Fused-AOD derived PM2.5 shows a histogram of PM2.5 much closer to that of ground measurements. The correlation coefficient and the mean absolute difference between ground measurements and fused-AOD derived PM2.5 are 0.90 and 3.59%, respectively, which are less than those between ground observations and estimations from MODIS AOD (0.88 and 3.90%).

Figure 7
figure 7

Histogram of the PM2.5 from three datasets, ground-level (left), MODIS AOD derived (center) and Fused AOD derived (right) PM2.5 concentrations. This figure was produced using MATLAB.

The method is also applied in other three seasons. PM2.5 estimation has also been improved as a result of enhancement of AOD sampling by the AOD fusion, although the improvements are less than that in the winter. PM2.5 concentrations estimated from the fused AOD are all closer to ground measurements at 25 stations than those from MODIS AOD only (Table 1). Both mean PM2.5 and its standard deviation (temporal variability) estimated from fused AOD increase to some extent to approach to that of ground measurements.

Table 1 Statistics over 25 sites for retrieved results based on three datasets.

Discussion and Conclusions

We use a statistical analysis to compare the AOD products from MODIS and AERONET Beijing between 2002 and 2014. The correlation analyses indicate that AOD at AERONET site can be used as representative of temporal variability for a larger region around its location. Grid AOD is then estimated from AERONET AOD at Beijing based on a linear regression analysis. The fused-AOD dataset provides a relatively higher temporal coverage in the winter (81%) instead of 50% days by MODIS only retrievals. PM2.5 concentration estimation using MODIS only AOD data resulted in an underestimation of PM2.5. PM2.5 concentrations calculated by the mixed effects model based on improved AOD sampling increased by 0.8, 6.1, 2.7, 6.5 μg m−3 in the spring, summer, autumn and winter, respectively.

The method in this study to fill missing MODIS AOD can supply more AOD data into chemistry models and model assimilations, provide good spatial and temporal coverage of PM2.5 concentrations based on increasing AOD-PM2.5 matchups, and offer better estimations of PM2.5 variability for epidemiological studies. Although only MODIS/Aqua data (13:30 local standard time) are used to generate the correlation map, using this map to calculate AOD at other times of the day may also be promising since the temporal variation of AOD is small. For example, Mishra et al.13 used the linear statistical model derived from MODIS/Aqua to prediction the spatial distribution of aerosol optical depth of MODIS/Terra. The result showed that the statistical model errors were generally below ~12%.

It should be noted that this method is highly dependent on the spatial representativeness of ground site and thereby optimal deployments of ground observations can enlarge the application of data fusion14. Besides, changes in spatial emissions over the domain in the past years may also play a role in the spatial correlation relationships that needs further study.

Data and Methods

PM2.5 Data

Hourly PM2.5 concentrations from December 1st 2013 to November 30th 2014 at 35 sites are available online (http://zx.bjmemc.com.cn/) (Fig. 8) and daily-mean PM2.5 concentrations are calculated from hourly measurements within a day. Automated monitoring systems are installed at each site to measure ambient concentration of SO2, NO2, O3, CO, and PM2.5 and PM10 according to China Environmental Protection Standards. PM2.5 concentrations are measured by the Tapered Element Oscillating Microbalance method (TEOM). The TEOM’s filter is heated to avoid particle-bound water that may result in a slight underestimation of PM2.5 mass concentration owing to volatilization of semi-volatile material20. Inter-comparison of PM2.5 concentrations from the Beijing U.S. Embassy and the nearby Ministry of Environmental Protection site indicated that these two data sets were in good agreement in the temporal variation but the former was slightly higher than the latter since the beta attenuation monitor was used at the Beijing U.S. Embassy21.

Figure 8
figure 8

(a) Annual mean PM2.5 concentrations at 35 sites during Dec. 2013 to Nov. 2104; (b) Spatial distribution of the 10-km MODIS AOD in Beijing overlapped by 3 sunphotometer stations (black triangles) during Dec. 2013 to Nov. 2104. The figure was produced using NCL. The map was created using ArcGIS 10.2 (ESRI Inc. Redlands, California, USA).

Modis AOD

Two Moderate Resolution Imaging Spectroradiometer (MODIS) sensors were launched to sun-synchronous orbits on the Terra (10:30 local standard time) in 1999 and on the Aqua (13:30 local standard time) in 2002. A 2330 km viewing swath provides near-global coverage in 1–2 days. Spatial resolutions vary with bands (from 250 m to 1 km at nadir) and become larger at the edge of the swath, by a factor of 2 along-track and 5 across-track. Three algorithms are applied to retrieve 550 nm AOD: the Deep Blue (DB) and Dark Target (DT) algorithms over land, and the DT over-water algorithm22,23. MODIS retrieves AOD with an estimated uncertainty of ±0.05 ± 0.20 × AOD over the land22. The collection 6.0 AOD datasets are available from https://ladsweb.modaps.eosdis.nasa.gov at a nominal (nadir) spatial resolution of 10 × 10 km. We created gridded AOD covering Beijing (115.2°E-117.6°E, 39.4°N-41.2°N) with a spatial resolution of 0.1° × 0.1° by using 13-year Level 2 Aqua merged DT and DB AOD at 550 nm (2002–2014). Mean AQUA AOD exhibits a strong spatial gradient, with the highest values over the southeast urban districts (Fig. 8).

Aeronet Aod

AERONET is a ground-based internationally federated, globally distributed network of sun photometers. AERONET AOD is derived from direct beam solar measurements at wavelengths from ultraviolet to infrared24. We used the cloud-screened and quality checked level 2.0 AOD product (http://aeronet.gsfc.nasa.gov/)25. Instantaneous AOD at 550 nm at Beijing during 2002–2014 was interpolated from AOD at 440 nm and at 675 nm. AOD products at SDZ and Xianghe (during 2005–2011) were served as the validation datasets for the data fusion (Table 2). Statistics of AODs at Beijing (2002–2014), Xianghe and SDZ (2005–2011) are presented in Table 2. SDZ is one of Chinese Aerosol Research Science Network (CARSNET) stations. The CARSNET uses the same sunphotometer and algorithm as AERONET to retrieve AOD with the comparable accuracy to that of AERONET26,27.

Table 2 Site information of Beijing, Xianghe and SDZ.

Data fusion approach

We establish a linear formula (slope and intercept) on the basis of daily AOD data pairs of AERONET AOD at Beijing site and MODIS gridded AOD within the Beijing Area (115.2°E-117.6°E, 39.4°N-41.2°N) during 2002–2014 at each grid. The analysis is performed based on daily paired AODs in four seasons, i.e., spring (March-April-March); summer (June-July-August); autumn (September-October-November) and winter (December-January-February). Pearson coefficient maps are derived from linear correlation analysis between two variables above (Fig. 2). A threshold value of correlation coefficient (R ≥ 0.5) is set to determine whether AERONET AOD can be used in the estimation of regional PM2.5. For grids with R ≥ 0.5, we use the linear-fit AOD values based on AERONET Beijing to fill missing values of MODIS AOD retrievals.

The mixed effects Model

A mixed effects model to investigate the AOD-PM2.5 relationship is as follows.

$$\begin{array}{c}{{\rm{PM}}}_{{\rm{i}},{\rm{j}}}=({\rm{\alpha }}+{{\rm{u}}}_{{\rm{j}}})+({\rm{\beta }}+{{\rm{v}}}_{{\rm{j}}}){{\rm{AOD}}}_{{\rm{i}},{\rm{j}}}+{{\rm{s}}}_{{\rm{i}}}+{{\rm{\varepsilon }}}_{{\rm{i}},{\rm{j}}}\\ ({{\rm{u}}}_{{\rm{j}}}{{\rm{v}}}_{{\rm{j}}} \sim {\rm{N}}[(00),\sum ])\end{array}$$
(1)

where PMi,j represents PM2.5 value at site i on day j; α and β represent fixed intercept and slope respectively; uj and vj are the random intercept and slope. si ~ N (0, \({{\rm{\sigma }}}_{{\rm{s}}}^{2}\)) and εi,j ~ N (0, σ2) represent the random intercept of site i and the error term at site i on day j. \({{\rm{\sigma }}}_{{\rm{s}}}^{2}\) and σ2 denote the variances for si and εi,j. ∑ is the variance-covariance matrix for the day-specific random effects7. We select the site-specific satellite AOD values for each surface site where it falls within a 10 × 10 km2 grid to collocate PM2.5 concentrations. If there are more than one site within a single 10 × 10 km2 grid, the PM2.5 values of those sites are averaged. With this process, there remain 25 pairs of AOD and PM2.5 data for the model development.

Data availability

The datasets generated during and/or analyzed in the current study are available from the corresponding author on reasonable request.