Performance comparison of LUR and OK in PM2.5 concentration mapping: a multidimensional perspective

Methods of Land Use Regression (LUR) modeling and Ordinary Kriging (OK) interpolation have been widely used to offset the shortcomings of PM2.5 data observed at sparse monitoring sites. However, traditional point-based performance evaluation strategy for these methods remains stagnant, which could cause unreasonable mapping results. To address this challenge, this study employs ‘information entropy’, an area-based statistic, along with traditional point-based statistics (e.g. error rate, RMSE) to evaluate the performance of LUR model and OK interpolation in mapping PM2.5 concentrations in Houston from a multidimensional perspective. The point-based validation reveals significant differences between LUR and OK at different test sites despite the similar end-result accuracy (e.g. error rate 6.13% vs. 7.01%). Meanwhile, the area-based validation demonstrates that the PM2.5 concentrations simulated by the LUR model exhibits more detailed variations than those interpolated by the OK method (i.e. information entropy, 7.79 vs. 3.63). Results suggest that LUR modeling could better refine the spatial distribution scenario of PM2.5 concentrations compared to OK interpolation. The significance of this study primarily lies in promoting the integration of point- and area-based statistics for model performance evaluation in air pollution mapping.

surface in the regions with sparse or missing data and is prone to over-amplify extreme variations due to its reliance on a single factor 8 . Consequently, accurate performance evaluation of LUR modeling and OK interpolation is particularly important for reliable air pollution mapping.
While studies have attempted to promote this work through comparing the performance of LUR, Kriging and air dispersion modeling in estimating PM 10 concentration 15 , further improvements are still needed. Because model performance in these comparative studies was largely determined by similarities of causal mechanisms on air pollution concentrations between locations of test sample sites and training sample sites [15][16][17] . Model reliability is therefore dependent on test sites selected that are subject to evaluation errors 18 .
Information entropy, an area-based statistic indicator that was originally designed to describe the even spatial distribution of energy, has been increasingly used to evaluate the richness of image information. Since air quality concentration varies over space, information entropy has the potential of reflecting this variation based on the raster map of air quality concentrations 19,20 . Compared to traditional point-based statistics, information entropy is an effective index that can uniquely and objectively measure the information amount of a map and evaluate the capacity of this map in disclosing variation details of an element 21,22 .
This study therefore employed area-based information entropy along with traditional point-based statistics to evaluate the performance of LUR modeling and OK interpolation in mapping PM 2.5 concentration in Houston from a multidimensional perspective. In order to better understand the meaning of information entropy values, an external profile analysis is also implemented.
As a large industrialized region in southeast Texas, the Houston metropolitan area covers 10 counties and 26,060 km 2 ( Figure 1). In this study the city serves as a representative urban environment with documented high PM 2.5 pollution rates. Prior works estimated a mean annual particulate concentrations in Houston that range from 9.87 mg/m 3 (minimum) to 14.24 mg/m 3 (maximum) in the metropolitan area.
In the flat landscape, industrial and traffic emissions are the main pollutant sources in the multi-county area of 6 million residents in Houston metropolitan area according to U.S Environmental Protection Agency (EPA) 23 . Therefore, factors that contribute to Houston's PM 2.5 pollution could be land-use type, road traffic, population distribution and geographic elements that represent location and climatic characteristics.
As a result, data used for LUR modeling in this study include the annual PM 2.5 concentration at 17 monitoring sites (10 of them locate in Harris County) from the U.S. EPA's Air Quality System Technology Transfer network 24 . These PM 2.5 concentrations are nearly distributed as normal fashion. Air quality monitoring on these sites complies with EPA's federal reference standard or federal equivalency standard, thus providing valid data for taking official air pollution measurements and quality assurance plans 25 . Land cover map with a spatial resolution of 30 m, road networks and demographic census data are respectively from the U.S. National Land Cover Database 26 , the Environmental Systems Research Institute (ESRI) nationwide street and geocoding databases 27 , and the U.S. Census database 28 . Figure 2 shows the spatial distribution of annual PM 2.5 concentrations in Houston metropolitan area produced by methods of LUR model and OK interpolation. Significant differences in PM 2.5 concentrations can be observed across the covered counties from  Performance comparison based on point-based statistics. The performance of the LUR model and OK interpolation in PM 2.5 concentration estimation is evaluated by using the point-based statistics including absolute error, error rate, RMSE, and paired Ttest. These statistics are calculated by using the typical N-1 cross validation strategy. Results listed in Table 1 show that the error rates of both LUR simulated-and OK interpolated PM 2.5 concen-trations varied among monitoring sites. Whilst the absolute minimum and maximum errors of LUR simulated PM 2.5 concentrations were 0.02 ug/m 3 at site 9 and 2.04 mg/m 3 at site 15/16 with an average absolute error of 0.70 mg/m 3 , that of the OK interpolated PM 2.5 concentrations were 0.01 mg/m 3 at site 15 and 2.07 mg/m 3 at site 13 with an average absolute error of 0.80 mg/m 3 . Moreover, LUR model had an overall higher accuracy of simulated PM 2.5 concentrations compared to the OK interpolated ones although the paired T-test confirmed insignificant difference in site-based error rates between these two methods at P 5 0.65. The LUR simulated-and OK interpolated PM 2.5 concentrations had respectively 5 and 7 sites with absolute error .1.00 mg/m 3 . The maximum error rates of these two methods were 15.95% and 20.34% with an average error rate of 6.13% and 7.01%, respectively. In addition, the RMSE evaluation   results of LUR simulated and OK interpolated PM 2.5 concentrations (i.e. 0.89 and 1.00, respectively) were also consistent with those based on the absolute error and error rate.

PM 2.5 concentration map.
Performance evaluation based on area-based information entropy. Table 2 displays the values of information entropy and the related statistics calculated from the spatial distribution maps simulated by LUR model and interpolated by OK. The information entropy values (i.e. LUR: 7.79 vs. OK: 3.63) in Table 2 indicate that LUR model outperformed OK interpolation in illustrating detailed spatial variations of PM 2.5 concentrations across the Houston metropolitan area. The reliability of information entropy was echoed by the maximum, minimum and average PM 2.5 concentrations simultaneously shown in Table 2. Specifically, the LUR model generated PM 2.5 concentrations (9.57-13.52 mg/m 3 ) were closer to the actual ground observation values (9.87-14.24 mg/m 3 ) than did by OK interpolation (which ranges from 10.08-12.93 mg/m 3 ). Additionally, profile analysis results in Figure 4 further confirmed above findings of information entropy evaluation. It can be observed that, along all four directions, the PM 2.5 concentrations interpolated by the OK method almost first demonstrated an increasing trend and then gradually decreased, while those simulated by LUR model at the same local sites were relatively lower and stable at two ends but higher and fluctuated in the middle. This difference suggests that the spatial distribution scenario of PM 2.5 concentrations could be better refined by LUR modeling rather than by OK interpolation.

Discussion
This study explored the differences in spatial distributions of PM 2.5 concentrations between LUR model and OK interpolation by com-prehensively using point-based statistics and area-based information entropy for the first time. We found that, based on point-based statistics, the two methods produce similar results. However, highlighted significant differences were observed between the two methods based on area level information entropy and confirmed the better performance of LUR relative to OK. Our findings provide new insights for future air pollution research.
The optimal adjusted LUR model in this study has a fitting R 2 of 0.69, which is much higher than that of the OK method (R 2 5 0.38) as well as the results of previous studies (e.g. London, 0.45 to 0.60 29 ; 0.56, 0.73 and 0.50, northern Europe 30 ; Germany, 0.17 31 ). This study applied backward Multiple Linear Regression (MLR) method 30,32,33 to achieve the best LUR model fitting. Due to the limited number of PM 2.5 monitoring sites in the Houston metropolitan area, this study utilized empirical LUR variable values and sampling-site numbers to screen individual modeling variables [34][35][36] and the strategy widely used in previous studies 37,38 . Variables of land use type and road traffic with strong prediction capacity are screened first. Population distribution and variables about distance to sea are then incorporated for model adjustments. Because Houston's PM 2.5 pollution is primarily from diesel emission, oil vehicles, road dust, barbeque, and wood burning 23 , Harris County which is highly urbanized and industrialized experiences relatively higher PM 2.5 concentration, while surrounding areas which are characterized by agricultural land use and fewer road networks have relatively lower PM 2.5 concentrations. This is reflected by the LUR model simulated result, which shows a decreasing trend from Harris County to surrounding areas. It also confirms that the simulation result of the LUR model is closer to the real PM 2.5 spatial distribution compared to that of the OK interpolation as shown by the statistics in Table 1, while the PM 2.5 annual concentration of OK  interpolation was zonally distributed in Houston. And, high concentration areas include the central eastern region of Harris County and the northeast and southwest regions of Houston.
The point-based statistics validation demonstrated no significant differences between the results from LUR model and OK interpolation. However, the LUR model achieved slightly better simulation accuracy than the OK interpolation (e.g. RMSE: 0.89 vs. 1.00). Given the fact that the quality of OK interpolation is dependent on the distribution of monitoring sites, the validation precision of OK interpolated PM 2.5 concentrations at different monitoring sites would certainly vary, with poorest results being at the boundary area due to the insufficient observation data. Inversely, considering more factors such as land use, traffic, population, we believe LUR model is more reliable than OK interpolation, especially for the area without abundant observed PM 2.5 concentrations but sufficient relevant auxiliary factors. Additionally, the point-based statistics validation process of LUR model and OK interpolation in this study is based on the typical N-1 cross validation strategy and should be the 'best' one we can use with discrete monitoring sites.
Furthermore, area-based information entropy evaluation revealed significantly different results between the LUR model and OK interpolation. The annual PM 2.5 concentrations simulated by the LUR model have more spatial variations (greater information entropy values) than the OK interpolation. This is because LUR model integrates additional influencing factors that are closely related with the emission and diffusion of PM 2.5 , such as land use, road traffic, and climatic indicators. These factors strengthened the ability of LUR model in revealing the concentration variations through distinguishing the surrounding geographic differences of divergent positions, especially for areas with limited monitoring sites. These two advantages increased the information richness of the LUR simulated map and reflected the real-world scenario of PM 2.5 concentration. These factors re-confirmed the superiority of information entropy in evaluating an air quality map's capacity in disclosing variations, which could not be achieved by previous air quality mapping studies based on traditional point-based statistics 7,18,39 . According to the urban development pattern (e.g. Harris County is with high volume of traffic and is also the industrial and economic center) and PM 2.5 sources in Houston, we believe the evaluation result based on LUR model is more reliable than OK interpolation.
Like previously reported studies with data from few monitoring sites (i.e. minimum site number is 13) 40,41 , while satisfactory results have been achieved with the data collected from 17 monitoring sites in this study, issues on monitoring sites and the predictor selection still need to be addressed in the future. PM 2.5 concentration estimation with higher accuracy could be achieved with more monitoring sites. Moreover, while this study established a significant LUR model at an acceptable accuracy level using MLR without overestimates, the model's performance definitely could be further enhanced by involving more predictors under sufficient monitoring sites that are evenly distributed in space.
In summary, findings in this study imply that although the pointbased statistics evaluation could accurately reflect a model's performance in mapping air pollution concentration, its evaluation result is often limited by test site locations and their spatial distribution. In regions with densely centralized test sites and training sites, pointbased statistics evaluation methods may overestimate the model accuracy (i.e. better or worse accuracy), and vice versa. Therefore, except for point-based statistics evaluation, the area-based information entropy evaluation proposed in this study is important and necessary for more comprehensive and accurate assessment of the air pollution concentration maps. In other words, the information entropy evaluation clearly confirms that LUR model is more accurate in representing the spatial distribution of annual PM 2.5 concentrations of Houston metropolitan area than the OK interpolation in this study. Additionally, this study implies that the utilization of information entropy is a new measure to effectively evaluate the performance of other exposure models such as dispersion modeling, LUR modeling, and remote sensing based models, for which the spatial resolution is better than OK interpolation. And this could greatly enhance the reliability of findings for future environmental health studies.

Methods
The methodology of this study is composed of three parts: LUR modeling, OK interpolation, and performance comparison between LUR and OK ( Figure 5).
LUR modeling. LUR modeling links the air pollution concentration at a monitoring site with other geographic characteristics of that monitoring site. The modeling is composed of variable extraction and screening, regression model building, and model validation. The variable extraction and screening include selecting geographic elements and extracting characteristic variables of geographic elements.
Considering experiences from previous LUR studies 10,11,30,32,33,[42][43][44] and PM 2.5 pollution sources in Houston, this study utilizes annual PM 2.5 concentrations as the outcome variable and develops predictors of various geographic elements including land use type (X 1 ), road length (X 2 ), distance to road (X 26 ), population density (X 31 ), house density (X 32 ), and distance to sea (X 41 ). Among them, the ''measured values'' of predictors with spatial scaling effect are extracted at 100 m, 300 m, 500 m, 800 m, 1000 m, 1500 m, 2000 m, 2500 m, 3000 m, 3500 m, 4000 m, 4500 m and 5000 m buffering radius due to the unclear 'spatial scale dependency' 33,42,45 . Land use types are reclassified as forest (X 11 ), open space (X 12 ), medium-density urban (X 13 ), highdensity urban (X 14 ) and barren land (X 15 ) with the 11 initial land use types provided by United States Geological Survey. Road traffic data in this study includes highway (X 21 ) major road (X 22 ) local road (X 23 ) minor road (X 24 ) and other road (X 25 ). The entire process is implemented with ArcGIS 10.0. To screen out effective predictors appropriate for LUR modeling in Houston, Pearson coefficient values between all predictors and annual PM 2.5 concentration are calculated with SPSS 19.0. For predictors with spatial scaling effect, the optimal spatial scale of each predictor is defined as the one with calculated maximum Pearson coefficient in a scale range of 100 m to 5000 m. Consequently, the final predictors screened out for LUR modeling in this study are area fraction of land use type including X 11 -5000, X 12 -100, X 13 -100, X 14 -800, X 15 -3000; road traffic including X 22 -100 m, X 23 -300 m, X 24 -3000 m, X 25 -1500 m and X 26 ; and others including X 31 -3000 m, X 32 -1000 m, X 41 .
A predictor-based regression model is established by using a multiple linear regression (MLR) equation (i.e. Equation 1) in this study. The equation is shown as Y~a 0 za 1 X 1 za 2 X 2 za 3 X 3 z::::::za n X n zu ð1Þ where Y is the annual PM 2.5 concentration, Xdenotes independent predictors, a 0 is a constant, a 1 to a n are the regression coefficients for each predictor X, respectively, and u is the random error. An equation group is composed of n groups of observed values Y i , X 1i , X 2 , X 3i , …, i 5 1,2,3..., n. Area fraction of land use type and road traffic are the two major influencing factors of annual PM 2.5 concentration in the Houston area. Thus, this study starts with backward MLR by using the respective type of predictors under an optimal spatial scale as inputs to establish the preliminary optimal models (i.e. model with highest fitting R 2 ) with SPSS 19.0. Thereafter, another round of backward MLRs is conducted for these preliminary optimal models by adding predictors such as population density, house density and distance to sea. As a result, the finalized LUR model is built as Y Conc 5 X 13-100 1 X 31-3000 1 8.357 with significant coefficients at p , 0.05. The adjusted R 2 of this finalized model is 0.69 with VIF values less than 10 to ensure nonmulticollinearity.
Using the finalized LUR model, a continuous surface of annual PM 2.5 concentrations at the resolution of 3 km 3 3 km (Figure 2) within the study area is generated with ArcGIS 10.0 taking into account the point-based high computational cost and spatial similarity of predictors within certain spatial scales (i.e. buffering area size). Specifically, grid points with the 3 km interval across the entire study area are pre-set firstly; then the 'measured values' of predictors in the finalized LUR model at these pre-set points are extracted and used to calculate the annual PM 2.5 concentrations at each pre-set point; these high density estimated PM 2.5 concentrations are used to produce the distribution map of annual PM 2.5 concentrations in the end.
OK interpolation. OK interpolation refers to the linear unbiased optimal estimation of unknown points according to the structural features of known sample points 46 . When the regional variable Z(x) is a constant (m) with unknown mathematical expectations, the OK method can be used for spatial interpolation. The interpolation formula is stated as For the process of OK interpolation, an exploratory data analysis is firstly conducted on the training sample data of 17 monitoring sites within the Houston metropolitan area and external 4 expanding sites outside Houston metropolitan area to determine whether the data follow a normal distribution or are spatially correlated or not. Then, a continuous prediction map of annual PM 2.5 concentration is produced using the 'Spatial Interpolation' wizard of ArcGIS10.0. We did not use trend removal because of the relatively smooth variation of PM 2.5 concentration across these monitoring sites (i.e. 17 1 4). Considering the sparse distribution of monitoring sites in the study area, the searching number of neighborhood points is set as 4.
Point-based statistics calculation. Point-based statistics including absolute error, relative error and root-mean-square error (RMSE) are employed to validate methods of LUR model and OK interpolation in this study by using the commonly N-1 cross validation strategy, which is suitable for limited data samples 12,15 . Following this, this study divides the 17 monitoring sites across the study area into 16 training sites and 1 validation site. The absolute error represents the deviation direction and size of the simulated/estimated concentration from the observed concentration. Relative error and RMSE represent the deviation degree of the simulated/estimated concentration from the observed concentration, which reflect the reliability of the estimation result of the model. The three error indices are calculated according to equations (3)- (5) with larger values indicating lower model accuracy.

E~jO{Sj
ð 3Þ RMSE~ffi where E, E*, and RMSE respectively represent the absolute difference, relative error (i.e. error rate) and the root-mean-square error between observed concentration and estimated concentration. O is observed concentration, S is simulated/estimated concentration and n is sample size.
Area-based information entropy. ''Entropy'' is an indicator that was originally designed to describe the even spatial distribution of energy 19 . It has been expanded to indicate the richness of information in information theories. Given v 5 {X 1 , X 2 ,..., X n }, suppose the probability of X i [n is r i 5 P(X i ), the information entropy of v can be defined as: where X i represents the pixel of an image and P(X i ) is the probability of occurrence of X i . The more heterogeneous X i is, the larger the information entropy of the image will be, indicating more details of the spatial pattern. In this study, information entropy is developed to depict the ability of LUR model and OK interpolation methods in mapping the variation of the annual PM 2.5 concentrations over the entire study area. Specifically, the distribution maps of annual PM 2.5 concentration are firstly produced and reclassified with natural break points. Then, the number of raster grids at each class are summed and divided by the total grid number of the raster map to calculate the probabilities P(X i ). These probabilities are finally used to compute the value of information entropy according to equation (6). The calculations of information entropy for raster maps from LUR model and OK Additionally, a four-direction criterion (Figure 1), namely, east-west (01), southnorth (02), southeast-northwest (03) and southwest-northeast (04) are employed to further confirm the necessity of area-based information evaluation considering factors that possibly caused the heterogeneity of PM 2.5 concentrations. In each direction, PM 2.5 concentrations at 50 randomly distributed sites are separately simulated and interpolated by LUR and OK methods.