Urban and air pollution: a multi-city study of long-term effects of urban landscape patterns on air quality trends

Most air pollution research has focused on assessing the urban landscape effects of pollutants in megacities, little is known about their associations in small- to mid-sized cities. Considering that the biggest urban growth is projected to occur in these smaller-scale cities, this empirical study identifies the key urban form determinants of decadal-long fine particulate matter (PM2.5) trends in all 626 Chinese cities at the county level and above. As the first study of its kind, this study comprehensively examines the urban form effects on air quality in cities of different population sizes, at different development levels, and in different spatial-autocorrelation positions. Results demonstrate that the urban form evolution has long-term effects on PM2.5 level, but the dominant factors shift over the urbanization stages: area metrics play a role in PM2.5 trends of small-sized cities at the early urban development stage, whereas aggregation metrics determine such trends mostly in mid-sized cities. For large cities exhibiting a higher degree of urbanization, the spatial connectedness of urban patches is positively associated with long-term PM2.5 level increases. We suggest that, depending on the city’s developmental stage, different aspects of the urban form should be emphasized to achieve long-term clean air goals.


Data
City boundaries. Our study period spans from the year 2000 to 2014 to keep the data completeness among all data sources. After excluding cities with invalid or missing PM 2.5 or sociodemographic value, a total of 626 cities, with 278 prefecture-level cities and 348 county-level cities, were selected. City boundaries are primarily based on the Global Rural-Urban Mapping Project (GRUMP) urban extent polygons that were defined by the extent of the nighttime lights 48, 49 . Few adjustments were made. First, in the GRUMP dataset, large agglomerations that include several cities were often described in one big polygon. We manually split those polygons into individual cities based on the China Administrative Regions GIS Data at 1:1 million scales 50 . Second, since the 1978 economic reforms, China has significantly restructured its urban administrative/spatial system. Noticeable changes are the abolishment of several prefectures and the promotion of many former county-level cities to prefecture-level cities 51 . Thus, all city names were cross-checked between the year 2000 and 2014, and the mismatched records were replaced with the latest names. PM 2.5 concentration data. The annual mean PM 2.5 surface concentration (micrograms per cubic meter) for each city over the study period was calculated from the Global Annual PM 2.5 Grids at 0.01° resolution 52 . This data set combines Aerosol Optical Depth retrievals from multiple satellite instruments including the NASA Moderate Resolution Imaging Spectroradiometer (MODIS), Multi-angle Imaging SpectroRadiometer (MISR), and the Sea-Viewing Wide Field-of-View Sensor (SeaWiFS). The global 3-D chemical transport model GEOS-Chem is further applied to relate this total column measure of aerosol to near-surface PM 2.5 concentration, and geographically weighted regression is finally used with global ground-based measurements to predict and adjust for the residual PM 2.5 bias per grid cell in the initial satellite-derived values.
Human settlement layer. The urban forms were quantified with the 40-year (1978-2017) record of annual impervious surface maps for both rural and urban areas in China 47,53 . This state-of-art product provides substantial spatial-temporal details on China's human settlement changes. The annual impervious surface maps covering our study period were generated from 30-m resolution Landsat images acquired onboard Landsat 5, 7, and 8 using an automatic "Exclusion/Inclusion" mapping framework 54,55 . The output used here was the binary impervious surface mask, with the value of one indicating the presence of human settlement and the value of zero identifying non-residential areas. The product assessment concluded good performance. The cross-comparison against 2356 city or town locations in GeoNames proved an overall high agreement (88%) and approximately 80% agreement was achieved when compared against visually interpreted 650 urban extent areas in the year 1990, 2000, and 2010.

Control variables.
To provide a holistic assessment of the urban form effects, we included control variables that are regarded as important in influencing air quality to account for the confounding effects.
Four variables, separately population size, population density, and two economic measures, were acquired from the China City Statistical Yearbook 56 (National Bureau of Statistics 2000-2014). Population size is used to control for the absolute level of pollution emissions 41 . Larger populations are associated with increased vehicle usage and vehicle-kilometers travels, and consequently boost tailpipes emissions 5 . Population density is a useful reflector of transportation demand and the fraction of emissions inhaled by people 57 . We also included gross regional product (GRP) and the proportion of GRP generated from the secondary sector (GRP2). The impact of economic development on air quality is significant but in a dynamic way 58 . The rising per capita income due to the concentration of manufacturing industrial activities can deteriorate air quality and vice versa if the stronger economy is the outcome of the concentration of less polluting high-tech industries. Meteorological conditions also have short-and long-term effects on the occurrence, transport, and dispersion of air pollutants [59][60][61] . Temperature affects chemical reactions and atmospheric turbulence that determine the formation and diffusion of particles 62 . Low air humidity can lead to the accumulation of air pollutants due to it is conducive to the adhesion of atmospheric particulate matter on water vapor 63 . Whereas high humidity can lead to wet deposition processes that can remove air pollutants by rainfall. Wind speed is a crucial indicator of atmospheric activity by greatly affect air pollutant transport and dispersion. All meteorological variables were calculated based on China 1 km www.nature.com/scientificreports/ raster layers of monthly relative humidity, temperature, and wind speed that are interpolated from over 800 ground monitoring stations 64 . Based on the monthly layer, we calculated the annual mean of each variable for each year. Finally, all pixels falling inside of the city boundary were averaged to represent the overall meteorological condition of each city.

Methods
Considering the dynamic urban form-air pollution relationship evidenced from the literature review, our hypothesis is: the determinants of PM 2.5 level trends are not the same for cities undergoing different levels of development or in different geographic regions. To test this hypothesis, we first categorized city groups following (1) social-economic development level, (2) spatial autocorrelation relationship, and (3) population size. We then assessed the relationship between urban form and PM 2.5 level trends by city groups. Finally, we applied the panel data models to different city groups for hypothesis testing and key determinant identification ( Fig. 1).

Calculation of urban form metrics.
Based on the previous knowledge [65][66][67] , fifteen landscape metrics falling into three categories, separately area, shape, and aggregation, were selected. Those metrics quantify the compositional and configurational characteristics of the urban landscape, as represented by urban expansion, urban shape complexity, and compactness (Table 1). Area metrics gives an overview of the urban extent and the size of urban patches that are correlated with PM 2.5 20 . As an indicator of the urbanization degree, total area (TA) typically increases constantly or remains stable, because the urbanization process is irreversible. Number of patches (NP) refers to the number of discrete parcels of urban settlement within a given urban extent and Mean Patch Size (AREA_MN) measures the average patch size. Patch density (PD) indicates the urbanization stages. It usually increases with urban diffusion until coalescence starts, after which decreases in number 66 . Largest Patch Index (LPI) measures the percentage of the landscape encompassed by the largest urban patch.
The shape complexity of urban patches was represented by Mean Patch Shape Index (SHAPE_MN), Mean Patch Fractal Dimension (FRAC_MN), and Mean Contiguity Index (CONTIG_MN). The greater irregularity the landscape shape, the larger the value of SHAPE_MN and FRAC_MN. CONTIG_MN is another method of assessing patch shape based on the spatial connectedness or contiguity of cells within a patch. Larger contiguous patches will result in larger CONTIG_MN.
Aggregation metrics measure the spatial compactness of urban land, which affects pollutant diffusion and dilution. Mean Euclidean nearest-neighbor distance (ENN_MN) quantifies the average distance between two patches within a landscape. It decreases as patches grow together and increases as the urban areas expand. Landscape Shape Index (LSI) indicates the divergence of the shape of a landscape patch that increases as the landscape becomes increasingly disaggregated 68 . Patch Cohesion Index (COHESION) is suggestive of the connectedness degree of patches 69 . Splitting Index (SPLIT) and Landscape Division Index (DIVISION) increase as the separation of urban patches rises, whereas, Mesh Size (MESH) decreases as the landscape becomes more fragmented. Aggregation Index (AI) measures the degree of aggregation or clumping of urban patches. Higher values of continuity indicate higher building densities, which may have a stronger effect on pollution diffusion. www.nature.com/scientificreports/ The detailed descriptions of these indices are given by the FRAGSTATS user's guide 70 . The calculation input is a layer of binary grids of urban/nonurban. The resulting output is a table containing one row for each city and multiple columns representing the individual metrics.

Division of cities.
Division based on the socioeconomic development level. The socioeconomic development level in China is uneven. The unequal development of the transportation system, descending in topography from the west to the east, combined with variations in the availability of natural and human resources and industrial infrastructure, has produced significantly wide gaps in the regional economies of China. By taking both the economic development level and natural geography into account, China can be loosely classified into Eastern, Central, and Western regions. Eastern China is generally wealthier than the interior, resulting from closeness to coastlines and the Open-Door Policy favoring coastal regions. Western China is historically behind in economic development because of its high elevation and rugged topography, which creates barriers in the transportation infrastructure construction and scarcity of arable lands. Central China, echoing its name, is in the process of economic development. This region neither benefited from geographic convenience to the coast nor benefited from any preferential policies, such as the Western Development Campaign.
Division based on spatial autocorrelation relationship. The second type of division follows the fact that adjacent cities are likely to form air pollution clusters due to the mixing and diluting nature of air pollutants 71 , i.e., cities share similar pollution levels as its neighbors. The underlying processes driving the formation of pollution hot spots and cold spots may differ. Thus, we further divided the city into groups based on the spatial clusters of PM 2.5 level changes.
Local indicators of spatial autocorrelation (LISA) was used to determine the local patterns of PM 2.5 distribution by clustering cities with a significant association. In the presence of global spatial autocorrelation, LISA indicates whether a variable exhibits significant spatial dependence and heterogeneity at a given scale 72 . Practically, LISA relates each observation to its neighbors and assigns a value of significance level and degree of spatial Table 1. Definition and description of the urban form metrics. A , total landscape area; a j , area of patch j ; C jr , contiguity value for pixel r in patch j ; E , total length of the edge in landscape in terms of cell surfaces; g ii , number of like adjacencies between pixels of urban patch i based on the single-count method; max → g ii , maximum number of like adjacencies between pixels of urban patch;n , number of urban patches; p j , perimeter of patch j;v , sum of the values in a 3-by-3 cell template; . denotes p value < 0.1; *p value < 0.05; **p value < 0.01; ***p value < 0.001. Cells in grey shadow show descriptive statistics of the corresponding variables in 2014. www.nature.com/scientificreports/ autocorrelation, which is calculated by the similarity in variable z between observation i and observation j in the neighborhood of i defined by a matrix of weights w ij 7,73 : where I i is the Moran's I value for location i ; σ 2 is the variance of variable z ; z is the average value of z with the sample number of n . The weight matrix w ij is defined by the k-nearest neighbors distance measure, i.e., each object's neighborhood consists of four closest cites. The computation of Moran's I enables the identification of hot spots and cold spots. The hot spots are highhigh clusters where the increase in the PM 2.5 level is higher than the surrounding areas, whereas cold spots are low-low clusters with the presence of low values in a low-value neighborhood. A Moran scatterplot, with x-axis as the original variable and y-axis as the spatially lagged variable, reflects the spatial association pattern. The slope of the linear fit to the scatter plot is an estimation of the global Moran's I 72 (Fig. 2). The plot consists of four quadrants, each defining the relationship between an observation 74 . The upper right quadrant indicates hot spots and the lower left quadrant displays cold spots 75 .
Division based on population size. The last division was based on population size, which is a proven factor in changing per capita emissions in a wide selection of global cities, even outperformed land urbanization rate [77][78][79] . We used the 2014 urban population to classify the cities into four groups based on United Nations definitions 80 : (1) large agglomerations with a total population larger than 1 million; (2) mid-sized cities, 500,000-1 million; (3) small cities, 250,000-500,000, and (4) very small cities, 100,000-250,000.
Panel data analysis. The panel data analysis is an analytical method that deals with observations from multiple entities over multiple periods. Its capacity in analyzing the characteristics and changes from both the time-series and cross-section dimensions of data surpasses conventional models that purely focus on one dimension 81,82 . The estimation equation for the panel data model in this study is given as: where the subscript i and t refer to city and year respectively. β 0 is the intercept parameter and β 1 − β 18 are the estimates of slope coefficients. ε is the random error. All variables are transformed into natural logarithms.
Two methods can be used to obtain model estimates, separately fixed effects estimator and random effects estimator. The fixed effects estimator assumes that each subject has its specific characteristics due to inherent individual characteristic effects in the error term, thereby allowing differences to be intercepted between subjects.  www.nature.com/scientificreports/ The random effects estimator assumes that the individual characteristic effect changes stochastically, and the differences in subjects are not fixed in time and are independent between subjects. To choose the right estimator, we run both models for each group of cities based on the Hausman specification test 83 . The null hypothesis is that random effects model yields consistent and efficient estimates 84 : H 0 : E(ε i |X it ) = 0 . If the null hypothesis is rejected, the fixed effects model will be selected for further inferences. Once the better estimator was determined for each model, one optimal panel data model was fit to each city group of one division type. In total, six, four, and eight runs were conducted for socioeconomic, spatial autocorrelation, and population division separately and three, two, and four panel data models were finally selected. Spatially, the changes in PM 2.5 levels exhibit heterogeneous patterns across cities (Fig. 3b). According to the socioeconomic level division (Fig. 3a), the Eastern, Central, and Western region experienced a 38.6, 35.3, and 25.5 µg/m 3 increase in annual PM 2.5 mean , separately, and the difference among regions is significant according to the analysis of variance (ANOVA) results (Fig. 4a). When stratified by spatial autocorrelation relationship www.nature.com/scientificreports/ (Fig. 3c), the differences in PM 2.5 changes among the spatial clusters are even more dramatic. The average PM 2.5 increase in cities belonging to the high-high cluster is approximately 25 µg/m 3 , as compared to 5 µg/m 3 in the low-low clusters (Fig. 4b). Finally, cities at four different population levels have significant differences in the changes of PM 2.5 concentration (Fig. 3d), except for the mid-sized cities and large city agglomeration (Fig. 4c).

Results
The effects of urban forms on PM 2.5 changes. The Hausman specification test for fixed versus random effects yields a p value less than 0.05, suggesting that the fixed effects model has better performance. We fit one panel data model to each city group and built nine models in total. All models are statistically significant at the p < 0.05 level and have moderate to high predictive power with the R 2 values ranging from 0.63 to 0.95, which implies that 63-95% of the variation in the PM 2.5 concentration changes can be explained by the explanatory variables ( Table 2). The urban form-PM 2.5 relationships differ distinctly in Eastern, Central, and Western China. All models reach high R 2 values. Model for Eastern China (refer to hereafter as Eastern model) achieves the highest R 2 (0.90), and the model for the Western China (refer to hereafter as Western model) reaches the lowest R 2 (0.83). The shape metrics FRAC and CONTIG are correlated with PM 2.5 changes in the Eastern model, whereas the area metrics AREA demonstrates a positive effect in the Western model. In contrast to the significant associations www.nature.com/scientificreports/ between shape, area metrics and PM 2.5 level changes in both Eastern and Western models, no such association was detected in the Central model. Nonetheless, two aggregation metrics, LSI and AI, play positive roles in determining the PM 2.5 trends in the Central model. For models built upon the LISA clusters, the H-H model (R 2 = 0.95) reaches a higher fitting degree than the L-L model (R 2 = 0.63). The estimated coefficients vary substantially. In the H-H model, the coefficient of CONTIG is positive, which indicates that an increase in CONTIG would increase PM 2.5 pollution. In contrast, no shape metrics but one area metrics AREA is significant in the L-L model.
The results of the regression models built for cities at different population levels exhibit a distinct pattern. No urban form metrics was identified to have a significant relationship with the PM 2.5 level changes in groups of very small and mid-sized cities. For small size cities, the aggregation metrics COHESION was positively associated whereas AI was negatively related. For mid-sized cities and large agglomerations, CONTIG is the only significant variable that is positively related to PM 2.5 level changes.

Discussion
Urban form is an effective measure of long-term PM 2.5 trends. All panel data models are statistically significant regardless of the data group they are built on, suggesting that the associations between urban form and ambient PM 2.5 level changes are discernible at all city levels. Importantly, these relationships are found to hold when controlling for population size and gross domestic product, implying that the urban landscape patterns have effects on long-term PM 2.5 trends that are independent of regional economic performance. These findings echo with the local, regional, and global evidence of urban form effect on various air pollution types 5,14,21,22,24,39,78 .
Although all models demonstrate moderate to high predictive power, the way how different urban form metrics respond to the dependent variable varies. Of all the metrics tested, shape metrics, especially CONTIG has the strongest effect on PM 2.5 trends in cities belonging to the high-high cluster, Eastern, and large urban agglomerations. All those regions have a strong economy and higher population density 86 . In the group of cities that are moderately developed, such as the Central region, as well as small-and mid-sized cities, aggregation metrics play a dominant negative role in PM 2.5 level changes. In contrast, in the least developed cities belonging to the low-low cluster regions and Western China, the metrics describing size and number of urban patches are the strongest predictors. AREA and NP are positively related whereas TA is negatively associated.
The impacts of urban form metrics on air quality vary by urbanization degree. Based on the above observations, how urban form affects within-city PM 2.5 level changes may differ over the urbanization stages. We conceptually summarized the pattern in Fig. 5: area metrics have the most substantial influence on www.nature.com/scientificreports/ air pollution changes at the early urban development stage, and aggregation metrics emerge at the transition stage, whereas shape metrics affect the air quality trends at the terminal stage. The relationship between urban form and air pollution has rarely been explored with such a wide range of city selections. Most prior studies were focused on large urban agglomeration areas, and thus their conclusions are not representative towards small cities at the early or transition stage of urbanization. Not surprisingly, the area metrics, which describe spatial grain of the landscape, exert a significant effect on PM 2.5 level changes in small-sized cities. This could be explained by the unusual urbanization speed of small-sized cities in the Chinese context. Their thriving mostly benefited from the urbanization policy in the 1980s, which emphasized industrialization of rural, small-and mid-sized cities 87 . With the large rural-to-urban migration and growing public interest in investing real estate market, a side effect is that the massive housing construction that sometimes exceeds market demand. Residential activities decline in newly built areas of smaller cities in China, leading to what are known as ghost cities 88 . Although ghost cities do not exist for all cities, high rate of unoccupied dwellings is commonly seen in cities under the prefectural level. This partly explained the negative impacts of TA on PM 2.5 level changes, as an expanded while unoccupied or non-industrialized urban zones may lower the average PM 2.5 concentration within the city boundary, but it doesn't necessarily mean that the air quality got improved in the city cores.
Aggregation metrics at the landscape scale is often referred to as landscape texture that quantifies the tendency of patch types to be spatially aggregated; i.e., broadly speaking, aggregated or "contagious" distributions. This group of metrics is most effective in capturing the PM 2.5 trends in mid-sized cities (population range 25-50 k) and Central China, where the urbanization process is still undergoing. The three significant variables that reflect the spatial property of dispersion, separately landscape shape index, patch cohesion index, and aggregation index, consistently indicate that more aggregated landscape results in a higher degree of PM 2.5 level changes. Theoretically, the more compact urban form typically leads to less auto dependence and heavier reliance on the usage of public transit and walking, which contributes to air pollution mitigation 89 . This phenomenon has also been observed in China, as the vehicle-use intensity (kilometers traveled per vehicle per year, VKT) has been declining over recent years 90 . However, VKT only represents the travel intensity of one car and does not reflect the total distance traveled that cumulatively contribute to the local pollution. It should be noted that the private light-duty vehicle ownership in China has increased exponentially and is forecast to reach 23-42 million by 2050, with the share of new-growth purchases representing 16-28% 90 . In this case, considering the increased total distance traveled, the less dispersed urban form can exert negative effects on air quality by concentrating vehicle pollution emissions in a limited space.
Finally, urban contiguity, observed as the most effective shape metric in indicating PM 2.5 level changes, provides an assessment of spatial connectedness across all urban patches. Urban contiguity is found to have a positive effect on the long-term PM 2.5 pollution changes in large cities. Urban contiguity reflects to which degree the urban landscape is fragmented. Large contiguous patches result in large CONTIG_MN values. Among the 626 cities, only 11% of cities experience negative changes in urban contiguity. For example, Qingyang, Gansu is one of the cities-featuring leapfrogs and scattered development separated by vacant land that may later be filled in as the development continues (Fig. 6). Most Chinese cities experienced increased urban contiguity, with less fragmented and compacted landscape. A typical example is Shenzhou, Hebei, where CONTIG_MN rose from 0.27 to 0.45 within the 14 years. Although the 13 counties in Shenzhou are very far scattered from each other, each county is growing intensively internally rather than sprawling further outside. And its urban layout is thus more compact (Fig. 6). The positive association revealed in this study contradicts a global study indicating that www.nature.com/scientificreports/ cities with highly contiguous built-up areas have lower NO 2 pollution 22 . We noticed that the principal emission sources of NO 2 differ from that of PM 2.5. NO 2 is primarily emitted with the combustion of fossil fuels (e.g., industrial processes and power generation) 6 , whereas road traffic attributes more to PM 2.5 emissions. Highly connected urban form is likely to cause traffic congestion and trap pollution inside the street canyon, which accumulates higher PM 2.5 concentration. Computer simulation results also indicate that more compact cities improve urban air quality but are under the premise that mixed land use should be presented 18 . With more connected impervious surfaces, it is merely impossible to expect increasing urban green spaces. If compact urban development does not contribute to a rising proportion of green areas, then such a development does not help mitigating air pollution 41 .

Conclusions
This study explores the regional land-use patterns and air quality in a country with an extraordinarily heterogeneous urbanization pattern. Our study is the first of its kind in investigating such a wide range selection of cities ranging from small-sized ones to large metropolitan areas spanning a long time frame, to gain a comprehensive insight into the varying effects of urban form on air quality trends. And the primary insight yielded from this study is the validation of the hypothesis that the determinants of PM 2.5 level trends are not the same for cities at various developmental levels or in different geographic regions. Certain measures of urban form are robust predictors of air quality trends for a certain group of cities. Therefore, any planning strategy aimed at reducing air pollution should consider its current development status and based upon which, design its future plan. To this end, it is also important to emphasize the main shortcoming of this analysis, which is generally centered around the selection of control variables. This is largely constrained by the available information from the City Statistical Yearbook. It will be beneficial to further polish this study by including other important controlling factors, such as vehicle possession. www.nature.com/scientificreports/