Heterogeneity and the determinants of PM2.5 in the Yangtze River Economic Belt

Haze has reached epidemic levels in many Chinese cities in recent years. Few studies have explored the determinants and heterogeneity of PM2.5. This paper investigates the spatiotemporal characteristics of PM2.5 through spatial analytical methods based on aerosol optical depth data from the Yangtze River Economic Belt (YREB) between 2000 and 2017. Geographically weighted regression and geodetector models were applied to assess the heterogeneity of key factors influencing PM2.5. The results indicate that the annual concentrations of PM2.5 in the YREB were 23.49–37.37 μg/m3, with an initial increase and a later decrease. PM2.5 pollution showed a diagonal high spatial distribution pattern in the northeast and a low spatial distribution in the southwest, as well as a noticeable spatial convergence. The spatial variability of PM2.5 was enlarged, and its main fractal dimension was in the northeast-southwest direction. There were clear spatiotemporal variations in the impacts of natural and anthropogenic factors on PM2.5. Our findings contribute to a better understanding of the impact mechanisms of PM2.5 and the geographic factors that form persistent and highly polluted areas and imply that more specific coping strategies need to be implemented in various areas toward successful particulate pollution prevention and control.

www.nature.com/scientificreports/ such as traffic fumes, industrial activities, soil dust, biomass, and coal combustion, and secondary sources of gaseous pollutants (such as SO 2 , NOx, and NH 3 ) formed by complex chemical reactions 10,[16][17][18] . Thus, a local PM 2.5 concentration change is the result of the combined action of natural and human factors 19,20 . Natural conditions, such as meteorology, topography, and vegetation, play important roles in the generation, accumulation, transfer, diffusion, and settlement of PM 2.5 and profoundly affect its local concentration 20,21 . With respect to the influence of socioeconomic factors related to anthropogenic activities on the distribution of PM 2.5 , studies suggest that energy-intensive economic growth and nonecological urbanization increase PM 2.5 concentrations, implying that economic development, urbanization, industrialization, land use, and energy use mixtures and efficiency can affect urban air quality 13,[22][23][24] . These studies provide many insights into urban particle pollution from the perspectives of artificial and natural conditions. The Yangtze River Economic Belt (YREB) is one of China's most crucial economic and ecological corridors that connects three national urban agglomerations of the Sichuan-Chongqing, the middle reaches of the Yangtze River, and the Yangtze River Delta (YRD). The YREB accounts for more than 40% of China's population and GDP, and its unique geographical advantages and vast economic hinterland make it the region with the greatest economic growth potential in the next 30 years 25 . However, urban air quality along the Yangtze has been deteriorating and has suffered from varying degrees of haze pollution due to continuous and intense industrial and human activities 26 . Recently, some areas in the middle-lower reaches of the Yangtze River have had more than 100 haze d year −1 , with some cities even exceeding 200 d year −127 . In the upper reaches of the Yangtze River, the ecological environment is relatively vulnerable as economic development is scaled up 28 . The alleviation of haze pollution involves the integrity of the ecological and environmental system and the quality of local people's lives 9 . In 2014, the development of the YREB was given national strategy status, leading to the fact that potential conflicts between economic growth and environmental conservation may be more prominent than ever before 29 . In 2018, President Xi emphasized the protection of the eco-environmental status of the YREB. Hence, the ecoenvironmental conservation of the Yangtze River is a priority for the government of China 25 .
Until now, studies on the environmental problems of the YREB have mainly focused on the macro-level, including environmental quality and risk assessment, a low-carbon economy, and a sustainable development strategy 9,28 . Studies on environmental problems at the micro-level focus more on water pollution, while studies on air pollution are still insufficient 27 . As the China National Environmental Monitoring Centre (CNEMC) did not include PM 2.5 in its monitoring index system until 2012, relatively few studies focused on PM 2.5 in the YREB prior to 2012 30 . The CNEMC has begun widespread real-time monitoring of PM 2.5 since 2013. Furthermore, the use of remote sensing data for long-term time series research was limited to local areas and specific cities, and studies on factors of influence were mostly in terms of the overall perspective 31 . Obvious spatial differences in the natural conditions and socioeconomic development of various regions exist, and hence, a global analysis cannot reveal the spatial heterogeneity of the effects of various factors 32 . On the other hand, the majority of research has not conducted spatial analysis from a geographical viewpoint. Although some research has considered the spatial dependence of PM 2.5 pollution, it has still neglected the spatial differentiation of the PM 2.5 level at the urban agglomeration scale and lacked analysis of the interactive mechanism between factors, which may have led to incomplete and biased conclusions 33 .
In this study, we attempt to fill the aforementioned knowledge gaps. We consider 130 cities within the YREB as the research area and employ geostatistical and spatial analytical methods to summarize the spatiotemporal evolutionary features of PM 2.5 . Furthermore, we adopt a geographically weighted regression model and geographical detectors to quantitatively analyze the spatial differentiation and interactive effects of socioeconomic and natural conditions on PM 2.5 during various periods. Thus, our study is significant in terms of meeting the demand for air quality improvements in the YREB and sheds light on implementing effective urban PM 2.5 pollution abatement policies.

Data and methods
Study area. The YREB spans the eastern, central, and western regions of China, including nine provinces (Zhejiang, Jiangsu, Jiangxi, Anhui, Hubei, Hunan, Sichuan, Yunnan, and Guizhou) and two municipalities (Shanghai and Chongqing) (Fig. 1). The YREB comprises an area of 2.05 × 10 6 km 2 , accounting for 21.3% of the country's land area. Moreover, its economy and population density are 6.2 and 4.5 times the national average, respectively, establishing its incomparable role in China's development strategies 8 . However, the YREB is an ecologically vulnerable zone, and hence, assessing PM 2.5 pollution in the region is of great strategic significance for national ecological security and regional sustainable development 28 . Data acquisition and processing. Our data sources have four components: (1) PM 2.5 concentration data from satellite retrievals. The annual concentrations of PM 2.5 were obtained from the global surface raster with a resolution of 0.01° × 0.01°, which was released by the Atmospheric Composition Analysis Group (ACAG) (https:// sites. wustl. edu/ acag/). This PM 2.5 remote sensing inversion dataset has the largest global coverage, the longest time span, and the highest accuracy, and it has been widely verified and effectively applied in China 34 . To verify the accuracy of the dataset in the YREB, we calculated the annual concentration of each station through real-time monitoring data of the YREB from 2015 to 2017 obtained from the CNEMC (http:// www. cnemc. cn) and correlated it with the PM 2.5 concentration from the remote sensing inversion data at the corresponding position. The correlation coefficient was 0.82 and significant, indicating a strong correlation and consistency between these datasets. (2) Data on natural parameters include wind, precipitation, vegetation, and topography 21,35 . The normalized differential vegetation index (NDVI) is the best indicator of vegetation coverage and growth status, and the original data were derived from the National Aeronautics and Space Administration (NASA) (https:// www. nasa. gov/), while other supporting data were obtained from the Resource and Environment Data Cloud (3) Socioeconomic data includes per capita GDP, population density, economic density, urbanization rate, industrial structure, and energy consumption 33,36 . Studies have shown that there is a significant linear correlation between nighttime light data from the Defense Meteorological Satellite Program/ Operational Line Scanner (DMSP/OLS) and energy consumption 37 . We converted the light intensity to gray pixel values and used the sum of all DMSP/OLS raster gray values in each region as an indicator of energy consumption in the region. The DMSP/OLS dataset was obtained from the National Oceanic and Atmospheric Administration (NOAA) (https:// www. ngdc. noaa. gov), and the remaining data were collected from the China Urban Statistics Yearbook published by the National Bureau of Statistics (http:// www. stats. gov. cn/). (4) Geographic information data includes the spatial vector map, which was derived from the National Catalog Service for Geographic Information (http:// www. webmap. cn).
As multicollinearity in the selected variables may cause information redundancy, we adopted the variance inflation factor (VIF) for multicollinearity diagnosis with a threshold of 5. We excluded the indicators of economic density and energy consumption because their VIFs were greater than 5. Thus, 8 explanatory variables were used in the model, including per capita GDP (pgdp), population density (popd), urbanization rate (urba), secondary industry share (indu), annual wind speed (wind), annual precipitation (prec), NDVI (ndvi), and topographic relief (topo). To reduce data heteroscedasticity, all the indicators were treated with Z-standardization.

Methods
Spatial autocorrelation analysis. Tobler's first law of geography states that everything is related to everything else, but nearby things are more related than distant things 15 . Therefore, we adopted the classical spatial autocorrelation method to quantitatively measure the spatial dependence of PM 2.5 in neighboring regions. Global Moran's I is written as 38 : where x i and x j are the observations of spatial units i and j, respectively; x is the mean of n locations; and W ij is a spatial weight matrix. I ∈ [− 1, 1], where I > 0 indicates positive correlation, I < 0 indicates negative correlation, and I = 0 indicates mutual independence.

Spatial variogram analysis.
The spatial variogram is a geostatistical method to describe the structure and randomness of regionalized variables and is often used to effectively measure the spatial structure characteristics and variation pattern of geographical variables 39 . The variogram is expressed as: is the sample size with separation distance λ.
There are three major parameters derived from the variogram model [39][40][41] . The nugget parameter (C 0 ) is a random spatial variance; the partial sill parameter (C) is a structural spatial variance; the sill parameter (C 0 + C) represents the total degree of spatial variation. The nugget effect (C 0 /(C 0 + C)) indicates whether regional or local factors are more important for PM 2.5 . If the nugget effect is less than 0.25, it indicates strong spatial correlation; if it is between 0.25 and 0.75, it shows moderate spatial correlation; and if it is greater than 0.75, it indicates weak spatial correlation 41 . The range parameter (A 0 ) represents the maximum spatial distance of the correlation.
The fractal dimension is another important parameter that characterizes the variogram, and its value is determined by the relationship between the variogram γ(λ) and distance λ: where D is the slope of the double logarithm linear regression equation; the higher the value is, the higher the heterogeneity caused by the spatial autocorrelation. The closer the value is to 2, the more balanced the spatial distribution.

Geographically weighted regression modeling (GWR). Geographically weighted regression (GWR)
is a spatial regression model based on the idea of local smoothness 42 . This technique constructs an independent equation for each unit in the study area and incorporates the spatial attributes of data into the regression model so that the relationship between variables can change with the change in spatial location, thus reflecting the spatial nonstationarity of parameters in different regions. Its formula is as follows 16 : where x ip is a dimensional interpretation variable matrix; (ω i , α i ) represents the longitude and latitude coordinates at the ith observation point; φ p (ω i , α i ) is the regression coefficient of the ith observation point; and ε i is a random error term.
Geographical detector technique. The geographical detector technique is a set of statistical methods to detect spatially stratified heterogeneity and reveal the driving forces behind it 43 . It can detect the possible causal relationship between variables by verifying the consistency of the spatial distribution of two variables 44 . The explanatory power of the factor is measured by the q value, and its expression is as follows: where L refers to the strata of variable Y (PM 2.5 ) or factor X; N h and σ 2 h are the number of units and variance of strata h, respectively; N and σ 2 are the total number of units and variance, respectively;q ∈ [0, 1] , and the greater the value is, the stronger the explanatory power of this factor is.
The purpose of interactive detection is to assess whether the factors X 1 and X 2 work together to increase or decrease the explanatory power on Y or whether the impact of these factors on Y is independent. The evaluation method first calculates the q value of the two factors X 1 and X 2 acting on Y, namely, q(X 1 ) and q(X 2 ), and then calculates the q value for their interaction, namely, q(X 1 ∩ X 2 ); finally, q(X 1 ) and q(X 2 ) are compared with q(X 1 ∩ X 2 ). The discriminant criteria can be divided into 5 categories (Table 1). Table 1. Types of interaction between two covariates. Min(q(X 1 ), q(X 2 )) is the minimum value between q(X 1 ) and q(X 2 ); Max(q(X 1 ), q(X 2 )) is the maximum value between q(X 1 ) and q(X 2 ); q(X 1 ) + q(X 2 ) is the sum of q(X 1 ) and q(X 2 ); q(X 1 ∩ X 2 ) is the interaction between q(X 1 ) and q(X 2 ).

Diagram
Criterion Interaction q(X 1 ∩ X 2 ) < Min(q(X 1 ), q(X 2 )) Nonlinear weakening Min(q(X 1 ), q(X 2 )) < q(X 1 ∩ X 2 ) < Max(q(X 1 ), q(X 2 )) Univariate nonlinear weakening q(X 1 ∩ X 2 ) > Max(q(X 1 ), q(X 2 )) Bivariate enhancement 93%. This improvement may be due to the effects of the national tenth 5-year plan on controlling the total emissions of major pollutants, adjusting the industrial structure, and establishing a monitoring, statistics, and assessment system for energy conservation and pollution emission reduction. It was further reinforced by the implementation of the Action Plan for Air Pollution Prevention and Control in 2013 (http:// www. mee. gov. cn). Figure 2a shows that the sliding interval of the global Moran's I value over the years was [0.825, 0.894], and all the values were significant at the 99% level. This indicates that the PM 2.5 distribution was not random but was a significant spatial agglomeration. We pursued the association of the evolution rules of spatial correlation with distance and found that the distance threshold to maintain spatial correlation was approximately 870 km (Fig. 2b). Within this spatial range, PM 2.5 had significant positive interaction effects, which increased with the shortening of distance. Figure 3 displays the spatial patterns and evolution of PM 2.5 in the YREB from 2000 to 2017. Its main features are as follows: (1) cities with an annual PM 2.5 level of less than 15 μg/m 3 were mainly concentrated in ethnic minority areas, where the environmental conditions were relatively good. However, air quality continuously deteriorated, albeit at low levels, which needs attention. (2) The PM 2.5 level was higher in the lower reaches than in the upper reaches and higher on the north bank than on the south bank, presenting a diagonal spatial distribution pattern with an obvious lowland plain directivity. (3) Urban economic activities and population density were closely related to PM 2.5 in three centers, namely, the Cheng-Yu area, the Wuhan metropolitan area, and the northern Anhui-Jiangsu region.

Spatiotemporal variation characteristics of PM 2.5 . The value of the variogram increased with increas-
ing separation distance, indicating that the spatial autocorrelation of PM 2.5 changed from strong to weak with increasing distance (Table 2). During 2000-2017, the variation range was 625-738 km, and it showed an overall   www.nature.com/scientificreports/ upward trend, implying that the spatial correlation of PM 2.5 was partly expanded in scope. In addition, the nugget effect indicated that regional-scale factors are more important for the distribution of PM 2.5 . In terms of the fractal dimension (Table 3), the isotropic dimension continuously decreased from 1.536 in 2000 to 1.453 in 2017, indicating that the spatial difference in PM 2.5 was continuously expanding. The northeastsouthwest direction had the greatest goodness of fit and the smallest fractal dimension and showed a downward trend. This result denoted that the spatial variation in PM 2.5 in this direction was continuously strengthened, making it the main direction in terms of spatial difference. The southeast-northwest fractal dimension was the largest, and its decisive coefficient continued to decrease, showing that the spatial difference in PM 2.5 in this direction continued to weaken and remained relatively evenly balanced. We conducted 3D-kriging interpolation, which further depicted the spatial distribution and evolution morphology of PM 2.5 (Fig. 4). It was evident that the spatial pattern steadily transitioned from a gradient differentiation to a relatively balanced structure that formed a trend indicating that the middle-lower reaches of the Yangtze River drove the whole basin PM 2.5 level to increase.

Analysis models of the factors of influence
GWR model results. The GWR model fitting results are shown in Table 4, in which the adjusted R 2 values are above 0.9, indicating good fitting performance.
The regression coefficients of socioeconomic factors, such as per capita GDP, population density, urbanization rate, and industrial structure, were mainly positive. Among them, population density had the largest impact on PM 2.5 and the most obvious spatial difference, followed by per capita GDP, while the coefficients of urbanization and industrial structure were relatively small. The regression coefficients of natural factors, such as wind, precipitation, vegetation, and topography, were distributed in positive and negative intervals, and the instability was striking in different years. Among them, the coefficient of the distribution interval for topographic relief was the longest, indicating that spatial heterogeneity was the largest.
Spatial heterogeneity of factors of influence. The coefficient of anthropogenic factors increased, signifying that extensive economic development and intensive human activities aggravated haze pollution. Existing studies have argued that economic growth strongly correlates with regional environmental pollution, but the relationship between GDP and PM 2.5 is significantly different in various regions 19 . Per capita GDP had positive impacts on PM 2.5 in the middle-upper reaches of the Yangtze River (Fig. 5a-c). This implied that PM 2.5 in economically underdeveloped areas was more sensitive to economic development and that the economic growth of these regions came at the cost of the environment. Some cities in the YRD showed negative correlation effects, www.nature.com/scientificreports/ which indicated that the development planning in the above areas was relatively good. With technological progress, industrial upgrading and economic development, these areas were essentially coordinated with the surroundings.
The coefficient of population density showed approximately the same spatial distribution at the three time nodes (Fig. 5d-f), all of which increased from the coastal areas to the inland areas, and among them, the east-west difference was obvious in 2017. This may be because of the increase in urban traffic flow and production, with a higher population density contributing to an increase in the local PM 2.5 level. The control of air pollution in the middle-upper reaches of the Yangtze Basin was still weak, thereby making the impact of population density more significant. Some studies have held that an increase in population density may have an agglomeration effect in promoting regional technical progress, thus facilitating a reduction in the local PM 2.5 level 14 . The technical advances brought by the agglomeration effect for the population size in the YREB were not significant. This may be related to the migration of people to large cities in recent years and the disordered nature of population mobility.
The coefficient of the urbanization rate was positive at the three time nodes. The proportion of positive values in 2007 was relatively large, and the positive values in 2017 marginally declined (Fig. 5g-i). During 2000-2007, areas with low urbanization rates in the Yangtze Basin were experiencing rapid urbanization, and the urban infrastructure industry developed rapidly 9 . A large quantity of building dust entered the atmosphere, aggravating urban PM 2.5 pollution. However, areas with a high urbanization rate, such as the YRD, tended to mature, and a stagnant infrastructure industry was conducive to the reduction in emissions of fine particles.
The industrial structure had a negative impact on PM 2.5 in the middle-lower Yangtze and Cheng-Yu areas, aggravating local air pollution (Fig. 5j-l). This is consistent with the findings of existing studies confirming that industrial activities were the main drivers of PM 2.5 in most areas 19 . The impact of the industrial structure in the Chengdu-Chongqing areas was strong, which may be because of heavy industries, such as the energy, chemical, and machinery industries, with relatively high direct energy consumption and pollutant emissions 25 . Optimization of the industrial structure significantly affects the local PM 2.5 level. The coefficient had a weak impact on the YRD because the local industrial structure was dominated by the service industry and was relatively stable; thus, there was limited scope for further optimization.
The coefficients of wind in the inland areas were mainly positive and decreased from the central region to the west (Fig. 5m-o). The negative impact was dominant in the eastern areas, and as the distance to the coast decreased, the negative impact increased. This may be related to the impacts of topography and monsoons 27 . Coastal areas were mostly alluvial plains with flat terrains, and they were affected by the local circulation caused by monsoon and temperature differences. Clean air from the ocean had important dilution effects on pollutants, and thus, the coefficient was mainly negative. In the Sichuan Basin, the impact of closed topography restricted the diffusion of airflow, and the transport of wind caused pollutants in the region to interact with each other; thus, the coefficient was mainly positive. Similar findings were made in the Fenwei Plain, where basin topography exists 30 .
The impact of precipitation on regional PM 2.5 presented a negative correlation at the three time nodes (Fig. 5p-r). A positive influence was mainly distributed in the Sichuan Basin and some areas of the Yunnan-Guizhou Plateau in western China, while the negative effect decreased from the coastal to inland areas. Concretely, the regions with a high regression coefficient were mainly located in the upper and middle reaches of the Yangtze Basin, while the eastern coastal areas with abundant rainfall had a small regression coefficient, indicating that abundant precipitation had a positive impact on PM 2.5 in most cities, and this effect was more distinct in areas with relatively deficient precipitation.
The impact of the NDVI on PM 2.5 had a negative correlation that was mainly distributed in the middle-lower Yangtze Plains (Fig. 5s-u). Research has shown that vegetation growth is correlated with climate and is affected by topography and human activities, resulting in a complex correlation between PM 2.5 and vegetation 21 . The high-value areas were mainly in Yunnan and Hunan Provinces, while the low-value areas were concentrated in the middle-lower Yangtze Plain. In 2017, the NDVI showed a negative impact, indicating that vegetation's www.nature.com/scientificreports/ inhibitory effect on PM 2.5 was distinctly enhanced. This may be related to the rapid restoration and development of vegetation in the Yangtze Basin, whose forest coverage rate rose from 24% in 2000 to 39% in 2017 9 . Topographic relief was negatively correlated with regional PM 2.5 in the Yangtze Basin, which was conducive to the improvement in air quality (Fig. 5v-x). Specifically, the high-value regions were located in Sichuan, Chongqing, and Zhejiang. Lower values were concentrated in the middle-lower reaches of the Yangtze River. Local topography affects the diffusion and dilution of atmospheric pollutants in an area by influencing meteorological conditions 20 . Table 5 shows the results of the interactions between factors. The impact of the interactions between factors was greater than that for individuals, and the interaction effects included nonlinear enhancement and bifactor enhancement. When wind speed, precipitation, and vegetation interacted with PM 2.5 in pairs, the nonlinear enhancement effect was generated at all three time nodes, and the explana-  www.nature.com/scientificreports/ tory power was notably varied in different periods. When the industrial structure interacted with the per capita GDP and urbanization level on PM 2.5 , a nonlinear enhancement effect was exerted at all three time nodes, and the explanatory power was continuously improved. Nonlinear enhancement means that the interactive impact of two factors is greater than the sum of the impacts when they act alone. The interactive types of pgdp ∩ popd, pgdp ∩ urba, wind ∩ topo, prec ∩ topo, and ndvi ∩ topo were dominated by nonlinear enhancement, although they varied at different times. The types of popd ∩ urba and popd ∩ indu exerted a bifactor enhancement effect, which was not as significant as that of the nonlinear enhancement.

Conclusions and policy implications
Conclusions. Haze pollution in Chinese cities has escalated to hazardous levels in recent years. This environmental problem has become a great challenge for public health and urban sustainable development. Our exploration of PM 2.5 in the YREB provides useful results for haze prediction, which is an important step toward protecting people from health damage caused by poor air quality. The results indicate that the annual PM 2.5 level of the YREB displayed an upward trend before 2007 and a fluctuating downward trend thereafter. PM 2.5 had significant spatial heterogeneity and convergence characteristics. There were clear spatiotemporal differences in the impact of various factors on the PM 2.5 pollution pattern. In the socioeconomic layer, population has the greatest impact, followed by the economy and industry, while urbanization was a relatively stable factor causing the rise in the PM 2.5 level. Among the natural factors, topography and vegetation mainly exerted a negative impact on PM 2.5 , while the potency and direction of others changed with spatiotemporal changes.
Policy implications. As the Yangtze River plays a vital role in China's eco-environmental systems, stricter measures should be implemented to meet the goal of sustainable development. First, special attention should be given to natural factors when distributing industries and residences; for instance, topography and wind, which notably affect PM 2.5 , should be considered, while urban air ducts and green belt designs must be optimized. Second, region-targeted policies should be considered based on spatial differentiation. Downstream areas should play a leading role in promoting pollution prevention and control technologies, while upstream areas should actively drive ecological protection. More emphasis should be placed on transregional linkage governance when formulating mitigation measures. Third, heavily and highly polluted industrial sectors must be closed or upgraded, as they are major sources of particulates, while sectors with environmental protection, new energy, and a low-carbon economy and technology should be highly encouraged, such as building hi-tech eco-industrial parks. Last, it seems inevitable that air pollution will be aggravated because of burgeoning urbanization, and therefore, reducing the impact of anthropogenic activity on the atmosphere through environmental policy and education will reduce PM 2.5 pollution.
Limitations and recommendation. Our study mainly focuses on the exploration of the heterogeneity and determinants of PM 2.5 at the macro-level. However, haze reduction ultimately needs to be implemented at micro-level enterprises, so investigations of micro-enterprises should be strengthened. In addition, PM 2.5 is closely related to other air pollutants, which leads to compound air pollution with multiple pollutants. Therefore, it is necessary to explore the mechanism of compound pollution in the future to better formulate control measures. www.nature.com/scientificreports/