A novel causality-centrality-based method for the analysis of the impacts of air pollutants on PM2.5 concentrations in China

In this paper, we analyzed the spatial and temporal causality and graph-based centrality relationship between air pollutants and PM2.5 concentrations in China from 2013 to 2017. NO2, SO2, CO and O3 were considered the main components of pollution that affected the health of people; thus, various joint regression models were built to reveal the causal direction from these individual pollutants to PM2.5 concentrations. In this causal centrality analysis, Beijing was the most important area in the Jing-Jin-Ji region because of its developed economy and large population. Pollutants in Beijing and peripheral cities were studied. The results showed that NO2 pollutants play a vital role in the PM2.5 concentrations in Beijing and its surrounding areas. An obvious causality direction and betweenness centrality were observed in the northern cities compared with others, demonstrating the fact that the more developed cities were most seriously polluted. Superior performance with causal centrality characteristics in the recognition of PM2.5 concentrations has been achieved.


Scientific Reports
| (2021) 11:6960 | https://doi.org/10.1038/s41598-021-86304-0 www.nature.com/scientificreports/ polycentricity on PM 2.5 concentrations using spatial econometric models based on a three-year panel of data for urban cities in China and used the spatial centralization index and spatial concentration index together to quantify polycentricity. Zhou et al. 14 collected high-resolution PM 2.5 data by mobile monitoring along different roads in Guangzhou, China, and explored the spatial-temporal heterogeneity of the relationship between the built environment and on-road PM 2.5 during the morning and evening rush hours, calculating the betweenness centrality index for measuring the pollution impact. Despite all these studies, no research has covered further analysis with topological centrality for meteorology or air pollutants, especially in causal-based adjacent matrices. The causal direction would be such an important factor in differentiating the mutual functionality of each pollutant in the air.
Recognition of air quality by model training is a future trend in the domain of atmospheric artificial intelligence. Deep learning can be used to achieve accurate prediction with specialized knowledge. Wang et al. 15 collected eight meteorological factors from the 100 most developed cities in China and trained an ensembled boosted tree model with 90.2% accuracy. Huang et al. 16 developed a deep neural network model that integrated the convolutional neural network (CNN) and long short-term memory (LSTM) architectures and collected historical data such as cumulated hours of rain, cumulated wind speed and PM 2.5 concentrations. The feasibility and practicality of the trained model were verified to improve the ability to estimate air pollution, especially in smart cites. In these studies, meteorological or pollutant factors were passed directly through machine learning models, and the intrinsic relationship among these factors was ignored during training. The spatial-temporal characteristics need to be more widely studied over a large extent.
In this paper, we studied the air pollutants NO 2 , SO 2 , carbon monoxide (CO) and O 3 by means of time series from a large number of air monitoring data in the Jing-Jin-Ji region in China and focused on the causality influence of the accumulative process of each pollution component on air PM 2.5 . By establishing four joint regression models, we quantitatively analyzed the influence degree of air pollutants on the cause of PM 2.5 to better clarify the formation of haze and trained a multilayer perception model to achieve improved performance compared with other methods. Figure 1 illustrates the new causality (NC) impacts from the four pollutants on PM 2.5 concentrations. For the inner-city impact, as shown in Fig. 1A, NO 2 has an obvious causal effect on the PM 2.5 concentrations in Beijing and Tianjin, followed by those in Chengde and Tangshan. SO 2 also has a significant causal effect on the PM 2.5 concentrations in Langfang and Cangzhou. In Fig. 1B, the causality of pollutants from peripheral cities around Beijing to the Beijing PM 2.5 concentrations is considered, and NO 2 in Zhangjiakou and Chengde have the greatest influence, followed by CO in Langfang. SO 2 in all the cities bordering Beijing, such as Langfang and Zhangjiakou, has certain impacts on the PM 2.5 concentrations in Beijing. Neither O 3 from the inner city itself that from the peripheral cities has a causal impact on the PM 2.5 concentrations, as shown in green. Detailed information on Fig. 1 is listed in Table 1 and Table 2. The column order refers to lagging days in the NC model.

Results
The causality-centrality results are drawn in Fig. 2. The upper row shows the betweenness centrality under the four pollutants in the Jing-Jin-Ji region, and the bottom row shows the clustering coefficient mapping results. A large betweenness centrality is present in the northern cities, especially those adjoining Beijing, such as Chengde (CO and O 3 ), Langfang (SO 2 ) and Zhangjiakou (NO 2 ). The discriminative ability of clustering coefficients in Fig. 2B does not behave as well as the betweenness centrality. Although the coefficient values are close to each other, it can still be inferred that pollutants around the Beijing area play an important role in the PM 2.5 concentrations in the Jing-Jin-Ji region. Figure 3 shows the causal direction among the Jing-Jin-Ji cities under the four pollutants. In Fig. 3A, the causal impacts for CO among each city are modeled by NC. The causalities in Shijiazhuang, Langfang, Baoding    Fig. 3D, SO 2 in Shijiazhuang, Tianjin, Hengshui, Cangzhou, Zhangjiakou, Baoding and Handan has a direct causal impact on that in other cities, and Beijing becomes an input-oriented SO 2 polluted city. Table 3 lists the recognition results with causal centrality measures used in the multilayer perception (MLP) model. By constructing a three-class confusion matrix, weather was categorized into 'Fine' , 'Bad' , and 'Polluted' according to the air quality index, and the corresponding evaluation indicators, including accuracy, precision, sensitivity, and F1 score, were computed with different training parameters. The model was tested with [50, 100, 200] epochs. To accelerate the training process, the batch size was enlarged to 32 when the epoch number was 200.

Discussion
In this study, the causal centrality characteristics are analyzed for the relationship between the air pollutants and PM 2.5 concentrations of the Jing-Jin-Ji region in China. The NC-based adjacent matrices with causal direction weighting information reveal the basal functionality for the formation of PM 2.5 under air pollutants. Different

Superior performance with causal centrality characteristics in the recognition of PM 2.5 concentrations.
Previous studies [26][27][28] have widely carried out research on air quality recognition mainly based on meteorological or pollutant characteristics. The centrality measured from the NC method shows superior performance in distinguishing different degrees of air pollution. The method proposed in this study can be considered efficient and practical for training the deep learning model. As shown in Table 3, the number of epochs tested ranged from 50 to 200. The best testing results were generally obtained with the parameter set (epoch = 150, batch = 16). When the epoch reached 200, nearly all critical classification indicators declined, which means that overfitting existed in the model. For all the models tested in Table 3, NO 2 shows the most effective classification capability, which is in consensus with the results above that it has the greatest impact on the PM 2.5 concentrations in Beijing and its surrounding areas.

Limitations
There are some limitations in this study. First, only air pollutants are under consideration. However, air quality is affected by many factors in addition to air pollutants or meteorological factors. These factors should also be considered in the joint regression models. Second, data from restricted areas in China are collected and analyzed.
Air pollution is such a complex and regional mutual weather phenomenon, and a vast spatial scale should be covered for the analysis of PM 2.5 formation.

Materials and method
Materials. Data  New causality. New causality theory is derived from Granger causality (GC) theory. GC was proposed by Granger. This theory was first applied in economics and was recently widely used in neuroscience, global climate change and other scientific domains [29][30][31] . A brief introduction is given here. Considering a set of time series, GC exhibits the causal relationship between variations based on past values. In the form of a linear regression model, two time series are assumed to be jointly stationary. The autoregressive representations (Eq. 1) and their joint representations (Eq. 2) are described below.
where i and j are integer numbers ranging from 1 to the lagging order m of time series X . a j is the coefficient of X . t represents time. The noise terms, ǫ i and η i , are uncorrelated over time and have zero means. The covariance between η 1 and η 2 is defined by σ η 1 η 2 = cov ( η 1 η 2 ). If the past values of variable X 2 make the estimation of X 1 more accurate, the noise term of σ 2 η 1 should be less than σ 2 ǫ 1 . In this case, X 2 is said to have a causal influence on X 1 . However, if σ 2 , X 2 has no causal impact on X 1 . The GC value from X 2 to X 1 is therefore defined in Eq. (3). (1) (2) X 1,t = m j=1 a 11,j X 1,t−j + m j=1 a 12,j X 2,t−j + η 1,t X 2,t = m j=1 a 21,j X 1,t−j + m j=1 a 22,j X 2,t−j + η 2,t www.nature.com/scientificreports/ There is no causal influence from X 2 to X 1 when F X 2 →X 1 = 0 , and if F X 2 →X 1 > 0 , X 2 is said to exhibit GC on X 1 . For long-term empirical research, the vector of past values in X 1 or X 2 will be too large to build a regressive model. A general approach for determining the lagged order is the AIC-Akaike information criterion (AIC). Many algorithms can be adopted to estimate the coefficients in the joint representations. In this paper, the least squares method is used to solve the equations.
However, the value of Granger causality has been suggested to be inaccurate in some cases. It overlooks the influence of other variances in the multivariable regression model and considers only the noise terms. In 2011, Hu et al. 32 pointed out the limitations and shortcomings of GC and provided plenty of examples that GC cannot exactly demonstrate the true causality relationship between variables. The NC method was proposed to avoid limitations and successfully applied to reveal the evident causal relationship between time series. In practice, the defined NC direction is most effective in explaining phenomena observed in nature and human activities, such as the processing of EEG signals, the increase in global temperature caused by the greenhouse effect, and the fluctuation of the stock market in the economy. In Eq. (2), past values of X 1,t−j and X 2,t−j occupy a large portion among the three contributors to X 1,t or X 2,t . Based on this, a more appropriate form of causality for multivariate interactions is defined in Eq. (4).
In which, i and k are any unequal integers. D represents the causal direction from variable X i to X k . m is the lagging order in X i and X k . N is the total length of observed time series. n is the number of variables. h ranges from 1 to n . t ranges from m to N . j ranges from 1 to m . η k,t is the noise term for X k at time point t . In this paper, the causality relationship between pollutants and PM 2.5 concentrations is tested, and the following model (Eq. 5) is built to describe the influence of each component contributing to haze, which appears frequently in the Jing-Jin-Ji region. Each of the four pollutants is represented by Pollutant.
Graph-based centrality analysis. Graph-based centrality analysis has been a widely used method for topological relationship analysis among variables. In this study, each city in the Jing-Jin-Ji region is considered the graph node, the NC value between any two cities is regarded as the weighted edge, and an 11 × 11 square adjacent matrix is generated. Topological centrality measures, including the betweenness and clustering coefficient, are computed based on this matrix. Different from the correlation coefficient-based matrix, causality can be used to measure the causal direction between two factors. Thus, we build four-pollutant models, which correspond to four NC adjacent matrices, to analyze the causal importance from pollutants to PM 2.5 concentrations.
The betweenness centrality is given in Eq. (6), and the clustering coefficient is defined in Eq. (7), where ρ hj is the number of shortest paths between cities h and j , and ρ (i) hj is the number of shortest paths between cities h and j that pass through city i . N is the city set in the Jing-Jin-Ji region, and n is the number of cities in N . a ij is defined as the connection weights between cities i and j . Betweenness centrality measures the number of shortest paths that pass through a given city in a communication graph. We use this measure to characterize the importance of each city in the process of pollutant spread. The clustering coefficient can be used to measure the degree of topological clustering of pollutants around cities.

Model training.
To verify the effectiveness of the causality-centrality-based method proposed in this study, we use the calculated causality-centrality measures in MLP to determine whether these properties would bring superior classification results to the PM 2.5 concentration prediction. MLP is a deep learning model used for classification. It mainly consists of three parts: the input layer (dependent variables), the hidden layer (interconnected neural network units) and the output layer (independent variable). The purpose of MLP is to obtain a prediction model with strong generalization ability by training the labeled input data. An MLP model with a 1024 × 1024 hidden layer is trained with these causality and centrality modalities. Instead of batch normalization, the layer normalization strategy is adopted for standardization with a range of [0, 1]. Principal component analysis is used for dimension reduction, and L 1 embedding feature selection is implemented to avoid sparsification and overfitting. Equation (8) shows the L 1 penalty ( ) term added to Eq. (5).