A new method for evaluating air quality using an ideal grey close function cluster correlation analysis method

To scientifically and reasonably evaluate air quality with a large amount of monitored data, this paper proposes a new evaluation method called ideal grey close function cluster correlation analysis (IGCFCCA). Taking the air quality in Ningxia Province, China, as an example, according to China’s air quality standard, SO2, NO2, PM10, PM2.5 and O3 are selected as evaluation indexes to perform the evaluation. The results show that the air quality in this region in 2018 can be divided into three classifications, among which the relatively poor air quality in March, April and May is the first classification, the better air quality in August and September is the third classification, and the air quality in other months falls under the second classification. Correlation analysis is used to qualitatively determine that these three classifications correspond to first-level air quality in China’s air quality standard, and the correlation degree, which is the distance between the three classifications and the first-level air quality, is quantitatively determined. Specifically, the correlation degrees of the first-classification, second-classification and third-classification of air quality are 0.674, 0.697 and 0.71, respectively. The research results indicate potential directions and objectives for air quality management to achieve scientific management.

www.nature.com/scientificreports/ At present, there are many methods for comprehensively evaluating atmospheric environmental quality, including the air pollution index (API) method, ambient air quality index (AQI) method, single factor index method, green air pollution comprehensive index method, analytic hierarchy process, artificial neural network models, and fuzzy comprehensive evaluation method 8 . Due to the different evaluation principles of various evaluation methods, each method has unique advantages and disadvantages. Among them, the API and AQI methods are simple, intuitive and convenient to use but only applicable for evaluating the short-term air quality in cities 9 . The single factor index method is clear and easy to implement, but it cannot consider the air quality status as a whole, and the evaluation results are one dimensional 9 . Green's comprehensive air pollution index method is easy to understand and implement, but it is only applicable to areas where coal pollution is the main pollution type 9 . The analytical hierarchy process (AHP) is simple, practical and systematic, but quantitative results are limited; additionally, when there are many indicators, the statistics will be complex, and weights will be difficult to determine 9 . The artificial neural network evaluation method has the advantages of a fast operation speed, self-adaptation and strong fault tolerance, but the disadvantage is that when the data are poorly correlated, the evaluation results will exhibit homogenization phenomena [10][11][12][13] . IGCFCCA is a kind of fuzzy comprehensive evaluation method based on fuzzy mathematics, the fuzzy principle and the grey close function. The method can solve the common incomplete data problem and mainly assesses the relationships between uncertainty and incomplete information analysis, model building and forecasting. The method only needs a small amount of data and can achieve good prediction results.
In this paper, the IGCFCCA method is used to evaluate the air quality in Ningxia Province. The method can not only scientifically classify a large amount of data but also calculate the correlation degree between each classification and the relevant standard. This approach can provide an important basis for comprehensive environmental management. Moreover, this new method provides a scientific reference and an important basis for the establishment and optimization of other industry standards in the future.

Basic principle and methods
A sample, which comes from the monitored data reports of some environmental management departments, is first classified by ideal grey close function cluster analysis. Then, the level of the sample is determined by grey correlation analysis, and comprehensive evaluation conclusions are established according to the correlation degree between the classification of the sample and the levels specified in GB3095-2012.
The classification of the sample to be evaluated. Establishing the evaluation index sequence matrix for the selected sample. Let S be a sequence of clustering objects, i.e., S = {s 1 , s 2 …, s m }; X is a sequence of air-influencing variables, i.e., X = {x 1 , x 2 …, x n }; x ik is the original monitoring data for s i (i = 1, 2…, m) and x k (k = 1, 2…, n); i and m represent the number of objects considered in clustering; k and n are the number of the influencing indexes which are the pollutants mentioned above. Accordingly, the following matrix can be established (Eq. 1).
Establishing the matrix of ideal-value grey close function clusters. Let X 0 = {x 01 , x 02 …, x 0n } be the ideal-value sequence corresponding to each influential index. The principle for determining the ideal value is as follows (Eqs. 2, 3, 4).
The first situation: The larger the influencing index (x k ) is, the better the air quality is; in this case, the ideal value The second situation: The smaller the influencing index (x k ) is, the better the air quality is; in this case, the ideal-value Third, the air quality is best when the influencing index (x k ) displays a moderate value, and the ideal value is According to the ideal value x 0k (Eqs. 2, 3 or Eq. 4) and the original monitored data (x ik ), the grey close function value y ik is calculated by using (Eq. 5).
where x ik is the original monitored data and x 0k is the ideal value corresponding to the k-th influential index. Moreover, the function value y ik is dimensionless, and y ik ∈ [0,1]. y ik denotes the correlation degree of s i and s 0 for the k-th index. Specifically, the larger y ik is, the closer s i is to the ideal value s 0 , and the smaller y ik is, the farther s i is from s 0 .
Thus, the following grey close matrix Y can be established (Eq. 6). www.nature.com/scientificreports/ In this case, Y is the grey close function value. Moreover, (y 01 , y 02 …, y 0n ) = (1,1…,1) 1×n is the ideal sequence, and the bigger y ik is, the better s i is; the biggest y ik is equal to 1.
The classification of the sample to be evaluated. Because the influence of each influencing index is different, the weight of each influencing index needs to be considered. Let P i be the comprehensive analysis value of s i . P i can be expressed as follows (Eq. 7) where W is the weight of each influencing index, and since the number of indexes is k, the number of W values is also k (W 1 , W 2 …, W k ). Corresponding, the following equation can be established (Eq. 8).
Based on the actual comprehensive analysis value P i , P j = (P 1 , P 2 …, P m ) T . The following equation (Eq. 9) can be used to calculate the grey close value P ij of P i in relation to P j .

Then,
If P (Eq. 10) satisfies the following three conditions: (1) reflexivity, where P ij = 1 (i = j); (2) symmetry, where P ij = P ji ; and (3) normativity, where P ij ∈ [0,1], we can select the appropriate threshold value from the P matrix, intercept the branches with weight values less than λ, which is the similarity coefficient 4,5 , and establish the classification S ′ t (t = 1, 2…, c) when λ level meets the relevant requirement. S ′ t represents each classification of the air in a given region. The following equations (Eqs. 11, 12) can be established.
where S ′ t is the t-th classification, S ′ tk is the kth index of the t-th classification, t is the number of classifications, and k is the number of influencing indexes.
S ′ tk can be expressed in the following matrix form (Eq. 13).
Correlation degree analysis of the sample to be evaluated. Let S ′ t be the sample to be evaluated, and let X = (x 1 , x 2 …, x n ), which is the influencing index set mentioned above and is the evaluation index used for S ′ t . Let S ′ 0 be the stated air quality classification in the GB3095-2012. Then, the equation for the correlation coefficient is as follows (Eq. 14) 14 .
where ζ t (k) is the correlation coefficient and ε is the resolution coefficient, with a general value of 0.5 4,5 .
Moreover, the correlation degree (R t ) equation is as follows (Eq. 15).  www.nature.com/scientificreports/ The value of R t is calculated by using (Eq. 15). The maximum value of R t indicates that the sample to be evaluated has the highest correlation degree with the considered air quality level. Therefore, the sample is classified correspondingly.

Air quality assessment-taking Ningxia Province in China as an example
The classification of the samples to be evaluated. Monthly reports of the air quality in Ningxia Province in 2018 were provided by the Department of Ecology and Environment of Ningxia Province. The monthly report data were used to establish the cluster of samples S (Table 1) (Eq. 1). Each sample included five kinds of pollutants. Moreover, the concentrations of SO 2 , NO 2 , PM 10 and PM 2.5 were based on monthly averages calculated from 24-h averages, and the concentration of O 3 was the monthly average calculated from the 8-h average values.
x 1 is the SO 2 concentration; x 2 is the NO 2 concentration; x 3 is the PM 10 concentration; x 4 is the PM 2.5 concentration; and x 5 is the O 3 concentration. For these pollutants, the lower the concentration is, the better the air quality is.
As shown in Table 1, because the management department only provided some monitored data and the data in January are incomplete, only the data that are listed in the table from February to December can be effectively analysed. However, the focus of this study is on the new analysis and evaluation method (IGCFCCA ), and almost all of the data can be analysed by this method.
According to (Eq. 3), the five ideal values are as follows: x 01 is 9, x 02 is 17, x 03 is 56, x 04 is 25, and x 05 is 76. Based on the sample data in Table 1, the ideal-value grey close matrix (Eq. 6) can be obtained from (Eq. 5); according to (Eq. 8), the weights of x 1 , x 2 , x 3 , x 4 and x 5 are w 1 = 0.06, w 2 = 0.09, w 3 = 0.34, w 4 = 0.12, and w 5 = 0.39, respectively. Consequently, the comprehensive analysis value P i (i = 1, 2…, 11) ( Table 2) of S i is calculated with (Eq. 7). The grey close function value y ik (Eq. 5) and the comprehensive analysis value P i are shown in Table 2.
With P i (P 1 , P 2 … and P 11 ) as known numbers, P ij (j = 1, 2…, 11) can be calculated from (Eq. 9). The corresponding elements of the grey similar matrix (Eq. 10) are shown in Table 3.   (Table 1) can be divided into three classifications, and the class-based approach provides two main advantages. First, if the data in each month are compared and analysed with the air standards, the workload will be large, and errors will easily accumulate. In contrast, only analysing the three classifications can greatly improve the work efficiency. Second, this classification method can be used to establish national or local standards. For example, actual statistical data over many years can be classified by this method, and the classification results can be used as new comparison standards, which would be beneficial to the analysis and evaluation of statistical data in the future.

Sample evaluation and correlation degree analysis.
In the former parts of the paper, the samples from each month in 2018 are divided into three classifications ( S ′ 1 , S ′ 2 and S ′ 3 ). The concentrations of these pollutants in the air quality standard (GB3095-2012) are used for comparison, and the comparison of the data is shown in Fig. 1.
As shown in Fig. 1, compared with that in the first-level air standard, the SO 2 concentration in the thirdclassification air standard is lower, and the NO 2 concentrations in the three air classes are all lower than the    According to grey theory, the cluster data and the data ( S ′ 01 and S ′ 02 from air quality standard) used for comparison must be initialized 4,5 , and the initial values are shown in Table 5.
According to Eqs. 14 and 15, the correlation degree R and the correlation coefficient ζ of the first-level standard are shown in Table 6, and the correlation degree and correlation coefficient of the second-level standard are shown in Table 7.
According to Tables 6 and 7, all three classifications have the highest correlation with the first-level air standard. Therefore, the air quality in Ningxia Province in 2018 was associated with the first-level standard. More importantly, this result quantitatively indicates a correlation between the three classifications and the first-level air standard. The correlation degrees of the first classification, second classification and third classification with the first-level air standard are 0.674, 0.697 and 0.71, respectively. Therefore, it is clear that the gaps between the three classifications and the compared air standard are 0.326, 0.303 and 0.29. Moreover, the reason why the correlation degree cannot reach 1 is that some pollutant concentrations in the monitored data for these classifications are lower than the first-level air standard, and the remaining pollutant values are higher. Therefore, there is still room to continue to improve the air quality in the region. The region should continue to reduce the concentrations of pollutants and further improve the correlation degrees of all classifications of air with the first-level air standards.

Conclusions
(1) A new method of air quality assessment, IGCFCCA , is proposed. The advantage of the method is that it can quantitatively characterize the correlation degree between the current air quality and the corresponding standard level. Specifically, the results of this method indicated that the air quality in Ningxia Province in 2018 was correlated with first-level air in China's air quality standard. The correlation degrees of the first classification, second classification and third classification of air quality with the first-level air standard are 0.674, 0.697 and 0.71, respectively. Therefore, the region should continue to reduce the concentrations of  Table 6. Correlation with the first-level air standard.  www.nature.com/scientificreports/ pollutants, especially PM 10 , PM 2.5 and O 3 , and further improve the correlation degrees of all classifications with the first-level air standards. Notably, this method can be used in other industries. (2) The air quality in Ningxia Province in 2018 was classified into three classifications by ideal grey close function cluster analysis. Specifically, the relatively poor air quality in March, April and May and the comparatively better air quality in August and September correspond to the third classification, and the air quality in the remaining months corresponds to the second classification. In addition, the classification method can be used as a reference when establishing other classification standards, such as national standards, regional standards, and industry standards.