Introduction

The air environment is a dynamic and complex system. The air quality is influenced by some pollutants, such as SO2, NO2, PM10, and O3. The concentrations of these pollutants are changing constantly. However, the monitored data used in analyses are usually collected in a certain period, and examples include one-hour average, few-hour average, one-month average and one-year average data. Instantaneous data collected every minute or second are difficult to collect and analyse. Therefore, this collection approach is considered a grey system. In a grey system, some information is known, and some information is unknown1,2,3,4,5,6,7.

At present, China’s air quality standard (GB3095-2012) divides air quality into two levels and stipulates the concentrations of pollutants in first-level and second-level air8,9. The concentrations of pollutants are comparatively lower in first-level air, and they are higher in second-level air. The major pollutants include SO2, NO2, PM10, O3, and others. However, when people evaluate air quality according to GB3095-2012, there may be some problems. First, according to the national standard, the common evaluation methods can only determine which level the current air is associated with. However, there is no analysis of how much the current air belongs to the level, and it is not clear how far the current air is from the standard level. The space for improving the current air quality is also very vague. It is necessary to develop a method to quantitatively calculate the correlation degree, which is the distance between the current air and the two levels of air standards. Second, to determine the air quality in a certain area in a period of time, the concentrations of pollutants are usually monitored every day. However, the amount of monitored data is very large. Obviously, if people compare and analyse each recorded value, the workload will be very large, and tasks will be almost impossible to complete. Therefore, people usually calculate the average value of the data first and then analyse the average. However, among so many monitored data, which data should be taken as a group for average calculation is a problem. In other words, determining how to scientifically classify data is the key. Data with similar characteristics can be classified into one group. These different classifications can be analysed and evaluated. Therefore, the results of the analysis can be scientific.

At present, there are many methods for comprehensively evaluating atmospheric environmental quality, including the air pollution index (API) method, ambient air quality index (AQI) method, single factor index method, green air pollution comprehensive index method, analytic hierarchy process, artificial neural network models, and fuzzy comprehensive evaluation method8. Due to the different evaluation principles of various evaluation methods, each method has unique advantages and disadvantages. Among them, the API and AQI methods are simple, intuitive and convenient to use but only applicable for evaluating the short-term air quality in cities9. The single factor index method is clear and easy to implement, but it cannot consider the air quality status as a whole, and the evaluation results are one dimensional9. Green's comprehensive air pollution index method is easy to understand and implement, but it is only applicable to areas where coal pollution is the main pollution type9. The analytical hierarchy process (AHP) is simple, practical and systematic, but quantitative results are limited; additionally, when there are many indicators, the statistics will be complex, and weights will be difficult to determine9. The artificial neural network evaluation method has the advantages of a fast operation speed, self-adaptation and strong fault tolerance, but the disadvantage is that when the data are poorly correlated, the evaluation results will exhibit homogenization phenomena10,11,12,13. IGCFCCA is a kind of fuzzy comprehensive evaluation method based on fuzzy mathematics, the fuzzy principle and the grey close function. The method can solve the common incomplete data problem and mainly assesses the relationships between uncertainty and incomplete information analysis, model building and forecasting. The method only needs a small amount of data and can achieve good prediction results.

In this paper, the IGCFCCA method is used to evaluate the air quality in Ningxia Province. The method can not only scientifically classify a large amount of data but also calculate the correlation degree between each classification and the relevant standard. This approach can provide an important basis for comprehensive environmental management. Moreover, this new method provides a scientific reference and an important basis for the establishment and optimization of other industry standards in the future.

Basic principle and methods

A sample, which comes from the monitored data reports of some environmental management departments, is first classified by ideal grey close function cluster analysis. Then, the level of the sample is determined by grey correlation analysis, and comprehensive evaluation conclusions are established according to the correlation degree between the classification of the sample and the levels specified in GB3095-2012.

The classification of the sample to be evaluated

Establishing the evaluation index sequence matrix for the selected sample

Let S be a sequence of clustering objects, i.e., S = {s1, s2…, sm}; X is a sequence of air-influencing variables, i.e., X = {x1, x2…, xn}; xik is the original monitoring data for si (i = 1, 2…, m) and xk (k = 1, 2…, n); i and m represent the number of objects considered in clustering; k and n are the number of the influencing indexes which are the pollutants mentioned above. Accordingly, the following matrix can be established (Eq. 1).

$$ S = \begin{array}{*{20}c} {s_{1} } \\ {s_{2} } \\ \ldots \\ {s_{m} } \\ \end{array} \left[ {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \ldots & {x_{1n} } \\ {x_{21} } & {x_{22} } & \ldots & {x_{2n} } \\ \ldots & \ldots & \ldots & \ldots \\ {x_{m1} } & {x_{m2} } & \ldots & {x_{mn} } \\ \end{array} } \right] $$
(1)

Establishing the matrix of ideal-value grey close function clusters

Let X0 = {x01, x02…, x0n} be the ideal-value sequence corresponding to each influential index. The principle for determining the ideal value is as follows (Eqs. 2, 3, 4).

The first situation: The larger the influencing index (xk) is, the better the air quality is; in this case, the ideal value

$$ x_{0k} = \max \left\{ {x_{ik} ,i = 1,2, \ldots ,m} \right\},k = 1,2, \ldots ,n. $$
(2)

The second situation: The smaller the influencing index (xk) is, the better the air quality is; in this case, the ideal-value

$$ x_{0k} = \min \left\{ {x_{ik} ,i = 1,2, \ldots ,m} \right\},k = 1,2, \ldots ,n. $$
(3)

Third, the air quality is best when the influencing index (xk) displays a moderate value, and the ideal value is

$$ x_{0k} = {\text{M}}. $$
(4)

According to the ideal value x0k (Eqs. 2, 3 or Eq. 4) and the original monitored data (xik), the grey close function value yik is calculated by using (Eq. 5).

$$ y_{ik} = \frac{{x_{ok} }}{{x_{ik} }}\;\left( {i = 1,2, \ldots ,m;k = 1,2, \ldots ,n} \right) $$
(5)

where xik is the original monitored data and x0k is the ideal value corresponding to the k-th influential index. Moreover, the function value yik is dimensionless, and yik [0,1]. yik denotes the correlation degree of si and s0 for the k-th index. Specifically, the larger yik is, the closer si is to the ideal value s0, and the smaller yik is, the farther si is from s0.

Thus, the following grey close matrix Y can be established (Eq. 6).

$$ Y = \left[ {\begin{array}{*{20}c} {y_{11} } & {y_{12} } & \ldots & {y_{1n} } \\ {y_{21} } & {y_{22} } & \ldots & {y_{2n} } \\ \begin{gathered} \ldots \hfill \\ y_{m1} \hfill \\ \end{gathered} & \begin{gathered} \ldots \hfill \\ y_{m2} \hfill \\ \end{gathered} & \begin{gathered} \ldots \hfill \\ \ldots \hfill \\ \end{gathered} & \begin{gathered} \ldots \hfill \\ y_{mn} \hfill \\ \end{gathered} \\ {y_{01} } & {y_{02} } & {...} & {y_{0n} } \\ \end{array} } \right] $$
(6)

In this case, Y is the grey close function value. Moreover, (y01, y02…, y0n) = (1,1…,1)n is the ideal sequence, and the bigger yik is, the better si is; the biggest yik is equal to 1.

The classification of the sample to be evaluated

Because the influence of each influencing index is different, the weight of each influencing index needs to be considered. Let Pi be the comprehensive analysis value of si. Pi can be expressed as follows (Eq. 7)

$$ P_{i} = \sum\limits_{k = 1}^{n} {Wy_{ik} } \left( {i = 1,2 \ldots ,m} \right) $$
(7)

where W is the weight of each influencing index, and since the number of indexes is k, the number of W values is also k (W1, W2…, Wk). Corresponding, the following equation can be established (Eq. 8).

$$ W_{k} = \frac{{\sum\limits_{i = 1}^{m} {X_{{i{\text{k}}}} } }}{{\sum\limits_{i = 1}^{m} {\sum\limits_{k = 1}^{n} {X_{ik} } } }}\;\left( {k = 1,2 \ldots ,n} \right) $$
(8)

Based on the actual comprehensive analysis value Pi, Pj = (P1, P2…, Pm)T. The following equation (Eq. 9) can be used to calculate the grey close value Pij of Pi in relation to Pj.

$$ P_{ij} = \frac{{\min (p_{i} ,p_{j} )}}{{\max (p_{i} ,p_{j} )}}\;\left( {i,j = 1,2 \ldots ,m} \right) $$
(9)

Then,

$$ P = \left( {P_{ij} } \right)_{m \times m} . $$
(10)

If P (Eq. 10) satisfies the following three conditions: (1) reflexivity, where Pij = 1 (i = j); (2) symmetry, where Pij = Pji; and (3) normativity, where Pij [0,1], we can select the appropriate threshold value from the P matrix, intercept the branches with weight values less than λ, which is the similarity coefficient4,5, and establish the classification \(S_{t}^{\prime }\) (t = 1, 2…, c) when λ level meets the relevant requirement. \(S_{t}^{\prime }\) represents each classification of the air in a given region. The following equations (Eqs. 11, 12) can be established.

$$ S_{t}^{\prime } = \left( {S_{1}^{\prime } ,S_{2}^{\prime } \ldots ,S_{c}^{\prime } } \right)^{{\text{T}}} $$
(11)
$$ S_{tk}^{\prime } = \left( {S_{t1}^{\prime } ,S_{t2}^{\prime } \ldots ,S_{tn}^{\prime } } \right) $$
(12)

where \(S_{t}^{\prime }\) is the t-th classification, \(S_{tk}^{\prime }\) is the kth index of the t-th classification, t is the number of classifications, and k is the number of influencing indexes.

\(S_{tk}^{\prime }\) can be expressed in the following matrix form (Eq. 13).

$$ S_{tk}^{\prime } = \left[ {\begin{array}{*{20}c} {s_{11}^{\prime } } & {s_{12}^{\prime } } & \ldots & {s_{1n}^{\prime } } \\ {s_{21}^{\prime } } & {s_{22}^{\prime } } & \ldots & {s_{2n}^{\prime } } \\ \ldots & \ldots & \ldots & \ldots \\ {s_{cc}^{\prime } } & {s_{c2}^{\prime } } & \ldots & {s_{cn}^{\prime } } \\ \end{array} } \right] $$
(13)

Correlation degree analysis of the sample to be evaluated

Let \(S_{t}^{\prime }\) be the sample to be evaluated, and let X = (x1, x2…, xn), which is the influencing index set mentioned above and is the evaluation index used for \(S_{t}^{\prime }\). Let \({\text{S}}_{0}^{\prime }\) be the stated air quality classification in the GB3095-2012. Then, the equation for the correlation coefficient is as follows (Eq. 14)14.

$$ \zeta_{t} (k) = \frac{{\mathop {\min }\limits_{t \in c} \mathop {\min }\limits_{k \in n} \left| {S_{t}^{\prime } (k) - {\text{S}}_{0}^{\prime } (k)} \right| + \epsilon \mathop {\max }\limits_{t \in c} \mathop {\max }\limits_{k \in n} \left| {S_{t}^{\prime } (k) - {\text{S}}_{0}^{\prime } (k)} \right|}}{{\left| {S_{t}^{\prime } (k) - {\text{S}}_{0}^{\prime } (k)} \right| + \epsilon \mathop {\max }\limits_{t \in c} \mathop {\max }\limits_{k \in n} \left| {S_{t}^{\prime } (k) - {\text{S}}_{0}^{\prime } (k)} \right|}} $$
(14)

where ζt (k) is the correlation coefficient and ε is the resolution coefficient, with a general value of 0.54,5.

Moreover, the correlation degree (Rt) equation is as follows (Eq. 15).

$$ R_{t} = \frac{1}{n}\sum\limits_{k = 1}^{n} {\zeta_{t} } (k) $$
(15)

The value of Rt is calculated by using (Eq. 15). The maximum value of Rt indicates that the sample to be evaluated has the highest correlation degree with the considered air quality level. Therefore, the sample is classified correspondingly.

Air quality assessment—taking Ningxia Province in China as an example

The classification of the samples to be evaluated

Monthly reports of the air quality in Ningxia Province in 2018 were provided by the Department of Ecology and Environment of Ningxia Province. The monthly report data were used to establish the cluster of samples S (Table 1) (Eq. 1). Each sample included five kinds of pollutants. Moreover, the concentrations of SO2, NO2, PM10 and PM2.5 were based on monthly averages calculated from 24-h averages, and the concentration of O3 was the monthly average calculated from the 8-h average values.

Table 1 Air quality in Ningxia Province in 2018.

x1 is the SO2 concentration; x2 is the NO2 concentration; x3 is the PM10 concentration; x4 is the PM2.5 concentration; and x5 is the O3 concentration. For these pollutants, the lower the concentration is, the better the air quality is.

As shown in Table 1, because the management department only provided some monitored data and the data in January are incomplete, only the data that are listed in the table from February to December can be effectively analysed. However, the focus of this study is on the new analysis and evaluation method (IGCFCCA), and almost all of the data can be analysed by this method.

According to (Eq. 3), the five ideal values are as follows: x01 is 9, x02 is 17, x03 is 56, x04 is 25, and x05 is 76. Based on the sample data in Table 1, the ideal-value grey close matrix (Eq. 6) can be obtained from (Eq. 5); according to (Eq. 8), the weights of x1, x2, x3, x4 and x5 are w1 = 0.06, w2 = 0.09, w3 = 0.34, w4 = 0.12, and w5 = 0.39, respectively. Consequently, the comprehensive analysis value Pi (i = 1, 2…, 11) (Table 2) of Si is calculated with (Eq. 7). The grey close function value yik (Eq. 5) and the comprehensive analysis value Pi are shown in Table 2.

Table 2 Grey close function value and the comprehensive analysis value.

With Pi (P1, P2… and P11) as known numbers, Pij (j = 1, 2…, 11) can be calculated from (Eq. 9). The corresponding elements of the grey similar matrix (Eq. 10) are shown in Table 3.

Table 3 Grey close values Pij.

The following information can be obtained from Table 3. If λ = 0.94,5, S2, S3 and S4 correspond to the first classification \(S_{1}^{\prime }\); S7 and S8 correspond to the third classification \(S_{3}^{\prime }\); and the other S values correspond to the second classification \(S_{2}^{\prime }\). S2, S3 and S4 are the samples for March, April and May, respectively, and S7 and S8 are the samples for August and September, respectively. Cluster \(S_{tk}^{\prime }\) (Eq. 13) (Table 4) includes \(S_{1}^{\prime }\), \(S_{2}^{\prime }\) and \(S_{3}^{\prime }\).

Table 4 The classifications of air.

The samples (Table 1) can be divided into three classifications, and the class-based approach provides two main advantages. First, if the data in each month are compared and analysed with the air standards, the workload will be large, and errors will easily accumulate. In contrast, only analysing the three classifications can greatly improve the work efficiency. Second, this classification method can be used to establish national or local standards. For example, actual statistical data over many years can be classified by this method, and the classification results can be used as new comparison standards, which would be beneficial to the analysis and evaluation of statistical data in the future.

Sample evaluation and correlation degree analysis

In the former parts of the paper, the samples from each month in 2018 are divided into three classifications (\(S_{1}^{\prime }\), \(S_{2}^{\prime }\) and \(S_{3}^{\prime }\)). The concentrations of these pollutants in the air quality standard (GB3095-2012) are used for comparison, and the comparison of the data is shown in Fig. 1.

Figure 1
figure 1

Comparison of the samples to be evaluated with the two levels of air standards.

As shown in Fig. 1, compared with that in the first-level air standard, the SO2 concentration in the third-classification air standard is lower, and the NO2 concentrations in the three air classes are all lower than the concentration in the first-level air standard. In other words, the concentration of NO2 in the region meets the first-level air standard throughout the year, and the concentration of SO2 in August and September meets the first-level air standard. Therefore, according to the first-level air standard, the region should strengthen the management of PM10, PM2.5 and O3 emissions throughout the year, and the management of SO2 emissions in months other than August and September should be strengthened.

Compared with the second-level air, the concentrations of SO2, NO2, O3 in the three air classes are all lower than that in the second-level standard, the concentrations of PM10 and PM2.5 in the third classification of air are lower those in the second-level standard. In other words, the concentrations of NO2, SO2 and O3 in the region meet the second-level air standard throughout the year. Moreover, the concentrations of PM10 and PM2.5 in August and September meet the second-level air standard. Therefore, according to the second-level air standard, the region should strengthen the management of PM10 and PM2.5 emissions.

According to grey theory, the cluster data and the data (\({\text{S}}_{01}^{\prime }\)and \({\text{S}}_{02}^{\prime }\) from air quality standard) used for comparison must be initialized4,5, and the initial values are shown in Table 5.

Table 5 Data initialization results.

According to Eqs. 14 and 15, the correlation degree R and the correlation coefficient ζ of the first-level standard are shown in Table 6, and the correlation degree and correlation coefficient of the second-level standard are shown in Table 7.

Table 6 Correlation with the first-level air standard.
Table 7 Correlation with the second-level air standard.

According to Tables 6 and 7, all three classifications have the highest correlation with the first-level air standard. Therefore, the air quality in Ningxia Province in 2018 was associated with the first-level standard. More importantly, this result quantitatively indicates a correlation between the three classifications and the first-level air standard. The correlation degrees of the first classification, second classification and third classification with the first-level air standard are 0.674, 0.697 and 0.71, respectively. Therefore, it is clear that the gaps between the three classifications and the compared air standard are 0.326, 0.303 and 0.29. Moreover, the reason why the correlation degree cannot reach 1 is that some pollutant concentrations in the monitored data for these classifications are lower than the first-level air standard, and the remaining pollutant values are higher. Therefore, there is still room to continue to improve the air quality in the region. The region should continue to reduce the concentrations of pollutants and further improve the correlation degrees of all classifications of air with the first-level air standards.

Conclusions

  1. (1)

    A new method of air quality assessment, IGCFCCA, is proposed. The advantage of the method is that it can quantitatively characterize the correlation degree between the current air quality and the corresponding standard level. Specifically, the results of this method indicated that the air quality in Ningxia Province in 2018 was correlated with first-level air in China’s air quality standard. The correlation degrees of the first classification, second classification and third classification of air quality with the first-level air standard are 0.674, 0.697 and 0.71, respectively. Therefore, the region should continue to reduce the concentrations of pollutants, especially PM10, PM2.5 and O3, and further improve the correlation degrees of all classifications with the first-level air standards. Notably, this method can be used in other industries.

  2. (2)

    The air quality in Ningxia Province in 2018 was classified into three classifications by ideal grey close function cluster analysis. Specifically, the relatively poor air quality in March, April and May and the comparatively better air quality in August and September correspond to the third classification, and the air quality in the remaining months corresponds to the second classification. In addition, the classification method can be used as a reference when establishing other classification standards, such as national standards, regional standards, and industry standards.