An ambient air quality evaluation model based on improved evidence theory

It is significant to evaluate the air quality scientifically for the management of air pollution. As an air quality comprehensive evaluation problem, its uncertainty can be effectively addressed by the Dempster–Shafer (D–S) evidence theory. However, there is not enough research on air quality comprehensive assessment using D–S theory. Aiming at the counterintuitive fusion results of the D–S combination rule in the field of comprehensive decision, an improved evidence theory with evidence weight and evidence decision credibility (here namely DCre-Weight method) is proposed, and it is used to comprehensively evaluate air quality. First, this method determines the weights of evidence by the entropy weight method and introduces the decision credibility by calculating the dispersion of different evidence decisions. An algorithm case shows that the credibility of fusion results is improved and the uncertainty is well expressed. It can make reasonable fusion results and solve the problems of D–S. Then, the air quality evaluation model based on improved evidence theory (here namely the DCreWeight model) is proposed. Finally, according to the hourly air pollution data in Xi’an from June 1, 2014, to May 1, 2016, comparisons are made with the D–S, other improved methods of evidence theory, and a recent fuzzy synthetic evaluation method to validate the effectiveness of the model. Under the national AQCI standard, the MAE and RMSE of the DCreWeight model are 1.02 and 1.17. Under the national AQI standard, the DCreWeight model has the minimal MAE, RMSE, and maximal index of agreement, which validated the superiority of the DCreWeight model. Therefore, the DCreWeight model can comprehensively evaluate air quality. It can provide a scientific basis for relevant departments to prevent and control air pollution.

• A combination rule with evidence decision credibility and evidence weight is proposed in this paper. And a case validates its effectiveness. The counterintuitive problem of D-S theory is solved and the credibility of fusion results is improved using the proposed improved evidence theory. • The air quality evaluation model (DCreWeight model) based on the improved evidence theory is proposed to evaluate air pollution situations comprehensively, which can effectively handle the uncertainty in comprehensive air quality evaluation. • In the DCreWeight model, membership functions of the six air pollutants are built based on fuzzy theory. And they are transformed into BPA functions, which better deal with the ambiguous information of air quality levels.
• In the DCreWeight model, considering the contribution of different pollutant concentrations to air quality, the combined weights of air pollutants are established by subjective excessive times weight method and objective entropy weight method, which improves the accuracy of entropy weight method. • Comparisons are made with the D-S, two improved methods of evidence theory and a recent FSE method.
The results of air quality evaluation in Xi'an show that the DCreWeight model has the minimal MAE, RMSE, and maximal index of the agreement under the national AQI standard and AQCI standard, which is superior to the other methods.
The novelty of the proposed method is based on the improved evidence theory, which is complementary to the traditional air quality assessment methods. The rest paper is organized as follows: "Backgrounds" section presents the background of evidence theory. "Improved evidence theory" section presents the improved evidence combination rule. "Ambient air quality evaluation model" section establishes the model of the ambient air quality evaluation based on improved evidence theory. "Results" section is the application of air quality evaluation model in Xi'an. "Conclusion" section concludes the paper and advances some prospects.

Backgrounds
In this section, to better understand the definitions in the subsequent content, the important nomenclature descriptions are listed in advance. Then the background of evidence theory is presented. The main nomenclature descriptions are as follows:

D-S evidence theory.
If a set is defined as and all elements in the set are independent and mutually exclusive, is called the frame of discernment framework. Under this premise, the following definitions are provided.
Definition 1 basic probability assignment function (BPA) 18 . All subsets of the are denoted as 2 which represents all possibilities of the proposition to be discriminated. The BPA function (i.e., mass function) is defined as m: 2 ∈ [0,1].
This function is also known as the mass function. If m (A) is greater than 0, A is also called a focal element. 24 .

Definition 2 belief and plausibility function
The belief function is defined as BEL and the formula is as follows: The belief function refers to the sum of the basic trust probability of all subsets of A, where BEL ( ) = 0 and BEL ( ) = 1. And let PL be the plausibility function, PL(A) = B∩A� =� m(B) . PL(A)-BEL(A) represents the uncertainty of A. 25 .

Definition 3 D-S rule
Let m 1 and m 2 be the two BPA functions on the same discernment framework . D-S rule is defined as follows: where ∀A ⊆ , B ⊂ , C ⊂ ,K = B∩C=∅ m 1 (B)m 2 (C) . K is the conflict between m 1 and m 2 . The two pieces of evidence are completely conflict when K = 1 and the two pieces of evidence are highly conflict when K → 1.
Due to high conflict evidence, the fusion result of the D-S rule may be contrary to common sense. The D-S rule is invalid 26 when K = 1. It is because the denominator is zero in the D-S normalization rule. In addition, the D-S rule failed to address the one-vote veto issue 27 . It means that m(A) is always 0 when the BPA of one piece of evidence is 0, even if much evidence supports A.
Other combination rules. Aiming to fuse conflicting evidence, Sun et al. measured the average evidence (q) and proposed an effective combination rule based on the evidence credibility (ε) . Equation (4) shows the evidence credibility function.
where K ij is the evidence conflict between evidence i and j. 1 n(n−1)/2 i<j K ij is the average conflict. When the average conflict increases, the credibility of fusion results decreases.
Here, the improved method in Reference 20 can be named as KCre-Sun. Equation (5) shows the combination rule. www.nature.com/scientificreports/ However, the average evidence does not consider the importance of different pieces of evidence, so it is difficult to apply to practical problems. In addition, the evidence credibility ε in the KCre-Sun method needs to calculate the conflict between any two pieces of evidence, so the calculation complexity is high.
Pan et al. proposed a hybrid combination rule 28 (namely Hybrid-Rule) to fuse the conflict evidence. When K > 0.95, measure the similarity degrees of pieces of evidence by the Euclidean distance in the condition of high evidence conflicts. However, a type of Euclidean distance method cannot measure the complex relationships of pieces of evidence accurately.

Improved evidence theory
To cope with the counterintuitive fusion results when high conflict pieces of evidence are combined, a lot of work based on the entropy method [29][30][31] has been researched to measure the importance of evidence. In addition, credibility 19,20,32 is measured based on BPAs to represent the evidence divergence. However, the divergence of evidence is sensitive to BPAs 33 , which limits the evidence theory to engineering.
In order to handle the conflict and make reasonable fusion results, this paper introduces the decision credibility to represent the discrepancy of evidence decisions. In addition, the weight of evidence is determined using the entropy weight method. Hence, a weighted combination rule based on decision credibility and evidence weight is presented to meet the engineering field.
It is because the limits of arctan 1 d equals π 2 . Here, 2 π in Eq. (6) is to make the range of decision credibility [0,1].
(2) Evidence weight Each evidence contains has the amount of different information. The weights of pieces of evidence can be determined objectively by the entropy weight method. The steps of the entropy weight method are as follows: Step 1 The entropy value can be calculated as: Step 2 The deviation degree can be calculated as: Step 3 The weights of pieces of evidence can be calculated as: Therefore, based on evidence decision credibility and evidence weight, the combination rule is defined as follows: The improved evidence theory in this paper can be named as DCre-Weight method and its algorithm description is shown in Appendix A.
Next, to validate the effectiveness of the proposed algorithm, an example in reference 20 is introduced to compare the improved evidence theory with the other three combination rules (seen in Part 2.2). Table 1 shows the fusion results of the four combination rules. According to the results in Table 1, it failed to recognize target A by D-S evidence theory because of the conflict evidence m 2 . Target A is recognized correctly using the KCre-Sun and Hybrid-Rule methods. However, in the fusion process, the credibility of target A is low using the KCre-Sun method. Compared with the KCre-Sun method, the proposed DCre-Weight method and the Hybrid-Rule method improved the credibility of fusion results. However, m ( ) is always 0 in the fusion of m 1 ⊕ m 2 and m 1 ⊕ m 2 ⊕ m 3 using the Hybrid-Rule method. It cannot express the uncertainty in the combined decision. Compared with the Hybrid-Rule, because the proposed method measured the decision credibility by calculating the difference of evidence decisions and assigned the evidence conflict according to the evidence weight, the value of m (Θ) is decreased when the third piece of evidence is combined.

Ambient air quality evaluation model
Nowadays, air quality data can be easily accumulated by sensors around the world 34 . The concentration of pollutants monitored at monitoring stations changes with meteorological conditions, policies, pollution sources, human factors, etc. Evidence theory can well address the ambiguity of air quality and the uncertainty of environmental systems. For air quality evaluation, the main air pollutants affecting air quality are CO, PM 10 , NO 2 , PM 2.5 , SO 2 , O 3 . Air quality is not determined by a single air pollutant, but a combination of multiple air pollutants. Through the fusion pollution information through the improved evidence theory, a more accurate assessment of air quality can be obtained.
The air quality evaluation model based on the improved evidence theory is shown in Fig. 1. Firstly, the membership functions (MFs) 35 of each air pollutant are established based on fuzzy theory and transformed into BPA functions. Then the weight set of pollutants is established according to the evaluation standard and entropy weight method. Finally, the improved evidence theory is used to fuse the information of multiple pollutants.
Evaluation standards. The AQI standards for China and the United States are the same, but the concentration limits of pollutants are different, especially the limits of PM 2.5 . According to the standard AQI (HJ633-2012[Z]) and the Ambient Air Quality Standards (GB 3095-2012), this paper revised the limits of some pollutants and established five criterion levels, as shown in Table 2.
Air pollutants have impacts on human respiratory system. The description of the air quality evaluation standard is shown in Table 3.

Determining the membership functions (MFs).
The events in the discernment framework are regarded as fuzzy sets {A 1 , …, A n } of the domain U. The membership degree of the object is transformed into the BPA using the normalization method.
Set U = {I, II, III, IV, V, } and define s air pollutants as the indicators set. According to the characteristics of pollutants in Table 2, the MFs are built for any recognition object x i in X = {x 1 , …, x s }. When the concentration of pollutants, x i , exceeds the limit of level j-1, the degree of membership of the previous quality level j-1 decreases, and the degree of membership of the next level j + 1 increases. But the change between air quality and pollutant concentration is non-linear. Let y ij be the concentration limit of the quality level j of x i . Here, the increasing function uses log 2 . The decreasing function uses If the concentration of some pollutant is less than the limit of first-level, the air quality is judged as level 1, and the membership function is improved from the Z function, as shown in Eq. (11). If the concentration of some pollutant exceeds the limit of level j-1, where 2 ≤ j ≤ 4 , the quality is judged as level j, and Eq. (12) is selected. If the concentration of some pollutant is over the limit of level 4, it is judged as level 5, and Eq. (13) is selected. The MFs of each air pollutant related to the five criterion levels can be selected as follows: Level I, j = 1 Table 1. Comparison of the fusion process under the four combination rules.

Combination
0, x i > y i(j+1) Figure 1. Air quality evaluation model based on improved evidence theory.  where i = 1, …, m, and j = 1, …, n. The membership of indicators belonging to each mode is shown in Eq. (14).
In this study, the evidence theory is applied to the evaluation model of ambient air quality. The first step is the initial belief probability in the model. Since the mass function in D-S theory represents the basic trust of a certain proposition A, and the degree of membership represents the degree that the object belongs to the fuzzy sets, the mass function can be transformed by the membership function. The mass functions of object x can be calculated by Eq. (15).
Air quality evaluation based on improved evidence theory. Based on the improved evidence theory (DCre-Weight), the air quality model (DCreWeight) is proposed to evaluate comprehensive air quality. Firstly, considering the contributions of pollutant to air quality evaluation, the weights of pollutants are built based on the subjective weight method and the objective entropy weight method. Then, define the concentration of six air pollutants as pieces of evidence and use the improved evidence theory to make a comprehensive decision of air quality level.
The steps of the DCreWeight model are as follows: (1) Set U = {I, II, III, IV, V, }. I, II, III, IV, V means the air quality levels, and represents the uncertainty in air quality evaluation. (2) According to the MFs in Equal (11), Equal (12), and Equal (13), the BPA can be established.
(3) Standardize the evaluation data (x ij ) m×n according to Eq. (16). And calculate the ration p ij = x ′ ij / m i=1 x ′ ij . Then the weights of air pollutants (W 1 ) can be calculated according to the entropy weight method in Eq. (7) ~ Eq. (10).
(4) Using the subjective weight method to establish the weights of air pollutants. The excessive times method is as follows.
where y ij is the limits of pollutants in Table 2 and x i is the real concentration of pollutant i. If j = 1, y i0 = 0 . Particularly, when the weight exceeds the n th level of concentration limit, a i = j + x i y ij . Define the normalized weights as W 2 . Establish appropriate weights {a, b} for w 1 and w 2 . Then the weight set of evidence is W=a*W 1 +b*W 2 . Here set the {a, b} = {0.2, 0.8} to the highlight the impacts of main pollutants on air quality. (5) Accoding to Eq. (10) in the DCre-Weight method, using the proposed combiantion rule to to evlaute the comprehensive air quality. The obtained probabilities are shown as Eq. (18).
(6) According to the maximum probability, the comprehensive air quality (Level) can be determinded according to Eq. (19).

Example 2
There are mainly six pollutants that affect air quality. Take a piece of data as an example to analyze the comprehensive air quality level. SO 2 = 46, NO 2 = 74, CO = 4.96, O 3 = 16, PM 10 = 390, PM 2.5 = 241 in Xi'an on January 5, 2016.
According to the maximum probability, the comprehensive air quality level is V and the air quality is most likely to be heavily polluted by the above methods given in Table 4. Compare to the D-S and Hybrid-Rule, the uncertainty is not 0 by the KCre-Sun and DCre-Weight method due to different pollution degrees of six pollutants. However, the level V is only 0.1743 and the credibility is 0.4467 by the KCre-Sun method. Compared to the KCre-Sun method, the fusion results contain more useful information by DCre-Weight, which is conducive to decision-making.

Results
Data. To validate the performance of the proposed DCreWeight model, select hourly air pollution data in Xi'an from June 1, 2014, to May 1, 2016. The years are randomly selected. In this paper, the null values are processed using the linear interpolation method. According to the proposed DCreWeight model, the comprehensive air quality evaluation results on a day are as follows (see Fig. 2).

Evaluation indicators.
(1) Evaluation indicators based on AQI where C P is the concentration of pollutant P. Taking the national AQI as the pollution standard, the indicator MAE, RMSE and an index of agreement can be calculated to analyze the performance of evaluation models. Count the number of days when AQI is equal to the evaluation level of models, and define it as right_num.
Defined AQI_MAE, AQI_RMSE and AQI_an index of agreement as evaluation indicators. The above evaluation indicators based on AQI can be calculated as follows: where n is the number of samples, y i is the actual AQI value of the i-th day, h i is the evaluation result of a model.

(2) Evaluation indicators based on AQCI
The national AQCI considers the comprehensive impacts of multiple pollutants on air quality. It highlights the contribution of six pollutants. AQCI is shown in Eq. (24).
where S P is the second concentration limit of pollutant P in the Ambient Air Quality Standards (GB 3095-2012).
Taking the national AQCI as the pollution standard, the indicator AQCI_MAE and AQCI_RMSE can be calculated by Eqs. (21) and (22)  According to Figs. 3 and 4, the air pollution situations were Winter > Spring > Summer > Autumn. PM 2.5 and PM 10 were primary pollutants in the four months. In Winter, the weight of SO 2 was greater than that of O 3 . But in the other three months, it was smaller than that of O 3 . It is because that the weak light made O 3 concentration decreased, and coal burning for heating made an increase of SO 2 in Winter. It is because that the weak light reduces the O 3 concentration while coal burning for heating increases SO 2 concentration in Winter. Take national AQI and AQCI as pollution standards, the evaluated air quality levels of D-S, KCre-Sun, Hybrid-Rule, and FSE methods are mostly lower than AQI. The evaluated results of the above models deviate greatly from the AQI and AQCI, while the evaluated results of the DcreWeight model are closest to the national AQI and AQCI.
To validate the superiority of the models, take AQI_MAE, AQI_RMSE, AQI_an index of agreement, AQCI_ MAE, and AQCI_RMSE as evaluation indicators. The performance comparison results of the evaluation methods under the AQI and AQCI standards are shown in Fig. 5.
According to Fig. 5, the DCreWeight model has the minimum MAE, RMSE under the AQI and AQCI standards and its index of agreement is the highest, which is superior to the D-S, KCre-Sun, Hybrid-Rule, and FSE methods.
The application in Shanghai and Beijing. The superiority of the model has been validated according to air pollutants data in Xi'an in "Analysis and comparison of evaluation methods" section. In order to better check whether the model is suitable for other urban air quality assessments, we also selected hourly air pollution data from 2014 from June 1, 2014, to May 31, 2015, in Shanghai and Beijing. Firstly, the null data were processed using the linear interpolation method. Then, we applied the DCreWeight model to the two cities and compared the air quality between Shanghai and Beijing under the national AQI and AQCI standards. Figure 6 shows the evaluation results of the DCreWeight model in Summer, from June 1, 2014, to June 31, 2014. The left vertical axis represents the air quality evaluation level, and the right vertical axis represents the AQCI value. National AQCI represents the comprehensive pollution degree. To clearly check the accuracy of the DCreWeight model, sort the days according to AQCI. www.nature.com/scientificreports/ According to Fig. 6, the AQI level fluctuates as the AQCI value decreases. This is because the AQI level depends on individual pollutants. However, with the decrease of AQCI, the evaluation results of the DCreWeight model basically decline in steps. It indicates that the evaluation of the model is in line with the actual comprehensive pollution. Compared with AQCI, the proposed model describes air quality levels more intuitively.
Next, compare the air quality between Shanghai and Beijing, as shown in Fig. 7.  www.nature.com/scientificreports/ According to Fig. 7, the following conclusions can be drawn. The air quality in Beijing in Summer was worse than that in Shanghai. The comprehensive air quality is good or regular basically in Shanghai in Summer. However, many days are regular, lightly polluted, and moderately polluted in Beijing in Summer.
Finally, given that pollution control is a long-term process, we finally analyze the pollution characteristics in Beijing according to the weights of pollutants, as is shown in Fig. 8. We also analyze the possible reasons for pollution to help relevant departments make strong pollution strategies based on pollution characteristics and the current air quality level.  Fig. 8. There are many causes of PM 2.5 in Beijing, including vehicle exhaust, industrial emissions, and dust from construction sites and road traffic, all of which increase the concentration of PM 2.5 . In summer, under the strong ultraviolet light, nitrogen oxides are more easily converted into O 3 by photochemical reaction, so the concentration of O 3 will increase. The exposed arable land around Beijing and the surrounding sandy areas, as well as the monsoon climate in Beijing, will cause PM 10 concentration to increase under the wind effect. In addition, vehicle exhaust and industrial waste gases also cause a higher concentration of NO 2 . Therefore, although NO 2 concentration is not as high as PM 2.5 , O 3 , and PM 10 pollution, we should pay attention to NO 2 concentration control in summer to avoid a high O 3 concentration. The relevant departments should control air pollution from the sources of traffic, industry park, and construction sites, and protect the surrounding environment by planting green plants.

Conclusion
Air quality is affected by many air pollutants. Selecting appropriate methods to evaluate air quality is the basis for taking relevant air pollution control measures. This paper proposed an air quality evaluation model based on the improved evidence theory. The core part of the model is to use the improved evidence theory (DCre-Weight) to evaluate the comprehensive impact of multiple air pollutants on air quality. An algorithm case showed that the DCre-Weight method improved the credibility of fusion results, which solved the counterintuitive fusion results in D-S evidence theory. And the uncertainty was well expressed using the DCre-Weight method. In addition, a specific application of this model in Xi'an shows that the DCreWeight model comprehensively evaluates air quality. Under the national AQI and AQCI as pollution standards, the MAE and RMSE values of    www.nature.com/scientificreports/ the proposed model were minimal and the index of agreement was maximal, which validated the superiority of the DCreWeight model. Air quality is closely related to human life and air quality evaluation is of great value and significance to the ecological environment. This paper considers the influence of multiple pollutants and comprehensively evaluated daily air quality, which is a supplement to the AQI evaluation method. The limitation of this method is that it may not be applicable in special high pollution areas. The air quality comprehensive evaluation model based on improved evidence theory can be applied to tourism industry and government departments. It can provide a reference for the tourism and support for the government in assessing air quality and developing long-term pollution prevention and control strategies. However, this paper studies the comprehensive evaluation of air quality based on existing hourly concentration of pollutants. It does not involve the prediction of pollutant concentrations. In our future research, the concentration of six pollutants prediction will be conducted. Then an air quality prediction and evaluation model will be established to form a relatively complete air quality research. In addition, with the increase of air pollution monitoring points, real-time monitoring data has surged. Therefore, the data processing and data fusion method are also the keys to assessing air quality accurately.