Introduction

Information fusion refers to the process of integrating and analyzing the observed data from multiple sensors to make accurate evaluations and decisions1,2,3. This technology has been developed since the research on sonar signal processing system funded by the U.S. Department of Defense in the 1970s and was applied in the field of air combat at first. It is mainly proposed for the data fusion of radar, infrared and other multi-source sensors to improve the assessment of air combat situation. In the era of big data, information fusion can be found in various industries. For example, information fusion technology can play an important role in coal mine safety monitoring system4 to promote its accuracy, sensitivity and stability. Besides, it is often applied in vehicle positioning and navigation in the field of intelligent transportation5, realizing the enhancement of the mobility and safety of the transportation system.

Dempster–Shafer (D–S) evidence theory is an effective theoretical tool to deal with information fusion. In 1967, Dempster6 derived the concept of upper and lower probability in dealing with multi-valued mapping of propositions and sets, and used a probability range to represent the uncertainty of an event. The Dempster’s combination rule was proposed for combining evidence from two independent information sources for some statistical problems. In 1976, Dempster’s student Shafer7 published A Mathematical Theory of Evidence, introducing the mass function and developing a method to deal with uncertainty problems based on “evidence” and “combination”. The publication of this book also marks that the D–S evidence theory has officially become a theory to quantify and calculate the uncertainty. D–S evidence theory has also been applied in the expert system8,9, classification with uncertainty10,11,12, clustering13,14,15, fault diagnosis16,17 and many other fields18,19,20,21, for it does not depend on the prior probability and shows an advantage in dealing with subjective judgement8.

However, a counter-intuitive result may be obtained when using Dempster’s combination rule for the data with high conflict22. How to effectively resolve the paradoxes arising from conflicting evidence has always been a hot issue in the evidence theory. Many scholars have carried out a lot of researches on this problem. One point of view suggests that Dempster’s combination rule is inadequate and a series of new evidence combination rules was proposed23, while the other point of view holds that the evidence source model is defective and some methods of pre-processing the evidence source were provided24,25,26,27. In addition, some scholars believe that the conflict comes from the incompleteness of the frame of discernment and proposed the basic framework of the generalized evidence theory28.

The modification of Dempster’s combination rule was first proposed by Yager29, who pointed out that Dempster’s combination rule constructed a new confidence structure based on the information without conflict, while the normalization step ignored evidence contradiction, so that the results might be contrary to the common sense. He argued that when the frame of discernment was a finite set, the conflict should be removed and assigned to the whole set. The modification of Dempster’s combination rule is mainly to solve the problem of redistributing conflict. In 2002, Lefevre et al.30 proposed the generalized framework for the fusion of information, which can be taken as a representative of this kind of method and other modification rules can be derived from this generalized framework.

The common idea for modifying evidence sources is by using the evidence discount method, and the degree of conflict between evidence should be measured before that. Jousselme et al.31 proposed a distance formula to measure the conflict degree between two bodies of evidence in 2001. Liu32 combined the classical conflict coefficient and Pignistic probability33 to measure conflict in the form of binary group. Silva34 defined three classes of conflict to measure different levels of conflict and provided a multi-criteria analysis method. Deng et al.35 introduced the distance function to obtain the credibility of each evidence, which was used as the coefficient for evidence discount and data fusion. Additionally, there are also lots of research about the measure of the uncertainty of evidence for evidence evaluation and modification36,37. Deng entropy is also defined and extended to measure the uncertainty of evidence38,39. Han et al.40 proposed an optimization method based on the fuzzy membership function to measure the uncertainty. Besides, evidential reasoning (ER) model41,42,43 can also be used for uncertain evidence modeling and processing to address the problem for combination of evidence with high conflict.

Recently, Bi et al.44 enhanced evidence similarity measure based on the Tanimoto measure to modify the evidence sources based on the credibility of different evidence and the weights of conflicted parts of evidence. Li et al.45 proposed a new discount coefficient by Deng entropy and the entropy difference between initial body of evidence and its negation. Jing et al.46 proposed a new base basic probability assignment method to solve the conflict problem by assigning the base belief to basic events in the frame of discernment. Wu and Tang integrated the ambiguity measure to measure the uncertainty of subjective assessment and improve the mode and effects analysis model based on the framework of D–S evidence theory47.

The contribution of this paper is as follows. First, the reasons for the fusion paradoxes with high conflict data are expounded. Then, inspired by the evidence discount method, an improved method for eliminating paradoxes by integrating the cosine similarity of evidence and belief entropy to adjust the evidence sources is proposed in this paper. The approach includes two parts for the generation and combination of BPA respectively, and both the reliability and uncertainty of evidence are taken into consideration. Finally, some examples and experiments are provided to demonstrate the effectiveness, superiority and rationality of the proposed method.

This rest of this paper is organized as follows. The preliminaries, including some basic concepts and theories, are introduced in “Preliminaries”. In “A new BPA generation and combination method”, a novel generation and combination method based on modifying evidence sources is proposed. Some numerical examples are presented in “Numerical examples” to verify the effectiveness and superiority of the method in dealing with both conflicting data and non-conflicting data. “Application” presents the experiments on Iris and wine datasets. “Conclusion” is the conclusion of this paper.

Preliminaries

Dempster–Shafer evidence theory

Basic concepts

Several important concepts in evidence theory should be introduced at first, such as the frame of discernment, the mass function, focal element, the belief function (Bel) and the plausibility function (Pl).

Definition 1

The frame of discernment7 is defined as a complete set composed of N mutually exclusive elements, which can be expressed as:

$$\begin{aligned} \Theta =\{{{\Theta }_{1}},{{\Theta }_{2}},\ldots ,{{\Theta }_{N}}\} \end{aligned}$$
(1)

Definition 2

The mass function7, which is also called basic probability assignment (BPA), is a mapping: \(m:{{2}^{\Theta }}\rightarrow [0,1]\), that satisfies:

$$\begin{aligned} m(\varnothing )=0,\sum \limits _{A\subseteq \Theta }{m(A)=1} \end{aligned}$$
(2)

where \(\Theta \) is the frame of discernment, \({{2}^{\Theta }}\) is the set composed of all subsets of \(\Theta \), m(A) is the mass function of A , namely, the reliability assigned to A.

Definition 3

Focal element7 refers to any subset A of the frame of discernment that satisfies the condition:

$$\begin{aligned} m(A)>0 \end{aligned}$$
(3)

Definition 4

The belief function7 \(Bel:{{2}^{\Theta }}\rightarrow [0,1]\) is defined as:

$$\begin{aligned} Bel(A)=\sum \limits _{B\subset A}{m}(B) \end{aligned}$$
(4)

Definition 5

The plausibility function7 \(Pl:{{2}^{\Theta }}\rightarrow [0,1]\) is defined as:

$$\begin{aligned} Pl(A)=\sum \limits _{B\cap A\ne \varnothing }{m}(B) \end{aligned}$$
(5)

For any subset A of the frame of discernment, Pl(A) is called the upper bound probability of the proposition A, which represents the potential evidence support degree for proposition A. The length of the belief interval [Bel(A), Pl(A)] represents the imprecision of proposition A.

Dempster’s combination rule

Mark \(Be{{l}_{1}}\) and \(Be{{l}_{2}}\) as the belief functions of the frame of discernment \(\Theta \), and the mass functions are \({{m}_{1}}\) and \({{m}_{2}}\) respectively. \({{m}_{1}}\) and \({{m}_{2}}\) are obtained from two independent sources of evidence, and the corresponding focal elements are \({{A}_{1}},{{A}_{2}},\ldots ,{{A}_{k}}\) and \({{B}_{1}},{{B}_{2}},\ldots ,{{B}_{k}}\). When the following condition is satisfied:

$$\begin{aligned} \sum \limits _{{{A}_{i}}\cap {{B}_{j}}=\varnothing }{{{m}_{1}}({{A}_{i}}){{m}_{2}}({{B}_{j}})<1} \end{aligned}$$
(6)

For any subset \(\text {A}\subset \Theta \):

$$\begin{aligned} m(\varnothing )=0,m(A)=\frac{\sum \nolimits _{{{A}_{i}}\cap {{B}_{j}}=A}{{{m}_{1}}({{A}_{i}}){{m}_{2}}({{B}_{j}})}}{1-\sum \nolimits _{{{A}_{i}}\cap {{B}_{j}}=\varnothing }{{{m}_{1}}({{A}_{i}}){{m}_{2}}({{B}_{j}})}} \end{aligned}$$
(7)

\(m:{{2}^{\Theta }}\rightarrow [0,1]\), resulting from the fusion of \({{m}_{1}}\) and \({{m}_{2}}\) according to Dempster’s combination rule, can be verified as a mass function. Mark the conflict part as \(k=\sum \nolimits _{{{A}_{i}}\cap {{B}_{j}}=\varnothing }{{{m}_{1}}({{A}_{i}}){{m}_{2}}({{B}_{j}})}\), and the greater the k is, the greater the conflict degree between the evidence is. The associative and commutative laws are satisfied when evidence is fused with Dempster’s combination rule:

$$\begin{aligned} ({{m}_{1}}\oplus {{m}_{2}})\oplus {{m}_{3}}={{m}_{1}}\oplus ({{m}_{2}}\oplus {{m}_{3}}) \end{aligned}$$
$$\begin{aligned} {{m}_{1}}\oplus {{m}_{2}}={{m}_{2}}\oplus {{m}_{1}} \end{aligned}$$

High conflict data fusion problem in the classical D–S evidence theory

The classical method (DS evidence theory) cannot deal with conflicting data fusion because it usually produces counter-intuitive results. Zadeh48 gave an example:

Let the frame of discernment

$$\Theta =\{a,b,c\}$$

and the two BPAs are as follows:

$$m_1(\{a\}) = 0.99; m_1(\{b\}) = 0.01; m_1(\{c\}) = 0.$$
$$m_2(\{a\}) = 0; m_2(\{b\}) = 0.01; m_2(\{c\}) = 0.99.$$

According to the D–S evidence theory:

$$\begin{aligned} m({a}) = 0; m({b}) = 1; m({c}) = 0 \end{aligned}$$
(8)

Surprisingly, the BPA of \(\{b\}\) is 1, which is obviously counter-intuitive. There are many such examples and real cases. So we should improve the classical D–S evidence theory if we want to handle high conflict evidence.

Evidence modification and fusion methods

Some typical evidence modification and fusion methods are used for comparison with the proposed method.

Yager’s method

Yager29 proposed a new method to handle the conflict evidence. He added conflict information as unknown information to \(m(\Theta )\) and therefore improved the classical combination rules as follows:

$$\begin{aligned} m(\emptyset )=0 \end{aligned}$$
(9)
$$\begin{aligned} m(A)= \left\{ \begin{array}{l}\sum _{A_i\bigcap B_i=A}m_1(A_i)m_2(B_i) \qquad A\ne \Theta \\ \\ \sum _{A_i\bigcap B_i=\Theta }m_1(A_i)m_2(B_i)+k \qquad A=\Theta \end{array}\right. \end{aligned}$$
(10)
$$\begin{aligned} k=\sum _{A_i\bigcap B_i=\emptyset }m_1(A_i)m_2(B_i) \end{aligned}$$
(11)

Murphy’s method

Murphy49 did not change the classical combination rules. He changed the model. Suppose there are n evidences, this method first averages the BPA of n evidences to obtain \(m_{avg}\): \(f_{DS}(S_1,S_2)\) represents the DS combination rules of two evidence sources. And \(m^i\) represents BPA after the ith iteration. The Murphy combination rule are as follows:

$$\begin{aligned} m^1=f_{DS}(m_{avg},m_{avg}) \end{aligned}$$
(12)
$$\begin{aligned} m^i=f_{DS}(m^{i-1},m_{avg}), i\ge 2 \end{aligned}$$
(13)

Dubois and Prade’s method

Dubois and Prade50 proposed a different method to handle conflict evidence. Let us suppose the source \(E_1\) supports the subset A with the mass of belief \(m_1(A)\), and the source \(E_2\) supports the subset B with the mass of belief \(m_2(b)\). The combination rule are as follows:

$$\begin{aligned} m(C)=\sum _{A\bigcap B=C}m_1(A)m_2(B)+\sum _{A\bigcap B=\emptyset ,A\bigcup B=C}m_1(A)m_2(B) \end{aligned}$$
(14)

Belief entropy

Definition 6

Deng entropy38 can be presented as follows:

$$\begin{aligned} {{E}_{D}}=-\sum \limits _{A\subseteq \Theta }{m(A){{\log }_{2}}\frac{m(A)}{{{2}^{|A|}}-1}} \end{aligned}$$
(15)

Where m is the mass function, \(\Theta \) is the frame of discernment, |A| is the cardinality of A. When BPA degenerates into probability, confidence entropy degenerates into Shannon entropy.

Mahalanobis distance (M-distance)

Definition 7

The M-distance between a point x and the population can be expressed as:

$$\begin{aligned} {{d}^{2}}(x,\mu )={{(x-\mu )}^{T}}{{\Sigma }^{-1}}(x-\mu ) \end{aligned}$$
(16)

Where, \(\mu \) is the mean of the population and \( \Sigma \) is the covariance matrix of the population. If \(\mu \) and \(\Sigma \) are unknown, they can be replaced by the sample mean and the sample covariance matrix.

Interval similarity

Definition 8

Mark \(A=[{{a}_{1}},{{a}_{2}}],B=[{{b}_{1}},{{b}_{2}}]\) as two intervals. The similarity between them can be defined as follows:

$$\begin{aligned} S(A,B)=\frac{1}{1+\alpha D(A,B)} \end{aligned}$$
(17)

Where, \(\alpha (\alpha >0)\) is the supporting coefficient, and D(AB) is the distance between interval A and interval B. The supporting coefficient \(\alpha \) regulates the dispersion degree of the similarity. When interval A degenerates into a point and interval B is reduced to a countable set, D(AB) and S(AB) are the distance and similarity between the point and the set.

A new BPA generation and combination method

Explanation of the zero mass function

D–S evidence theory deals with the uncertainty when assessing and forecasting the unknown. The data and results obtained from different sensors or experts are the reflections of this uncertainty. The data and results are different because the perspectives, thinking modes and knowledge backgrounds of the sensors or experts are different. Fusion paradoxes often occur with highly contradicting evidence. It can be noticed that the paradox arises because one BPA for some focal element is assigned to 0, but other BPAs for that focal element are assigned to higher values. When it is assumed that all sensors are not faulty and all the experts are qualified, the conflict between non-zero and zero BPAs should be paid attention to. The existence of a non-zero BPA for a focal element means some corresponding evidence have been found to support it. Those zero BPAs for that focal element can be interpreted as no evidence collected or a lack of knowledge, so the zero BPAs need to be modified. Inspired by the similarity distance between the evidence31 and the belief entropy38, a new combination method based on weighted discounting of data sources was proposed with the consideration of reliability and uncertainty.

We used Mahalanobis distance to measure the similarity between the test sample and each focal element of the frame of discernment to determine BPA. Then cosine similarity and belief entropy were respectively used to measure the reliability and uncertainty of each BPA to make adjustments and generate a standard BPA. In general, we integrate the cosine similarity of evidence and belief entropy to adjust the evidence sources.

Definition of cosine similarity

Before the introduction of the method, the definition of cosine similarity considering the relationships between focal elements needs to be explained.

Definition 9

Assume that \({{m}_{i}}\) and \({{m}_{j}}\) are two independent mass functions: \({{2}^{\Theta }}\rightarrow [0,1]\), and \({{X}_{i}},{{X}_{j}}\) refer to two independent evidence vectors generated by BPA on each focal elements of the frame of discernment \(\Theta \). The cosine similarity between two evidence vectors can be presented in the following expression:

$$\begin{aligned} {{s}_{ij}}=\frac{{{X}_{i}}QX_{j}^{T}}{\sqrt{{{X}_{i}} QX_{i}^{T}}\sqrt{{{X}_{j}}QX_{j}^{T}}}~ \end{aligned}$$
(18)

where Q is the adjustment matrix:

$$\begin{aligned} Q=({{k}_{ij}}),{{k}_{i}}_{j}=\frac{\left| {{A}_{i}}\cap {{A}_{j}} \right| }{\left| {{A}_{i}}\cup {{A}_{j}} \right| } \end{aligned}$$
(19)

\( {{A}_{i}}\), \({{A}_{j}} \) are the focal elements.

BPA generation and combination method

The steps of BPA generation and combination are as follows.

Step 1: BPA generation.

Step 1.1: Calculate the distance \({d(x_{i},\mu _{j})}\) between the sample to be identified \(x_{i}\) and each known population with the mean of \(\mu _{j}\) according to Eq. (16).

Step 1.2: Take the distance \({d(x_{i},\mu _{j})}\) as \(D(A_{i},B_{j})\) and calculate the similarity \(S\left( A_{i},B_{j} \right) \) between the sample to be identified \(A_{i} (x_{i})\) and each known population \(B_{j} \mu _{i}\) according to Eq. (17).

Step 1.3: Normalize the similarity \(S\left( A_{i},B_{j} \right) \) to obtain BPA.

Step 2: BPA combination.

Step 2.1: Calculate the similarity between the evidence vectors. With the independent evidence sources \({{m}_{i}}(i=1,2,\ldots ,l)\), corresponding evidence vectors are marked as \({{X}_{i}}(i=1,2,\ldots ,l)\), and the similarity \({s}_{ij}\) between \({{X}_{i}}\) and \({{X}_{j}} (i,j=1,2,\ldots ,l)\) can be calculated according to the Eq. (18).

Step 2.2: Generate the similarity matrix \(M = ({{s}_{ij}})\).

Step 2.3: Calculate each similarity weight \({{\omega }_{i}}\) by the following formula:

$$\begin{aligned} {{\omega }_{i}}=\frac{\sum \nolimits _{j\ne i}{{{s}_{ij}}}}{\sum \nolimits _{i}{\sum \nolimits _{j\ne i}{{{s}_{ij}}}}} \end{aligned}$$
(20)

Step 2.4: Update the BPA. The method is shown in Algorithm 1.

Step 2.5: Calculate the standard BPA \({{m}_{std}}\) by the entropy weight method. The belief entropy \({{E}_{i}}\) of each mass function can be calculated by Eq. (15). Then the entropy weight for each BPA can be defined as:

$$\begin{aligned} {{\theta }_{i}}=\frac{1-{{e}_{i}}}{\sum \nolimits _{k=1}^{l} {(1-{{e}_{i}})}},{{e}_{i}}=\frac{{{E}_{i}}}{2|\Theta |{{\log }_{2}}({{2}^{|\Theta |}}-1)} \end{aligned}$$
(21)

The standard BPA can be calculated as follows:

$$\begin{aligned} {{m}_{\text {std}}}({{A}_{j}})=\sum \limits _{k=1}^{l}{{{\theta }_{k}}{{m}_{k}}({{A}_{j}})} \end{aligned}$$
(22)

Step 2.6: Fuse the mass functions including the standard BPA to obtain the final BPA according to Eq. (6):

$$\begin{aligned} m={{m}_{1}}\oplus {{m}_{2}}\oplus \cdots \oplus {{m}_{l}}\oplus {{m}_{std}} \end{aligned}$$
(23)
figure a

Numerical examples

The steps to use the proposed BPA generation and combination method are shown in Fig. 1. When the original BPAs are unknown, BPAs are generated by the proposed method. Then, modified BPAs and standard BPA can be fused by the proposed method. and judgment and decision can be made according to the final BPA. While when the original BPA is known, modification fusion of BPA is performed.

Figure 1
figure 1

Flow chart for the proposed method.

An illustrative example

Example 1

Assume that a source reported by four information sources (e.g., sensors) of one sample are identified as (7.6, 3.0, 6.6, 2.1). This sample belongs to one of three categories, so the frame of discernment can be marked as \(\{A, B, C\}\). The means and variances of each category can be calculated based on the four information sources. Table 1 shows the means and variances of category A.

According to the proposed method, the BPAs of the sample to be identified can be generated and fused based on the following steps to determine the category of the sample.

Table 1 The means and variances of category A.

Step 1.1: Calculate the distance between the sample and each focal element in the frame of discernment, the results are in Table 2:

Table 2 The distance between the sample and each focal element.

Step 1.2: Calculate the similarity between the sample and each focal element in the frame of discernment, the results are in Table 3:

Table 3 The similarity between the sample and each focal element.

Step 1.3: Calculate the BPA of the sample on each focal element in the frame of discernment, the results are in Table 4:

Table 4 The generated BPAs.

The original BPAs do not need to be adjusted. Jump to Step 2.5.

Step 2.5: Calculate the standard BPA according to the entropy weight method:

$$BPA_{std}=(0.1220, 0.1373, 0.1578, 0.1028, 0.1743, 0.1928, 0.1150)$$

Step 2.6: Fuse the BPAs including the standard BPA to obtain the final BPA:

$$BPA=(0.07407, 0.1163, 0.7849, 0.0050, 0.0112, 0.0111, 0.0004)$$

Therefore, it can be judged that the sample to be identified is most likely to belong to category C.

BPA combination on data with high conflict

The proposed method is useful in solving the classical paradoxes. Some numerical examples are as follows, among which the detailed calculation process for Example 2 is shown.

Example 2

The frame of discernment is \(\{a, b, c\}\), and the mass functions are shown in Table 551.

When the Dempster’s combination rule is used, the outcomes are as follows:

$$\begin{matrix} m(a)=0.5 &{} {} &{} m(ab)=0.3 &{} {} &{} m(ac)=0.2 \\ m(b)=0 &{} {} &{} m(c)=0 &{} {} &{} m(abc)=0 \\ \end{matrix}$$

Since only the first evidence is presented, the BPA combination loses its meaning. The calculation process according to the method in this paper is as follows:

Table 5 Data of Example 2.

Step 2.1: According to Eq. (19), the adjustment matrix can be calculated.

$$Q=\left[ \begin{matrix} 1 &{} \text {1/2} &{} \text {1/2} &{} 0 &{} 0 &{} 1/3 \\ \text {1/2} &{} 1 &{} 1/3 &{} \text {1/2} &{} 0 &{} 2/3 \\ 1/2 &{} 1/3 &{} 1 &{} 0 &{} \text {1/2} &{} 2/3 \\ 0 &{} \text {1/2} &{} 0 &{} 1 &{} 0 &{} 1/3 \\ 0 &{} 0 &{} \text {1/2} &{} 0 &{} 1 &{} 1/3 \\ 1/3 &{} 2/3 &{} 2/3 &{} 1/3 &{} 1/3 &{} 1 \\ \end{matrix} \right] $$

Step 2.2: The evidence similarity matrix can be calculated according to Eq. (18).

$$M=\left[ \begin{matrix} 1.0000 &{} 0.2880 &{} 0.3590 \\ 0.2880 &{} 1.0000 &{} 0.9873 \\ 0.3590 &{} 0.9873 &{} 1.0000 \\ \end{matrix} \right] $$

Step 2.3: According to Eq. (20), the weight of each evidence can be calculated, and the results are as follows:

$$\begin{aligned} &{{\omega }_{1}}=0.1979\text { }, {{\omega }_{2}}=0.3902, {{\omega }_{3}}=0.4119 \end{aligned} $$

Step 2.4: After the updates of zero BPAs and the normalization processing, the new mass functions are shown in Table 6.

Table 6 Results of Example 2.

Step 2.5: The belief entropy weights of the new mass functions can be obtained according to Eq. (15):

$$\begin{aligned} &{{\theta }_{1}}=0.3219\text { } , {{\theta }_{2}}=0.3449 , {{\theta }_{3}}=0.3332 \end{aligned} $$

And the standard mass functions are as follows:

$$\begin{aligned} &{{m}_{std}}(a)=0.1416,\text { }{{m}_{std}}(ab)=0.0850 \\&{{m}_{std}}(ac)=0.0567,{{m}_{std}}(b)=0.3943 \\&{{m}_{std}}(c)=0.2150,\text { }{{m}_{std}}(abc)=0.1074 \\ \end{aligned} $$

Step 2.6: The Dempster’s combination rule is used to fuse \({{m}_{1}},{{m}_{2}},{{m}_{3}},{{m}_{std}}\), and the final mass functions are:

$$\begin{aligned} &m(a)=0.1209,\text { }m(ab)=0.0081 \\&m(ac)=0.0039,m(b)=0.7400 \\&m(c)=0.1264,\text { }m(abc)=0.0007 \\ \end{aligned} $$

This result is well interpretable for the original data.

Example 3

The frame of discernment is \(\{A, T, C\}\), and the mass functions are as follows:

$$\begin{aligned} &{{m}_{1}}(A)=0.99,\text { }{{m}_{1}}(T)=0.01,\text { }{{m}_{1}}(C)=0 \\&{{m}_{2}}(A)=0,\text { }{{m}_{2}}(T)=0.01,\text { }{{m}_{2}}(C)=0.99 \\ \end{aligned} $$

When the Dempster’s combination rule is used, the outcomes are as follows:

$$\begin{aligned} m(A)=0,\text { }m(T)=1,\text { }m(C)=0 \end{aligned}$$

Obviously, the consequence goes against intuition. After using the method proposed in this paper, the final outcomes are as follows:

$$\begin{aligned} m(A)=0.5000,\text { }m(T)=0.0000,\text { }m(C)=0.5000 \end{aligned}$$

It can be seen that the final results correct the fusion paradox.

Example 4

The frame of discernment is \(\{A, B, C\}\), and the mass functions are in Table 7.

Table 7 Data of Example 4.

The results with the Dempster’s combination rule are:

$$\begin{aligned} m(A)=0,\text { }m(B)=0,\text { }m(C)=1 \end{aligned}$$

By using the method in this paper, the final results are consistent with intuition:

$$\begin{aligned} m(\text {A})=0.9997,\text { }m(B)=0.0000,\text { }m(C)=0.0003 \end{aligned}$$

Example 5

The frame of discernment is \(\{A, B, C\}\), and the mass functions are as follows:

$$\begin{matrix} {{m}_{1}}(A)=0.5 &{} {} &{} {{m}_{1}}(B)=0.2 &{} {} &{} \text { }{{m}_{1}}(C)=0.3 \\ {{m}_{2}}(A)=0 &{} {} &{} {{m}_{2}}(B)=0.9 &{} {} &{} {{m}_{2}}(C)=0.1 \\ {{m}_{3}}(A)=0.55 &{} {} &{} {{m}_{3}}(B)=0.1 &{} {} &{} {{m}_{3}}(C)=0.35 \\ {{m}_{4}}(A)=0.55 &{} {} &{} {{m}_{4}}(B)=0.1 &{} {} &{} {{m}_{4}}(C)=0.35 \\ {{m}_{5}}(A)=0.55 &{} {} &{} {{m}_{5}}(B)=0.1 &{} {} &{} {{m}_{5}}(C)=0.35 \\ \end{matrix}$$

The results obtained by the proposed method are compared with Dempster’s, Murphy’s and Deng’s methods, as shown in Table 8.

Table 8 Comparison of different methods of Example 5.

It is noticed that the new approach is more supportive for the target (focal element A) with the BPA being 0.9808.

BPA combination on data without high conflict

The combination method in the proposed method not only performs well on highly conflicting data, but also shows superiority on the data without high conflict. Some numerical examples are as follow.

Example 6

The frame of discernment is \(\{A, B\}\), and the mass functions are as follows:

$$\begin{matrix} {{m}_{1}}(A)=0.2 &{} {} &{} {{m}_{1}}(B)=0.3 &{} {} &{} \text { }{{m}_{1}}(AB)=0.5 \\ {{m}_{2}}(A)=0.5 &{} {} &{} {{m}_{2}}(B)=0.3 &{} {} &{} {{m}_{2}}(AB)=0.2 \\ \end{matrix}$$

The results with the Dempster’s combination rule are:

$$\begin{aligned} m(A)=0.4937, m(B)=0.3797, m(AB)=0.1266 \end{aligned}$$

The belief entropy of this group of BPAs can be calculated as 1.2394. By using the method proposed in this paper, the final results are as follows:

$$\begin{aligned} m(\text {A})=0.5425,\text { }m(B)=0.3959,\text { }m(AB)=0.0615 \end{aligned}$$

The belief entropy of the modified group of BPAs is 1.0417. By using the proposed method, the belief entropy decreases, which means the reduction in uncertainty of the judgment. In addition, the results of the proposed method provide more explicit support for A, while the support for the other two focal elements declines, which facilitates the judgment and decision making.

Example 7

The frame of discernment is \(\{A, B, C\}\), and the mass functions are shown in Table 9.

Table 9 The data of Example 7.

It can be calculated that the belief entropys of the original BPAs are 2.6520 and 3.0517, respectively. The results with the Dempster’s combination rule are as follows:

$$\begin{matrix} m(A)=0.3924, m(B)=0.2076, m(C)=0.1671 \\ m(AB)=0.0810, m(AC)=0.1063, m(ABC)=0.0456\\ \end{matrix}$$

The belief entropy of this group of BPAs can be calculated as 2.2984, which is close to that of the original BPAs. The uncertainty is still large, which is not conducive to further judgment. By using the method in this paper, the final results are as follows:

$$\begin{matrix} m(A)=0.3046, m(B)=0.3692, m(C)=0.2269 \\ m(AB)=0.0390, m(AC)=0.0390, m(ABC)=0.0153\\ \end{matrix}$$

The belief entropy of the modified group of BPAs is 1.7905, and the BPAs on focal elements AB, AC and ABC decrease. This result significantly reduces the belief entropy and enhances the certainty. In addition, the results with the Dempster’s combination rule are more supportive for focal element A, while the results of the proposed method support focal element B more.Compared with the original data with \({{m}_{1}}(A)=0.3, {{m}_{1}}(B)=0.3\) and \({{m}_{2}}(A)=0, {{m}_{2}}(B)=0.1 \), the result of the proposed method is more credible.

Convergence verification

Example 8

The frame of discernment is \(\{A, B, C\}\), and the mass functions are shown in Table 10.

The BPAs obtained by the method proposed are compared with that obtained by Dempster’s, Yager’s and Murphy’s methods. The results are shown in Table 11.

Table 10 The data of Example 8.
Table 11 The comparison of fusion BPA of Example 8.

According to Table 11, there is high conflict between \(m_{3}\) and other mass functions. With Dempster’s combination rule, the outcomes of the fusion including \(m_{3}\) assign a high value to focal element B, which is inconsistent with the original data. When the method proposed in this paper is used to fuse BPAs, due to the high value of the third BPA on focal element B, the support for A after fusing m1, m2 and m3 is lower than that with the fusion of m1 and m2, while the support degree of B is the opposite. With the addition of BPAs supporting focal element A, the fusion BPA on A gradually increased. Besides, the BPAs of AB, AC and ABC decreased with the increase of fusion times, which means the reduction of uncertainty. It can be seen that the fusion results of the method proposed are consistent with the original BPA, and compared with Yager’s and Murphy’s methods, the fusion BPA can quickly converge to the focal element as the number of BPAs supporting that focal element involved in fusion increases.

Example 9

The frame of discernment is \(\{A, B, C\}\), and the mass functions are as follows:

$${{m}_{1}}(A)=0.7,{{m}_{1}}(AB)=0.3, {{m}_{i}}(C)=0.6,{{m}_{i}}(ABC)=0.4,i=2,3,4,5. $$

The fusion results of Dempster’s combination rule and the proposed method are shown Table 12.

The results obtained by Dempster’s combination rule only reflect the value of \(m_{1}\), while the fusion results of the method proposed gradually converge to focal element A. Figure 2 reflects the changes of the fusion BPAs on four focal elements combined by two methods with the increase of fusion times.

Table 12 The comparison of fusion BPA of Example 9.
Figure 2
figure 2

The changes of the fusion BPAs.

In other methods used, it can be noticed that with the addition of \(m_{i},i=2,3,4,5\), the support degree of fusion BPA calculated by three methods for focus elements A and AB gradually decreased, while the support degree for C gradually approached 1. But the fusion BPA calculated by proposed method can converge to the focal element more quickly than other two methods. In addition, the fusion BPA of ABC increased first but then decreased and tended to 0. This was because the high value of \(m_{i}, i = 2,3,4,5\) on ABC provide more support for ABC. However, the increase of the BPA on ABC would increase the uncertainty of the results. The belief entropy of the fusion BPA would gradually decrease when fusing with the method proposed. Therefore, even if the fusion BPA of ABC increased for a short time, it would tend to 0 with the increase of fusion times. However, the variation of bps calculated by the other two methods is much larger than that by the proposed method. As a consequence, compared with Yager’s and Dubios and Prade’s, we can see that the BPA calculated by the new method is more stable.

Application

All the following experiments and calculations are done in matlab software with C program language. The hardware for computation has the following parameters: Intel Core i7-8700, the CPU is 3.20 GHz, and with a 16 G RAM.

The classification of Iris

Classify Iris by the proposed method

The method was used in the identification of iris. The test set and training set were divided according to 8:2, in which 120 data were taken as known datasets, and the remaining 30 data were samples to be identified. The three categories of iris were used to generated the frame of discernment, which can be expressed as \(\{Setosa (Se), Versicolor (Ve), Virginica (Vi)\}\). The four attributes of iris, SLSWPLPW, can be regarded as four sensors. The experiment process is shown in Fig. 3.

Figure 3
figure 3

Prediction process on iris sets.

Figure 4 shows the training data distribution on attributes SL and SW, and the data range and mean of each category Se, Ve, Vi are labeled in the graph. The mean and variance of each focal element can be estimated by the sample distribution. It can be noticed that the data of Se shows obvious differences from that of Ve and Vi on the attribute SL and SW, while the data distributions of Ve and Vi overlap to a certain degree.

Figure 4
figure 4

The distribution of iris data.

The BPAs of samples to be identified on each attribute generated in accordance with the proposed method are shown in Fig. 5. For observation purposes, only the BPAs of the first two samples are displayed. Blue and Yellow lines represent the BPAs on each focal element of the first two samples respectively. It can be seen that there is no significant conflict between BPAs of the sample data. For categories Ve and Vi, BPAs are close, and the proposed fusion method was used for further calculation and judgment.

Figure 5
figure 5

The generated BPAs of the first two samples.

BPAs were modified according to the above steps, and the standard BPAs of each sample to be identified were obtained. The BPAs on the four attributes and the standard BPA were fused, and the final BPA was used to classify the iris. The predicted results on the test set were calculated, and the prediction accuracy of all the samples to be identified is 96.7%. The prediction outcomes are shown in Fig. 6.

Figure 6
figure 6

The predicted results on iris data with the proposed method.

Where yellow points are the prediction results, and blue points refer to the real category. Only one sample was misclassified. Check the BPA for misclassified sample. It can be found that the BPA for Ve, 0.3770, is slightly less than the BPA for Vi, 0.4143.

Comparison with the Dempster’s combination rule

The accuracy obtained by the Dempster’s combination rule is also 96.7%, and the misclassified sample is the same as that with the proposed method. Compare the BPAs obtained by Dempster’s combination rule and the proposed method. Table 13 shows BPAs of the first six samples.

Table 13 Comparison of BPAs with the two methods.

Take the first sample as an example. The target category is Vi, and the BPA of Vi obtained by Dempster’s combination rule and the proposed method are 0.5271 and 0.5597, respectively. Compare the rest of the samples in the same way, and it can be found that the BPA with the proposed method support the correct result more.

Comparison with the K-nearest neighbors algorithm

The K-nearest neighbors (KNN) algorithm was used to classify Iris. We set the neighbors K = 5. After training the model, the test results are shown in Fig. 7. The horizontal axis represents Sepal Length and the vertical axis represents Sepal Width. We set 0 for Iris-setosa, 1 for Iris-versicolor and 2 for Iris-virginica. The Classification results of model prediction is:

{0 0 1 0 0 0 0 0 0 0 0 1 2 2 1 2 1 1 0 1 1 2 2 0 2 1 1 2 0 0 }.

The real classification results is:

{0 0 1 0 0 0 0 0 0 0 0 1 2 2 1 2 1 1 0 1 1 2 2 0 2 1 1 1 0 0}.

Thus, the accuracy rate of classfication is 96.7%.The proposed method achieves the same accuracy as the KNN algorithm. The accuracy of the proposed method is proved.

Figure 7
figure 7

The predicted results on iris set by KNN algorithm.

Comparison with BP neural network classification method

The BP neural network with a single hidden layer was also used to classify iris. The parameters selection were as follows. There were three nodes in the hidden layer and the learning step was 0.1. After 300 iterations, the convergence rate of the loss function was very slow and the classification results tended to be stable. In this case, the accuracy rate of classification is 96.7%. The prediction results are shown in Fig. 8, where yellow points are the prediction results, and blue points refer to the real category.

Figure 8
figure 8

The predicted results on iris set by BP neural network.

The method proposed in this paper achieves the same accuracy as the BP neural network in the classification of iris, which also demonstrates that the generation and combination of BPA with the method in this paper is effective.

The classification of wine

The method was used in the identification of wine. The test set and training set were divided according to 8:2, in which 1279 data were taken as known datasets, and the remaining 320 data were samples to be identified. These wines can be classified as good, normal and bad, and the frame of discernment can be expressed as \(\{Bad (B), Normal (N), Good (G)\}\). The 11 attributes of wine can be regarded as 11 sensors.

Figure 9 reflects the distribution of wine data on fixed acidity and sulphates, and the data range and mean of each category B, N, G are labeled in the graph. The mean and variance of each focal element can be estimated by the sample distribution. It can be noticed that the distribution of these three categories are similar, which adds to the difficulty of classification.

Figure 9
figure 9

The distribution of wine data.

Comparation of the classification with the two methods, BP neural network and decision tree

The method proposed in this paper and the Dempster’s combination rule were used for BPA generation and fusion, and both the accuracy of classification are 58.13%.

The classification result of the sixth sample is wrong. Take it as an example to analyze the cause of the error. Sample 6 should be “Normal”, but both the results obtained by the two methods are “Good”. The BPAs for “Good” are 0.3509 and 0.3508 respectively, and the BPA for “Normal” are 0.3447 and 0.3447, respectively. There is no significant difference in confidence between the two types, which makes the judgment difficult. Table 14 shows the generated BPAs and the fusion BPA with two methods of the sixth sample. It can be observed that there are 7 original BPAs support “Good” more than “Normal”. Thus, the error of judgment in this case comes from the inconspicuous feature of the sample.

Table 14 The BPA of each sample.

The BP neural network with a single hidden layer was also used to classify wine. The parameters selection were as follows: there were six nodes in the hidden layer and the learning step was 0.1. After 500 iterations, the classification results tended to be stable. In this case, the accuracy rate of classification is 60.0%. When using the cart tree method in the decision tree for classification, the final accuracy is 57.19%. The KNN algorithm was also applied to the classification of wine. First we divide the test set and training set. Then we normalized the data to speed up the convergence of the training network and increase the accuracy. The final accuracy is 55.94%, which is lower than the accuracy calculated by the proposed method.

For the classification of wine, the accuracy obtained by the proposed method is higher than the results obtained by the KNN algorithm and decision tree. And it is close to The BP neural network with a single hidden layer. The proposed method has reached a relatively high accuracy in this problem, which also demonstrates the effectiveness of the proposed method.

Conclusion

In this paper, a novel BPA generation and combination method was proposed. The evidence source was modified by reinterpreting the BPA with the value being 0, and the weighted standard BPA was obtained with consideration of both the reliability and uncertainty of the evidence. The results were well verified in the correction of various highly conflicting data fusion paradoxes, and the fusion results with the method proposed show higher degree of support for correct target than the existing classical methods for conflicting data fusion. Besides, compared with Dempster’s combination rule, the proposed method also reduces the belief entropy of the final BPA, which means the reduction of the uncertainty of the results, and is beneficial for decision making.

Additionally, experiments on the datasets demonstrate the rationality of the method. The public iris and wine datasets were used in the experiment. With BPA generation and combination method proposed in this paper, the classification accuracy of the iris is 96.7%, as high as the result from the BP neural network with single hidden layer. For the wine quality classification experiment, the accuracy with the proposed method is 58.13%, slightly lower than the result from the BP neural network with single hidden layer, but slightly higher than that from cart decision tree and KNN algorithm.

Some possible directions for the following work are as follows. Since the modification value of the zero BPA was obtained by taking a weighted average of all the evidence, the assignment is only an estimate in the average level. However, the real BPA provided by the sensors in the same condition is difficult to estimate. How to update BPA in a more reasonable way is still a question to be discussed.