Learning and predicting the unknown class using evidential deep learning

In practical deep-learning applications, such as medical image analysis, autonomous driving, and traffic simulation, the uncertainty of a classification model’s output is critical. Evidential deep learning (EDL) can output this uncertainty for the prediction; however, its accuracy depends on a user-defined threshold, and it cannot handle training data with unknown classes that are unexpectedly contaminated or deliberately mixed for better classification of unknown class. To address these limitations, I propose a classification method called modified-EDL that extends classical EDL such that it outputs a prediction, i.e. an input belongs to a collective unknown class along with a probability. Although other methods handle unknown classes by creating new unknown classes and attempting to learn each class efficiently, the proposed m-EDL outputs, in a natural way, the “uncertainty of the prediction” of classical EDL and uses the output as the probability of an unknown class. Although classical EDL can also classify both known and unknown classes, experiments on three datasets from different domains demonstrated that m-EDL outperformed EDL on known classes when there were instances of unknown classes. Moreover, extensive experiments under different conditions established that m-EDL can predict unknown classes even when the unknown classes in the training and test data have different properties. If unknown class data are to be mixed intentionally during training to increase the discrimination accuracy of unknown classes, it is necessary to mix such data that the characteristics of the mixed data are as close as possible to those of known class data. This ability extends the range of practical applications that can benefit from deep learning-based classification and prediction models.

www.nature.com/scientificreports/ the uncertain class; that is, data whose class is unknown by the network (henceforth, known as class u), and the probability p u of class u is not output.In other words, the output is "the predicted class is k with uncertainty b u , " and it is ultimately at the discretion of the model user to determine what value of b u means the result is trustworthy.Second, EDL assumes that the input always belongs to one of the K classes.That is, the output consists of predictions for each class k predicting whether the input belongs to that class, and the uncertainty of each prediction.This is true even for unexpected input data that does not belong to any known class.Examples of such data are outliers that cannot be correctly labeled at the time of training data labeling but are registered as "unknown" for the time being (called "contaminated data" here).
To address these problems, I propose a modified EDL (m-EDL) model that provides an output that predicts whether the input belongs to class u and not class k along with the probability for all K + 1 classes.Consequently, there is no need to determine a threshold at which the user judges the result to be uncertain.Moreover, when the output predicts that the instance belongs to a certain class k, the uncertainty of the prediction is nevertheless available.Finally, in contrast to the training data for EDL, the training data for m-EDL can include instances from class u.Several out-of-distribution (OOD) and open-set learning methods add a class to handle uncertainty 11,36 .In open-set recognition, Neal et al. 37 augmented a dataset with a class of "counterfactual" images.Others explicitly train the classifier with a class of OOD samples near the in-distribution boundary 38 .By contrast, this study does not create an entirely new unknown class and attempt to learn it.Instead, the proposed m-EDL outputs in a natural way the "uncertainty of prediction" that EDLs naturally generate, and uses it as the probability of an unknown class.Only with this simple extension can data, including unknown classes, which EDL cannot handle, be learned.Moreover, the arbitrariness of the threshold, which is a weak point of EDL, can be resolved.In fact, the results of this study show the potential for improving the performance in discriminating unknown classes in test data without having to learn the counterfactual or OOD samples that existing approaches require.
The remainder of this paper is organized as follows."Overview of the proposed model" explains the structure of the proposed m-EDL prediction model and compares it with that of EDL 29 .Additionally, a method for calculating the parameters in m-EDL is introduced and the likelihood calculation method used to train the model is explained.In "Advantages of m-EDL", the advantages of m-EDL modifications are explained."Results" presents the experimental results, "Discussion" discusses the results, and "Methods" presents the methods used in the experiments.

Overview of the proposed model
In this section, I first review the structure of EDL and then present m-EDL. 29using the two-class example shown in Fig. 1a.In this figure, the number of classes K is two (classes A and B); that is k ∈ {A, B}.

EDL. I describe EDL
First, the input is fed to a neural network, and evidence e A and e B for classes A and B respectively are obtained from its output, which is greater than or equal to zero.To train the neural network, Sensoy et al. employed the likelihood function using the sum-of-squares loss to stabilize neural network training 29 .The likelihood is calculated as follows: Here, p = (p 1 , p 2 , . . ., p K ) represents the probabilities for class k, y is 0 or 1 for each class, and B(α) is the beta function for the parameter α k , k ∈ {1, . . ., K} .Sensoy et al. also employed a Kullback-Leibler divergence (or relative entropy and I-divergence) term to regularize the predictive distribution by penalizing the divergences from class u 29 .
The belief mass b k is obtained from the output of the neural network (evidence e k for each class k).In this example, the belief masses b A and b B are obtained using S, where S = k=A,B (e k + 1) .The belief mass for each class k is calculated as follows: Furthermore, the belief mass b u for class u is calculated such that ) is given by the following equation: Similar to the belief masses b k , the Dirichlet distribution parameters α k are obtained from the evidence e k for each class k from the neural network using α k = e k + 1 .These α k parameters are directly used for the Dirichlet distribution.By contrast, the b k are used to check the uncertainty (b u ) and are not used for the distribution.However, Eq. ( 2) reveals that the Dirichlet distribution parameters and belief masses are related, as follows: (1) ( . (3) For the example in Fig. 1a , K − 1 = 1 dimension, as shown in Fig. 1b.The probability distributions p A and p B for each class (A, B) are obtained using the Dirichlet distribution parameters α A and α B , and the condition 1 = k=A,B p k is satisfied.For this example, the result obtained from the Dirichlet distribution is that the expected value that the input belongs to class A is p A = 20% , the expected value that the input belongs to class B is p B = 80% , and the uncertainty of this overall result (b u ) is 30%.The sum of the expected values (i.e., 20% + 80%) satisfies the condition 1 = k=A,B p k .Note that the value of b u is not included in this sum.

m-EDL.
In the proposed m-EDL, an additional class u is added to the original EDL to represent instances that do not belong to a known class.In this section, the extensions needed to EDL to obtain m-EDL are presented.
To obtain evidence from the neural network for all classes, including class u, the likelihood calculation must be extended.Equation (1) . ., p iK , p iu are used to extend Eq. ( 1) to the following: Furthermore, using the relationship of 5) is transformed as follows: (5)   The proposed m-EDL uses a Dirichlet distribution in K-dimensions.To output the Dirichlet distribution as p + = p 1 , p 2 , • • • , p K , p u , the following extension is required after introducing α u : To calculate α u , I first use S = K k=1 (e k + 1) = K k=1 α k and focus on the relationship of b u + K k=1 b k = 1 .These relationships should be satisfied using subjective logic 39 , where the Dempster-Shafer theory is used in the framework of the Dirichlet distribution.
From this point, the extension to class u begins.When b u + K k=1 b k = 1 is transformed using Eq. ( 2), it is expressed as follows: If Eq. ( 2) is further extended to class u and written as b S in Eq. ( 2), then the belief mass of class u can be written as b u = e u S .Hence, the evidence for class u can be written as follows: Equations ( 8)- (10) are obtained by the extension to class u, but they are derived from the relationships between S = K k=1 (e k + 1) = K k=1 α k and b u + K k=1 b k = 1 .Therefore, they are in line with the belief mass of the Dempster-Shafer theory and subjective logic 39 .
For the same two-class example used in "EDL", the structure of the proposed m-EDL is shown in Fig. 2a.In this example, k ∈ {A, B} ; hence, k + ∈ {A, B, u} is defined.As in EDL, the input is fed to the neural network, and evidence e A and e B for classes A and B are obtained from the output of the neural network.Next, belief masses b A and b B are obtained using S such that S = k=A,B (e k + 1) .The belief mass b u for class u is calculated using 1 = k=A,B b k + b u .This b u is used to obtain evidence e u for class u, as described in detail above.The probability distributions p A , p B , and p u for each class (A, B, and u) are obtained using the Dirichlet distribution parameters α A , α B , and α u .These distribution parameters are themselves obtained from the belief masses b A and b B as well as b u , and the condition 1 = k + =A,B,u p k + is satisfied.
The output from m-EDL is a Dirichlet distribution in K dimensions (two dimensions), as shown in Fig. 2b, where the increase in probability density is indicated by hue from blue to red.Furthermore, the results from the Dirichlet distribution are the expected value that the input belongs to class A is p A = 50% , the expected value that the input belongs to class B is p B = 30% , and the expected value that the input belongs to class u (that is, the input cannot be said to belong to either class A or B) is p u = 20% .The sum of these expected probabilities also satisfies As explained in the Supplementary Information and illustrated in Supplementary Fig. S1 (both available online), the expected value p k satisfying 1 = k=A,B p k obtained in EDL can also be obtained from p k + .In the above example, p A = 62.5% and p B = 37.5%.

Advantages of m-EDL
There are two main advantages of m-EDL.First, it is unnecessary to determine the threshold at which the model user will judge the result to be uncertain.As described in "EDL", the EDL model of Sensoy et al. 29 outputs the expected value that the input data class is class A, the expected value that the input data class is class B, and uncertainty ( p A = 20% , p B = 80% , and uncertainty = 30%, respectively), whereas m-EDL outputs the expected value that the input data class is class A, the expected value that the input data class is class B, and the expected value that the input data class is class u; that is, that the input cannot be said to be either class A or B ( p A = 50% , p B = 30% , and p u = 20% , respectively).
The EDL model's output is in the form of input-data prediction classes and the corresponding uncertainty for each class.Hence, an uncertainty threshold must be set 29 to determine whether the results should be used.The accuracy of the model changes according to this threshold 29 .In contrast, m-EDL has an output that includes the expected value for all K classes and class u.These probabilities sum to 1. Therefore, the user can simply choose the class with the highest probability from the K + 1 classes as the predicted class.It is unnecessary to define an uncertainty threshold in the first place.In addition, even when m-EDL predicts a certain k class from K classes, the uncertainty b u is nevertheless available from m-EDL.
Furthermore, training data can include data from class u.I explain why this is the case below.
Here, the likelihood function used for the simple likelihood estimation in Sensoy et al. 's EDL 29 for parameter fitting of the neural network part of EDL (as shown in Fig. 1a) is expressed as follows: where y i is the one-hot vector encoding the ground-truth class of observation x i with y ij = 1 and y ik = 0 for all k = j, and where the jth class is the correct label for observation i.Meanwhile, α ij indicates the K parameters of the Dirichlet distribution for observation i and S i = K j=1 α ik .In Sensoy et al. 's method 29 , it is assumed that the input data belongs to one of the K classes; therefore, the range that index j can take is 1 through K.
By contrast, the m-EDL shown in Fig. 2a introduces the parameter α u of the Dirichlet distribution.That is, it is in the form of j + ∈ {1, • • • , K, u} , which is an extension of j ∈ {1, • • • , K} .Applying this extension to the likelihood function of Eq. ( 11) results in the following, with j + ∈ {1, • • • , K, u}: where y ij + is a one-hot vector that contains class u, indicating that the data labeled as belonging to class u can be included in the training data of m-EDL.
The implications of this extension are as follows.First, it becomes possible to learn a dataset that, for example, consists of handwritten digits 0-9 such as MNIST (ground truth labels 0-9) mixed with a completely different dataset type (correct label u or 10).Additionally, this learning may help determine the accuracy of predictions about whether, for example, the input is a digit from 0 to 9 or is not a digit when non-numeric data are mixed into the test dataset.To answer these questions, several datasets and models were prepared.Conditions that depended on whether data from class u were included in the training and/or test data, as well as which model was used to learn the data, were used in the evaluation.

Performance comparison of EDL and m-EDL on class k data (Q1).
Here, I evaluate whether the performance of m-EDL is comparable to that of EDL in the situation assumed by EDL; that is, the situation where all training and test data belong to class k.In other words, both the training and test data were composed only of images from MNIST, and the following two conditions were compared: (1) the EDL model trained and tested on datasets with no class u data and (2) the m-EDL model trained and tested on datasets with no class u data.
Figure 3 compares the accuracies of EDL (thin solid red line) and m-EDL (thick solid blue line).Each line shows the mean value and the shaded areas indicate the standard deviation.The accuracy of EDL changes with respect to each uncertainty threshold; the accuracy is plotted on the vertical axis with the uncertainty threshold indicated by the horizontal axis.The accuracy of EDL improves as the threshold decreases because only a (12) www.nature.com/scientificreports/classification result the model is confident of is treated as a classification result.Figure 3a shows the results when p k + is used for the classification results of m-EDL.An uncertainty threshold is not used for the classification result of m-EDL; a result parallel to the horizontal axis is obtained.In contrast, Fig. 3b shows the results when p k + is converted to p k and the uncertainty threshold used for EDL is also used for m-EDL.These graphs show that the accuracy of m-EDL is lower than that of EDL, except in the region where the uncertainty threshold is 0.9 or more.However, no substantial decrease in accuracy is observed, and it can be said that the performance of m-EDL would be sufficient depending on the application.First, I consider whether an m-EDL model that has learned class u has the same prediction accuracy for class k when compared with an EDL model that cannot learn class u (Q2a).I then consider whether it can determine class u with higher prediction accuracy (Q2b).
The following two cases are considered: (1) EDL is tested on data that include Fashion MNIST data, and m-EDL is trained on data that include EMNIST data, but tested on data that include Fashion MNIST data.Figure 4a-c shows the results for class u rates of 25%, 50%, and 75% in training data, respectively.The lines of different colors indicate the results for class u rates of 25%, 50%, and 75% in the test data (1-2).These are percentages of the number of MNIST data.Additionally, Table 1 presents the mean accuracies of EDL and mEDL for each condition.(2) EDL is tested on data that include EMNIST data, and m-EDL is trained on data that include Fashion MNIST data, but tested on data that include EMNIST data.Figure 4d-f shows the results for class u rates of 25%, 50%, and 75% in the training data, respectively.The lines of different colors indicate the results for class u rates of 25%, 50%, and 75% in test data.These are percentages of the number of MNIST data.Additionally, Table 2 presents the mean accuracies of EDL and mEDL for each condition.
Under these two conditions, the one-hot vector y j of the data has K = 10 dimensions.Therefore, all elements of the one-hot vectors of class u (EMNIST or Fashion MNIST data) in the test data were set to 0. In each of the following cases, the same processing was applied when EDL was tested on data including class u data.
The left plots of Fig. 4a-c and Table 1 (avg.accuracy for k) show the results for class k data for the first condition.The line color indicates the ratio of the class u data included in the test data, and it is assumed that the accuracy decreases as the mix ratio of class u in the test data increases.The results show that the accuracy of m-EDL with respect to class k is high and robust for the mix rate of class u in the training and test data: it can be seen from the left plots in Fig. 4a-c that when the m-EDL model that has learned class u is compared with the EDL model, which cannot learn class u, it has equal or higher accuracy with respect to class k.Moreover, the accuracy of m-EDL is not easily affected by the ratio of class u in the test data as well as the training data.Table 1.Accuracy comparison of EDL and m-EDL.These values are mean accuracy through the uncertainty threshold.This table corresponds to Fig. 4a-c.
Training: MNIST + EMNIST Test: MNIST + FashionMNIST (Fig. 4a-c The right plots of Fig. 4a-c and Table 1 (avg.accuracy for u) show the accuracy for class u data, that is, the accuracy that the "data that was judged as 'I do not know' is actually different from the data classes learned so far." The right plots of Fig. 4a-c show that the accuracy of m-EDL with respect to class u is high and robust for the mix rate of class u in the training and test data.It is natural to increase the accuracy for class u of EDL when the ratio of class u increases because the accuracy increases when the ratio of class u increases even if class u is randomly classified via EDL.
Figure 4d-f and Table 2 (avg.accuracy for k) show the results for the second condition, which is exactly the same as the first condition except that the EMNIST and Fashion MNIST datasets switch roles.Again, the accuracy of m-EDL with respect to class k is high and robust, as in the left plots of Fig. 4a-c.The results in the left plots of Fig. 4d-f reveal that the m-EDL model that learned class u, when compared with EDL, achieved an equal or higher accuracy with respect to class k, and the accuracy of m-EDL was not easily affected by the ratio of class u in the test and training data.
However, the right plots of Fig. 4d-f and Table 2 (avg.accuracy for u) show that the accuracy of m-EDL with respect to class u cannot be said to be better than that of EDL.

Class Class k k
Class Class u u comparison of EDL and m-EDL when class u is included in the training and test data (Q2)", if the ratio of class u in the training data affects the prediction accuracy of the class k and u data, then the ratio of class u included in the training data must be appropriately selected.To answer whether this is the case, I used the results from "Performance comparison of EDL and m-EDL when class u is included in the training and test data (Q2)" (Fig. 4a-c  and d-f, which have training data mix ratios of 25%, 50%, and 75%, respectively), and added the following two cases:1) Fashion MNIST is included in the test data, but neither EDL nor m-EDL are trained on class u data (a training data mix ratio of 0%; Fig. 5a) and 2) EMNIST is included in the test data, but neither EDL nor m-EDL are trained on class u data (a training data mix ratio of 0%; Fig. 5b).The lines of different colors indicate the results for class u rates of 25%, 50%, and 75% in the test data.
In the left plot of Fig. 5a, the accuracy improved for class k as shown in the left plots of Fig. 4a-c, whereas in the right plot of Fig. 5a, there was no improvement in accuracy for class u.In the right plots of Fig. 4a-c, the accuracy for class u was improved even when the ratio of class u in the training data was small.These results suggest that the accuracy for class u may be improved by having m-EDL learn even a small amount of class u data.Moreover, there is no particular need for these data to be related to the class u data in the test data.
The right plot of Fig. 5b shows that m-EDL did not lead to improvements in accuracy for class u.Moreover, in the right plots of Fig. 4d-f, the accuracy of m-EDL for class u is not better than that of EDL; however, when compared with the results in the right plot of Fig. 5b, it is clear that the accuracy of m-EDL for class u is improved even if the ratio of class u in the training data is small.
It can be inferred from these comparisons that the amount of accuracy improvement for class u changes depending on the characteristics of class u in the training and test data.

Impact of the nature of class u in the training and test data (Q4).
As shown in "Performance comparison of EDL and m-EDL when class u is included in the training and test data (Q2)" and "Effect of the ratio of the class u included in the training data on the prediction accuracy of classes k and u in the test dataset (Q3)", the amount of improvement in accuracy for class u data changes depending on the characteristics of u in the training data and test data.Hence, I evaluated whether the accuracy for class u always improves when the characteristics of u in the training and test data are exactly the same (i.e., when the class u data are from the same dataset).
The following two conditions were considered: (1) when Fashion MNIST is included in both the test and training data [Fig.6a-c  The differences in Fig. 6a-c and d-f are the mix rates of class u in the training data (25%, 50%, and 75%, respectively).The lines of different colors indicate the results for class u rates of 25%, 50%, and 75% in the test data.These are percentages of the number of MNIST data.In particular, the right-hand side plots of Fig. 6a-f confirm that the accuracy of m-EDL is higher than that in the cases considered for Q2 and Q3 and is almost 100%.
In the cases of Q2 and Q3, the class u data in the training and/or test data have different characteristics, and the accuracy of m-EDL on the class u data changed depending on the combination.Meanwhile, in the Q4 cases, class u data had the same characteristics during both training and testing, and hence, the accuracy is very high.From this, it is clear that the feature learning of class u in the training data contributes to the improvement in accuracy that m-EDL exhibits when learning class u.However, in the comparisons of Q2, particularly when m-EDL was trained using EMNIST and both EDL and m-EDL were tested on data including Fashion MNIST, examples can be found where the accuracy improved even when the unknown classes in the training and test data differ.Therefore, m-EDL has the potential to improve accuracy by excluding uncertain data as a result of learning unrelated data that do not belong to class k data, although this depends on the combination of class u data in the training and test data.
Here, we hypothesize regarding the combination of class u datasets to be mixed during training that will increase the class u accuracy in testing.The hypothesis is that "if class u data whose characteristics are as close as possible to those of class k are learned during training, class u data in the test can be discriminated as class u as long as the characteristics of class u given during the test are different from those in training"; i.e., "if a boundary that can distinguish the range of class k more strictly with u whose characteristics are close to those of class k is learned via mEDL, class u can be easily distinguished." Conversely, "if the class u data during training are far from the characteristics of k, the decision boundary between k and u is freely determined, and if the class u data in the test are close to k, they may be incorrectly classified." To test this hypothesis, I introduced another dataset (Cifar-10 40 ) and evaluated the similarity of the characteristics of different datasets.The Cifar-10 dataset used had images of 28 × 28 pixels for similarity calculation (consistent with the other dataset), which were grayscaled using a previously proposed method 41 .Table 5 presents the similarity of MNIST, EMNIST, Fashion-MNIST, and Cifar-10.Here, the structural similarity (SSIM) was determined by randomly selecting 500,000 images of the datasets to be compared, and the mean and variance were calculated as the similarity between the datasets.
The distance between datasets was determined as the inverse of the SSIM, and the positional relationship of the datasets on a two-dimensional plane was estimated via multidimensional scaling (MDS) 41 , as shown in Fig. 7.
As shown in Fig. 7, EMNIST was more similar to Fashion-MNIST than to EMNIST.The newly introduced Cifar-10 is an image dataset with characteristics that are more different from those of MNIST than those of both EMNIST and Fashion-MNIST.The hypothesis explains the result presented in "Performance comparison of EDL and m-EDL when class u is included in the training and test data (Q2)" that the accuracy of class u was higher in Case 1 when u was trained with EMNIST and classified with test data containing Fashion MNIST than in Case 2 when u was trained with Fashion-MNIST and classified with test data containing EMNIST.The reason why the accuracy of class u was higher in Case 1 is because the characteristics of EMNIST were closer than those of Fashion-MNIST to the those of MNIST.mEDL-trained EMNIST was able to identify Fashion-MNIST, which Table 3. Accuracy comparison of EDL and m-EDL.These values are mean accuracy through the uncertainty threshold.This table corresponds to Fig. 6a-c.

Discussion
Deep learning has led to many remarkable advances; however, in many scenarios, the uncertainty of the model output is required.EDL is one model that can provide this uncertainty.In this study, I proposed a method that extends the EDL model proposed by Sensoy et al. 29 to predict that the input belongs to class u and not k along with a probability and evaluated its performance and behavior.
The proposed m-EDL does not require the user to set a threshold for the uncertainty to interpret the results.Because m-EDL does not require this parameter, the accuracy of the model is not affected by its value.Additionally, m-EDL allows data belonging to unknown classes to be included in the training dataset.
The results of the experiments revealed that m-EDL performs comparably to EDL when there are no instances of unknown classes.When there are instances of unknown classes, m-EDL performs better than EDL on known classes.Its performance in class u improves depending on the combination of unknown data in the training and test data.m-EDL can learn the characteristics of class u itself, and it has the potential to predict unknown classes even when the unknown classes in the training data and test data have different properties.
The accuracy of m-EDL on class u changed depending on the combination of classes in the data.The additional analysis with the Cifar-10 dataset indicated that during training, if class u, whose characteristics evaluated via the SSIM are as close as possible to the characteristics of class k, is learned, the class u data in the test can be determined as class u as long as the characteristics of class u in testing are farther than those in training.From the above results, if class u is to be mixed intentionally during training to increase the discrimination accuracy of class u in mEDL, it is necessary that the characteristics of the mixed u data are as close as possible to those of class k.
In this study, I set the class k data to MNIST data.In future research, it is necessary to determine that the optimized mEDL exhibits superior performance for various datasets.

Methods
The datasets MNIST 42 , Fashion MNIST 43 , and EMNIST 44 were used in the evaluation.MNIST was used to provide the data for class k. consists of images of handwritten digits.Each image is labeled as belonging to classes 0-9; that is, K = 10.
The data for class u were obtained from either Fashion MNIST or EMNIST according to the experiment.Fashion MNIST is a dataset of 60,000 28 × 28 grayscale images of ten fashion categories ("t-shirt/top, " "trousers, " "pullover, " "dress, " "coat, " "sandal, " "shirt, " "sneaker, " "bag, " or "ankle boot") along with a test set of 10,000 images.All the images from this dataset were categorized as class u in this evaluation.Therefore, even if images of a t-shirt or dress appear in the training or test data, the correct label for both images is class u.The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28 × 28 pixel image format and dataset structure that directly matches the format of the MNIST dataset.Specifically, I used EMNIST Letters, i.e., 26 capital letters (26 classes).They were all categorized as class u.Therefore, even if images of "A, " "C, " or "X" exist in the training or test data, the correct label is u.
The total number of training data was 60,000.When blending class u (from EMNIST and/or Fashion MNIST) into the MNIST data, the class u data to be blended were randomly selected prior to blending.The total number of test data was 10,000.The class u blending method was the same as that used for the training data.
A fully coupled neural network was constructed in Python using the Keras library to build the neural networks used for the EDL and m-EDL models.The input image was a 28 × 28 grayscale normalized image, and there were two hidden layers with 32 dimensions each.The size of the output layer was K (= 10) or K + 1 (= 11).The activation function was ReLU, and Adam was used for the optimization.Mini-batch learning was used with a batch size of 64, initial learning rate of 10 −3 , and no decay.Table 6.Comparison of the accuracies of mEDL for class u in different cases.

Accuracy of mEDL for class u
Case 1 (Fig. 4a-c
investigated whether m-EDL has the same performance as EDL through comparative experiments.I also investigated whether m-EDL has an advantage when including class u in the training data.The objective of this evaluation was to determine the following: (Q1): whether the use of m-EDL reduces the prediction accuracy for a class k when the same training and test data are given to EDL and m-EDL models; (Q2): whether a) an m-EDL model that has learned class u has the same prediction accuracy for a class k when compared with an EDL model that cannot learn class u, and b) m-EDL predicts class u with higher accuracy than EDL; (Q3): if the ratio of class u data included in the training data affects the accuracy of predicting classes k and u in the test data; (Q4): what happens when the properties of class u data that are blended with the training data and test data in Q2 and Q3 are exactly the same.

Figure 3 .
Figure 3. Accuracy of EDL and m-EDL when both the training and test datasets contain no class u data.(a) Results when p k + is used in m-EDL classification.(b) Results when p k is converted from p k + and used in m-EDL classification with the same uncertainty threshold as that of EDL.

ClassFigure 4 .
Figure 4. Accuracy comparison of EDL and m-EDL.Line colors indicate the proportion of class u in the test data, and top and bottom plots show the accuracy for class k data and class u data, respectively.Results when m-EDL has learned class u (EMNIST data) but is tested on Fashion MNIST data for class u mix rates in the training data of (a) 25%, (b) 50%, and (c) 75%.These are percentages of the number of MNIST data.Results when m-EDL has learned class u (Fashion MNIST data) but is tested on EMNIST data for class u mix rates in the training data of (d) 25%, (e) 50%, and (f) 75%.

Figure 5 .
Figure 5. Accuracy comparison of EDL and m-EDL when neither EDL nor m-EDL have learned class u.Line colors indicate the mix rate of class u in the test data, and left and right plots show the accuracy for class k data and class u data, respectively.(a) Results for Fashion MNIST data.(b) Results for EMNIST data.
and Table 3 (avg.accuracy for k and u)] and (2) when EMNIST is included in both the test and training data [Fig.6d-f and Table 4 (avg.accuracy for k and u)].

Figure 6 .
Figure 6.Accuracy comparison of EDL and m-EDL.Line colors indicate the proportion of class u in the test data, and top and bottom plots show the accuracy for class k data and class u data, respectively.Results when m-EDL has learned class u (Fashion MNIST) for class u mix rates in the training data of (a) 25%, (b) 50%, and (c) 75%.These are percentages of the number of MNIST data.Results when m-EDL has learned class u (EMNIST)for class u mix rates in the training data of (d) 25%, (e) 50%, and (f) 75%.

Figure 7 .
Figure 7. Location of each dataset estimated via MDS, where the points M, F, E, and C represent the locations of the MNIST, Fashion-MNIST, EMNIST, and Cifar-10 datasets, respectively, and the distance between points is proportional to the inverse of the similarity.The numbers on the horizontal and vertical axes are dimensionless.
ij + ] is the expected value of the Dirichlet distribution D(p + |α + ) , and Var p ij + is its variance.The detailed calculations are provided in the Supplementary Information available online.

Table 4 .
) Accuracy comparison of EDL and m-EDL.These values are mean accuracy through the uncertainty threshold.This table corresponds to Fig.6d-f.

Table 5 .
Mean (standard deviation) values of the structural similarity between datasets.