Abstract
Deep learning models have been widely used in many supervised learning applications. However, these models suffer from overfitting due to various types of uncertainty with deteriorating performance when facing data biases, class imbalance, or noise propagation. The InformationSet Deep learning (ISDL) architectures with four variants are developed by integrating information set theory and deep learning principles to address the critical problem of the absence of robust deep learning models. There is a description of the ISDL architectures, learning algorithms, and analytic workflows. The performance of the ISDL models and standard architectures is evaluated using a noisecorrupted benchmark dataset. The experimental results show that the ISDL models can efficiently handle noisedominated uncertainty and outperform peer architectures.
Introduction
Although Deep Learning Neural Networks (DLNN) are efficient at classifying images, noise in either training or testing dataset propagates through the layers of DLNN or Convolutional Neural Network (CNN) and significantly deteriorates the performance of these models. The informationset deep learning (ISDL) architectures are introduced here for handling noisy data applicable to broad supervised learning tasks. To develop noiseresistant deep learning models, researchers employed a variety of approaches, including modifications to the architecture^{1}, regularization methods^{2,3,4}, and in the loss functions^{5}. In some studies, image restoration was tried to restore clear latent images from corrupted observations. The researchers trained the stateoftheart deep neural networks on images from the original dataset without noise and then used to classify images degraded by noise^{6}. It has been shown that the performance of stateoftheart DLNN decreases when classifying lowquality images.
The concept of information set was first introduced by Hanmandlu et al.^{7} based on an entropy framework, mainly aimed to address the limitations of fuzzy set theory^{8}. The potential of entropybased feature extraction approaches is underutilized compared to conventional techniques like PCA, ICA, LDA, etc. Information set theory has been proven to be highly effective for addressing uncertainty and achieving superior performance in various settings. The integration of the information set with deep learning models is proposed to leverage this capability in deep learning models. The ISDL architectures are general and broadly applicable to many deeplearning tasks. In classic fuzzy set theory, membership functions (MF) play central roles while original signals (information source) and their potential interactions with fuzzy memberships are primarily ignored^{3}. Since the MF value only measures the extent to which an information source value belongs to the set, it cannot accurately express the overall uncertainty pertaining to all information source values. To address this limitation, the information set is formulated based on an entropy framework and used to modulate fuzzy memberships by some form of the original signal^{7,9,10}. The Pal and Pal entropy^{11} has been extended to an informationtheoretic entropy structure and further established into informationset theory by subsequent improvements^{9,10,12,13,14,15,16}.
In the past, the Informationset theory has been exploited and integrated into many machinelearning models to develop various features and classifiers in noisy environment. The ability of information set theory to represent probabilistic uncertainty and possibilistic certainty is described by Grover et al.^{17}. Six features are created for the face recognition application Sayeed et al.^{18} utilizing the information set, the Hanman Filter (HF), and the Hanman Transforms (HT). The HF is designed to adjust the information values using a cosine function. In contrast, the HT is designed to evaluate the information source values based on the information obtained from them. The new features based on higherorder information sets are developed by Grover et al.^{19}, which use fewer features per sample and have a lower time complexity than the most recent features. It was demonstrated that these features could accurately represent the multispectral palmprints. In addition to feature extraction, an information processingbased fuzzy classifier is also developed. The evolutionary method known as Human effort for achieving the goal (HEFAG), which is based on the human approach to learning and does not require algorithmic specific parameters, was developed by using information set theory^{20}. The twofold information set (TFIS) features for textindependent speaker identification and gate recognition is developed^{12,16}. The entropy framework creates the TFIS features, which capture spatial and temporal information components. The TFIS features are fewer in number, which reduces computing complexity and time. These features boost performance in noisy environments. Moreover, the socalled swish activation function, recently proposed by google brain, exhibits improved performance in deep learning models, particularly in image classification and machine translation^{21}. With a closer inspection of both formulation and experimental results, it is found that the swish activation function also has some roots in informationset theory.
The InforSet Deep Feedforward Networks (ISNN) and InforSet Convolutional Neural Networks (ISCNN) and their variants are the two InforSet supported Deep Learning architectures that are proposed and described here. ISNN employs the InforLayer, which is applied both after the source and after each dense layer. The Convolutional Neural Network (CNN) is altered in ISCNN by adding the InforLayer and/or by swapping the Inforpool for the Pooling layer. The key benefit of the proposed models is that they perform well on noisy samples without any additional preprocessing after being trained on clean samples. The various highlevel features corresponding to the CNN layers are enhanced with the aid of the Inforlayer and/or InforPool layer. In the current work, an informationset layer (InforLayer) and a pooling method (InforPool) are proposed by using information sets that are further integrated with prominent deep learning designs to improve the performance of deep learning architectures. The InforLayer is added after the input and between the standard layers to extract effective information and advance to deeper representations. In contrast, the InforPool acquires localized information and reduces dimensionality. These specifically designed information set layers and pooling methods improve the noise robustness of classic deep learning models.
The effectiveness and robustness of the two ISDL architectures and standard models are assessed by using two independent benchmark datasets that have been degraded by noise. To show how effective the suggested Inforlayer and InforPool layers are, these reformulated layers are added to the classic CNN designs, and the performance is compared to that of the architectures without them. The experimental results show that the proposed ISDL architectures can efficiently handle uncertainty and related issues and achieve superior performance compared to peer methods, where the data are corrupted with the noise of varying Peak SignalToNoise Ratio (PSNR).
Results
Inforset based deep learning (ISDL)
When the input data are affected by noise, the information set theoretically measures the quality of contaminated attribute values in terms of possibilistic uncertainty^{3}. Based on this interpretation, the ISNN and ISCNN are proposed to boost the classification performances.
Multiple evaluation criteria, including Accuracy, Precision, Recall, F1 score, and ROCAUC, are utilized to assess and compare the performance of proposed networks with that of conventional deep networks (Supplementary Information). Performance robustness against noise is evaluated by using both the noisecorrupted MNIST dataset of handwritten digits and the EMNIST Balanced dataset with varying Peak SignalToNoise Ratio (PSNR). The EMNIST Balanced dataset is an extended and comprehensive collection of both handwritten characters and handwritten digits. It extends the classic MNIST dataset by incorporating a more diverse set of pattern classes. The dataset encompasses 47 unique classes of characters and digits, comprising both upper and lowercase alphabetical letters and the digits 0 to 9.
The PSNR is an expression for the ratio between a signal's maximum possible value (power) and the power of distorting noise that influences its representation quality. Mathematically, for a noisefree \(m\times n\) monochrome image ‘I’ and its noisy version ‘K’ it can be represented as
where,
Inforset based convolutional neural network (ISCNN)
Three variants of the ISCNN architectures, namely ISCNNI, ISCNNII, and ISCNNIII are proposed, as shown in Fig. 1. The proposed variants modify the CNN by introducing the InforLayer and/or replacing the Pooling layer with the Inforpool. Figure 1 demonstrates one of the possible modifications in the CNN architecture(s). However, the Inforlayer and InforPool can be introduced at different places in the network architecture.
In ISCNNI, the Pooling layer is replaced with InforPool to extract the localized information and reduce the dimensionality. Whereas in ISCNNII, the input features are not directly connected to the convolution block; instead, the InforLayer is introduced before the convolution block to extract effective information of the signal of interest. ISCNNIII is the fusion of ISCNNI and ISCNNII.
The key benefit of introducing the InforLayer and InforPool layer is that Inspite of noise present in the input signal the information layer helps to boost the different high level features corresponding to the CNN layers. The Structural Similarity Index (SSIM) helps to validate the above statement. SSIM measures the perceptual difference between two similar images. SSIM value + 1 indicates that the two given images are very similar while a value of 0 indicates the two given images are very different. The Fig. 2 shows the filtered output after the first convolutional layer in the standard CNN and in ISCNNIII for the standard MNIST and EMNIST datasets. These are the trained architectures with clean images only. The figure also presents the SSIM values calculated for the high level features of clean image with respect to the corresponding high level features of the noisy images for standard CNN and ISCNNIII. The standard CNN yielded a Structural Similarity Index (SSIM) of 0.6 ± 0.09 for the MNIST datasets, while the ISCNNIII model produced a higher SSIM of 0.7 ± 0.08. For the EMNIST datasets, the standard CNN recorded an SSIM of 0.4 ± 0.19, and the ISCNNIII model achieved again a higher SSIM of 0.5 ± 0.2. This shows that the information layer introduced in standard CNN helps to boost the high level features. Figure 2 emphasise only on the first layer of CNN but when observed on whole architecture the information layer/or polling introduced at every convolutional layer will enhance the different level of features at the corresponding layers and enhance the performance. The filtered output after each layer of standard CNN and ISCNNIII is shown in Figure S4 and Figure S5.
Comparative evaluation of ISCNN
Three variants of ISCNN, namely ISCNNI, ISCNNII, and ISCNNIII are considered, whose architectures are specified in Table 1 and in Table S2 for MNIST and EMNIST dataset respectively . In ISCNNI, the MaxPool layers of conventional CNN are replaced with the InforLayers, whereas; In ISCNNII, the InforLayer is introduced at a single place only with an exponential membership function; and ISCNNIII uses both the InforLayer and MaxPool, and combines the sigmoid and exponential gain membership functions. Table 2 summarizes the comparison of ISCNN with conventional CNN in terms of varying PSNR for the MNIST dataset. The highest performance is achieved by ISCNNIII. Table 3 presents the comparison between ISCNNIII and CNN on the EMNIST dataset. It is evident from the results that, at a lower PSNR range, the introduction of Inforlayer/Inforpool significantly improves the performance of conventional CNN.
Additional metrics including precision, recall, F1, and ROCAUC score are also computed to analyse and compare ISCNNIII with traditional CNN at PSNR = 6.11 as shown in Tables 4 and 5 for MNIST and EMNIST datasets respectively. The layer details of the ISCNN architectures are shown in Table 6 for MNIST dataset. The layer details for EMNIST dataset are shown in Table S3.
The model is evaluated using fivefold crossvalidation. The test set for MNIST has 12,000 samples, which is nearly the same size as the training dataset which is having 10,000 samples. Before splitting, the training dataset is shuffled. The ISCNNIII (the best model) architecture's learning accuracy and loss are shown in Fig. 3A.
While the performance of all ISCNN versions is comparable to that of classic CNN models when the data is clean, it is significantly better than traditional CNN when the test dataset has much more noise. Figure 3B depicts a comparison between CNN and other ISCNN variants with increasing PSNR.
Figure 4 illustrates the confusion matrices for MNIST dataset comparing CNN and ISCNNIII performance with varying PSNR. Figure S6 shows the same comparison for the EMNIST dataset. When images are not contaminated by noise, the performance of classical CNN and information set theorybased CNN is almost similar. However, the performance of classical CNN drastically declines with decreasing values of PSNR. A onedigit sample with two different PSNR values is shown in Figure S1. A number of wrongly classified and correctly classified samples are included in Figure S2 and Figure S3, respectively.
Inforset based deep feedforward networks (ISNN)
The architecture of ISNN consisting of nodes (circles) and the connections (lines) between the nodes is shown in Fig. 5. The deep neural networks have multiple hidden layers to create deeper representations on each layer. Only two hidden layers have been shown for simplicity and ease of demonstration. The output from the node on the left is connected to the node on the right through a matrix multiplication in between weights and input, which is further passed through an appropriate nonlinear activation function. The output of neuron after each layer is calculated as:
where superscript \((k)\) denotes the layer number, \({x}_{i}=({x}_{1},{x}_{2}\dots , {x}_{n})\) is the input feature, \({w}_{ji}\) is the weight connected from \({i}{th}\) to \({j}{th}\) nodes, and \(f\) is a nonlinear activation function.
In ISNN, InforLayer is applied after source and after every dense layer. The output is the product of information source values with the corresponding membership function values, as explained earlier. For example, the information set value \({z}_{i}\) is calculated as
where, \({G}_{s}\left({X}_{i}\right)\) is the suitable membership function such as sigmoid, exponential, gaussian, etc. Similarly, InforLayer is applied after each layer.
Evaluation of inforset deep feedforward networks (ISNN)
The comparison of the architecture and the results with DLNN and ISNN is summarized in Tables 7 and 8, respectively. Here, the performance metric of "accuracy" is used to measure how well the model predicts the test data set from both positive and negative classes (Supplementary Information). The 'relu' activation function is used for initial dense layers, and the 'Softmax' activation function for the final layer. The 'sigmoid' is used as a membership function for InforSet calculation. With a batch size of 32, training is done over five epochs. Out of six experiments, the best outcomes for each were taken into consideration.
Description of layerI and layerIII in Table 7
In ISNN, two InforLayers are introduced which do not exist in DLNN. In layer1, instead of linear activation function, the InforLayer is considered, whereas layer III encapsulates the information from layer II's output.
In Table 8, for the uncorrupted data (PSNR = Inf) the performance of both the architectures for MNIST dataset is identical. Afterward, with the introduction of noise (with varying PSNR values) the performance of conventional networks deteriorated sharply, whereas ISNN exhibited robustness to noise. It is clear from the table that the ISNN can also maintain high accuracy of 56.17% compared to 36.77% of DLNN when the input image is corrupted with high noise (PSNR = 6.4: as minimal information of actual image due to noise).
It is evident from Table 9 that even at PSNR = 6.44, the standard architectures DLNN and CNN yields 36.72% and 65%, respectively. One of the proposed variants yields a significantly better performance of 96.40%, which is almost consistent even at the high noise level.
Discussion
In the present work, the capability of InforSet theory to handle uncertainty is exploited by integrating it with Deep Learning architectures to extract and enhance the actual information buried under the noise. The proposed architecture's efficacy is showcased using the MNIST database of handwritten digits and the EMNIST Balanced dataset, which is an extensive and inclusive compilation of handwritten characters and digits. To validate the proposed approach's noise tolerance, both datasets are degraded with varying levels of SignaltoNoise Ratio (SNR).
The approach is general and can be easily applied to many deep networks. In the present work, the proposed technique is used to develop one variant of DLNN and three variants of CNN. In ISNN (a variant of DLNN), an InforLayer is introduced after the input layer and between the standard layers of DLNN. The three variants of ISCNN are ISCNNI to ISCNNIII. In ISCNNI, InforPool layer is introduced to replace the MaxPool layer in CNN. In this architecture, the localized information is extracted in terms of pooling for the features extracted by filters of CNN. In Maxpool, the maximum value of the window is taken, but it may provide false information due to corrupted input. Whereas, in ISCNNII, the InforPool layer is introduced just after the input layer. The filters in CNN architecture are used for feature engineering, but if the source is corrupted, the features are not efficient. In ISCNNIII, both the InforPool in place of MaxPool and InforLayer after the input layer is introduced so that the raw data is not directly inserting into the ConvLayer due to this. The effective information is extracted, which improves the quality of features extracted using the filters of ConvLayer.
These all architectures are tested with different gain functions and their combinations. In the present work only the selected membership functions are used; however, a dynamic membership function can be devised that can adapt to the environment. For the uncorrupted data, the performance of proposed architectures was identical to conventional architectures; however, there is a significant improvement and consistency in performance on the degraded MNIST and EMNIST data at different levels of PSNR.
It's worth noting that the proposed technique isn't a denoising technique. Instead, it can improve the useful information, which in turn improves noise handling. This does not necessarily imply that the proposed models are denoising; nonetheless, the noise may be reduced as a result of the procedure. The proposed method is less computationally expensive and capable of boosting effective information, which improves noise handling.
One limitation of the proposed method is that Inforset layers do not contain trainable parameters. Therefore, all of ANN's layers can’t be replaced with Inforset layers; otherwise, there would be no room for learning. In the future, the Generalised Entropy Function (GEF)^{12} will be tested which has the potential to replace the layers of CNN since its parameters not only collect information from the source but can also be trained to capture the relationship between input and output. Furthermore, compared to conventional architectures, the number of parameters that must be learned will be substantially reduced.
Because noise/irrelevant features in the proposed work may execute selfsuppression or selfconsolation, theoretical insights from combining information set theory with the activation function can be investigated to alleviate overfitting concerns. In the future, this strategy will be applied to broader realworld problems, such as medical image segmentation. The proposed techniques can also be extended for other deep learning architectures such as Autoencoders, UNet, and RNN, to name a few.
Methods
Information set theory
Fuzzy set theory was proposed to deal with uncertainty present in the crisp sets (conventional sets) by characterizing the imprecise, vague, or missing information. The fuzzy sets are formulated as a pair (member, membership grade), where the membership grade is defined in the interval [0,1] and captures the belongingness of the member in a fuzzy set.
The Information set (Inforset) theory employs an entropy framework to transform the fuzzy set into information values and create the information set. The aggregation of information values imparts the overall uncertainty.
Conversion of a fuzzy set into an information set
Consider an attribute \(X\) with the following set of values:
The set of values that \({x}_{i}\) indicates is denoted as
where \({I}_{\rm X}\left({x}_{i}\right)\) is the individual information source value and \({I}_{\rm X}\) is the collection of information source values from \(X\) . To create fuzzy sets, these values are segregated into K soft classes. The kth fuzzy set (\({F}_{X}^{k})\) is represented as the pair \({(I}_{\rm X}\left({x}_{i}\right), {\mu }_{X}^{k} \left({x}_{i}\right))\), where \({I}_{\rm X}\left({x}_{i}\right)\mathrm{ and }{\mu }_{X}^{k} \left({x}_{i}\right)\) are the information source values and the corresponding membership grades, respectively.
To measure the uncertainty in fuzzy sets, the conventional entropy functions such as Shannon and Pal and Pal^{5} are not suitable as they provide the uncertainty in a probability but not possibility domain. Typically, for fuzzy sets, various fuzzy entropy functions are used. However, the limitation of these functions is that they can only capture the uncertainty in membership grades and not in information source values. As a remedy, the generalized HanmanAnirban entropy function has been developed^{8}, which combines the information source values and the associated gain as a single entity termed as “information value”.
For the fuzzy set \({F}_{X}^{k}\), the uncertainty, or information in \({I}_{\rm X}\) is formulated as:
with
where \({g}_{X}^{k}\) is the gain function representing the uncertainty in the information source value. The uncertainty is further converted into the information values (entropy values) in (6). While the gain function is datadependent, Inforset theory provides the flexibility to choose different membership functions (Gaussian, Sigmoid, etc.) by optimizing \({a}_{X}^{k}, {b}_{X}^{k}, {c}_{X}^{k}, and {d}_{X}^{k}\) belonging to the kth fuzzy set. All the information values are collected using a generalized HanmanAnirban entropy function to form the Information set,
and the sum of all the information values in \({S}_{X}^{k}\) is the effective information \({E}_{X}^{k}\). The normalized effective information \({E}_{XN}^{k}\) is obtained as
The definitions related to Information source values, Information set, and for the normalized information are discussed in^{3} in more detail.
Procedure to compute the effective information using the Gaussian membership function
Consider a data matrix \({\varvec{X}}\) of size \({\varvec{d}}\times {\varvec{m}}\).

Step 1 Computation of mean (\({\mu }_{j}\)) and variance \(( {\sigma }_{j})\) of the jth attribute considering only one soft class per attribute.
$$\mu_{j} = \frac{1}{d}\mathop \sum \limits_{i = 1}^{d} x_{ij} , \quad j = 1, \ldots ,m$$(10)$$\sigma_{j} = \mathop \sum \limits_{i = 1}^{d} \left( {x_{ij}  \mu_{j} } \right)^{2} , \quad j = 1, \ldots ,m$$(11)where, \({{\varvec{x}}}_{{\varvec{j}}}\) is:
$${{\varvec{x}}}_{{\varvec{j}}}=\left[\begin{array}{c}{x}_{1j}\\ \begin{array}{c}\vdots \\ {x}_{ij}\\ \begin{array}{c}\vdots \\ {x}_{dj}\end{array}\end{array}\end{array}\right], \quad j=1,\dots ,m$$ 
Step 2 Calculation of membership function value for each information source for the jth attribute.
$${\text{G}}\left( {x_{ij} } \right) = e^{{  \frac{1}{2}\left( {\frac{{x_{ij}  \mu_{j} }}{{\sigma_{j} }}} \right)^{2} }} , \quad j = 1, \ldots ,m$$(12) 
Step 3 Calculation of information values for the information source matrix \({\mathbb{X}}\).
$$\begin{gathered} S_{X} \left( {x_{ij} } \right) = x_{ij} {\text{G}}\left( {x_{ij} } \right), \quad i = 1, \ldots ,{\text{d }} \hfill \\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad j = 1, \ldots ,{\text{m}} \hfill \\ \end{gathered}$$(13)where,
$${S}_{X}=\left\{{S}_{X}\left({x}_{ij}\right)\right\}, \quad i=1,\dots ,\mathrm{d } \quad j=1,\dots ,\mathrm{m}$$ 
Step 4 Computation of effective information
$$E_{i} = \frac{1}{\left X \right}\mathop \sum \limits_{{{\text{i}} = 1}}^{d} \left\{ {S_{X} \left( {x_{ij} } \right)} \right\}, \quad j = 1, \ldots ,{\text{m}}$$(14)
For the noisy data with varying SNR, instead of trained generalized gain function, the sigmoid and exponential membership functions are used, which give comparable results in the experiments. These (sigmoid & exponential) are the standard functions and can be derived from the generalized gain function.
Connection between the information set and ‘swish’
The swish activation function is represented as:
Here the connection between the 'swish' activation function and the information sets can be clearly observed. In fact, the above equation is a special case of Eq. (6) with gain function \({g}_{X}^{k}\left({x}_{i}\right)=\) \(sigmoid(\beta x)\).
Typically, in Artificial Neural Network (ANN) the outcome of every neuron captures the information in the form of \(\sum {x}_{i}{w}_{i}\), similar to what Inforset theory is formulated in the form of \(\sum_{i}{I}_{X}\left({x}_{i}\right){g}_{X}\left({x}_{i}\right)\). However, the two has two major differences. First, in ANN, the weight generation does not follow any standard distribution function; instead, training develops these; In contrast, the Inforset theory acquire weights with the help of a generalized gain function. Second, weights in ANNs are determined by the input–output relationship. The Inforset theory, on the other hand, does not extract any information based on this relationship; instead, it merely pulls information from input.
Inforlayer

Step 1 Information Source Values
The information source values are attributes/features of an image, which can be represented as
$$I_{X} = \left\{ {I_{X} \left( {x_{ijk} } \right)} \right\},\quad i = 1,2,3; \;j = 1, \ldots ,d; \;k = 1, \ldots ,m$$(16) 
Step 2 Information gain
The Information gain is calculated for each element of the information source in a window with the help a generalized membership function as shown in (4). The present work uses the commonly used membership function(s) (Sigmoid and exponential). Example: The gain value for the sigmoid Membership function is calculated as:
$$G_{s} \left( {X_{ijk} } \right) = \frac{1}{{\left( {1 + \exp \left( {  x_{ijk} } \right) } \right)}}$$(17) 
Step 3 Information set
The information set is obtained by multiplying information source values with the corresponding information gain
$$I_{X}^{1} \left( {X_{ijk} } \right) = x_{ijk} G_{s} \left( {X_{ijk} } \right)$$(18)
The extracted InforSets are the output of the InforLayer.
The proposed operation is general and can be applied to any multidimensional data. The procedure is demonstrated in the following steps by taking an example of a 3D image.
Inforpool
The function of the Pooling layer is to reduce the dimension of the feature map (number of pixels) by capturing the information contained in the region. This reduces the computational complexity of the network and, in turn, speedup the operation. This operation involves no training of the parameters. However, it has hyperparameters, including the size of the filter, stride, and padding. The most common pooling layers are the Max pooling and the Average pooling. These operations with a stride of two and filter size 2 × 2 are shown in Fig. 6. Although these pooling operations provide exemplary performance in terms of positional invariance and reduce the size and complexity, there is significant information loss.
According to our literature review, little work has been done to address information loss in pooling operations. The maximum value is chosen from the feature map's window in the Maxpooling operation. In this operation, the most prominent value is chosen, while the other values are discarded. On average pooling, the average of the values in a window is selected.
The Inforset based pooling captures the holistic information of a specific window weighted as per the input. Figure 7 shows the block diagram of the process.

Step 1 Information extraction
Before information extraction, choose a nonoverlapping window of size \(m\times m \times m\) extracted from a input matrix of size \(n\times n\times n\), which is obtained after the convolution operation (where \(m<n\)). In the figure a \(3\times 3\times 3\) window is selected. Afterward, the information from the nonoverlapping window is extracted by following the procedure explained in section (InforLayer).

Step2 Collective information calculation
After information extraction, the collective information contained in the selected window is captured with the help of the following equation:
$$E_{Xi} = \frac{1}{M}\mathop \sum \limits_{j = 1}^{m} I_{X}^{1} \left( {X_{ijk} } \right)$$(19)
To calculate the overall information from the entire input matrix, the window is shifted according to the chosen stride \(\mathrm{^{\prime}}k\mathrm{^{\prime}}\) and calculate the next value of collective information by following the procedure as explained. After following the above steps, a new matrix of size S (3 × h × v) is obtained. The values of 'h' and 'v' are calculated as:
where \(\mathrm{^{\prime}}n\mathrm{^{\prime}}\) is the size of the input matrix, \(\mathrm{^{\prime}}p\mathrm{^{\prime}}\) is the applied padding, \(\mathrm{^{\prime}}m\mathrm{^{\prime}}\) is the size of the window, and \(\mathrm{^{\prime}}k\mathrm{^{\prime}}\) is the applied stride. The output matrix obtained after the InforSetbased pooling operation will be used as an input for future layers.
The proposed integration can enhance the effective information, which handles the noise better and may be suppressed in the process. This situation can be analysed by taking the following simple example:
Take an image and suppose it is corrupted by adding a salt and pepper noise. Afterward, take a small window for localized information and let this window have more black pixels than white before introducing noise. Then as per the steps of the Inforset theory, let us derive the Gaussian MFs through the generalized information gain formalism, which as per the theory, gives a measure of uncertainty in the information source values. Afterward, an information set is prepared, which is a collection of the information values corresponding to the original source values, computed using the Hanman–Anirban entropy function. In this situation, if salt falls on any pixel, then according to Inforset theory, see the belongingness of this particular salt concerning its neighborhood, this is very insignificant because it is dominated by the dark pixels. When salt noise is introduced, the source value of that specific pixel becomes higher, but its belongingness is very low due to the neighborhood. When the effective information is calculated by multiplying the information source with its belongingness, it reduces the effect of salt. On the other hand, in the case of pepper noise means the dark pixel is there. However, its belongingness is high, so if it is multiplied then, it still becomes dark. So, in this particular case, the Inforset theory is trying to extract the actual information up to some extent and suppress the noise. Table S1 explains the significant differences between the proposed models and the most relevant conventional DL models in design and functionality.
Data availability
In this study, the datasets analyzed can be found in the MNIST Database of handwritten digits repository, which can be accessed via the URL http://yann.lecun.com/exdb/mnist/. The EMNIST dataset, on the other hand, can be obtained from the Kaggle website at https://www.kaggle.com/datasets/crawford/emnist.
References
Goldberger, J. & BenReuven, E. Training deep neuralnetworks using a noise adaptation layer. In International conference on learning representations (2017).
offe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, 448–456 (pmlr, 2015).
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Zhang, Z. & Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. neural information processing systems 31, (2018).
Dodge, S. & Karam, L. Understanding how image quality affects deep neural networks. In 2016 eighth international conference on quality of multimedia experience (QoMEX) 1–6 (IEEE, 2016).
Hanmandlu, M. & Das, A. Contentbased image retrieval by information theoretic measure. Def. Sci. J. 61, 415 (2011).
Zadeh, L. A., Klir, G. J. & Yuan, B. Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems: Selected Papers (World Scientific, 1996).
Aggarwal, M. & Hanmandlu, M. Representing uncertainty with information sets. IEEE Trans. Fuzzy Syst. 24, 1–15 (2015).
Hanmandlu, M. Information sets and information processing. Def. Sci. J. 61, 405 (2011).
Pal, N. R. & Pal, S. K. Some properties of the exponential entropy. Inf. Sci. (N Y) 66, 119–137 (1992).
Medikonda, J., Bhardwaj, S. & Madasu, H. An information setbased robust textindependent speaker authentication. Soft Comput. 24, 5271–5287 (2020).
Hanmandlu, M. Robust ear based authentication using local principal independent components. Expert Syst. Appl. 40, 6478–6490 (2013).
Hanmandlu, M. Robust authentication using the unconstrained infrared face images. Expert Syst. Appl. 41, 6494–6511 (2014).
Hanmandlu, M. A new entropy function and a classifier for thermal face recognition. Eng. Appl. Artif. Intell. 36, 269–286 (2014).
Medikonda, J., Madasu, H. & Bijaya Ketan, P. Information set based features for the speed invariant gait recognition. IET Biom. 7, 269–277 (2018).
Grover, J. & Hanmandlu, M. Development of an optimal entropy classifier and prudent learning model. IEEE Trans. Artif. Intell. 3, 164–175 (2021).
Sayeed, F. & Hanmandlu, M. Properties of information sets and information processing with an application to face recognition. Knowl. Inf. Syst. 52, 485–507 (2017).
Grover, J. & Hanmandlu, M. The fusion of multispectral palmprints using the information set based features and classifier. Eng. Appl. Artif. Intell. 67, 111–125 (2018).
Grover, J. & Hanmandlu, M. New evolutionary optimization method based on information sets. Appl. Intell. 48, 3394–3410 (2018).
Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017).
Acknowledgements
The research reported in this article was conducted at Thapar Institute of Engineering and Technology in India and Virginia Polytechnic Institute and State University in the USA. Financial support for this work was provided by the US National Institutes of Health under Grant MH110504.
Author information
Authors and Affiliations
Contributions
S.B. developed the method and performed comparative evaluation and analysis. S.B. and Y.Z.Wang cowrote the manuscript. G.Y. and Y.W. edited the manuscript. All authors have discussed the work, and read, edited, and accepted the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bhardwaj, S., Wang, Y., Yu, G. et al. Information set supported deep learning architectures for improving noisy image classification. Sci Rep 13, 4417 (2023). https://doi.org/10.1038/s41598023314626
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598023314626
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.