Radiogenomic classification for MGMT promoter methylation status using multi-omics fused feature space for least invasive diagnosis through mpMRI scans

Accurate radiogenomic classification of brain tumors is important to improve the standard of diagnosis, prognosis, and treatment planning for patients with glioblastoma. In this study, we propose a novel two-stage MGMT Promoter Methylation Prediction (MGMT-PMP) system that extracts latent features fused with radiomic features predicting the genetic subtype of glioblastoma. A novel fine-tuned deep learning architecture, namely Deep Learning Radiomic Feature Extraction (DLRFE) module, is proposed for latent feature extraction that fuses the quantitative knowledge to the spatial distribution and the size of tumorous structure through radiomic features: (GLCM, HOG, and LBP). The application of the novice rejection algorithm has been found significantly effective in selecting and isolating the negative training instances out of the original dataset. The fused feature vectors are then used for training and testing by k-NN and SVM classifiers. The 2021 RSNA Brain Tumor challenge dataset (BraTS-2021) consists of four structural mpMRIs, viz. fluid-attenuated inversion-recovery, T1-weighted, T1-weighted contrast enhancement, and T2-weighted. We evaluated the classification performance, for the very first time in published form, in terms of measures like accuracy, F1-score, and Matthews correlation coefficient. The Jackknife tenfold cross-validation was used for training and testing BraTS-2021 dataset validation. The highest classification performance is (96.84 ± 0.09)%, (96.08 ± 0.10)%, and (97.44 ± 0.14)% as accuracy, sensitivity, and specificity respectively to detect MGMT methylation status for patients suffering from glioblastoma. Deep learning feature extraction with radiogenomic features, fusing imaging phenotypes and molecular structure, using rejection algorithm has been found to perform outclass capable of detecting MGMT methylation status of glioblastoma patients. The approach relates the genomic variation with radiomic features forming a bridge between two areas of research that may prove useful for clinical treatment planning leading to better outcomes.

upon current radiogenomic analysis methods, artificial intelligence can be employed to construct complex predictive deep learning radiogenomic models 30 .
Deep convolutional neural networks are employed for a vast array of tasks, including medical image analysis 31,32 and image classification 14,33 . Deep learning architecture has encoding blocks ordered in multiple layers. The feature maps in lower layers are forwarded to subsequent layers with increased complexity order. A convolutional neural network (CNN) 34 massively reduces the number of neurons due to sparse interaction in comparison to shallow neural networks. The transfer learning methodology based on CNN is well proven for quite some time [35][36][37][38][39] and has been extensively used in the analysis of different imaging databases 37,38,40,41 , neuroimaging 42 , MRI, CT (Computed Tomography) 36 , and ultrasound images 43 . Transfer learning using CNN based on AlexNet and GoogleNet for the ImageNet dataset is well known deep learning approach 44 . The CNNs are extensively used in vision-related applications including object detection 45 , spanning classification 46 , and segmentation 47 . The combination of data pre-processing and augmentation with transfer learning can be helpful for improved classification results. In our case, since the dataset is enormous, only pre-processing is seeming of great value along with a fine-tuned CNN architecture.
The features used as the source of training essence should have discrimination ability that would be exploited to predict the regions across the hyperplane with a maximum confidence level as the target label. In this context, there is a gap for efficient and robust automated systems for brain tumor detection using MRI. Hsieh et al. 48 defined brain tumor categories based on region-of-interest (ROI), feature selection, and feature extraction. They used local textural features including global histogram moments using 107 images of gliomas (73 low-grade; 34 high-grade). The work, however, was reported on a limited dataset and lacking features based on other static feature extraction methods. Cheng et al. 49 used a T1-weighted Contrast-Enhanced brain MRI dataset 50 having three types of tumors (glioma, pituitary, and meningioma), experimenting with three feature extraction methods, namely intensity histogram, bag-of-words (BoW), and gray level co-occurrence matrix (GLCM), finding that BoW performs relatively better at a higher computational cost. The accuracy was limited due to the absence of preprocessing scenarios that could lead to improved discriminative features. The hybrid of solution spaces for the three characteristic feature sets was not explored. Similarly, Sachdeva et al. 51 extracted color and textural features based on segmented ROIs, using the genetic algorithm for features' selection with optimum fitness level, and reported the accuracy as 94.90% using a genetic algorithm-based artificial neural network (GA-ANN). However, for large datasets, the colored images have different color tints necessitating the use of staining procedures as an essential step. Further, the dynamic features need to be explored with deep learning algorithms to address enhanced discrimination features.
Claro et al. 52 used hybrid feature space formed by textural features, like Tamura (coarseness, contrast, directionality, line-likeness, regularity, and roughness), gray level run length matrix (GLRLM), histogram of oriented gradients (HOG), morphology, local binary patterns (LBP), merging the extracted features using seven CNN architectures for glaucoma classification. Their feature space was based on 30,862 dimensions, which was squeezed by the gain ratio for arranging the features according to their performance concluding in an optimum setting for glaucoma detection. They found the GLCM descriptor with transfer learning-based features to be the most effective for their specific problem structure. The work needs to be explored on multi-parametric and multi-institutional datasets, for larger and more diverse datasets, with dynamic features using residual feature maps concatenated in the successive layers. Garcia et al. 53 solved the problem of imbalanced datasets by using ensemble classifiers with feature space partitioning. The parameters of the partitioning were optimized by using a hybrid metaheuristic method, called GACE which combined a genetic algorithm (GA) with a cross-entropy (CE) method. More elaborative work using generative adversarial networks (GAN) can be used to tackle the underlying problem of class imbalance. Shaban et al. 54 introduced a hybrid feature selection methodology that extracts the features with optimum characteristics using COVID-19 CT images. The feature selection is based on fast and accurate selection stages. They used an enhanced version of k-NN that is not trapped due to solid heuristics in choosing the neighbors of the tested subject. The work can be explored using other ML classifiers, along with DL classification techniques to involve the features based on high-level abstraction layers.
In this research, we propose a novel two-stage MGMT Promoter Methylation Prediction (MGMT-PMP) system, that precisely quantifies the image structure of GB in patients from the evaluation of FLAIR, T1w, T2, and T1Gd mpMRIs. We have selected the popular feature types, viz. GLCM, HOG, and LBP, and fused these features with novice deep learning features forming a hybrid feature set (HFS) differing vis-à-vis in three aspects. Firstly, it engages a novel Deep Learning Radiomic Feature Extraction (DLRFE) module that extracts dynamic features based on the problem structure into the classification process leading to promising results. Secondly, it provides different categories of second-order statistics and local textural features exploiting the positive aspects of each category of feature extraction modules. Third, the system is based on filtering that uses the rejection algorithm for removing redundant and irrelevant features from the RSNA dataset thereby improving the discrimination or variance and leading to its quick convergence. A comparison with recent techniques is also presented for performance analysis.
The key contributions of this research work are summarized as follows: • This work is related to the MGMT promotor methylation status affecting the efficiency of chemotherapy in GB patients where 'MGMT+' status increases the effectiveness of chemotherapy. • A novel two-stage prediction system for MGMT promoter methylation status, that precisely quantifies the image structure of glioblastoma in patients using FLAIR, T1w, T2, and T1Gd mpMRIs. • A novel deep learning-based feature extraction module that extracts dynamic features based on the problem structure. www.nature.com/scientificreports/ • The hybrid feature set formation by fusing deep features with static features of the origin GLCM, HOG, and LBP. • RSNA ASNR MICCAI Brain Tumor Segmentation BraTS 2021 challenge was used for radiogenomic classification with 348,642 mpMRI scans for the very first time in published form (using performance measures, viz. accuracy, F 1 -score, and Matthews correlation coefficient). • The rejection algorithm is introduced for removing redundant and irrelevant features from the dataset.
• A detailed complexity analysis of the individual and combinatorial hybrids formed by the fusion of dynamic and static feature sets. • A comparison of the proposed work with other state-of-the-art techniques. The paper organization follows; "Importance in the Clinical management and the survival of GB patients" briefly details the impact of the study on the survival of GB patients with clinical management, "Material and methods" is dedicated to materials and methods, "Results and discussion" details the results and discussion, followed by Conclusions in section "Conclusions". The abbreviations used throughout this article are illustrated in Table 1.

Importance in the clinical management and the survival of GB patients
Gliomas with varying levels of heinous symptoms have been declared a serious threat to the central nervous system. The BraTS-2021 focuses on the molecular representation and structure of the underlying tumor in intraoperative neurosurgery and serves preoperative ground using mpMRI data 18,40 . Its objective has been the localization of the brain tumor sub-regions that are microscopically distinct in structure. The identification of tumor boundaries in MRI is of importance in surgical treatment planning, intraoperative brain incision to monitor the tumor growth, and planning the radiotherapy and chemotherapy maps (RCM) following the surgical treatments. www.nature.com/scientificreports/ Irrespective of the poor prognosis of GB patients, the tumor's MGMT promoter methylation status, which can be found by radiogenomic classification of mpMRI scans, is extremely important to indicate the chemotherapy response prediction. Further, due to the proposed fine-tuned framework, using low-cost GPU-based portable machines would greatly help medical practitioners assist consumers along with the support system to manage post-surgical treatment more effectively. Some recent cohorts have focused on the information freshness notion. Yang et al. 55 offset the COVID-19 effects by controlling the diffusion of the epidemic by introducing Age of Information (AoI), a measure for the quantification of information freshness, an optimization scheme using artificial intelligence-based diagnostic bots. Initially, they formed a health state monitoring system where the diagnostic biosensing data was transmitted through bots using edge servers. It was followed by the derivation of AoI problem in a closed form. They also proposed algorithms for bot placement and channel selection using stochastic learning.

Material and methods
The proposed framework (Fig. 1), namely MGMT Promoter Methylation Prediction (MGMT-PMP) System, forms a highly discriminative feature set, HFS, in two stages. In stage one, the notion is to extract features using the DLRFE module where the features from the last flattened layers of the convolutional neural network are  www.nature.com/scientificreports/ acquired, whereas, in stage two the second-order statistics and local textural features are extracted. The features from these stages are merged to form HFS. The feature fusion is further used for radiogenomic classification by a strong ML classifier, like SVM or k-NN algorithms. The dataset details, preprocessing, and feature extraction finally follow the prediction model in this section.

Dataset. The proposed framework was analyzed on a publicly available RSNA-MICCAI Brain Tumor
Radiogenomic Classification dataset (BRATS-2021) 56 , for a featured code competition that was introduced for the treatment of brain cancer to predict the state of the genetic biomarker. The BraTS-2021 is going to be a common benchmark for GB segmentation algorithms using multi-institutional and multi-parametric Magnetic Resonance Imaging (mpMRI) images of 2000 patients suffering from glioma. The preoperative MRI images were divided into training, validation, and testing cohorts. For the classification task, the target labels, based on the MGMT promoter methylation status, are provided for 585 subjects. The testing cohort is not accessible currently and the validation data for 87 subjects are provided without labels. We have used the training data provided with labels and further partitioned it for testing by cross-validating the results ten folds. The available modalities are T1 weighted (T1w), T1 weighted contrast enhancement (T1wCE), T2 weighted (T2w), and fluid-attenuated inversion-recovery (FLAIR). The two tasks that BraTS-2021 focuses on are: first, segmentation of tumor subregions, and second, radiogenomic classification of the MGMT promoter methylation status 18 . Some rescaled mpMRI images from the BraTS-2021 dataset, as illustrated in Fig. 2, showing from top to bottom: sagittal, axial, and coronal views, and the variation in columns represents the intra-class variance of the dataset. The specifications for training and testing instances in BRATS-2021, from the given "train" directory, have been illustrated in Table 2. Similar information files leading to redundant data are removed using the rejection algorithm (RA). The RA trimming effect on instances has been illustrated in the lower part of Table 2 where the specifications of the reduced dataset have been shown ("Preprocessing").
In the absence of a testing cohort, the BraTS-2021 dataset is constituted of 348,642 training instances. The sharing of this dataset can be useful provided that its security-related issues are addressed The large amount of data generated by heterogeneous Internet of Medical Things (IoMT) may be dispatched to the cloud servers for   57 perceived the physical-layer security and over-centralized server problem in wireless medical sensor networks, and proposed a reliable authentication protocol using cuttingedge blockchain technology and physically unclonable functions. Further, the biometric information was dealt with a fuzzy extractor scheme.
Preprocessing. Standardized preprocessing procedures have been adopted for all the mpMRI scans included in the BraTS-2021 dataset. The pre-processing included NIfTI file format conversion 58 , co-registration to the template of normal adult human brain anatomy for an MRI-based atlas SRI24 59 , resampling to 1mm 3 isotropic resolution, followed by skull stripping, isolating cortex and cerebellum from the skull and nonbrain area, as illustrated in Fig. 3. All the preprocessing phases are handled using publicly available support such as Cancer Imaging Phenomics Toolkit (CaPTk) including Federated Tumor Segmentation (FeTS) tool 18,60-62 .
The preprocessing channel commenced with the conversion of the DICOM file format to NIfTI files 18,58 . The files for radiogenomic classification, one of the two tasks of the BraTS-2021 dataset, are not co-registered while for segmentation, the second task, the dataset undergoes registration to a standard template provided as NIfTI files. The NIfTI format removes the associated metadata including all the Protected Health Information (PHI) from the DICOM files. In addition, since the skull-stripping lessens the extent of facial reconstruction, which may be used for face recognition of the patient subsequently, it is based on a representation learning-based methodology that describes the brain shape prior and is independent of the MRI input [63][64][65] .
For radiogenomic classification, the entire mpMRI images are preprocessed as illustrated in Fig. 3, to generate the skull-stripped volumes, and finally converted from NIfTI to DICOM format consisting of skull-stripped images. Finally, a two-step process for data deidentification, comprising of RSNA CTP (Clinical Trials Processor) www.nature.com/scientificreports/ anonymizer 18 , and whitelisting of these outcome DICOM files, is carried out. This removes all unnecessary tags, ensuring the removal of all PHI entries, from the DICOM headers. For radiogenomic classification, the entire images are segregated into two classes, namely "0" and "1" based on the presence of MGMT promotor methylation status, with T1W, T1WCE, FLAIR, and T2W combined in either of the classes depending on the annotation allocated by the expert radiologists and pathologists. These images are further converted from DICOM (12 bits) to jpg (8 bits) format and resized from 512 × 512 to 64 × 64 to speed up the training process, which is unavoidable for an enormous dataset, and takes care of memory issues encountered when using an average-price GPU based portable system.
Pseudocode for rejection algorithm. The rejection algorithm, illustrated in Fig. 4, has been run for each class. The sole objective is to remove the instances that do not add discrimination to the radiogenomic classification process. The "datapath" of either of the binary classes is followed by exclusive loading of the entire image set for either of the classes, MGMT− (Class "0") or MGMT+ (Class "1"). The number of images is fetched to act as the stopping criterion for the algorithm. The rejection algorithm checks the redundant/irrelevant images and removes them one by one from the specific class by using the threshold of the sum of pixel values (T h ). After thorough investigation, T h has been empirically found equal to zero for the optimum results. Once all the images have gone through RA, the output consists of the details for each of the rejected files, with the revised datapath loading producing the improved class details. The difference in the initial instance count of a class should match the rejected files count plus the final class size (after RA). The RA resulted in 26.75% and 27.51% reduction in the number of instances for classes "0" and "1" respectively.
Deep learning-based latent shape features. We propose a DLRFE module for radiomic characteristics using deep learning by feature bleeding through the fully connected layers. The parametric tuning of the module is carried out for the least memory requirements and computational overhead culminating in maximum efficiency using average-price GPU-based computer systems. It is composed of 3 convolution blocks as illustrated in Fig. 5. The weights and biases have been maintained at smaller values using L2-regularization and a dropout layer after the third block paving an improved generalization policy. www.nature.com/scientificreports/ The DLRFE module is constituted of two networks; an encoder, to infer latent variables given input image, and a fully connected network to infer the prediction of the deep learning architecture. The input to the encoder network is an image sized: (64 × 64), and the output is a classification prediction. The encoder network consists of 3 convolutional network blocks (Block-1, Block-2, and Block-3) followed by three fully connected layers. Each block consists of convolutional, ReLU, and max-pooling layers with batch normalization after Block-2, and a dropout layer after Block-3.
The fully connected layer, after converting feature maps to a 1D vector, connects neurons to other layers, causing dynamic extraction of latent features. Three fully-connected layers in the DLRFE module are FC1, FC2, and FC3 with 512, 64, and 2 neurons respectively. The activation function is carried out by the SoftMax layer for the non-normalized output of FC3 using two neurons with a probability distribution in the range [0, 1] to a binary basis. The kth class probability as the cross-entropy loss (CEL) is determined using a normalized exponential function: P k = e O k M m=1 e Om where O k is the activation of the kth class. The CEL has two label sets: the target and expected labels are represented as t(x) and p(x) respectively. The loss is defined as: L t, p = − ∀x t(x)log p(x) . The updating of weights during backpropagation is carried out by minimization of the cost function given by: − 1 J J j=1 ln p t j |x j , where J is the size of the training set, x j represents the training sample with the target label t j and p(t j |x j ) is the classification probability. The stochastic gradient descent approach is used epoch-wise, with each epoch divided in the form of mini-batches, for the minimization of the cost function. The updated weight for the i + 1th iteration in layer l,W i+1 l is given by is the corresponding weight update. The network training takes place in a circular way, feedforward, and feed-backward iteratively.

Radiomic and radiogenomic features.
Radiogenomics is based on quantitative data collected from medical images having individual genomic phenotypes, and the notion is to design a prediction framework to categorize patients for clinical outcomes. The genomic fractional variation in the tumor DNA can be obtained by multiple radiomic features during radiogenomic analysis using mpMRI scans 18,66,67 . The BraTS-2021 dataset for radiogenomic classification task is based on mpMRI scans and MGMT promoter methylation status. Many cohorts are employing this dataset leading to machine learning-based solutions for the prediction of MGMT promoter methylation status using radiomic features based on gray leveled imaging. Numerous radiomic feature extraction methods have been introduced and exploited depending on the nature of the problem structure and the corresponding solution domain. We have selected second-order statistics and textural features to build HFS in addition to the latent shape features ("Deep Learning-based Latent Shape Features"). We have used the three most effective feature extraction modules (FEMs) for radiogenomic classification tasks corresponding to individual feature extraction strategies, namely GLCM, HOG, and LBP. Each of the modules extracts diverse features independently from each of the mpMRI scans. www.nature.com/scientificreports/ GLCM features. Many researchers have employed textural features successfully and solved classification-linked problems 68,69 , including the categorization of brain tumors 40,70 . We used GLCM-based FEM which is based on the spatial relationship of pixels in an image. These highly discriminative features describe the texture based on the repeatability of pixel groups with specific values that exist in a two-dimensional relationship in an image 71,72 . The GLCM is a square matrix (size: N × N), where N represents a different number of gray levels in an image. If the matrix is denoted by M(i, j) where each element (i, j) of GLCM represents the frequency of occurrence of a particular relationship between two intensities of pixels in the input image, where "i" represents the gray level of the pixel at location (x, y), and "j" represents the gray level of a contiguous pixel positioned at a relative distance "d" from the pixel at an orientation "θ"' . The relationship between the ith and jth intensities is based on these two parameters, d and θ, in four directions (0°, 45°, 90°, 135°) with an increment of 45° without symmetry repetition. We selected 13 Haralick features in our work as illustrated in Table 3 showing mathematical formulae for their calculation and comprehensive definitions 73 . These features are computed to form M, whereas p ij is (i, j)th entity of M divided by its size, the expressions m i and m j , and σ i and σ j represent the mean and standard deviations of ith row and jth column of M. The directions averaged features are values are fused to form the GLCM-based feature vector A schematic diagram for GLCM calculation has been shown in Fig. 6. In the GLCM illustration, Fig. 6b, the top row and left column, cyan cells, are pixels presented in the input matrix. The GLCM, Fig. 6b, calculation using the input matrix, Fig. 6a, illustrates the calculation of the matrix at 0° and d = 1 for neighboring intensity pairs (3,0). In Fig. 6a, a pixel with intensity '3' is present with pixel '0' in a pair (3,0) four times as shown with a yellow circle in Fig. 6b. Similarly, the entire GLCM is calculated for single orientation and adjacent pixel intensities pairs. HOG features. HOG features efficiently perform classification tasks due to highly discriminative characteristics associated with their extraction procedure 74,75 . The HOG module extracts the direction and gradient-based information from the input image that helps in describing the structure of the problem. The HOG feature extraction process is illustrated in Fig. 7a-e. Figure 7a shows an original mpMRI scan (BraTS-2021 dataset), while Fig. 7b represents the schematic of cells and a block superimposed on the original input image. Figure 7c shows the HOG descriptors with a schematic of a block and a cell, both depicted separately in Fig. 7d, while the magnified view of a single cell is shown in Fig. 7e. The feature extraction strategy consists of three activities: i. The computation of gradients by calculating the direction or magnitude of each pixel. The Sobel kernel function is used to obtain gradient in E x and E y directions to calculate gradient and angle at every pixel using the denotes the direction of the gradient, i and j denote rows and columns respectively.
ii. The small cells of size (r × s) are derived from the input image as shown schematically in Fig. 7b,d. Table 3. Textural features used in the formation of HFS.

Features Equation Definition
Angular second moment A measure of uniformity of distribution of grey levels in the image A measure of image linearity. It will be high if an image contains a considerable amount of linear structure The variance is a measure of the dispersion of the gray-level differences at a certain distance The measure of closeness of the distribution of GLCM elements to the GLCM diagonal The average sum of gray levels The variance of a sum of gray levels The uniform distribution of a sum of gray levels has maximum entropy The variance of a difference in gray levels The uniform distribution of a difference in gray levels has maximum entropy Information measures of correlation 1 F 12 = HXY −HXY 1 max{HX,HY }

Normalized mutual information
Information measures of correlation 2 The difference between joint entropy and joint entropy assuming independence www.nature.com/scientificreports/  www.nature.com/scientificreports/ iii. Finally, the cells are combined into overlapping blocks, each of size (p × q), and in each block, a histogram of oriented gradients falling into each bin is computed which is further subjected to the normalization process (L 2 -norm) to overcome illumination variation. The normalized vector for each block of the histogram is given by: where ∈ is any constant that can't be divided by zero, and T indicates the non-normalized vector.
The features so collected from all the normalized blocks are fused to form a feature descriptor, x HOG , for the entire image.
LBP features. Local binary patterns (LBP) descriptor is another efficient feature extraction operator generating high-level characteristics that are used to assign a label to each pixel of an image by thresholding its neighborhood and translating the result as a binary number 76 . LBP has the advantages of rotation invariance along with gray level invariance 77 . This feature module encodes the relationship between the pixel and its neighbors in a circular manner by describing the local spatial structure of an image. The binary output using the LBP operator is obtained using the difference (G p -G c ) on a per-pixel basis and checking it through the central pixel (G c ) along with the surrounding pixels G p , where p is limited in the range 1,8 around 3 × 3 receptive areas 30 . The working principle of the LBP operator on a pixel is illustrated in Fig. 8. The LBP values are computed by binarization based on the difference between the contiguous pixels, where two groups are formed and each element is assigned to either of the group, with the help of a step function. The central pixel value LBP p,r (G c ) is given by: The threshold function, H(x), treats values that are greater than zero or equal to zero. In the above equation, r represents the central pixel distance from the neighboring pixels (radius), and p represents the total number of pixels minus the central pixel included in the process 78,79 . In Fig. 8, r = 1 and p = 1,8 , have been employed with a receptive area of 3 × 3 sized mini-image. The last stage is carried out by converting the binary codes of zeros and ones to decimal numbers to form an LBP image 80 .
Jackknife cross-validation. The Jackknife cross-validation with tenfold engaged in this work for parametric optimization including the training and test data formulation. It is a commonly used approach for the verification of robustness and confidence of the system performance ( Fig. 1) for model selection on potential algorithms 81 . In this technique, the data is divided into 10 folds ( F i ; i = 1, 2, . . . 10) or partitions. The test portion, based on a single fold, is crossed and the training portion consisting of (K-1) folds is partitioned into training and validation sets 82 . The diagonal folds, shown as crossed rectangles ( F i A i ; i = 1, 2, . . . , 10 ), represent the test partitions and the off-diagonal folds, non-cross folds, represent training partitions, shown as plain rectangles ( F i A j ; i = 1, 2, . . . , 10; j = 1, 2, . . . , 10 AND i � = j ), which are divided into training and validation sets. The best parameters-based model is used with the test data to evaluate the performance. We have used the stratified cross-validation scheme due to the imbalanced class distribution in the dataset to give a close approximation of the generalization accuracy ( A i ; i = 1, 2, . . . , 10 ). This ensures an equal number of instances of each class distributed across the training and test partitions 83 . The performance (A c ) estimated by cross-validation using each test fold accuracy A i ∈ ℜ i = 1, 2, 3,…, K, folds is given by: www.nature.com/scientificreports/ The average performance A is given by: Classification model. The individual features including their combinatorial hybrids and HFS have been employed in the classification system using SVM and k-NN algorithms as the potential ML tools to find a reliable classification solution. The SVM was optimized using linear, RBF, and polynomial kernels. A brief overview of the classifiers being selected for this article is given by: The k-NN, a proximity-based classifier, is based on the notion that the test instance q would have the maximum number of nearest neighbors for the prospective class. In this classification technique, distance d is measured from query q to k-nearest neighbors, x t that lie in class y t . The q is assigned the class y u having the maximum number of neighbors among the k-nearest neighbors 84,85 . In the case of k = 1 , the test instance is simply assigned the class of the single nearest neighbor. The formula for Minkowski distance d mk , a generalized distance metric is given by: where the larger value of r gives more influence to the distance on which q differs the most. Some of the distance metrics are Euclidean (L 1 -norm), Mahalanobis (L 2 -norm), and Chi-square distances as given in Table 4. Another variant mostly used is distance weighted-voting where the q receives vote V from each of the nearest neighbors weighted inversely to their distance from q 85 : where 1 y u , y t is unity when both labels match, zero otherwise, and z determines the type of distance measure being adopted during classification.
Support vector machine classification. The SVM maps the supervised-learning tth instance pair (x t , y t ), x t being the sample with label y t , from a data set X in the sample space S separated by a hyperplane with a maximized margin on either of its sides 86 . The class label-based least-confident points near the hyperplane are the support vectors. The better generalization of the model depends on maintaining the margin as much as possible. The outlier (noise) does not influence the decision boundaries significantly as it would simply ignore its effect in the training phase 87 . On the other hand, the SoftMax layer in CNN will be influenced by such a point due to its probability-based working principle. In other words, the SVM is favored as a strong classifier with a reduced error rate. The hyperplane function, defined for the linearly-separable classes, is given for the ith class by: where x is the input training vector and w T i is weight vector that is orthogonal to the hyperplane for i th class, and b is the bias of the decision plane. The distance on both sides of the hyperplane defines a margin that is maximized when the weight vector w is minimum. To find the optimal separating hyperplane, SVM aims to maximize the margin as given by:

Distance Function
Euclidean www.nature.com/scientificreports/ Performance measures. The evaluation of the model is established on performance measures like confusion matrix, accuracy, sensitivity, specificity, precision, negative predictive value, receiver operator characteristic curve curves, F 1 -score, and Matthews correlation coefficient. "TP" is the number of images detected with a specific promoter methylation status and actually, they are of the same status. "FN", also known as Type II error, is the number of images detected not found with a specific promoter methylation status but truly they are having that specific status and "FP", also known as Type I error, is the number of images detected with a specific promoter methylation status but they are not truly having that status. "TN" is the number of images that neither are having a specific promoter methylation status nor are labeled as having a specific status by the classifier.  FN) . This measure of performance is also known as Recall (R c ) or true positive rate (TPR).
Specificity, S p , defines the usefulness of a classifier to know instances of MGMT− class. It can be computed by: S p = TN/(TN + FP) . The specificity is also known as true negative rate (TNR).
Precision, P r , defines the proportion of all cases testing positives that truly belong to MGMT+ class. It can be calculated by: P r = TP/(TP + FP) . The precision is also known as positive predictive value (PPV).
Negative predictive value, NPV, defines the proportion of all cases testing negatives that truly belong to MGMT− class. It can be calculated by: NPV = TN/ (TN + FN). F 1 -score, measured in the range [0, 1], is the harmonic mean of P r and R c , and it can be mathematically expressed as: F 1 − score = (2 × P r × R c )/(P r + R c ) . This measure is significant in case the class imbalance is present in the dataset.

Results and discussion
We analyzed the proposed framework to categorize mpMRI scans into either MGMT+ or MGMT− instances. After preprocessing, GB images are used for latent and radiomic features extraction, feature fusion is investigated and HFS so formed is forwarded to a strong classification algorithm. All of the experiments have been conducted using open source libraries, and the standard programming tools used to tune the proposed system using Dell G7 Laptop (Intel® Core™ i7 8th Generation CPU), 32 GB RAM, and 6 GB GPU (NVIDIA GTX-1060 and 1280 CUDA cores).
Experimental setup. The setup was initiated by selecting appropriate parametric values to extract the latent and radiomic features. The experimentation has been carried out by employing the tenfold Jack-knife cross-validation on the BraTS-2021 dataset. The optimal heuristics selection has been discussed in "Selection of optimal parameters". The contrast normalization was carried out in the range [0-1] before the classification stage. The classification results ("Dataset") have been generated on unseen test samples, with hidden labels used only for performance measurement, of the test folds. Sections "Performance analysis of DL-based latent feature extraction", "Analysis of parameters for Deep learning and radiomic feature methods", and "Performance analysis of individual FEMs" analyze the classification performance of the latent-, individual-and hybrid-features respectively. Section "Performance analysis of HFS" illustrates the time involved in feature extraction (two stages), and finally classification during the training and testing phases. Section "Performance comparison" describes a comprehensive comparison of the proposed technique with existing schemes in terms of classification performance followed by the shortcomings and the future recommendations for cohorts working in the areas of common interest.
Selection of optimal parameters. The classification performance of our framework, the MGMT-PMP system, is dependent on the tuning of numerous parameters for classification using ML classifiers. Subsequently, the analysis of optimal values of the DLRFE module, latent feature extraction, radiomic feature modules, and the strong machine learning models have been presented in detail. Only k-NN and SVM classifiers have been used, www.nature.com/scientificreports/ and the parameterization has been achieved based on classification performance accuracy. We have used these optimal values in the forthcoming sections.
k-NN model. The proposed model has been experimented with four variants of the k-NN classifier ( k = 1, 3, 5, 7 ) to categorize HFS into MGMT+ and MGMT− classes. In our case, the results for the k-NN classifier have been found to outclass relatively at k = 1 as illustrated in Table 5 for the BraTS-2021 dataset without RA. The performance of the k-NN model has been shown in Table 6 with RA. The confusion matrices for varying numbers of epochs have been illustrated in Fig. 9 for radiogenomic classification of the BraTS-2021 dataset with k-NN classifier with RA. The correlation between the features for the same class, resulted in the reduction of false events for test instances after the RA is applied to the original dataset. If we compare the classification results using Table 5, it is inferred that the application of RA resulted in improving the discriminative features (Fig. 6) thereby alleviating the characteristics that are based on similar or overlapping features. The confusion matrices in Fig. 6 show that excellent results have been found from higher to lower epoch counts as the deep learning module has been used alone for dynamic feature extraction, and the k-NN is engaged for training and test purpose using these features.
The improvement in classification performance using RA has been found remarkable as shown in Fig. 10 which is a visual representation of quantitative analysis illustrated in Tables 5 and 6. Figure 10a explains the trends followed by a variation of training epochs employed by the DLRFE module with and without RA. Initially, the classification accuracy exhibits transitory behavior indicating a converging trend followed by a uniform behavior till the termination of 50 epochs. Figure 10b shows the ML-based classification accuracy using a k-NN model and latent features with and without RA. The plot represents epochs on the x-axis that have been used before the DL features were bled from the main architecture through the last fully connected layers ("Deep learning-based latent shape features"). It has been found that the classification accuracy is insensitive to the varying epochs so that the lesser epochs are sufficient to exploit the discriminative features derived for the proposed framework. F 1 -score measures reliable performance in case of imbalanced datasets, i.e. BraTS-2021. Figure 10c illustrates trends followed by the F 1 -score for original and RA-based datasets with the variation of training epochs for representation-learning-based discriminative features. It has been found that RA modified dataset resulted in outclass performance at its fifth epoch-based latent features. The low epoch features relate to the underfitting of the model due to high bias. These features resulted in good generalization during machine learning according to Occam's razor principle, stating the simplest solution is the best. The higher number of epochs corresponds to training even on noise distribution causing overfitting, and sacrificing generalization, so that the resulting Table 5. Proposed system results (tenfold-cross validated) with epochs variation depicting analysis-based performance using k-NN classifier with k = 1 for BraTS-2021 dataset (without RA). Significant values are given in bold.  Table 6. Proposed system results (tenfold-cross validated) with epochs variation depicting analysis based on performance using k-NN classifier with k = 1 for BraTS-2021 dataset (with RA). Significant values are given in bold. www.nature.com/scientificreports/ model with high variance exhibits compromised performance on unknown instances. Figure 10d shows a similar trend for MCC indicating the performance improvement using RA for the original dataset. Figure 10e shows the AUC (ROC) plot showing the marked improvement with RA. Figure 10f follows the P r trend indicating the performance rise due to RA. Figure 10g shows a similar trend of R c with and without RA. A careful look at Table 5 indicates that the prediction discrimination threshold between positive and negative sides appears to increase S n , by moving towards the latter, reducing FNs as illustrated in Table 7 showing the cross-validated (tenfold) distribution of confusion matrix components for different epochs runs for latent features extraction. At the same time, S p decreases due to increased FPs following the same logic. A similar trend is encountered between P r and R c as that of S p and S n respectively. There is a slight decrease in R c with RA due to a slight increase in FN but at the same time has the effect of raising S p from (70.48 ± 0.00)% to (97.08 ± 0.10)% using Tables 5 and 6 respectively at epochs = 5 with k-NN for k = 1. The overall effect of RA is a balanced increase in S p and S n , i.e. (97.08 ± 0.10)% and (95.86 ± 0.14)% respectively as illustrated in Table 6 (k-NN with k = 1, epochs = 5). It can be inferred that high S n (↓FNs) is accompanied by positive results and high S p (↓FPs) leads to healthy subjects. Figure 10h shows the comparison of higher to lower NPV trends. This can be explained based on previous findings. Figure 10i plot has been based upon NPVs with and without RA. The latter has shifted the predictive threshold boundary to the negative side so that FNs are reduced. In the case of RA based NPV plot, the predictive threshold boundary between positive and negative subjects is balanced in such a way that so that the false events are almost equally divided across it, as illustrated in Table 7. This results in balanced PPV and NPV values. At relatively higher epoch values, for getting deep learning features, the NPV is higher in comparison to the NPV without RA because of a relatively lower number of FNs (Table 7).

Epoch S p (%) R c (or S n ) (%) P r (or PPV) (%) NPV (%) AUC (ROC) F 1 -score MCC
SVM model. Three important variants of SVM classifier using linear, polynomial kernels of order ∈ (2,3,4) and radial basis function (RBF) have been experimented with to classify HFS into MGMT+ and MGMT− classes using latent features. The polynomials of higher-order (> 4) were not able to generate significant results. The results using SVM for binary classification of BraTS-2021 with RA using latent features extracted at the fifth   Table 8. It can be observed that SVM RBF has achieved outclass performance among other variants for the proposed model. Another study has been carried out by changing the number of folds in the range [10, 20,…,50] during crossvalidation for both k-NN and SVM classifiers for accuracy and F 1 -score. The performance variation between different folds has been found insignificant and negligible. However, the time for classification rises with an increase in the number of folds. The optimum number of folds was found ten and used throughout the article.
Performance analysis of DL-based latent feature extraction. The DLRFE module extracts the dynamic features from fully connected layers using the mpMRI scans. The convolution process is based on local operators, in case the kernel size is small, that can be thought of as a subset of fully connected layers, and the same is true for pooling, whereas the fully connected layers have a global concept so that each neuron is connected with every neuron in the next layer causing information to pass through each input to each output class. The final decision is based on the whole image 89 . The features in the first two fully connected layers, FC1 and FC2, are bled from the tuned architecture in the forward direction for the training and test partitions. The features are blended to form the latent features x latent . The weight distribution in two fully connected layers is used to decide for either of the classes. The variation of the number of unknowns with a varying number of DL layers for the deep learning architecture is illustrated in Table 9 The parametric set finalized after empirical tuning is given in Table 10.

Analysis of parameters for deep learning and radiomic feature methods. The deep learning
(latent) and radiomic feature extraction, constituting stages 1 and 2 of the proposed methodology, involves www.nature.com/scientificreports/ several parameters that need to be addressed by selecting their optimum values before the formation of HFS. We address the selection of appropriate values of parameters by an experimental process for each of the categories of feature extraction in a sequential manner.
Feature space visualization of deep learning-based FC1 and FC2. The t-SNE plots, a variant of Stochastic Neighborhood Embedding (SNE), for radiogenomic classification for the test cases using deep learning-based fea-  www.nature.com/scientificreports/ tures, namely x FC1 , x FC2 , and x latent , are illustrated in Fig. 11a-c, d-f using the 1-NN classifier ( k = 1) , "Performance analysis of DL-based latent feature extraction", with original dataset and reduced dataset with RA applied respectively. For each of the high-dimensional feature vectors, the mapping is carried out in a non-linear manner to a lower (two) dimensional plan. The improvement in visualization is attributed solely to the alleviation of the tendency of points-crowding in the central region. Although there are marked inclinations of false events in FC1 and FC2 without RA due to overlapping regions, Fig. 11a-c, the fused DL features improved the results. Similarly, as illustrated in Fig. 11d-f using RA, the features represented the discrimination of vectors lying in either of the classes. Further improvement, by the exploitation of features, is carried out by our proposed system with the notion to find the complex hyperplane causing discrimination between MGMT+ and MGMT− classes.
GLCM feature extraction process. We experimented with five distance values in the range 1,5 , each corresponding to a GLCM with four directions of θ, and each value of GLCM corresponds to its directional average corresponding to a specific value of d. The independent values of θ, without averaging, resulted in four GLCMs, each corresponding to a specific direction. So, a total of 20 GLCMs were generated without direction averaging. Further, several combinations of GLCMs and the hybrids of features thereof, based on cross-averaging of GLCMs using different combinations of θ, were also investigated for the classification of mpMRI scans to MGMT+ and MGMT− classes. The performance of GLCM features varies with the value of d. The array of offset matrices used for this experimentation to try different combinations of d and θ has been shown in Fig. 12a. The perception of the relationship between d and accuracy is illustrated in Fig. 12b. Further, the best results for our problem were found at ( d = 3 ) with direction averaging of four angles incremented by 45° without the repeated symmetric view. Initially, the classification accuracy shows a rising trend, and finally, accuracy falls off with a local rising surge. But the peak value of accuracy has been found at d = 3 , therefore, this value of the relative distance from the neighboring pixels has been used for the feature extraction process.
HOG feature extraction process. The investigation was carried out to tune the parameters of the feature descriptor for our specific problem of radiogenomic classification. Further, we tried individual HOG parameters, based on classification accuracy, between the number of bins, block-and cell-sizes as illustrated in Fig. 13. Here, Fig. 13a shows the trend for the effects of varying the number of bins. Initially, the classification accuracy shows a rising trend, and finally, accuracy falls off. But the peak value of accuracy has been found at the uphill of the initial part. There is a sort of compromise found between the number of features and generalization. The higher  www.nature.com/scientificreports/ number of features, due to the higher number of bins, resulted in dropped performance due to the curse of dimensionality. Figure 13b shows the effect of block size on possible options depending on the input image size. The higher block size resulted in outclass performance with the number of bins selected as 9 and block size as 32 × 32. Figure 13c shows the plot for cell size variation with classification accuracy. Higher cell size, before the maximum possible value, resulted in increased classification performance. The features exploiting the large-scale spatial characteristics are based on a large HOG cell size. Consequently, the small-scale related features are lost with large cell sizes. Therefore, a compromise exists between the cell size and the details required depending on the inherent structure of the problem being experimented with. Similarly, local illumination changes are explored at a smaller HOG block size as the important information is lost when averaging the pixel intensities in a large block (with a relatively larger number of pixels). Consequently, the significant changes in the local pixels can be explored by reducing the block size. The parameters that play a key role in attaining its best individual performance were determined using the k-NN classifier, and the optimum selected values are illustrated in Table 11 out of the under-trial options. The optimum results for our problem structure were found using cell size: 32 × 32, and block size: 2 × 2 based on 9 bins resulting in x HOG = 36.

LBP features.
We investigated different parameters of the LBP feature extraction module that significantly affected its performance. The trials were based on the number of neighbors p, radius r, and the receptive area to get an insight into the relationship between these parameters and the model performance for radiogenomic classification. Figure 14 shows the effect of individual LBP parameters on radiogenomic classification. Figure 14a shows the trend of classification accuracy with p so that increasing the latter encodes more details corresponding to the surroundings of each pixel. The optimum classification accuracy is 70.44% associated with a minimum number of features as 59 and r = 1 . Similarly, Fig. 14b illustrates how varying r, the pointing boundary for the circular pattern through which neighbors are selected, influences the performance in terms of classification accuracy. The accuracy is 92.11% using the optimum value of p for r = 5 . The optimal parameters for the LBP feature extraction module are illustrated in Table 12. The experimentation for the LBP parametric analysis was carried out using 1-NN classifier with RA that significantly influenced the performance of the framework. Table 11. Optimal parameters of HOG feature extraction module.

Parameter Options (valid under trial) Number of features Optimum selection
Number of bins 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 20, 24, 28, 32, 36, 40, 60, 80, Table 13 illustrating the corresponding results using BraTS-2021 with and without RA. Each of the extraction modules has been found to accompany reasonable performance measuring indices. However, a close analysis of the results reveals that the proposed latent features with RA perform relatively better comparing all the performance metrics. Similarly, HOG features, based on the structure of the problem with RA, have also accompanied adequate classification results. Figure 15 shows a comparison of various feature sets for their size and classification performance accuracy. The FC2, using the proposed deep learning architecture, is claimed to   www.nature.com/scientificreports/ have a marked effect in comparison to its share (9.35%) in the HFS. The next optimum in this respect is the HOG set, with optimized parameters for our problem structure, with a 5.26% share in the HFS resulting in remarkable individual accuracy.
Performance analysis of HFS. This set of experiments is based on a combinatorial strategy using individual features from both stages, i.e. latent features from Stage 1 and radiomic features from Stage 2. The distribution of features in HFS into various categories and their proportionate share is illustrated in Table 14. The 2-Stage HFS formation for the proposed radiogenomic classification framework is illustrated in Fig. 16. The feature extraction modules are fused in this trial by manifold combinations, each consisting of two, three, four, and five feature types. Each of the experiments is evaluated using BraTS-2021 with RA using various perfor-   www.nature.com/scientificreports/ by the application of RA so that the diversity in HFS is combined by exploiting five distinct feature extraction modules. Each of the extraction modules, having its possible solution domain, extracts the most favorable and discriminative characteristics from the mpMRI scans. When the features originated from the RA-based reduced BraTS-2021 dataset fused, they collectively generated better results, and an overall accuracy increase of 17.59% and 17.99% was observed for individual features' performance and hybrid strategies respectively when compared with results without RA. From another perspective of sensitivity analysis of epochs on proposed deep learning features extraction, and consequently on ML-based classification has been illustrated in Table 18 on the radiongenomic classification dataset with RA. The epochs when larger than five marginally influence, 0.8% increase, the performance measuring standards with radiomic as static feature extraction techniques.
Computational time analysis of the framework. It is worthwhile to analyze the proposed framework for its space and time complexity in terms of CPU time requirements. To check the performance of the framework, the CPU time involvement has been considered for various feature extraction modules on an individual as well as group basis, using ML/ DL classifiers after applying RA, and training/ testing time on a per-image basis. The hybrid feature extraction time is the mere sum of the time taken by the individual FEMs.
The time requirements (with epochs) for deep learning feature extraction in minutes, machine learning training time (MLTrgT) in minutes, and machine learning testing time (MLTstT) in ms/ image are illustrated in Fig. 17a-c as DLFET, MLTrgT and MLTstT respectively. Figure 17a shows that the latent feature extraction time rises with epochs, and five epochs have been found optimum ("Selection of optimal parameters") so that the HFS culminates in high classification performance accuracy. Further, RA application resulted in a reduction of feature extraction time, so there is a time gap observed between the two trends.
In   www.nature.com/scientificreports/ in milliseconds for testing on a per-image basis using a 1-NN classifier. The testing time on a per-image curve is always accompanied by quick convergence in the case of RA. The trend of feature extraction time/image has been demonstrated in Fig. 18a. Although a relatively higher extraction cost for FET is observed for LBP, attributed to its large radius that was found optimum considering its contribution on an individual basis ("LBP features"), the proposed system is tractable computationally, only 8.67 ms for the HFS extraction, due to time involvement per feature extraction module. However, the preprocessing times, the training time for deep features extraction, and the RA times are not included in these computations as they are to be involved once, and from thereon never to be repeated, until and unless new test data is involved which has to follow the same norms as defined earlier. Further, the higher time cost of LBP is justifiable by performance rise compared to other FEMs. Figure 18b illustrates the feature vector size for individual and hybrid features, where the latter is just the summation of the former. Figure 18c shows the classification time for individual FEMs and their hybrid. The classification time varies with the length of the feature vector, and the classification time of HFS is high compared to the individual FEMs.
Performance comparison. The classification performance of our proposed framework has been related to existing solutions for brain tumor classification using well-known radiomic features through the CE-MRI dataset. The BraTS-2021 dataset, based on radiogenomic data, has no published work so far to the best of our knowledge. In this context, we have implemented four techniques 17,40,79,90 from the contemporary literature with optimal parameters for our comparison with the BraTS-2021 dataset. We used accuracy as the comparative performance measure. The tabular comparison has been illustrated in Table 19, which shows relatively better performance of the proposed framework over others in terms of performance. The radiomic features are texture oriented while latent features are problem structure oriented. The proposed system, however, uses discriminative HFS feature vectors wherein each module acquires knowledge associated with the problem structure. The manifold combinatorial fusion of different modules reinforces the multi-solution domain culminating in relatively superior performance measures.
The effect of RA on radiogenomic classification accuracy is being reported with an improvement of 16.57% in the case using the reduced dataset exploited by RA. The outclass performance of the proposed MGMT-PMP system with RA is based on three key facts: firstly, simpler model, thereby reducing the complexity of the algorithm due to more focused/ highly discriminative features from the dataset using two-stage HFS, secondly, the computational cost is reduced due to the rejection of 27.18% instances, and thirdly, resulting in an improved generalization on testing instances with more discriminative feature vectors.
Future challenges and recommendations. The main challenge to the proposed system is that it needs a clinical trial for resolving the expert opinion corresponding to the patient's data with a second opinion. Another challenge is to define a systematic clinical step-up culminating in patient management with GB. Moreover, a study is required using the proposed technique for evaluation of its impact on the survival rate of GB patients with an associated clinical management system.
The deep learning-based prediction strategies, working as a source of latent feature extractor, are complex as well as opaque where performance metrics, viz. accuracy, F 1 -score, sensitivity, and specificity, are dependent on enormous parametric space using deep learning algorithms. In this context, the XAI 91,92 proposed that the deep learning architectures should be examined using a white box that is transparent for multi-modal data fusion.

Conclusions
In this research activity, we proposed a novel classification system MGMT-PMP, bridging imaging and genomics, for predicting radiogenomics classes using genetic variation associated with response to radiations. In this context, a novel fine-tuned CNN architecture, namely the DLRFE module is proposed for latent feature extraction to capture the features that dynamically bleed the quantitative knowledge related to the spatial distribution and the size of tumorous structure using the brain paraphernalia by mpMRI scans. It finally results in the development of a dynamic feature vector that is used in the radiogenomic classification. Further, in the proposed scheme, several radiomic features, namely GLCM, HOG, and LBP are extracted from mpMRI scans using the BraTS-2021 challenge dataset. The application of the novice rejection algorithm has been found very effective in selecting and delineating the instances out of the main dataset. The fusion of latent features with radiomic features, forming a hybrid feature collection as HFS, is then used in a different number of neighbors using k-NN classification. Working with mpMRI scans, using k-NN ( k = 1) resulted in 97.28% training and 96.94% test classification accuracy. www.nature.com/scientificreports/ We have compared the proposed technique with numerous existing brain tumor classification techniques with the BraTS-2021 dataset and observed a significant increase in radiogenomic classification performance using RA. It has been established through results that dynamic-and static-feature fusion, using HFS, culminated in classification performance improvement as compared to individual features. This research study can have many implications extendable to multiple directions. First, this study links the genomic variation with radiomic-based data characterization algorithms paving the bridge between the two independent areas of research. The second possibility is its comprehensive use in real-time brain tumor surgery for the removal of leftover tumor cells by chemotherapeutic treatment as an alternative in tie to radiotherapy. Third, the proposed system can be used as an alternative without any dedicated machine as a portable low-cost solution for brain surgery. Further, this study can be of great impact on the clinical management and survival of GB patients.

Data availability
The use of all data mentioned in this article is publicly available at: https:// www. kaggle. com/ compe titio ns/ rsnamiccai-brain-tumor-radio genom ic-class ifica tion/ data.