## Introduction

Renal Cell Carcinoma (RCC) possesses the 15th rank among frequent cancers, according to the GLOBOCAN report1. RCC comprises 3% of all malignant neoplastic cases in adults and approximately 90% of malignant kidney tumors2. Based on the American Cancer Society estimation, 1 in 46 men and 1 in 80 women will be diagnosed with RCC during their life. Also, the 5-year survival rate of patients with RCC is 73.7%3. Smoking, obesity, and hypertension are important risk factors that can affect RCC occurance4. The incident and mortality rate of RCC highlight the importance of screening programs to develop reliable biomarkers for early detection of its subtypes5.

The histological subtypes of renal cancer include; clear-cell RCC (ccRCC or KIRC, 60–80% of all patients), papillary RCC (pRCC or KIRP, 10–15%), chromophobe RCC (chRCC or KICH, 5–10%), and other rare subtypes (< 1%)6. KIRC is identified by mutations in the VHL (von Hippel–Lindau) gene and the loss of chromosome 3p. The clinical severity of KIRC is higher compared with KIRP and KICH subtypes (5-year survival rate of 55–60%, 80–90%, and 90%, respectively)7 due to the lack of effective biomarkers for earlier detection. In the early stages, KIRC is frequently asymptomatic and 25–30% of patients are often diagnosed at metastasis status; hence, presenting a high mortality rate8. KIRP is characterized by the loss of chromosome 9p and trisomy of chromosomes. In KIRP, a few subgroups of patients have a satisfactory treatment outcome, while a wide range of patients needs to develop promising treatment strategies9,10. Small incidentally detected kidney masses have a major diagnostic dilemma since a section of them can be benign and managed conservatively11. A lower-risk RCC, KICH, is identified by the loss of chromosomes and can pose a slight risk to a patient if cautiously treated with routine surveillance rather than surgery11. KICH has enormous potential to be diagnosed earlier compared to other RCC subtypes due to its variable long-term outcome12. Due to these distinct clinical and biological behaviors, differentiating and accurate detection of RCC subtypes via non-invasive and precise biomarkers confidently help physicians perform an appropriate therapeutic decision. This could be different from total and partial nephrectomy or even close follow-ups.

It has been shown that candidate biomarkers are not reliable for the diagnosis and prognosis of different RCC subtypes13,14,15,16 and none of them can be employed in clinical application17. Furthermore, computed tomography (CT) and abdominal ultrasound have problems including high costs and low sensitivity in the diagnosis of small tumors4,18. Thus, the development of non-invasive and accurate screening for different types of RCC is a necessity. Comprehensive understanding of the molecular mechanisms of RCC subtypes is the main challenge in RCC research to identify novel and reliable molecular biomarkers.

This study was designed to introduce panels of mRNAs and miRNAs for discriminating different subtypes of RCC and detect remarkable molecules that have significant pathological roles in RCC subtypes through machine learning and artificial intelligence approaches. Firstly, we implemented reading and pre-processing of the RNA-sequencing data. Next, to identify candidate features (mRNAs and miRNAs), we applied feature selection based on filter and graph algorithms. Then, to evaluate selected candidate features, we employed a deep learning model to classify the subtypes of RCC. Finally, an association rule mining algorithm was used for detecting remarkable features which play significant roles in firing molecular mechanisms to cause RCC subtypes.

## Material and method

### Material

In this study, the renal cell carcinoma data, including mRNA, miRNA, and clinical data were downloaded from the GDC portal (https://portal.gdc.cancer.org) provided by The Cancer Genome Atlas (TCGA) dataset19. Downloaded data were utilized to study the molecular mechanism of RCC subtypes, including ccRCC (KIRC), pRCC (KIRP), and chRCC (KICH). In the TCGA renal cell carcinoma project, 60,482 mRNAs expression and 1,881 miRNAs expression were reported for each patient. mRNA expression and clinical data were provided for 611 ccRCC patients, 321 pRCC patients, and 89 chRCC patients. Moreover, miRNA expression and clinical data were reported for 616 patients in the ccRCC subtype, 326 patients in the pRCC subtype, and 91 patients in the chRCC subtype. Information on each subtype is presented in Table 1 in more detail. This study was conducted according to the principles of the Declaration of Helsinki (2013).

### Method

The proposed method contained four main steps: reading and preprocessing, feature selection, classification, and filtering, as shown in Fig. 1. In the 1st and 2nd steps, miRNA, mRNA, and clinical data were preprocessed using the necessary preprocessing methods. In the 3rd step, irrelevant and redundant features (mRNAs and miRNAs) were removed from the data using the proposed algorithms. In the 4th step, the deep classifier model was applied to distinguish defined groups based on the candidate features obtained in the previous step. In the 5th step, the association rule mining algorithm was employed for identifying the predominant features among candidate features playing a crucial role in each group.

miRNA and mRNA data were interpreted as a matrix composed of rows and columns, with approximately equal numbers of rows/columns; 1033/1881 and 1021/60,482, respectively. Next, redundant features were removed from the data, including 2256 mRNAs from 60,482 mRNAs and 336 miRNAs from 1881 miRNAs. Then, the hold-out cross-validation method was utilized to split data into three parts; training, validation, and test data by considering 70%, 10%, and 20% contributions, respectively. Finally, the min–max was used to scale mRNAs and miRNAs values in the [0 1] range. We applied z-score [Eq. (1)] for normalization and min–max [Eq. (2)] in feature selection and classification steps based on the need of the proposed algorithms in these steps, respectively.

$$y= \frac{x-\mu }{\sigma }$$
(1)
$$y= \frac{1}{Max-Min}(x-Min)$$
(2)

In Eqs. (1) and (2), $$y$$ and $$x$$ are the normalized and raw feature values, respectively. The µ, σ, Min, and Max are defined as mean value, standard deviation, minimum value, and maximum value of x, sequentially.

### Feature selection

In feature selection, we applied a new graph-based method for feature selection in mRNA and miRNA data separately20. Therefore, a suitable subset of candidate features was identified using the proposed algorithms. In miRNA data, a graph-based algorithm extracted the suitable subset of candidate features among 1545 features. However, in mRNA data, we first utilized the filter-based method for primary feature selection due to the existence of 58,226 features that could cause a high computational cost in the graph-based algorithm. Thus, the filter-based approach reduced the mRNA data dimension from 58,226 to 1000, and a graph-based algorithm extracted the suitable subset of candidate features among 1000 features. In the following, the proposed algorithms are specified in more detail.

### Filter method

Filter-based methods interpret the feature selection process by calculating importance measures for each feature separately. These algorithms do not apply any evaluation tool such as a classifier and are called classifier-independent techniques. Filter-based methods have low computational and time costs; however, they do not consider features interactions in the feature selection process. In this sub-step, we applied the filter-based method for primary feature selection in mRNA data. Filter-based method help to remove some irrelevant features and reduce the dimensionality for use in the graph-based process. Many filter methods have been introduced and utilized in various data with different domains, but a few are suitable for high dimension and low sample size (HDLSS) data21. In this regard, we used the $$AMGM$$ measure to evaluate the importance of features. The $$AMGM$$ filter has illustrated its powerful potential in HDLSS data22. The $$AMGM$$ measure is calculated by Eq. (3).

$${AMGM{:} R}_{i}= \frac{{AM}_{i}}{{GM}_{i}} \in [1, +\infty )$$
(3)

In Eq. (3), the $${AM}_{i}$$ and $${GM}_{i}$$ are arithmetic mean, and the geometric mean of the ith feature are shown in Eqs. (4) and (5), respectively. $${R}_{i}$$ presents a dispersion of the ith feature among all samples. The higher $${R}_{i}$$ mentions a high dispersion and more relevant feature concerning defined phenotype. When $${R}_{i}$$ is close to one, which means the ith feature has low relevancy with a defined phenotype.

$${AM}_{i}= {\overline{x} }_{i}= \frac{1}{n} \sum_{i=1}^{n}{x}_{ij}$$
(4)
$${GM}_{i}= {(\prod_{i=1}^{n}{x}_{ij})}^\frac{1}{n}$$
(5)

If the ith feature contains zero among reported values, then $${GM}_{i}=0$$ based on Eq. (5) and $$AMGM$$ measure will be inefficient. The modified version of $$AMGM$$ is defined to avoid this problem based on Eq. (6). In the revised version, the exponential function was applied to features in the numerator and denominator of the $$AMGM$$ formula.

$$AMGM{:} {R}_{i}= \frac{\frac{1}{n}\sum_{i=1}^{n}{\mathrm{exp}(x}_{ij})}{{(\prod_{i=1}^{n}{\mathrm{exp}(x}_{ij}))}^\frac{1}{n}}= \frac{1}{n \times \mathrm{exp}({\overline{x} }_{i})}\sum_{i=1}^{n}{\mathrm{exp}(x}_{ij})$$
(6)

### Graph method

Recently, graph-based methods are used for feature selection20,23. These methods display the search space as a graph using a graph representation of features. Then the principles of graph theory are employed for selecting the most relevant attributes. The graph-based method contains three sub-steps, including graph representation of feature space, community detection/graph clustering, and selecting the significant nodes in each cluster.

#### Graph representation

In the first phase, the feature set is mapped to the graph space. The graph is defined by $$\mathrm{G }= <\mathrm{F},\mathrm{ E}>$$, in which $$F=\{{F}_{1},{F}_{2},\dots , {F}_{n}\}$$ and $$E=\{\left({F}_{i}, {F}_{j}\right): {F}_{i}, {F}_{j} \epsilon F\}$$ is a set of nodes and edges, respectively. Each feature represents a node, and relevance between two features deputes an edge in the graph structure. In this work, the $${W}_{ij}$$ is used to measure the relationship between two features $${F}_{i}$$ and $${F}_{j}$$, as shown in Eq. (7).

$${W}_{ij}= \left\{\begin{array}{ll}\beta \times Relevancy-\left(1-\beta \right)\times Redundancy & if i\ne j \\ 1 & otherwise\end{array}\right.$$
(7)
$$Relevancy= \frac{\mathrm{AMGM}({F}_{i}) +\mathrm{ AMGM}({F}_{j})}{2}$$
(8)
$$Redanduncy= \left|\frac{\langle {F}_{i},{F}_{j}\rangle }{\Vert {F}_{i}\Vert \times \Vert {F}_{j}\Vert }\right|=\mathrm{cos}({\theta }_{{F}_{i},{F}_{j}})$$
(9)

$${W}_{ij}$$ contains two parts, including relevancy and redundancy. In relevancy measure [Eq. (8)], $$AMGM\left({F}_{i}\right)$$ and $$AMGM({F}_{j})$$ are the $$AMGM$$ value of features $${F}_{i}$$ and $${F}_{j}$$, respectively. Unlike the Bakhshandeh et al. that used symmetric uncertainty measure20, we applied the $$AMGM$$ to measure relevancy; due to its high potential in HDLSS data. In redundancy measure, the similarity of two features is calculated based on the cosine similarity by Eq. (9), in which $$\langle ,\rangle$$ and $$\Vert .\Vert$$ are the inner product and Euclidean norm, respectively. Also, β ($$0<\beta <1$$) is the user-defined parameter that controls the potency of each part.

$${\widehat{W}}_{ij}$$ is the normalized relationship between two features in the range [0 1], calculated by the SoftMax scaling function, as shown in Eq. (10). We employed $${\widehat{W}}_{ij}$$ as the weight of edge in the graph structure. In the SoftMax scaling, $$\overline{W }$$, $$\sigma$$ are the mean and standard deviation of weights, respectively.

$${\widehat{W}}_{ij}= \frac{1}{1+\mathrm{exp}(- \frac{{W}_{ij}-\overline{W}}{\sigma })}$$
(10)

#### Community detection

In the second phase, we applied a community detection algorithm to cluster graph space. The community detection algorithm performs clustering by finding groups of nodes that have high-density connectors internally. Nodes in the same community have high similarity properties. Community detection assists in better understanding sophisticated networks, such as genomic data networks. In recent years, researchers have introduced various community detection algorithms. Louvain algorithm is one of the fastest among them. Louvain algorithm performs the clustering process by maximizing the modularity objective function, in which the quality of partitions is compared, by the community detection process24. In addition, the Louvain algorithm is simple in the implementation phase.

In a graph with n nodes, the Louvain algorithm starts firstly with n communities by allocating each node to one community. The algorithm works based on a random selection of a node and transferring from its community to another one, then the gain modularity is calculated. This process is repeated until no improvement happens, and the algorithm will finish.

#### Node selection

In the third phase, we need to select the significant nodes in each community/cluster. In this regard, the Maximum Independent set (MIS) concept based on graph theory is utilized for node selection in subgraphs. A subset of the vertex set of a graph is independent if and only if it includes no pair of adjacent vertices. Identifying the independent set with the maximum size is an NP-hard optimization problem. It is doubtful that there exists an efficient algorithm for finding an MIS of a graph. In this study, we applied the proposed algorithm of R. Boppana et al. based on25 to get the MIS in each community/cluster. Also, we defined the adjacency matrix to employ in the R. Boppana algorithm. It is a Boolean graph matrix that is calculated by Eqs. (11) and (12). Where γ is the user-defined parameter ($$0<\gamma <1$$).

$$Adjacency \, Matrix={\left[{a}_{ij}\right]}_{n\times n}$$
(11)
$${a}_{ij}= \left\{\begin{array}{ll}1 & {\widehat{W}}_{ij}> \gamma and i\ne j \\ 0 & otherwise\end{array}\right.$$
(12)

### Classification

In this step, we applied a classifier to evaluate the candidate features selected in the previous step. High accuracy (or any user-defined measure) of classification can mark the success of the feature selection method in choosing the relevant attributes. Otherwise feature selection method cannot identify relevant features.

Employing this method, we constructed a self-organizing deep auto-encoder model to classify data based on candidate features. A self-organizing deep auto-encoder is a specific type of deep auto-encoder that can determine its structure automatically, including the number of neurons and layers26. The description details of the self-organizing deep auto-encoder are available in Supplementary Methods. First, the training process of the deep model and the model selection were performed by training and validation data, respectively. Next, the performance of the classification was estimated by employing test data. The accuracy, F1-score, and AUC-ROC were applied to evaluate the classifier performance, as shown in Eqs. (13)–(16).

$$Accuracy= \frac{TP+TN}{TP+FN+FP+TN}\times 100$$
(13)
$${F}_{1}-score= 2\frac{Precision\times Recall}{Precision+Recall}$$
(14)
$$Recall=\frac{TP}{TP+FN}$$
(15)
$$Precision=\frac{TP}{TP+FP}$$
(16)

where TP, TN, FP, and FN are True Positive, True Negative, False Positive, and False Negative, respectively. AUC-ROC is the area under the Receiver Operating Characteristic (ROC) curve.

### Association rule mining

Association Rule Mining can extract efficient associations among data items. An association rule is defined in $$A\to C$$ form, in which A and C are Antecedent and Consequent, respectively. If we consider A as the feature(s) and C as the feature(s)/user-defined phenotype, association rule mining can show interesting dependency between feature(s)-feature(s) and feature(s)-phenotype. Therefore, it is reasonable to apply the association rule mining-based method to identify important features among candidate features. In this regard, we applied the association rule mining algorithm to candidate features, which was obtained in the feature selection step. In the following, some principal concepts of association rule mining are introduced. Support, Confidence, and Lift are three important measures in association rule mining. Let $$I=\{{i}_{1}, {i}_{2}, \dots ,{i}_{d}, y\}$$ be a set of items, $$D=\{{d}_{1}, {d}_{2}, \dots , {d}_{n}\}$$ be a dataset of n instances, $$F=\{{f}_{1}, {f}_{2}, \dots , {f}_{m}\}$$ be the features space with m features, and $$Y=\{0, 1\}$$ be the user-defined phenotype. The $${d}_{i}$$ can be presented as a tuple $$({X}_{i}, {y}_{i})$$, where $${X}_{i}\in {f}_{1}\times {f}_{2}\times \dots \times {f}_{m}$$ and $${y}_{i}\in Y$$. Also, $$A\to C$$ is an association rule, where $$A\subset I$$, $$C\subset I$$, and $$A\cap B= \varphi$$. The support of rule $$A\to C$$ is the probability of instances containing both A and C, as shown in Eq. (17). Support evaluates the rule's usefulness.

$$Support\left(A\to C\right)= \frac{support(A\cup C)}{n}$$
(17)

In Eq. (17), $$Support\left(A\right)=|\left\{{d}_{i}\right|A\subseteq {X}_{i}, {d}_{i}\in D\}|$$ is the number of instances that includes the $$i$$ itemset. The confidence of rule $$A\to C$$ is the probability, which shows the frequency of cases with $$C$$ among all samples containing $$A$$, as shown in Eq. (18). Confidence estimates the rule's certainty.

$$Confidence\left(A\to C\right)=P\left(C|A\right)= \frac{support(A\cup C)}{support(A)}$$
(18)

The Lift of rule $$A\to C$$ determines the dependency between the occurrence of itemset $$A$$ and $$C$$. When the Lift value is more (less) than one, the occurrence of $$A$$ is positively (negatively) correlated with the occurrence of $$C$$. If the Lift value is equal to one, then $$A$$ and $$C$$ are independent. The Lift value is shown in Eq. (19).

$$Lift\left(A\to C\right)= \frac{P(A\cup C)}{P\left(A\right)P(C)}$$
(19)

#### FP-growth algorithm

Association rule mining mainly contains two phases: Frequent itemset generation and Rule generation. In the Frequent itemset generation, the algorithm generates all itemsets iteratively, then itemset that its support count is more than the min_support (user-defined threshold) reported as a frequent itemset. In the rule generation, association rules are made based on frequent itemsets. Agrawal et al. introduced the Apriori algorithm as one of the first association analysis algorithms, in which the property of frequent itemset is applied successfully by R. Agrawal27. In recent years many researchers have proposed many improved algorithms, such as FP-Growth28,29, Apriori-Hybrid30, Fuzzy association rule31, etc.

In this study, we utilized the FP-Growth algorithm for association analysis. In terms of computational, storage space, and time, the FP-Growth is one of the best algorithms for association analysis due to one-time searching itemset space. The pseudo-code of the FP-Growth algorithm is shown in Tables 1 and 2 in Supplementory method in more detail.

## Results

We executed the following steps to the miRNA and mRNA data (Fig. 1). In the feature selection step, we applied a graph-based method to the miRNA data. First, the graph was constructed with 1545 nodes and 1,194,285 edges, and the weights of the edges were calculated based on the $$AMGM$$ and cosine similarity measures. Next, the Louvain algorithm identified 40 communities/clusters from each; and finally, 73 candidate miRNAs were selected among communities/clusters using the MIS algorithm. Moreover, β and γ, the parameters of the graph-based method were set to 0.6 and 0.3, respectively.

In mRNA data, first, we used the filter method based on the $$AMGM$$ measure to remove some irrelevant features. In the primary feature selection, we selected 1000 top features with the highest $$AMGM$$ value, then we employed a graph-based method for the mRNA data. The graph was composed of 1000 nodes and 504,510 edges. Next, community detection of the mRNA network graph was performed by the Louvain algorithm that identified 73 communities/clusters. Finally, 77 candidate mRNAs were selected from communities/clusters using the MIS algorithm where β and γ parameters of the graph-based method, were set to 0.5 and 0.3, respectively. The list of 73 candidate miRNAs and 77 candidate mRNAs are reported in Supplementary Tables 1 and 2, respectively. Also, these candidate mRNAs and miRNAs are illustrated in Figs. 2a and 3a based on their sorted $$AMGM$$ measure.

We applied the classifier to evaluate the discrimination power of candidate features (mRNAs/miRNAs) among RCC subtypes. In this regard, the self-organizing deep auto-encoder was utilized for performing classification tasks. The accuracy, F1-score, and AUC of mRNA and miRNA data are reported individually in Table 2. Also, the confusion matrix and ROC curve are illustrated for miRNA and mRNA data in Figs. 2 and 3, respectively. Results indicated that machine learning-derived mRNAs (Accuracy = 92%) and miRNAs (Accuracy = 95%) panels could significantly distinguish these subtypes from each other with high accuracy.

In similar studies, all subtypes, including ccRCC, pRCC, chRCC, WT (Wilms tumor), and RT (Rhabdoid tumor), were classified based on selected miRNAs in the feature selection process. However, these subtypes are different in cancer nature and patient type. The ccRCC, pRCC, and chRCC are common adult kidney cancer. In contrast, WT and RT are common pediatric kidney cancer. Thus, feature selection and classification of all subtypes may lead to missing information related to pediatric kidney cancer subtypes. In addition, the comparison classification accuracy of similar studies with the proposed method will be difficult due to this difference. Nevertheless, the accuracy of the applied methods is illustrated in Table 3. Also, we do not find any similar studies related to ccRCC, pRCC, and chRCC subtypes based on mRNA data for comparison.

In the association rule mining process, candidate features were scaled into the [0 1] range using min–max normalization. Next, candidate features (mRNAs/miRNAs) were discretized into three categories, including low, medium, and high levels, so that the number of miRNA and mRNA items were equal to 212 and 233, respectively. Then, the FP-Growth algorithm was applied to generate frequent itemsets and association rules. Parameters of the algorithm, including min-support (frequent itemset), max-length (maximum length of frequent itemset), and lift (association rule), were set to 0.1, 4, and 1.1, sequentially.

To discover patterns, association rules that consequently were equal to KIRC/KIRP/KICH were selected. In the miRNA data, the number of association rules related to KIRC/KIRP was equal to 27,635/23,198. Moreover, in the mRNA data, the number of association rules related to KIRC/KIRP was equal to 94,354/28,956. Due to the lack of samples in the KICH subtype, frequent itemsets and related association rules were not generated; the support count of itemsets was less than the min-support threshold. In the antecedent part of the selected association rules, mRNAs and miRNAs with the highest repeat count were considered significant features in each RCC subtype. In Figs. 4a,b and 5a,b, significant miRNAs and mRNAs are shown as a graph network based on repeat count in KIRC/KIRP rules. Moreover, the strength distribution of KIRC/KIRP association rules according to their support, lift, and confidence is illustrated in Figs. 4c,d and 5c,d for miRNA and mRNA, respectively. In this regard, mRNAs and miRNAs were identified based on sorted repeat count (Supplementary Table 3). We hypothesized that these top features with the highest repeat counts may play a fundamental role in the pathogenesis of specific subtypes.

Box plots of five top miRNAs and mRNAs in KIRC and KIRP subtypes are illustrated in Fig. 6. The comparison of medians, interquartile ranges, and whiskers in box plots of significant miRNAs and mRNAs demonstrated a notable difference in all RCC subtypes. The pair plot of five top miRNAs and mRNAs of KIRC/KIRP rules are illustrated in Supplementary Figs. 1c,d and 2c,d, respectively. Moreover, the correlation of the top ten miRNAs and mRNAs of KRIC/KIRP rules is shown in Supplementary Figs. 1a,b and 2a,b, respectively based on the Spearman correlation.

CSN7A (ENSG00000111652.8) and NMD3 (ENSG00000169251.11) were the most frequent itemset with 5774 and 2306 repeat counts in KIRC and KIRP association rules, respectively. Moreover, miR-28 and miR-125a were the most frequent itemset with 2642 and 2591 repeat counts in KIRC and KIRP association rules, respectively. As a result, we decided to examine these miRNAs and mRNAs more closely using other association rules to investigate their relation with another feature. More in-depth coverage of these findings is available in the discussion section of the study. In this regard, we showed these relations based on association rules in the graph network (Figs. 7 and 8).

In Fig. 7a, it is obvious that hsa-miR-28, the most frequent miRNA in KIRC association rules, has a high dependency on hsa-let-7i, hsa-miR-196a-2, hsa-miR-324, hsa-miR-198, and hsa-miR-152, respectively. In Fig. 8a, it is clear that hsa-miR-125a, the most frequent miRNA in KIRP association rules, has a high dependency on hsa-miR-99b, hsa-miR-374b, hsa-miR-186, hsa-miR-28, and hsa-miR-32, respectively. In Fig. 7b, it is noticeable that CSN7A, the most frequent mRNA in KIRC association rules, has a high dependency on ENSG00000110801.12 (PSMD9), ENSG00000126247.9 (CAPNS1), ENSG00000130560.7 (UBAC1), ENSG00000100138.12 (SNU13), and ENSG00000143344.14 (RGL1: Ral guanine nucleotide dissociation stimulators like 1), respectively. In Fig. 8b, it is observable that ENSG00000169251.11 (NMD3), the most frequent miRNA in KIRP association rules, has a high dependency on ENSG00000126247.9 (CAPNS1), ENSG00000185085.2 INTS5 (integrator complex subunit 5), and ENSG00000163001.10 (CFAP36), respectively.

## Discussion

We carried out a comprehensive machine learning analysis of clinically significant patterns of the miRNAs and mRNAs within the RCC subtypes. A panel of 77 candidate mRNAs and a panel of 73 miRNAs could discriminate KIRC from KIRP with high accuracy. The association rule mining analysis could identify top mRNAs and miRNAs with the highest repeat counts, suggesting their possible pathological roles in each RCC subtype. The CSN7A and miR-28 along with the NMD3 and miR-125a were the most frequent itemsets in the KIRC and KIRP association rules, respectively. The roles of these mRNAs have not been studied before in the RCC. In the following sections, we present a brief discussion on the possible roles of these mRNAs and microRNAs in the pathogenesis of the KIRC and KIRP based on the published literature.

### Candidate RNAs in KIRC

The CSN7A, UBAC1, PSMD9, RNF40, and Capn4 were identified as candida mRNAs in the KIRC with the highest repeat counts by the association rule mining analysis. Machine learning approaches could detect novel targets in the field of the KIRC and bring the UPS (ubiquitin–proteasome system) and its components (CSN7A, UBAC1, PSMD9, and RNF40) into the focus of interest. The degradation of regulatory proteins by the UPS has an important function in controlling the cell cycle progression, DNA repair, response to extracellular stress, and signal transduction. The ubiquitin, proteasome, ubiquitinating enzymes (including ubiquitin-activating (E1), -conjugating (E2s), and -ligases (E3s) enzymes) along with deubiquitinating enzymes are the key components of the UPS system36. Most of the E3 ligases are cullin-RING ligases (CRLs) that determine the substrate-selectivity in response to specific stimuli either to degrade by the proteasome or modify their function. The COP9 signalosome (CSN), another component of the UPS, is a multi-subunit (CSN 1–8) metalloprotease complex37 that plays roles in gene expression, cell-cycle control, and DNA-damage response38, and increases the stability of some proteins including EGFR39. Furthermore, the CSN plays a role in controlling the activity of NF-κB, an inflammatory transcriptional regulator involved in cell survival, proliferation, and transformation40,41.

The VHL is the substrate recognition subunit of the E3 ubiquitin ligase complex that polyubiquitylates the hydroxylated targets under normoxia42. Under the loss of the VHL, the degradation of its specific targets, hypoxia inducible factors (HIF1α and HIF2α) does not occur, making a pseudohypoxia state. Both VHL deficiency and the accumulation of the HIF-1 promote the NF-κB activity that subsequently stimulates an NF-κB/PI3K/AKT/TGF-β/EGFR/IKK signaling cascade, resulting in the activation of proliferation and glycolytic pathways, a pro-angiogenic and apoptosis-resistant phenotype, and highly vascularized tumors43,44. It is reported that the CSN elevates the efficacy of the VHL-mediated HIF-1α recognizing, ubiquitination, and degradation45.

In our deep learning analysis, the CSN7A (COPS7A), 7a subunit of the CSN, was identified as the first top mRNA with the highest role in the KIRC. It is reported that in tumor tissues, the CSN7A level is decreased which may be associated with the oxidative phosphorylation pathway46 and the transcription-coupled nucleotide excision DNA repair47. Moreover, an overexpressed CSN7A could stimulate the IκBα deubiquitinylation and; consequently, suppressing the transcriptional activity of the NF-κB48. The mechanism by which the CSN7A can impact KIRC has not been defined; however, the CSN7A may function alike in the KIRC.

The UBAC1, the second identified mRNA in this study, is a subunit of the KPC complex, an E3 ubiquitin-protein ligase. The UBAC1 acts with the proteasome and ubiquitinated proteins such as the Nf-κB49,50. The UBAC1 also contributes to the inflammatory signal transduction pathways and affects cell proliferation and viability in keratinocytes51. The role of the UBAC1 in the KIRC needs to be clarified. The PSMD9 was 3rd top KIRC-related mRNA found in our rule mining analysis. The PSMD9, as a part of the 26S proteasome, regulates protein degradation52. The PSMD9 is overexpressed in tumor tissues and associated with cell proliferation, hostile tumor outcome, and resistance to the therapy53,54,55. Since different tumor suppressors and oncogenes are controlled by the Ub- and proteosome-mediated degradation, the CSN7A, UBAC1, PSMD9, and RNF40 may play important roles in the pathogenesis of the KIRC. The CSN7A in the KIRC association rules has a high dependency on PSMD9, CAPNS1, and UBAC1, SNU13, respectively. Our study may open an innovative horizon to investigate the role of the CSN7A, UBAC1, PSMD9, and RNF40 in the pathogenesis of the KIRC.

microRNAs are critical managers of the development and progression of the RCC. They function as oncomirs or anti-oncomirs. miR-28, let-7i, miR-23b, miR-125a, miR-22 were top five identified miRNAs that were dysregulated in KIRC. For more details see Supplementary file, discussion part.

#### Candidate RNAs in KIRP

In the Rule mining analysis, the NMD3, ZNF41, CFAP36, FGFR1OP2, and RGL1 were identified as the top five mRNA with high association rules in the KIRP (≥ 2195).

The transcript patterns of the ribosomal proteins are tumor- and tissue-specific56 and their induced abnormal translation can provoke a malignant phenotype independent of chromatin remodeling and the deregulation of the transcriptional process57. The 60S ribosomal export protein NMD3 (NMD3) was the first identified mRNA involved in the pathogenesis of the KIRP in our study. The NMD3 is a nuclear adaptor protein that transports the 60S subunit into the cytoplasm via the nuclear pore complex and participates in the cytoplasmic maturation of 60S particles58,59. In the cytoplasm, the release of the NMD3p from 60S subunits needs a GTPase and the ribosomal protein (Rpl10p). Any mutation in these proteins leads to cytoplasmic retention of the NMD3 on pre-60S subunits, blocking ribosome assembly and biogenesis60,61. The NMD3 also exerts a significant effect on RNA biosynthesis, mainly ribosomal RNA, and consequently may impact tumorigenesis62. A direct role of the NMD3 needs to be elucidated in the KIRP.

Respectively, miR-125a, miR-23b, miR-210, miR-99b, and miR-101-2 were the top five identified miRNAs dysregulated in KIRP. The possible pathological roles of the candidate mRNAs and miRNAs identified in this study are presented in Fig. 9 and explained in the Supplementary file in more detail. Although much remains to elucidate the KIRP mechanism, the roles of the NMD3, ZNF41, CFAP36, FGFR1OP2, and RGL1 are of considerable interest. The NMD3 in KIRP has a high dependency on CAPNS1, INTS5, and CFAP36, respectively.

The molecular features presented in this study will offer new insights into the underlying mechanisms that are responsible for the initiation and progression of KIRC and KIRP. Moreover, the ability to diagnose KIRC and KIRP with a high certainty level will help pathologists accuratelydifferentiate the most common subtypes of RCC. In this way, appropriate clinical decision-making strategies can be obtained. Furthermore, the ability of artificial intelligence for accurate differentiation of RCC may reduce unnecessary intervention rates. Our attempts to develop molecular discriminating patterns had some limitations. First of all, we did not evaluate the molecular mechanism of the candidate RNAs in the RCC models. In vitro and in vivo studies are needed to be performed to achieve this goal. Second, we did not validate the identified RNAs in clinical samples. Future studies would address this issue in large-sample-sized studies. We believe that adding other features from mutations, polymorphisms, alterations in copy number, and DNA methylation platforms would effectively tackle the discriminating problem of RCC subtypes and improve their early detection.

## Conclusion

In this paper, deep learning-driven biomarkers were presented for discriminating common subtypes of RCC. Panels of 77 mRNAs and 73 miRNAs could discriminate the KIRC, KIRP, and KICH subtypes from each other with high accuracy. The CSN7A and miR-28 along with the NMD3 and miR-125a were the most frequent itemsets in the KIRC and KIRP association rules, respectively. Due to a frequent mutation in protein-coding regions and an elevated burden of unfolded proteins; an elevated protein turnover was necessary for those speedily dividing cancer cells. Hence, the inhibition of the UPS components appeared to be a hopeful strategy for KIRC therapy. The identified mRNAs and microRNAs in this study can regulate signal transduction, cell cycle machinery, and apoptosis and all are relevant contributors to carcinogenesis and cancer progression. Therefore, they may provide further insight into the pathogenesis, diagnosis, prognosis, and molecular-targeted therapy in RCC subtypes (Fig. 9).