Introduction

The incidence of clear cell renal cell carcinoma is twice as high in males as in females worldwide, and the prognosis is also worse in males1,2,3,4. Differences in subtypes3, prognosis3, and treatment response5 in ccRCC according to sex have been observed, but the underlying reasons are not well known2,6.

Sex-related differences in cancer have been reported independent of race, ethnicity, or geographic location7,8. According to the GLOBOCAN 2020 database1, increased overall cancer incidence rates (by 19%) and mortality rates (by 43%) in males have been reported worldwide. This pattern of increased cancer susceptibility in males has been observed not only in kidney cancer, but also in cancers of the bladder, lung, liver, stomach and other sites1,2,7. Differences in gene expression and mutation frequencies between males and females in cancer have been reported in various studies2,5,7,8,9,10. These findings have been observed in ccRCC11 in the USA6,12, Canada13, Europe10,14,15, and Asia10. However, the differences in genetic variation according to sex and clinical usefulness are not well studied. Proposed factors for the sex disparity include environmental factors, immunological7, hormonal2,7, genetic, and pharmacokinetic16 differences between males and females, X chromosome effects7,8, and differences in the efficiency of the immunological and genomic surveillance mechanisms between males and females7.

Despite the growing demand for personalized medicine for cancer treatment, biological sex has not received special attention in clinical practice, and differences in genetic variation between sexes are largely unknown. With the introduction of the concept of gender medicine to the field of oncology16,17, investigating and discovering sex-specific genetic variants is important as they can be used as biomarkers for personalized treatment16,18. Understanding sex differences and incorporating them into personalized treatment strategies, rather than relying on a one-size-fits-all approach for all ccRCC patients, is essential.

In this study, machine learning was performed on 417 ccRCC patients using The Cancer Genome Atlas-Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) database. We identified 68 sex-related genes in ccRCC and analyzed their association with survival. In addition, we examined regional differences by comparing results from European patients (Renal Cell Cancer-European Union, RECA-EU, 422 patients) and Korean patients (Korea-KIRC, 120 patients).

Methods

Ethics approval and consent to participate

All procedures performed in this study were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards and approved by the Institutional Review Board of Catholic University of Korea, Seoul St. Mary’s Hospital (approval no. 2018-2550-0008, date of approval: 20 November 2018). The retrospective genetic study and the treatment plan were conducted according to clinical guidelines and standard of care. The results of the current genetic study did not affect the treatment plan of patients following surgery. Informed written consent was obtained from all patients.

Feature selection and machine learning for the discovery of sex-related genes

The workflow of our study is shown in Fig. 1.

Figure 1
figure 1

Workflow of the study. ccRCC clear cell renal cell carcinoma, Korea-KIRC Korea Kidney Renal Clear Cell Carcinoma, MRMR Minimum Redundancy and Maximum Relevance, RECA-EU Renal Cell Cancer-European Union, TCGA-KIRC The Cancer Genome Atlas-Kidney Renal Clear Cell Carcinoma.

We first selected 417 patients with both somatic non-silent mutation data and clinical information from TCGA-KIRC (accessed on August 2016)19. The variant annotations of 39,532 genes of the 417 patients were obtained as an MAF file from UCSC Xena20. The machine learning methods utilized in this study were performed similarly to our previous study21. The cohort consists of 271 males and 146 females. We used Rapidminer (7.3 version, Boston, MA, USA) to implement data engineering and model building steps. The feature selection algorithm and classifiers used in the study are as follows: Information Gain, Chi-squared test, Minimum Redundancy Maximum Relevance (MRMR), Naïve Bayes, K-Nearest Neighbor (K-NN), and Support Vector Machine (SVM). The performances of the classification models in accordance with the feature selections were analyzed. We used tenfold cross validation for the model evaluation. Sex-related genes selected by machine learning were defined as genes showing differences in mutant rates based on sex.

NGS-based ccRCC gene panel

A gene panel for ccRCC was designed using next-generation sequencing (NGS), consisting of 216 genes. The panel consists of 33 sex-related genes and 123 survival-specific genes, which were identified using machine learning in our research.21 Additionally, the panel includes 21 mutant genes with a mutation frequency above 5% in TCGA-KIRC and 14 genes associated with solid tumors, along with other 26 genes related with ccRCC.

Targeted library preparation

Genomic DNA was extracted from formalin-fixed, paraffin-embedded (FFPE) tissues for library preparation. Genomic DNA was fragmented (approximately 250 bp fragments) using the Bioruptor Pico Sonication System (Diagenode, Belgium) and processed for Illumina sequencing by end-repair, dA-tailing, adapter ligation and pre-PCR for the indexed next generation sequencing (NGS) library. The prepared gDNA library and capture probes were hybridized to capture target regions using the Celemics target enrichment kit (Celemics, Seoul, Republic of Korea). Customized capture probes were designed and chemically synthesized to hybridize the target region. Captured regions were further amplified by post-PCR to enrich the amount of sample. The target-captured library was then sequenced on an Illumina NextSeq550 instrument (Illumina, San Diego, CA, USA) using the read layout 2 × 150 bp. The sequencing coverage and quality statistics for each sample are summarized in Additional file 1.

Bioinformatics analysis

Samples were sequenced by the Nextseq 550 platform, Illumina Inc. BCL2FASTQ version 2.19.1.403 (Illumina) was used to demultiplex the base-call image files into individual sequence read files (FASTQ format). All options and parameters followed default settings. Sequencing adapters were removed by AdapterRemoval version 2. 2. 2.22, after low quality bases were removed by in-house code. All sequencing reads were aligned to the GRCh37 human genome by BWA-MEM (Burrows-Wheeler Aligner) software. The program uses the Burrows–Wheeler Transform algorithm to index the human genome sequence for calculating the constant complexity of each sequencing read. Post-align and recalibration processes were performed by Picard version 1.115 (http://broadinstitute.github.io/picard) and GATK23 version 4.0.4.0. We performed variant calling with GATK Haplotype caller. All detailed parameters and options followed best practices.

Datasets

For validation of the identified sex-related genes in ccRCC, two publicly available and two private datasets were used. The TCGA-KIRC (accessed on 7 April 2021)19 and RECA-EU (https://dcc.icgc.org/projects/RECA-EU; accessed on 27 May 2021) datasets provide variants and clinical information of patients with RCC. Gene sequencing data and clinical information were available for 451 and 422 patients, respectively, including 293 males and 158 females in the TCGA-KIRC dataset (Additional file 2) and 245 males and 177 females in the RECA-EU dataset (Additional file 3).

Under the approval of the National Biobank of Korea, the Centers for Disease Control and Prevention, Republic of Korea, we acquired data from the Korean Chip, which contains genomic sequencing reads of a normal population in South Korea, from the Korea Biobank Array Project (KBN-2019-019, approval date: 21 March 2019). The project was initiated in 2014 by the Korea National Institute of Health and included 210,000 participants aged 40–69 years from the Korean Genome and Epidemiology Study24; these data were used for a customized Korean genome structure-based array with high genomic coverage and abundant functional variants of low to rare frequency25. The Korean Chip covers more than 833,000 markers including approximately 247,000 rare-frequency or functional variants estimated from approximately 2500 sequencing data in Koreans. Of the 833,000 markers, 208,000 functional markers were genotyped. More than 89,000 markers are present in East Asians.

For further validation, we chose 120 Korean patients diagnosed with ccRCC (Korean-KIRC) through either radical or partial nephrectomy and pathologic examination at The Catholic University of Korea, Seoul, St. Mary’s Hospital (Additional files 4 and 5). All participants provided signed informed consent for participation in the study. The cohort included 79 males and 41 females who underwent surgical treatment. Kidney samples were prepared from FFPE tissue and included 59 normal-tumor pairs and 61 tumor-only samples.

Data pre-processing

A variant call format file store gene sequence variations was processed using PLINK 1.9 (www.cog-genomics.org/plink/1.9; accessed on 15 November 2021). Single nucleotide polymorphisms were genotyped to genomic variants using Ensembl Variant Effect Predictor (Version 96, http://apr2019.archive.ensembl.org/index.html; accessed on 15 November 2021), and identifiers for gene annotation were added using the biomaRt (https://bioconductor.org/packages/release/bioc/html/biomaRt.html) package for R (version 4.3.1, https://www.r-project.org/)26. To focus on the presence of cancerous mutations in ccRCC, variants that were also identified in normal tissues were removed. Python (version 3.11.4, https://www.python.org/) was used to preprocess raw data. Non-synonymous mutations were only considered in this study with eliminating synonymous and intron variants. Variants with less than 2% of variant allele frequency, less than five alternate allele count and less than 100 reading depth were excluded. Finally, the benign and likely benign variants were discarded as determined by clinical significance of variants with reference to the ClinVar27.

Gene set enrichment analysis

Gene set enrichment analysis (GSEA) was performed using the Enrichr server28 (https://maayanlab.cloud/Enrichr/) to find out the biological processes and molecular function of sex-specific genes discovered from TCGA-KIRP and Korean-KIRP databases. This involved performing Kyoto Encyclopedia of Genes and Genomes (KEGG) 2021 Human and Gene Ontology (GO) Biological Process 2021 databases.

Statistical analysis

To examine associations between sex and mutations, the Fisher’s Exact test was performed with the stats module of the Scipy package (version 1.8.1), Python (version 3.11.4) which deduces the sex-specific genes showing statistical specificity in mutation frequency based on sex. The survival probabilities in male and female patients were estimated by the Kaplan–Meier method and the Log-Rank Test using Python library called lifelines29. Statistical significance was determined with p < 0.05 as a threshold and with a 95% confidence level. The Fisher's exact test was performed using the R package (stats version 0.1.0).

Results

Machine learning was performed and 68 sex-related genes were selected from TCGA-KIRC

The TCGA-KIRC cohort consisted of 451 ccRCC patients; 293 (65%) were male and 158 (35%) were female (Supplementary Table S1). Machine learning was used to select sex-related genes from TCGA-KIRC. We evaluated the accuracy of classification algorithms including Naïve Bayes, K-NN, and SVM, and the best performing classifier was Naïve Bayes. Among the three feature selection methods, Naïve Bayes showed the highest accuracy of 98.80% (931 genes) when used with Information Gain (Supplementary Table S2). The classification prediction accuracy using other classification algorithms combined with feature selection methods are summarized in Supplementary Table S3.

A total of 68 sex-related genes from the TCGA-KIRC database were selected through four different methods of machine learning. First, the top 100 genes that were weighted and ranked by each feature selection method (Information gain, Chi-squared, MRMR) were selected, and the 14 genes commonly selected by all three feature selection methods were extracted (Supplementary Table S4; Supplementary Fig. S1). Second, 13 genes located on the X chromosome were discovered among the top 100 genes selected by three feature selection (Supplementary Table S5; Supplementary Fig. S2). Third, 41 genes located on the X chromosome of 931sex-related genes were discovered (Supplementary Table S6a,b). Finally, 14 genes commonly selected by two feature selection methods were extracted among the top 100 genes ranked by each feature selection (Supplementary Table S7). Among the top 20 genes related with sex that were extracted by each feature selection method, BAP1 was extracted as the top gene in all methods and predicted to be the most important sex-related gene in ccRCC (Table 1).

Table 1 Top 20 sex-related genes identified by feature selection methods in the TCGA-KIRC database.

Twenty-three sex-specific genes were verified by statistical analysis in TCGA-KIRC, RECA-EU and Korean-KIRC databases

The 68 sex-related genes were verified statistically with the RECA-EU and Korean-KIRC databases, and total 23 sex-specific genes were identified (Table 2).

Table 2 Comparison of 23 sex-specific genes from 68 sex-related genes in TCGA-KIRC, RECA-EU and Korean-KIRC databases.

We identified 19 sex-specific genes from the 68 sex-related genes by statistical analysis in TCGA-KIRC (Supplementary Table S8). All genes are frequently or only mutated in females except for KDM5C (Supplementary Fig. S3). KDM5C was the only gene mutated predominantly in males [male:female = 7.85:2.53 (odds ratio = 2.74; p = 0.023). Among the 19 sex-specific genes, nine genes were located on the X chromosome (AFF2, COL4A5, FAM47A, IRAK1, KDM5C, KDM6A, NHS, RTL9, and WNK3). Four genes (ASXL3, KDM5C, MAGED1 and ZMYM3) were verified as sex-specific in the RECA-EU data (Supplementary Table S9). ASXL3 mutation occurred only in males in both TCGA-KIRC (2.73%, p = 0.055) and RECA-EU (3.27%, p = 0.023), and was also found at a higher frequency in males in Korean-KIRC [male:female = 11.11:5.71 (p = 0.485)]. KDM5C mutations also occurred more frequently in males [male:female = 13.47:5.65 in RECA-EU (odds ratio = 2.43; p = 0.009)]. A total of 216 genes were verified in the Korean-KIRC database. We also investigated the mutation frequencies of the 68 sex-related genes in ccRCC using the Korean Chip, which provides genome sequencing reads of 210,000 healthy Koreans25. Mutations of the 68 genes were not found in Koreans without ccRCC. Sex-specifically verified among the 33 sex-related genes in Korean-KIRC was exclusively CLN8. (Supplementary Table S10). CLN8 mutations were found only in females (1.27%) in TCGA-KIRC, more frequently in females (1.13%) than males (0.41%) in RECA-EU. However, in Korean-KIRC, CLN8 mutations were identified solely in males (14.29%, odds ratio = 6.39; p = 0.024).

Survival analysis of 68 sex-related genes and 23 sex-specific genes using three databases

We conducted survival analysis on the 68 sex-related genes using the TCGA-KIRC, RECA-EU and Korean-KIRC databases. In TCGA-KIRC, ASXL3, HAUS7, and NBPF10 were survival-specific only for males in OS (p = 0.017, p = 0.042, and p = 0.008). However, ACSS3, ALG13, BAP1, CFP, FAM47A, JADE3, KDM6A, NCOR1P1, SCRN1, and ZNF449 were survival-specific only in females. The data for these genes are presented in Supplementary Table S11. Collective survival graphs depending on the presence of mutations in the male-specific survival genes (ASXL3, HAUS7, and NBPF10) and the female-dependent survival genes in OS and DFS (ACSS3, BAP1, CFP, and FAM47A) of TCGA-KIRC were shown in Supplementary Fig. S5. Survival-specific genes showing sex differences in the RECA-EU and Korean-KIRC databases can be found in Supplementary Tables S12 and S13, respectively.

Differences in survival analysis by sex for 23 sex-specific genes were compared among the three databases (Table 3).

Table 3 Comparative survival analysis for 23 sex-specific genes in the TCGA-KIRC, RECA-EU and Korean-KIRC databases.

In TCGA-KIRC, four genes (ACSS3, BAP1, FAM47A, and KDM6A) were survival-specific only for females in overall survival (OS) (p = 0.0001, p = 0.004, p = 0.010,and p = 0.032, respectively) and disease-free survival (DFS) (p = 0.002, p = 0.001, p = 0.000003, and p = NA, respectively). Individual survival graphs of male and female patients with mutations in the four sex-specific and survival-specific genes from TCGA-KIRC were shown in Supplementary Fig. S4. ASXL3 mutations were significantly correlated with OS (p = 0.017) only in males. As a result of survival analysis, 8 genes (ADAM21, BAP1, COL4A5, KDM5C, KDM6A, ULK3, MAGED1, and CLN8) among 23 sex-specific genes in the RECA-EU dataset showed survival-specific significance. Male-specific survival differences were found in ADAM21, COL4A5, KDM5C, and CLN8 (p = 0.033, p = 0.026, p = 0.009, and p = 0.025, respectively) in OS. BAP1 and MAGED1 were female-specific in OS (p = 0.002) and DFS (p = 0.00003). KDM6A was male-specific in OS (p = 0.042), whereas female-specific in DFS (p = 0.012). The ULK3 gene mutation was specific in OS for both males and females (p = 0.002 and p = 0.037). In the Korean-KIRC database, a total of 5 genes (ACSS3, BAP1, KDM5C, KDM6A, and ASXL3) were identified as survival-specific among the 23 sex-specific genes. BAP1 was female-specific in both OS (p = 0.003) and DFS (p = 0.000004). On the other hand, ACSS3, KDM5C, and KDM6A were male-specific in DFS (p = 0.026, p = 0.016, and p = 0.049), while ASXL3 was also male-specific in OS (p = 0.005).

Nine survival genes showing sex differences were compared between three databases

Combining the above results, nine survival-specific genes (ACSS3, ALG13, ASXL3, BAP1, JADE3, KDM5C, KDM6A, NCOR1P1, and ZNF449) that were commonly identified in two or more databases were analyzed according to sex (Table 4).

Table 4 Sex-specific survival differences in 9 survival-specific genes commonly identified in TCGA-KIRC, RECA-EU and Korean-KIRC databases.

Among these nine sex-dependent survival-specific genes, ALG13, JADE3, KDM5C, KDM6A, and ZNF449 were X-linked genes. ASXL3 and KDM5C were identified as male-specific survival genes. ASXL3 showed male-specificity in TCGA-KIRC (OS, p = 0.017) and Korean-KIRC (OS, p = 0.005), and KDM5C also male-specific in RECA-EU (OS, p = 0.009) and Korean-KIRC (DFS, p = 0.016). BAP1 and NCOR1P1 were detected as female-specific survival genes. BAP1 was female-specific in all three databases: TCGA-KIRC (OS, p = 0.004 and DFS, p = 0.001), RECA-EU (OS, p = 0.002) and Korean-KIRC (OS, p = 0.003 and DFS, p = 0.000004). NCOR1P1 was also female-specific in TCGA-KIRC (DFS, p = 0.046) and RECA-EU (DFS, p = 0.00003). Figures 2 and 3 showed a comparison of the differences in survival rates by BAP1 mutations between males and females in the three databases above.

Figure 2
figure 2

Sex differences in overall survival rates by BAP1 mutation in the TCGA-KIRC, RECA-EU and Korean-KIRC databases. Graphs of overall survival in (A) TCGA-KIRC, (B) RECA-EU and (C) Korean-KIRC databases according to BAP1 mutations were compared separately males and females. The p-values are from the Log-rank test.

Figure 3
figure 3

Sex differences in disease-free survival rates by BAP1 mutation in the TCGA-KIRC, RECA-EU and Korean-KIRC databases. Graphs of disease-free survival in (A) TCGA-KIRC, (B) RECA-EU and (C) Korean-KIRC databases according to BAP1 mutations were compared separately males and females. The p-values are from the Log-rank test.

Only female patients with a BAP1 mutation had a lower survival rate than those without the mutation. The remaining genes (ACSS3, ALG13, JADE3, KDM6A, and ZNF449) exhibited varying patterns of sex-specific survival across the three databases. That is, ACSS3 exhibited survival specificity in females (OS, p = 0.0001; DFS, p = 0.002) in TCGA-KIRC, but in males (DFS, p = 0.026) in Korean-KIRC. For ALG13, it showed female-specific survival (OS, p = 0.018) in TCGA-KIRC, while it was male-specific for OS (p = 0.010) and female-specific for DFS (p = 0.0001) in RECA-EU. JADE3 was female-dependent in TCGA-KIRC (OS, p = 0.046), but male-dependent in RECA-EU (OS, p = 3.93E−54) and Korean-KIRC (OS, p = 0.00005 and DFS, p = 0.00002). KDM6A showed different sex specificity according to the databases: TCGA-KIRC (OS in female, p = 0.032), RECA-EU (OS in male, p = 0.042; DFS in female, p = 0.012), and Korean-KIRC (DFS in male, p = 0.049). Lastly, ZNF449 showed survival specificity in females (OS, p = 0.012) in TCGA-KIRC, but in males (OS, p = 0.001) in RECA-EU. Through GSEA, the results of biological pathways and gene ontology of the sex-specific genes can be found in Supplementary Fig. S6 and Supplementary Table S14.

Discussion

Although sex differences have been reported regardless of race or region in various carcinomas including ccRCC1,7,8, biological sex has not yet been evaluated as an important clinical factor in cancer treatment. Recently, with the introduction of gender medicine into oncology5,16, researchers’ interest in identifying sex-related genetic variations and using them as therapeutic biomarkers is increasing5,6,9,30.

We identified 23 sex-specific genes by comparing and analyzing 68 sex-related genes which were selected from TCGA-KIRC with RECA-EU and Korean-KIRC data. Significant differences in mutation frequencies by sex were observed when comparing these three databases We also analyzed survival differences that differed by sex among 23 sex-specific genes. Nine sex-dependent survival genes (ACSS3, ALG13, ASXL3, BAP1, JADE3, KDM5C, KDM6A, NCOR1P1, and ZNF449) were identified in at least two of three databases. ASXL3 and KDM5C were finally found as male-specific survival genes in our study.

We found male-specific survival differences in ASXL3 and KDM5C genes. ASXL3, Additional Sex Combs Like Transcriptional Regulator 3, is known to act as an adaptor protein, linking BRD4 to the BAP1 complex and regulating enhancer function in small cell lung cancer31. Tsuboyama et al. also reported that ASXL3 is highly expressed and also essential for cell viability, and that inhibition of BAP1 dramatically destabilized ASXL3 in small cell lung cancer32. In this study, ASXL3 was a male-specific survival gene in OS (TCGA-KIRC, p = 0.017; Korean-KIRC, p = 0.005). However, the sex specificity or survival specificity of ASXL3 has not yet been reported in the literature.

KDM5C, Lysine Demethylase 5C, a gene located on the X chromosome, deviates from X-inactivation and causes higher mRNA expression in female tissues6. In our study, KDM5C was the only sex-specific gene in mutation frequency in at least two of the three databases. Ricketts et al. reported that KDM5C mutation was highly observed in male patients (p < 0.0001) in the TCGA-KIRC, Japanese, and Chinese cohorts and that it was sex-specific in TCGA-KIRC (p = 0.0039) and Chinese cohorts (p = 0.0104)12. Dunford et al. also reported that loss-of-function KDM5C mutations and copy number loss of KDM5C were higher in males (p < 0.0001)6. They identified KDM5C and KDM6A as the EXITS (escape from X-inactivation tumor suppressors) genes and suggested that mutations in EXITS genes could underlie the male predominance in various cancers. KDM5C was reported to have a high mutation rate in males in RCC6,12, and a high expression rate in females in melanoma33. When GSEA was conducted between sex-specific genes based on GO and KEGG gene sets, the findings revealed that KDM5C and KDM6A were significantly enriched in histone lysine demethylation (GO:0070076) (p = 0.000058173). Notably, a previous ccRCC study by Guo and Zhang reported the involvement of histone demethylase activity in RCC34.

Female-specific survival differences were also found in BAP1 and NCOR1P1 genes. BAP1 and NCORR1P1 were finally identified as female-specific survival genes in our study. The extraction of BAP1 as the most important sex-related genes by all three feature selection methods coincided with our statistical verification and showed the validity of the study using artificial intelligence.

BAP1, BRCA1 Associated Protein 1, encodes a deubiquitylase related to multiprotein complexes that regulate cellular pathways including the cell cycle, cell differentiation, apoptosis, gluconeogenesis, and DNA damage response35. The BAP1 protein acts as a tumor suppressor and is often inactivated in ccRCC36,37. Sex difference in BAP1 with higher mutation frequency in females was observed in TCGA-KIRC (male:female = 5.80:14.56, p = 0.003), but not in RECA-EU (male:female = 12.24:14.69, p = 0.471) and Korean-KIRC (male:female = 2.53:4.88, p = 0.605) in this study. Similar to our results of Korean-KIRC, Ricketts et al. found higher BAP1 mutation frequencies in females only in TCGA-KIRC, but not in Japanese and Chinese cohorts (p = 0.001)12. Luchini et al.15 reported in a systematic review with meta-analysis of ccRCC that BAP1 mutations were mutated more often in females (p < 0.0001). Li et al.5 also reported that the incidence of BAP1 mutations in ccRCC patients was higher in females (15%) than in males (6.1%). Additionally, in our study, BAP1 was a female-specific survival gene in OS (TCGA-KIRC, p = 0.004; RECA-EU, p = 0.002; and Korean-KIRC, p = 0.003) and DFS (TCGA-KIRC, p = 0.001 and Korean-KIRC, p = 0.000004). Several studies have reported on the relationship between BAP1 mutation and survival in RCC. BAP1 mutation was reported to be associated with significantly poorer survival in female patients (p = 0.0021) but not in male patients (p = 0.7659)12. Manley et al. showed that BAP1 mutation in ccRCC was associated with decreased cancer-specific survival (p = 0.004) in a multivariable model38. Luchini et al.15 also reported that BAP1 mutated clear cell renal carcinomas were frequently observed in females and were associated with high tumor grade (p < 0.0001), increased all-cause mortality, cancer-specific mortality, and risk of recurrence.

NCOR1P1, Nuclear Receptor Corepressor 1 Pseudogene 1, is predicted to enable transcription corepressor activity, and to be involved in the negative regulation of transcription by RNA polymerase II39. NCOR1 has been reported as one of the nuclear receptor co-regulators, with mutations observed in hormone-dependent cancers such as breast, ovarian, and prostate cancers40. NCOR1P1 was found to be a female-specific survival gene in DFS (TCGA-KIRC, p = 0.046; RECA-EU, p = 0.00003) in this study. However, there are no previous reports of sex specificity or survival specificity for NCOR1P1 in ccRCC or other cancers.

Among the nine sex-dependent survival genes, the remaining five genes (ACSS3, ALG13, JADE3, KDM6A and ZNF449) showed different sex specificities depending on the databases, and lacked consistency. Some explanation is required as to whether this is due to chance or other factors. However, the limited information available does not sufficiently explain these contradictory results.

ACSS3, Acyl-CoA Synthetase Short Chain Family Member 3, is located in mitochondrial matrix and is predicted to be involved in ketone body biosynthetic process39. Limited information is available regarding the role of ACSS3, but it has been suggested as a prognostic biomarker in gastric cancer41. In our study, ACSS3 was survival-specific in females in TCGA-KIRC (p = 0.0001 in OS; p = 0.002 in DFS), whereas survival-specific in males in Korean-KIRC (p = 0.026 in DFS), showing contradictory results. The association of ACSS3 with sex or survival has not been reported in renal cancer, but it has been studied in other cancers. Zhou et al. reported that prostate cancer patients with lower ACSS3 expression had significantly shorter DFS in both univariate (HR 0.563, 95% CI 0.36–0.89, p = 0.013) and multivariate analyses (HR 0.575, 95% CI 0.36–0.91, p = 0.018)42.

ALG13, ALG13 UDP-N-acetylglucosaminyltransferase subunit, is an enzyme involved in protein N-glycosylation, and is associated with an X-linked congenital glycosylation disorder with severe developmental delay, epilepsy and intellectual disability. In this study, ALG13 was survival-specific in females (OS, p = 0.018) in TCGA-KIRC, whereas it was survival specific in males (OS, p = 0.010) and females (DFS, p = 0.0001) in RECA-EU. There has been no report in RCC regarding the association of ALG13 with sex or survival specificity. However, it has been reported that patients with high ALG13 expression had longer OS than those with low expression in non-small-cell lung cancer43, and ALG13 mutations in uterine corpus endometrial carcinoma were linked with better survival (p = 0.01)44.

JADE3, Jade family PHD finger 3, participates in promoting histone acetylation during the process of transcription45. JADE3 was female-dependent in TCGA-KIRC (OS, p = 0.046), whereas it was male-dependent in RECA-EU (OS, p = 3.93E−54) and in Korean-KIRC (OS, p = 0.00005 and DFS, p = 0.00002). JADE3 has not been reported in terms of sex or survival in RCC. However, JADE3 has been reported to be upregulated in colorectal cancer and highly associated with cancer progression, and patients with high JADE3 expression had a shorter 5-year OS (p = 0.005)45.

KDM6A, Lysine Demethylase 6A, is located on the X chromosome and encodes the histone lysine demethylase UTX. Although KDM6A was survival specific in all three databases, it showed inconsistent results across the three databases in this study. It was survival-specific in females in TCGA-KIRC (p = 0.032 in OS) and RECA-EU (p = 0.012 in DFS), whereas it was male-specific in RECA-EU (p = 0.042 in OS) and Korean-KIRC (p = 0.049 in DFS). KDM6A has been reported as a prototypical sex-biasing tumor suppressor gene by Kaneko et al.46. They reported that loss of KDM6A consistently increased bladder cancer risk only in female knockout mice but not in male knockout mice.

ZNF449, Zinc Finger Protein 449, encodes a nuclear protein that likely functions as a transcription factor. Zinc Finger proteins are known to play an important role in various cell functions such as cell proliferation, differentiation, and apoptosis47. In our study, ZNF449 was survival-specific in female (OS, p = 0.012) in TCGA-KIRC, but was in male (OS, p = 0.001) in RECA-EU. There are no reports of sex or survival specificity of ZNF449 in RCC or other cancers.

Various factors have been suggested as causes of the sex disparity, including environmental factors, immunological7, hormonal2,7, genetic, and pharmacokinetic16 differences between males and females, X chromosome effects1,7, and differences in the efficiency of the immunological and genomic surveillance mechanisms between males and females7. Sex hormones, particularly estrogen, may play an important role in regulating cellular aging and deterioration and protecting women from some cancers2. At the chromosomal level, it has been suggested that X-linked mutations are more likely to have detrimental effects in male cells than in female cells, where the potential for selective inactivation of the X chromosome accompanying the mutation is favorable2. In particular, KDM5C located on the X chromosome has been reported as an X-inactivation tumor suppressor gene6, which may explain parts of the sex difference. However, further study is needed to find out the precise mechanism of the genes that show sex differences in patients with ccRCC.

Although we performed the analysis using various genomic databases of American, European, and Korean patients with kidney cancer, this study has several limitations. First, genetic differences arising from racial factors may have contributed to the inconsistent sex-specificity between the three databases. Second, the difference in cancer subtypes included in the three databases selected for this study may have an effect. Only ccRCC data were included in TCGA-KIRC and Korean-KIRC, whereas various renal cancers, including ccRCC, were used in RECA-EU.

Conclusion

We discovered and validated sex-specific survival genes in patients with ccRCC by performing machine learning and NGS analyses the TCGA-KIRC, RECA-EU and Korean-KIRC databases. Genetic variants showing sex-specific survival differences were identified. Female-specific survival differences were found in BAP1 and NCOR1P1. Male-specific survival differences were found in ASXL3 and KDM5C. These results suggest that biological sex should be considered an important predictor in ccRCC. In the era of precision medicine, it is necessary to understand sex differences and apply this knowledge to tailor personalized medicine, rather than relying on existing treatment strategies that apply to all patients regardless of sex. Sex-specific customized treatments may improve patient survival in ccRCC.