Role of FRG1 in predicting the overall survivability in cancers using multivariate based optimal model

FRG1 has a role in tumorigenesis and angiogenesis. Our preliminary analysis showed that FRG1 mRNA expression is associated with overall survival (OS) in certain cancers, but the effect varies. In cervix and gastric cancers, we found a clear difference in the OS between the low and high FRG1 mRNA expression groups, but the difference was not prominent in breast, lung, and liver cancers. We hypothesized that FRG1 expression level could affect the functionality of the correlated genes or vice versa, which might mask the effect of a single gene on the OS analysis in cancer patients. We used the multivariate Cox regression, risk score, and Kaplan Meier analyses to determine OS in a multigene model. STRING, Cytoscape, HIPPIE, Gene Ontology, and DAVID (KEGG) were used to deduce FRG1 associated pathways. In breast, lung, and liver cancers, we found a distinct difference in the OS between the low and high FRG1 mRNA expression groups in the multigene model, suggesting an independent role of FRG1 in survival. Risk scores were calculated based upon regression coefficients in the multigene model. Low and high-risk score groups showed a significant difference in the FRG1 mRNA expression level and OS. HPF1, RPL34, and EXOSC9 were the most common genes present in FRG1 associated pathways across the cancer types. Validation of the effect of FRG1 mRNA expression level on these genes by qRT-PCR supports that FRG1 might be an upstream regulator of their expression. These genes may have multiple regulators, which also affect their expression, leading to the masking effect in the survival analysis. In conclusion, our study highlights the role of FRG1 in the survivability of cancer patients in tissue-specific manner and the use of multigene models in prognosis.


Material and methods
Workflow. We chose the top seven cancers with the highest incidence For this study based on Global Cancer Observatory data 8 . FRG1 co-expression data of the top 20 most correlated genes was obtained from cBioPortal 9,10 . mRNA expression and clinical datasets for all the patients in each cancer type were downloaded from Genomic Data Commons (GDC) Data Portal (Htseq-FPKM-UQ) 11 . Figure 1 shows the workflow for the complete analysis. Kaplan Meier survival analysis was performed in each cancer type to observe the effect of FRG1 mRNA expression on the survivability of patients 12 . Stratified multivariate cox regression 13 was performed to determine the association between overall patient survival and gene expression levels of FRG1 along with the 20 correlated genes (top correlated genes based on the spearman's correlation, r s ) in all the cancer types. The model was optimized by removing the least correlated genes sequentially till the FRG1 was significant. The risk score was calculated for each patient, and the patients were divided into low and high-risk groups based on the median risk score 14,15 . We created Kaplan-Meier plots to identify the difference in survival between the low and high-risk groups in different cancer types. Box plots were created to represent the FRG1 expression in both the risk groups.
Using STRING 16 and HIPPIE 17 web tools, we developed a network of FRG1 and the top correlated genes to find the known interactions at various levels. From the cancer type-specific pathways, a common pathway was identified. The effect of FRG1 expression on these common genes was validated via RT-PCR. Using DAVID (KEGG pathway) 18,19 and Gene Ontology (GO) 20 we performed gene functional enrichment analysis to identify potential biological processes, molecular function and signaling pathways involved in different cancer types.
Data sources and processing. Co-expression data of FRG1 was obtained from cBioPortal (accessed on 20 Aug 2020). In cBioPortal for each cancer type "TCGA, Firehose Legacy" database was selected for correlation analysis. In co-expression tab FRG1 correlated genes were searched using data for mRNA expression (RNA Seq V2 RSEM).
For survival analysis, data of expression profiles along with clinical data was downloaded from GDC Data Portal (up to 19 Dec 2020) for all the cancer types. Settings chosen to download the data from GDC TCGA were as follows; Data Category-Transcriptome Profiling, Data type-Gene Expression Quantification, Experimental Strategy-RNA-Seq, Primary site-Cancer Type, Program-TCGA, Workflow Types-Htseq-FPKM-UQ.

Overall survival analysis for single gene.
To test the effect of FRG1 expression on the survivability of patients in our chosen cancer types, we performed the Kaplan Meir survival analysis using the TCGA data downloaded from GDC. We did the analysis in R using the "survival" and "survminer" libraries. To determine the optimal cut-off point of FRG1 expression for KM plot, we used the surv_cutpoint() function and plotted using plot() ( Supplementary Fig. S1). KM plot were made using ggsurvplot() function.
The Kaplan-Meier plotter (KM-Plotter) 21 was used to validate the effect of FRG1 mRNA expression on OS. In the KM plotter, under "Start KM Plotter for pan-cancer" tab, dataset for specific cancer was selected. At "Gene symbol" FRG1 was given as input. Patients were split according to the cutoffs used in the TCGA data analysis.
Survival analysis and identification of prognostic genes. We analyzed the correlation between OS time and gene expression by using stratified multivariate Cox regression. The model was optimized by removing the least correlated genes until the FRG1 remained significant. A risk score was calculated for each patient based on the following equation, where n was the number of prognostic genes, exp i the expression value of gene i, and β i the regression coefficient of gene i in the Cox regression analysis. Patients were classified into high-and low-risk groups, using the median risk score as a cutoff value. Box plots were generated to compare the FRG1 mRNA expression level between the low and high-risk groups. The Log-Rank test was used to determine the statistical significance of the difference in OS between the two groups. Pathway analysis. Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) is a biological database and web resource of known and predicted protein-protein interactions (PPI). For each cancer type, a model was created using STRING PPI network data. In the model, each node is represented by a protein and, edges show physical interaction between the two proteins. The missing links between FRG1 and co-expressed genes were found using Human Integrated Protein-Protein Interaction rEference (HIPPIE), which is based on the earlier reports of FRG1 interacting proteins. Cytoscape (Version: 3.8.2) was used to visualize the networks and to find the intersection of all the pathways using the merge tool 22 . We did GO Enrichment analysis for the identification of enriched biological processes and molecular functions 20 . Database for Annotation, Visualization, and Integrated Discovery (DAVID) 19 was used to find out the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways for the set of FRG1 correlated genes in different cancer types 18 .   qRT-PCR. Total RNA was extracted using the RNeasy Mini kit (Qiagen, Germany). RNA was quantified using the NanoDrop 2000 spectrophotometer (Thermo Scientific, USA). RNA was converted into cDNA using oligo dT primer and random hexamer (in 1:3 ratio) (Verso cDNA Synthesis Kit). RT-PCR primers were designed for FRG1, HPF1, RPL34, and EXOSC9 using Primer-BLAST 23 (Supplementary Table S1). SYBR™ Green PCR Master Mix (Applied Biosystems™, Thermo Scientific, USA) was used to perform qRT-PCR in QuantStudio™ 3 Real-Time PCR System (Applied Biosystems™, Thermo Scientific, USA). The experiment was performed in triplicate for each sample and GAPDH was used as an internal control.
Statistics. For multigene model-based OS analysis multivariate Cox regression analysis was performed using SPSS (version 26) 24 . Risk score (generated via multigene model) based OS analysis was performed using Kaplan Meir analysis in SPSS. A log-rank test was used to find the statistical significance of the difference in survival between the groups. The prognostic value of the risk score was measured using a time-dependent receiver operating characteristic (ROC) curve in SPSS. Mean values were compared using Student's t-test (two-tailed, unpaired). For all the tests performed, a p-value ≤ 0.05 was considered significant.

Effect of FRG1 alone on survival in different cancer types. Kaplan-Meir survival analysis was per-
formed to determine the effect of FRG1 mRNA expression on the OS across the seven most frequent cancer types. There was a highly significant difference in the survival probability between high and low FRG1 expression groups in cervix, stomach, and prostate cancers (Fig. 2). In liver cancer, the difference in survivability was marginally significant. Although the trend was there yet, the difference was not significant in breast, lung, and colorectal cancers. We used KM plotter data (available for breast cancer, lung adenocarcinoma, cervical squamous cell carcinoma, stomach adenocarcinoma, liver hepatocellular carcinoma, and rectum adenocarcinoma) for the validation. We observed a similar trend of the effect of FRG1 mRNA expression level on the OS as in the first set ( Supplementary Fig. S2). Overall, the data suggest that FRG1 affects the survival in cancers but the extent www.nature.com/scientificreports/ of the effect is tissue specific. Analysis of FRG1 expression alone may not be enough to explain the contribution of other genes, which are affected by FRG1 directly or indirectly. Therefore, we did multigene model-based analysis in breast, lung, colorectal, and liver cancers to get a clear idea about the effect of FRG1 mRNA expression on OS.
High FRG1 expression is associated with a good prognosis in the multigene model. To determine the contribution of FRG1 on survival, the effect correlated genes was neutralized using the multivariate Cox regression model. Sub-sections below describe the multigene model for each cancer type.

Effect of FRG1 and correlated genes on survival in breast cancer. In breast carcinoma, initially,
we entered the top 20 FRG1 correlated genes (r s ≥ 0.353) (Supplementary Table S2) to generate the multivariate cox regression model in the TCGA-BRCA (The Cancer Genome Atlas Breast Invasive Carcinoma) dataset. Sequentially the least correlated genes were removed from the model until the FRG1 showed a maximum level of significant association with survival ( Table 1). The hazard ratio of FRG1 was 0.133 (95% CI 0.029-0.599, p = 0.009) for breast cancer patient's death.
To analyze the combined effect of FRG1 and the correlated genes (genes present in the final model) on the OS, for each breast cancer patient risk score was calculated. The patients were stratified into low-risk (n = 612) and high-risk (n = 611) groups based on the median risk score value. A significant (p = 2.45E-13) difference in OS was observed between the groups (Fig. 3A). The AUC (area under the ROC Curve) for this risk model was 0.645 ( Supplementary Fig. S3). In Time-dependent ROC curve analysis the value of AUC above 0.5 indicates a good prognostic performance based on risk factor in predicting the overall survival. There was significantly higher (p < 0.0001) FRG1 mRNA expression in the low-risk group compared to the high-risk group (Fig. 3B).  Table S2) and FRG1 were added in multivariate cox regression model using TCGA-MESO (The Cancer Genome Atlas Mesothelioma), TCGA-LUAD (The Cancer Genome Atlas Lung Adenocarcinoma) and TCGA-LUSC (The Cancer Genome Atlas Lung Squamous Cell Carcinoma) datasets. To investigate the prognostic effect of FRG1 on lung carcinoma patients, we applied the same strategy as described above. The final model had 17 genes where the hazard ratio of FRG1 was 0.235 (95% CI 0.074-0.742, p = 0.014) for lung cancer patient's death ( Table 2).

Effect of FRG1 and correlated genes on survival in
All the patients were stratified into low-risk (n = 559) and high-risk (n = 572) groups based on the median value of the risk score. The AUC for this risk model was 0.569 ( Supplementary Fig. S3). A significant difference (p = 1.0E−6) in OS was observed between the groups (Fig. 4A). There was significantly (p < 0.0001) high FRG1 expression in the low-risk group compared to the high-risk group (Fig. 4B). Next, to determine the effect in the multigene model, the patients were divided into the low-risk (n = 306) and high-risk groups (n = 305), based on the median risk score. The AUC for this risk model was 0.604 (Supplementary Fig. S3). A significant (p = 0.0001) difference in OS was observed between the two groups (Fig. 5A). Comparison of FRG1 mRNA expression between the high-risk and low-risk groups showed significantly (p < 0.0001) higher expression in the low-risk group (Fig. 5B).   (Table 3) where the hazard ratio of FRG1 was 0.18 (95% CI 0.034-0.948, p = 0.043) for liver cancer patient's death. Next, to determine the effect in the multigene model, the patients were divided into the low-risk group (n = 231) and high-risk group (n = 231) based on the median risk score. The AUC for this risk model was 0.616 ( Supplementary Fig. S3). A significant (p = 0.0001) difference in OS was observed between the two groups (Fig. 6A). Comparison of FRG1 mRNA expression between the high-risk group and low-risk group (Fig. 6B) showed significantly (p < 0.0001) higher expression in the low-risk group.    www.nature.com/scientificreports/ a significant decrease in the FRG1 expression (Fig. 7). From the top 20 genes correlated with FRG1 across cancer types, we found that three genes (HPF1, RPL34, and EXOSC9) were common. We hypothesized that these genes could be part of pathway/pathways in which FRG1 has a role and could affect their expression. To validate this, the expression level of these three genes was analyzed in response to FRG1 depletion in the HEK293T cell line by quantitative real-time PCR. We observed that knockdown of FRG1 led to a significant decrease in expression of HPF1 (0.68-fold, p-value = 0.011), RPL34 (0.65-fold, p-value = 0.025) and EXOSC9 (0.54-fold, p-value = 0.012) (Fig. 7). These findings suggest the effect of FRG1 in transcriptional regulation of HPF1, RPL34, and EXOSC9, which could be direct or indirect.

FRG1 may have role in multiple pathways.
To figure out the pathway/s where FRG1 may have a role we used genes that correlate with FRG1 expression and the genes that interact with FRG1 (HIPPIE database) as input in the STRING database. Individual networks for each cancer type are shown in Fig. 8. After that all the networks were merged and the intersection was obtained using the Merge tool of Cytoscape, giving us the most   www.nature.com/scientificreports/ common pathway (Fig. 8). The merged pathway had 17 nodes (MEPCE, LARP7, SUMO2, UBE2O, HECW2, RBPMS, JUN, ESR2, SART3, EXOSC8, FRG1, PARP2, C4orf27 (HPF1), EFTUD2, SNRPD3, CWC22, and AQR) and 21 edges. Functional enrichment analysis showed GO terms RNA binding, snRNA binding, and nucleic acid binding to be most frequent in molecular functional (Fig. 9). In biological processes ( Supplementary Fig. S4) we observed GO terms such as metabolic process, RNA metabolic process, and mRNA metabolic process to be the most frequent. We identified Ribosome KEGG pathway to be common in different cancer types (Supplementary  Table S4). Other RNA related pathways, namely spliceosome and RNA degradation pathways were also identified in lung cancer.

Discussion
FRG1 protein is part of human spliceosomal complex C 25 . Earlier studies primarily focused on the role of FRG1 in FSHD. However, a few studies have demonstrated the role of FRG1 in tooth germ development and angiogenesis 6,26 . Our previous research showed reduction in FRG1 protein expression in gastric cancer, colon cancer, and oral cavity cancer tissues by IHC analysis. Change in FRG1 mRNA expression affected migration and angiogenic potential of HEK293T and HUVECs, respectively. FRG1 expression perturbation in HEK293T  Our another study clearly showed the protective role of FRG1 in prostate cancer 7 . FRG1 expression was reduced in prostate tumor tissues compared to normal tissue. Depletion of FRG1 led to increased tumorigenic properties in prostate cancer cell lines and activation of p38 MAPK (mitogen-activated protein kinase) signaling. FRG1 expression affected levels of GM-CSF (Granulocyte Macrophage colony stimulating factor), PLGF (Placental Growth Factor), PDGFA (Platelet Derived Growth Factor A) and CXCL1 (Chemokine (C-X-C motif) ligand 1), which are well known for their effect on tumor progression, chemotaxis, migration and invasion [29][30][31][32][33][34] . Being part of the spliceosomal C complex, FRG1's downregulation might lead to instability and disruption in downstream processes affecting the normal mRNA levels. In concordance, recent studies have shown that the expression of splicing factors is frequently deregulated in different cancer types 35 . Role of FRG1 in survival of cancer patients is not clearly understood. There aren't many studies focusing on FRG1, hence we wanted to perform a comprehensive study in multiple cancer types to elucidate a concreate role of FRG1 in predicating the OS of cancer patients. We first elucidated the role of FRG1 alone in the OS in multiple cancer types. High FRG1 mRNA expression correlated with better survival in the cervix and gastric cancer patients. In cancer types such as breast, lung, and liver, the difference in FRG1 expression level did not affect OS significantly. We observed that the patients with low FRG1 mRNA expression were more frequent in the cervix and gastric cancers. On the contrary, just the opposite trend was observed in liver, colorectal, lung, and prostate cancers. In breast cancer, distribution was approximately equal. Expression of genes can correlate if one of them regulates the transcription of another, directly or indirectly. Upstream regulator genes may have mutation/s, resulting in the masking of independent effects of mutation/s in the downstream target. We used multigene models to nullify the influence of other genes on OS that correlate with FRG1. As expected, we observed a clear effect of FRG1 expression in breast, lung, and liver cancers also after multivariate cox regression analysis.
Segregation of the patients based on the risk score (calculated based on the multigene model) showed that low-risk patients had better OS than high-risk patients. We also observed that low-risk patients had high FRG1 levels, which confirms the role of FRG1 mRNA expression in survival. The earlier studies support our observation directly, where increased FRG1 expression affected in-vitro cell migration, invasion, and angiogenesis inversely 6 .
To further elucidate the molecular mechanism of the role of FRG1 in cells, we generated pathways. Our final model (Fig. 10) shows four types of functions where FRG1 might be involved, namely pre-mRNA processing (CWC22), mitochondrial functioning (MRPS18C, MRPL1, MRPL54, and NDUFC1), ribosomal functioning (RPL34, RPL24), and in DNA damage/repair pathway (HPF1, PARP1, SUMO2). FRG1 with CWC22 interact 36 ,  www.nature.com/scientificreports/ and they both are also part of the spliceosomal C complex 25 . Deregulation of these genes may have a direct effect on spliceosome complex functioning. Previous literature has shown the importance of CWC22 in pre-mRNA splicing 37 . CWC22 expression levels were associated with colon cancer and its silencing led to increased p53 levels 38,39 . SNRPD3 is also part of the spliceosome complex 40 . It has been found to have a regulatory effect on p53 expression in non-small cell lung cancer. It also has a role in triple-negative breast cancer cell proliferation 38,41 . In our model we found EXOSC9 to be highly correlated with FRG1 in multiple cancer types. Protein-protein interaction between FRG1 and EXOSC8 has been observed in previous studies 42 . EXOSC8 and EXOSC9 (both present in our model) are non-catalytic parts of the RNA exosome complex 43 . EXOSC8 and EXOSC9 are associated with many diseases 44,45 , but their role in cancer has recently been uncovered. EXOSC8 promoted tumor and cancer cell growth in colorectal carcinoma 46 . Reduction in EXOSC9 was associated with reducing of p-body formation in cancer cells 47 . From all these studies, we can infer that FRG1 along with EXOSC8 and EXOSC9 might play a major role in controlling RNA processing and, its depletion can affect functional RNAs. Our Functional enrichment analysis results also suggest FRG1 may be involved in RNA related biological processes and molecular functions. Another very interesting observation was the mitochondria-related genes in our model. Mitochondrial ribosomal proteins (MRPS18C, MRPL1, and MRPL54) and NDUFC1, which is a component of the mitochondrial complex 1, are related to FRG1. MRPS18C is downregulated in esophageal cancer 48 . MRPL1 is a part of the gene signature for low-grade gliomas prognosis 49 . In malignant mesothelioma (MM) and lung cancer MRPL1 was mutated 50 . Similarly, in HCC, high expression of MRPL54 was associated with better survival 51 . NDUFC1 may affect the production of ROS, which has been observed in many cancer types 52 . Similarly, RPL24 and RPL34 that are part of the cytoplasmic ribosomal complex, can affect protein production. Alteration in RPL34 expression affects non-small cell lung cancer cell proliferation 53,54 . Depletion of RPL24 inhibits cancer cell growth, which makes RPL24 a potential therapeutic target 55 . Another gene in our model, RBPMS interacts with FRG1 at protein level 56 . RBPMS has been shown as a coactivator of transcriptional activity of many genes 57 . Multiple myeloma shows drug resistance when RBPMS is silenced 58 . These observations suggest that FRG1 might control the protein synthesis as well.
FRG1 is also related to the DNA repair pathway (HPF1, PARP1, and SUMO2). HPF1 protects the DNA from damage by limiting the hyper auto modification of PARP1 required for repair 59 . PARP2 shows a direct protein-protein interaction with FRG1, but its function is unknown. SUMO2, which plays an important role in post-translational modification and affects multiple cellular processes, including DNA repair and replication 60,61 , www.nature.com/scientificreports/ has also been implicated in cancers 62,63 . FRG1 may affect DNA repair by acting as a transcriptional regulator of these genes. Overall, our analysis indicates two possibilities about FRG1's role, first being a part of the spliceosome complex and the other is by acting as a transcriptional regulator of other genes involved in various functions. To check the latter possibility, we performed qRT-PCR and found that FRG1 knockdown led to a reduction in expression levels of HPF1, EXOSC9, and RPL34. Further in-depth experiments are needed to figure out the exact role of FRG1 in tumorigenesis via the first possibility. FRG1's role as transcriptional activator or repressor may be assessed by identifying its direct binding to the promoter region of the putative target gene (EMSA, ChIP). Integrity of spliceosome complex and rate of transcription can be checked after knock out or knock down of FRG1. Immunoprecipitation assay can confirm the protien-protien interaction with spliceosome components. This study has additional limitations, such as more number of genes can be incorporated in our model. We chose the top seven cancer types with the highest incident rates; studies in other cancers can give a more in-depth understanding of the FRG1 pathway.
In conclusion, this study has clearly shown the role of FRG1 in predicting the survivability of cancer patients. The higher expression of the FRG1 gene has a protective effect. The use of the multigene models can be helpful in elucidating the effect of a specific gene in a biologically complex background.