Six novel immunoglobulin genes as biomarkers for better prognosis in triple-negative breast cancer by gene co-expression network analysis

Gene co-expression network analysis (GCNA) can detect alterations in regulatory activities in case/control comparisons. We propose a framework to detect novel genes and networks for predicting breast cancer recurrence. Thirty-four prognosis candidate genes were selected based on a literature review. Four Gene Expression Omnibus Series (GSE) microarray datasets (n = 920) were used to create gene co-expression networks based on these candidates. We applied the framework to four comparison groups according to node (+/−) and recurrence (+/−). We identified a sub-network containing two candidate genes (LST1 and IGHM) and six novel genes (IGHA1, IGHD, IGHG1, IGHG3, IGLC2, and IGLJ3) related to B cell-specific immunoglobulin. These novel genes were correlated with recurrence under the control of node status and were found to function as tumor suppressors; higher mRNA expression indicated a lower risk of recurrence (hazard ratio, HR = 0.87, p = 0.001). We created an immune index score by performing principle component analysis and divided the genes into low and high groups. This discrete index significantly predicted relapse-free survival (RFS) (high: HR = 0.77, p = 0.019; low: control). Public tool KM Plotter and TCGA-BRCA gene expression data were used to validate. We confirmed these genes are correlated with RFS and distal metastasis-free survival (DMFS) in triple-negative breast cancer (TNBC) and general breast cancer.


Results
We made comparisons between groups using GCNA with r > 0.9 and edge limit = 1. Comparison networks between cases of recurrence and no recurrence were generated, and UBE2C, MCM6 and IGHG1 were found to be highly differentially co-expressed genes. (Table 1 and Fig. 1) Common genes in the two networks were IGHA1, IGHD, IGHG3, IGLC2, and IGLJ3. Highly co-expressed genes in each of the four comparison groups classified by node (+/−) and recurrence (+/−) are shown in Table 2 and Fig. 1. Regardless of node status, highly co-expressed genes within the network of no recurrence were IGHA1, IGHD, IGHG1, IGHG3, IGLC2, and IGLJ3. Cox proportional hazard ratio regression analysis found these genes to be significantly correlated with the recurrence of BC, regardless of node status ( Table 3, Fig. 1). Logistic regression analysis revealed a significant correlation with node status, with an odds ratio (OR) range of 2.3-23 (p < 0.001). These six highly co-expressed genes for LST1 and IGHM belong to a cluster and are related to immune function (Fig. 1).
Recently, studies have found that robust levels of tumor-infiltrating lymphocytes (TILs) are associated with increased disease-free survival (DFS) and overall survival (OS) rates in triple-negative breast cancer (TNBC) patients with and without treatment. There have also been efforts to develop a standardized methodology for evaluating TILs 19 . Their presence at diagnosis is associated with a pathologic response to neoadjuvant therapy as well as increased DFS and OS following adjuvant chemotherapy in certain subtypes 20,21 .
We speculate that highly co-expressed immune-related genes can be used for the prognosis and treatment of general BC as well as TNBC, and we have established an immune response index to explore the relationships between specific genes and BC recurrence. Because these six genes were highly correlated, they were replaced with a component score by using principle component analysis (PCA). This score had a value range of −1.93~ 1.83 (mean = 0, sd = 1). Cox proportional hazard ratio regression analysis (under the control of node status) was also performed to investigate the impact of the component score on recurrence. It was found that the risk of recurrence was reduced by approximately 13% (HR = 0.87, p = 0.014) for each additional unit of the component score (Table 4). In addition, to divide the samples into high and low immune index groups, we used the 40th percentile (value: −0.5) of the component score as the group's cut-off point. Using the low group as the control, the immune index effectively distinguished recurrence status, with a hazard ratio (HR) of 0.774 (p = 0.019). (Table 4, Fig. 2).
Since TNBC is an aggressive subtype and difficult to treat, we wanted to know whether these immune-related genes are correlated by using KM Plotter online cancer survival analysis tool (http://kmplot.com/analysis/) 22 . The TNBC validation data sets consist of 255 RFS and 43 DMFS cases. All immunoglobulin-related genes were significantly associated with RFS and DMFS with the exception that IGHD was not related to RFS (Table 5).
In order to validate these genes in a larger BC data set, we used TCGA-BRCA gene expression data sets representing 1,215 tumors ( Supplementary Fig. 1). DMFS (n = 68) and RFS (TNBC (+): n = 131; TNBC (−): n = 637) samples were selected and stratified by TNBC status (negative/positive) (Supplementary Table 1). All of these BC patients had received initial treatment. Because there are no corresponding gene symbols for our six immunoglobulin-related genes in TCGA-BRCA microarrays, we validated all related immunoglobulin genes: IGLL3, IGLL1, IGSF9B, IGDCC3, IGDCC4, IGBP1, IGSF5, IGSF11, IGSF22, IGSF21, IGHMBP2, IGSF10, IGSF8,  IGSF9, IGSF6, IGSF1, IGSF3, IGFN1, and IGJ. Stage, TNM stage, PR status and node status were significantly associated with RFS and DMFS in the univariable Cox proportional-hazards regression models (Supplementary Table 2). The clinical variables were analyzed with the immunoglobulin-related genes by using multivariable  www.nature.com/scientificreports www.nature.com/scientificreports/ Cox proportional-hazards regression. Only node status was found to be significantly related to recurrence with biomarkers in the multivariable Cox proportional-hazards regression models ( Table 6). The results showed that IGDCC3, IGJ and IGSF9B were significantly associated with RFS and DMFS; IGSF3 was significantly associated with RFS; and IGSF22, IGSF6 and IGSF9 were significantly associated with DMFS in BCs (Table 6).

Discussion
We showed that without prior information, comparison of co-expression networks between case and control groups can confirm and reveal novel disease mechanisms using a systems approach. In addition, we found that co-expression networks estimated from integrated publicly available genomic studies provide more accurate and robust results than those from a single study 18 .
We found 34 candidate genes related to BC recurrence from six studies 23-28 that identified marker genes for BC prognosis. A GCN was established based on these 34 candidates, and eight sub-networks related to immune function were found using GCNA, which consisted of two candidate genes, LST1 and IGHM, and six co-expressed genes, IGHA1, IGHD, IGHG1, IGHG3, IGLC2, and IGLJ3. Studies have found the functional pathways of significant recurrent genes in BC to be associated with the immune response and sensitivity to drugs 2 indicating that the immune-related genes identified in this study may also be related to drug sensitivity. Gene function annotation was performed by using DAVID (https://david.ncifcrf.gov/home.jsp). Although a corresponding function for IGLJ3 was not identified, the Gene Ontology (GO) terms of the other five novel genes include "the immunoglobulin complex" and "circulation". These biological processes are positive regulators of B cell activation, phagocytosis recognition, engulfment, and B cell receptor signaling. B cells infiltrating a patient's BC and B cells present in the tumor-draining lymph node are clonally and functionally related. Heavy and light chains selected for tumor binding from the BC and tumor-draining lymph node (TDLN) libraries indicate a physiologic relationship that may be important to the tumor-specific immune response 29 .
Inflammatory cells and their mediators are important constituents of the tumor microenvironment, and they can affect the prognosis of various cancers, including BC 31,32 . Gene expression of immunoglobulin normally associates with lineage fidelity in B lymphocytes 33 . Growing evidence indicates that immunoglobulins are produced by mature B lymphocytes, plasma cells and BCs 34 .
The six immunoglobulin-related genes that we examined have not previously been identified as having roles in BC, but widespread evidence has shown that immunoglobulin-related genes are effective diagnostic and prognostic biomarkers for BC 31,[34][35][36][37][38][39] . Recently, many immunoglobulin superfamily (IgSF) genes were found to serve as effective prognostic biomarkers for BC 36 . In addition, immunoglobulin free light chains (FLCs) were identified as ligands in the pro-tumorigenic activation of mast cells. FLCs may be helpful in the diagnosis and prognosis of BC 31 .
The stromal immunoglobulin kappa chain (IGKC) has been validated as an immunological biomarker of prognosis and response to therapy in BC 38,39 . Immunoglobulin gamma heavy-chain marker and kappa light-chain marker allotypes are associated with humoral immunity to HER-2, a finding with potential implications for BC immunotherapy 40 . All of the evidence suggests that our six immunoglobulin-related genes have potential for use in prognosis prediction and targeted therapy.
TNBC is an aggressive disease without established targeted treatment options for patients. It represents a major challenge, and there is an urgent need for new therapeutic targets 41,42 . We sought to determine whether the immunoglobulin-related genes are associated with RFS and distal metastasis-free survival (DMFS) in TNBC samples. After validate in KM Plotter online cancer survival analysis tool 22 , we found that all immunoglobulin-related www.nature.com/scientificreports www.nature.com/scientificreports/ genes were significantly associated with RFS and DMFS with the exception that IGHD was not related to RFS (Table 5). Further validation was conducted using TCGA-BRCA gene expression data sets (Supplementary Table 1). Though there are no corresponding gene symbols for our six immunoglobulin-related genes, we still found that related immunoglobulin genes are associated with the RFS and DMFS in BC or TNBC. In summary, the results indicate that immunoglobulin-related genes play significant roles in RFS and DMFS both in general BC and TNBC. This has implications for targeted therapy for TNBC.
TILs are reported to be positively associated with improved survival 21 , particularly in TNBC, but they can also aid in the prediction of responses to neoadjuvant and adjuvant chemotherapy treatments. There have been increasing efforts to target the immune system as part of BC therapy, primarily in patients with TNBC 19 . Accordingly, we established an immune index score system with six immune-related genes. This score is a protective indicator for the recurrence of BC: as the score increases, the risk of recurrence decreases. This index may be used as a TIL-related indicator and a TNBC treatment marker in the future.
To the best of our knowledge, this is the first study showing that the immunoglobulin-related genes IGHA1, IGHD, IGHG1, IGHG3, IGLC2, and IGLJ3 serve as suppressor genes in the recurrence of general BC and TNBC patients. The validation results from the public tool KM Plotter and TGCA-BRCA confirmed their significant roles in DMFS and RFS of general BC or TNBC. Our results also show that the analysis workflow of GCNA can effectively and efficiently detect novel prognostic biomarkers of BC. These six immunoglobulin genes are warrant further study of their roles in TNBC and we are working on verifying their function in cell lines.

RFS DMFS
(h) IGHG3  www.nature.com/scientificreports www.nature.com/scientificreports/ The co-expression networks were established as follows: (1) Width of the gene connection: indicates the degree of correlation between genes by the method of Spearman correlation; a thicker connection line indicates a greater degree of correlation, and the weight of the correlation between two genes is shown by clicking the connection line. (2) The number of gene connections: to simplify the co-expression networks, the number of edges for each gene was limited to one, and the selection started from the gene with the highest correlation coefficient. (3) Colors of the gene icons and connecting lines: These denote similar gene expression patterns for genes in the same color. The R function "hclust" was used with the method set to "spearman" and "complete"; the tangent point was set to be the same cluster when the kinship distance was 1/1.5 of all lengths, and the cluster results are illustrated using the same color connecting line as in the gene networks. Connection lines in green denote neighboring genes that do not belong to the same cluster. (4) Size of the gene icon: the coefficient of variation (cv) of each gene was calculated, and the absolute value of the cv was used to represent the size of the dot; as the cv increases in value, the variation in mRNA gene expression also increases. (5) Shape of the gene icons: the 34 candidate genes are represented by diamonds; co-expressed genes are represented by circles, and significant recurrence associated co-expressed genes are represented by stars (univariable Cox proportional hazard ratio regression test, p < 0.05). (6) Frame color of gene icons: correlation between the mRNA gene expression of each gene and recurrence was analyzed by univariable Cox proportional hazard ratio regression; if 0.01 ≤ p < 0.05, then the icon frame is shown in red, and if p < 0.01, then the icon frame is shown in yellow. (7) Style of the gene icon frame: up-regulated genes are shown with dashed lines, whereas down-regulated genes are shown by solid lines.

Conclusions
We identified and validated six genes related to immune function as potential biomarkers of recurrence for both general breast cancer and TNBC. Our results suggest that GCNA can effectively and efficiently detect novel prognostic biomarkers of breast cancer.

Data Availability
All the GEO dataset are available in NCBI GEO data base (https://www.ncbi.nlm.nih.gov/geo/).