Osteosarcoma (OS) is a primary bone malignant tumor that most commonly affects children, adolescents, and young adults, and it also exhibits a predilection to occur in the metaphysis of long bones, and most commonly occurs in the distal femur (43%), proximal tibia (23%), or humerus (10%)1. Additionally, osteosarcoma is aggressive and often metastasizes to the lungs2. In the past 10 years, the incidence of OS has been annually increasing by 0.3%3, and it has been consistently ranked as the second deadliest cancer in adolescents and children4. Despite advances in multimodal therapy, the 5-year survival of osteosarcoma is approximately 60% to 70%, which has remained stagnant over the past three decades, patients with distant metastases still fare poorly, as the 5-year survival rate in these patients does not exceed 20%5,6. In addition, patients with the same clinical or pathological conditions receiving the same treatment regimen may have different clinical outcomes, due to their genetic heterogeneity7. Therefore, in-depth exploration of the molecular mechanisms behind the development of OS is crucial to finding effective prognostic biomarkers to guide patient risk stratification, which aligns with the concept advocated by precision medicine.

In recent years, biomolecules and risk models have been used to evaluate the prognosis of OS8,9,10,11. However, they have not yet been used in clinical practice because of unavoidable limitations, such as overfitting due to small samples12. In recent decades, increasing evidence has indicated that the immune response is actively involved in OS occurrence and progression13. Immune genes act as pivotal regulator of immune response14,15. They maintain the body's self-tolerance by strictly regulating the immune function and reducing the damage inflicted on the surrounding tissues16. However, OS cell may use these immune genes to escape the immune system and achieve a favorable environment for their growth13,17. Given the critical role of immune molecules in OS prognosis, these immune-related genes (IRGs) deserve further study.

Here, were identified differentially expressed IRGs (DEIRGs) in OS and normal muscle tissue samples. Subsequently, an IRGs signature that can predict outcome of OS was constructed by using univariate Cox regression and iterative LASSO Cox regression analysis of DEIRGs. In addition, based on an independent cohort, the accuracy of IRGs signature in predicting the prognosis of OS patients was verified. Finally, we also evaluated the independence, repeatability, and clinical value of the IRGs signature in different clinical subgroups. Our results reveal the prognostic value of IRGs signature and provide promising prognostic indicator for OS.


GO and KEGG pathways enrichment analysis of DEIRGs

All 604 DEIRGs were screened. GO analysis results showed that DEIRGs are involved in biological functions such as cell chemotaxis, leukocyte adhesion, and innate immune regulation. They were also determined to participate in cellular components such as the external side of the plasma membrane, MHC protein complexes, endoplasmic reticulum membranes, and phagocytic vesicles. Additionally, they are also found to be involved in molecular functions including receptor-ligand, cytokine, steroid receptor, and nuclear receptor activity (Fig. 1A). The KEGG analysis indicated that DEIRGs were mainly enriched in the following signaling pathways: chemokines, PI3K/AKT, MAPK, JAK-STAT, and natural killer (NK) cell-mediated cytotoxicity signaling (Fig. 1B).

Figure 1
figure 1

GO and KEGG analysis of DEIRGs. (A) GO enrichment analysis of DEIRGs. The color of the bar indicates p.adjust: the redder the color, the smaller the p.adjust value; the bluer the color, the larger the p.adjust value. The horizontal axis represents the number of DEIRGs under the GO term. (B) KEGG pathways enrichment analysis of DEIRGs. Significant gene (p.adjust < 0.05) enrichment to the 10 most important paths. p.adjust: adjusted P-value; BP: biological process; CC: cellular component; MF: molecular function.

PPI network construction, hub IRGs screening, and functional similarity analysis of DEIRGs

The result of these analyses were shown in Fig. 2A. CASP3, TNFRSF10B, and HSP90 had a larger weight and a stronger correlation in the PPI network. Ten hub IRGs were obtained, namely CXCR4, CCR5, CXCL16, CCL5, CXCL12, CXCL10, CXCR3, OPRL1, S1PR1, and GAL (Fig. 2B, Table 1). To further recognize the closeness of the interactions between hub IRGs, which were ranked according to average functional similarity, as indicated by the results, CCR5, CXCL12, CXCR4, SIPR1, and CXCR4 were found to be hub genes with cut-off values greater than 0.55, and CCR5, CXCL12, and CXCR4 were the most closely related genes (Fig. 2C).

Figure 2
figure 2

Protein–protein interaction (PPI), hub IRGs, and functional similarity analysis of DEIRGs. (A) PPI network. The size of a node represents the clustering coefficient, the color indicates the degree, the width of the line indicates the score; the color of the line represents co-expression. (B) Hub IRGs. The hub IRGs were the top 10 DEIRGs scored by the maximum correlation coefficient. (C) Functional similarities of 10 hub IRGs. The boxes indicate the middle 50% of the similarities; the upper and lower boundaries represent the 75th and 25th percentiles. The two ends of the line represent the maximum and minimum values. The dashed line represents the cut-off value of similarity.

Table 1 Functions of 10 hub IRGs.

Identification and assessment of the prognostic signature

To identify the optimal prognostic signature of OS based on IRGs, 82 prognostic-associated IRGs were identified by univariate Cox regression analysis of DEIRGs. Further, we identified the optimal prognostic signature that consisted of 13 prognosis-associated IRGs via the iterative LASSO Cox regression analysis (Fig. 3A, Table 2). ROC curve results showed that the accuracy of this signature in diagnosing OS prognosis was high (Fig. 3B, AUC = 0.918). The Kaplan–Meier curve indicated that the overall survival of patients in the high-risk group was markedly worse than that in the low-risk group (Fig. 3C, p < 0.001). According to the optimal signature, we obtained the risk score distribution (Fig. 4A), the survival status (Fig. 4B), and the expression characteristics of the immune genes of OS (Fig. 4C). Compared to the low-risk group, the high-risk group had more deaths. In addition, the expression levels of GNRH1, VEGFA, TNFRSF11B, GAL, STC2, BRAF, BMP8A, and CORT were higher in the high-risk group, whereas patients in the low-risk group expressed higher levels of PSMD10, TNFRSF21, GRN, VAV1, and SDC3.

Figure 3
figure 3

Development and assessment of the prognostic signature. (A) Construction of the prognosis-associated IRGs signature. The horizontal axis represents the gene frequency and the vertical axis represents the AUC. (B) Time-dependent ROC curve for prognosis-associated DEIRGs signature. The horizontal axis indicates the FDR, and the vertical axis indicates the TPR. (C) Kaplan–Meier survival curves of overall survival from the high-risk and low-risk groups. The horizontal axis represents survival time (y), and the vertical axis represents the survival rate (%). ROC: relative operating characteristic curve; AUC: area under the curve. FDR: false positive rate; TPR: true positive rate.

Table 2 IRGs function in the prognostic signature.
Figure 4
figure 4

Prognostic analyses of high-risk and low-risk patients. (A) Risk score distribution of patients in the prognosis-associated IRGs signature. (B) Survival status scatter plots for patients in the prognosis-associated IRG signature. (C) Expression patterns of risk genes in the prognosis-associated IRG signature. Red means high expression, green means low expression. OS: overall survival.

Comparison of the IRGs signature with other known prognostic biomarkers and verification in independent cohort

To determine whether the IRGs signature has a better diagnostic capacity for OS patient survival, we conducted receiver operating characteristic (ROC) analysis of the IRG signature along with other known prognostic biomarkers (SP140, MALAT1, UCA1, and MIR191) in the training cohort. The results showed that the area under the curve (AUC) of the IRGs signature was increased compared to that for other known biomarkers (Fig. 5A), indicating that the IRG signature was a better prognostic biomarker and provided better stability and reliability in predicting the survival of OS patients. To further examine the prognostic value of the IRG signature, we conducted the ROC analysis in another independent cohort (GSE39055). The results showed that the AUCs were 0.92, 0.93, and 0.89 at 1, 3, and 5 years, respectively (Fig. 5B), suggesting that the IRG can also predict the survival of OS patients in other independent cohorts.

Figure 5
figure 5

Comparison of IRGs signature with other prognostic biomarkers and verification in an independent cohort. (A) Time-dependent ROC curve of IRGs signature compared to that other prognostic biomarkers. (B) The ROC curve of the IRGs signature predicting survival in an independent cohort.

Independence of the IRGs signature in survival prediction from clinicopathological factors

An important feature of a good prognostic biomarker is that it should be independent of clinicopathological prognostic factors. Clinicopathological characteristics, such as the patient's age, sex, and metastasis, are also considered to be the main factors that determine the prognosis of OS patients. To evaluate the independence and applicability of the IRGs signature, we regrouped patients according to different clinicopathological characteristics and performed Kaplan–Meier survival analysis. The Kaplan–Meier curve showed that regardless of sex, age, and metastasis, the survival time of OS patients in the low-risk group was significantly prolonged (p < 0.05, Fig. 6A–C). All of results indicated that the IRGs signature showed satisfactory applicability when grouping patients according to different clinicopathological characteristics. Univariate and multivariate COX regression also suggested that the signature is an independent indicator for predicting the prognosis of OS patients (Table 3).

Figure 6
figure 6

Kaplan–Meier curves of patients with OS in different clinical subgroups. (A) Kaplan–Meier curve for OS patients aged < 18 years and those aged ≥ 18 years. (B) Kaplan–Meier curve of male and female patients with OS. (C) Kaplan–Meier curve of metastatic and non-metastatic OS patients.

Table 3 Univariate and multivariate Cox regression models of the IRGs signature in predicting survival.

Relationship between the prognostic signature and clinical characteristics

The relationship between clinical characteristics, such as metastasis, age, grade, and the risk score based on the prognosis-associated IRGs signature, was analyzed to validate the accuracy of the prognostic signature further. The results showed that metastasis groups had a significantly higher risk score than non-metastasis groups (Fig. 7C, p = 0.001). However, no significant association was observed between age (Fig. 7A, p = 0.531), sex (Fig. 7B, p = 0.485), and risk score.

Figure 7
figure 7

Correlation analysis between prognostic signature and clinical characteristics. (A) Correlation between prognosis-associated IRGs signature and age. (B) Correlation between prognosis-associated IRGs signature and sex. (C) Correlation between prognostic-associated IRGs signature and metastasis. The boxes indicate the middle 50% of the similarities; the upper and lower boundaries indicate 75% and 25%. The two ends of the violins represent the maximum and minimum values. n: number of cases of OS.


OS is the most common bone malignancy in children and adolescents, and it is also one of the main causes of cancer-related deaths in this age group18. Evidence demonstrates that the immune response defines the tumor’s microenvironment. In particularly, immune cell disorders often cooccur with tumors and are considered an essential driver of OS development19,20. In this study, we analyzed the DEIRGs of the OS and control samples from TARGET and GTEx databases to identify new prognostic biomarkers by constructing a prognostic IRGs signature.

Related studies show that chemotaxis, adhesion of leukocyte, and innate immunity are dysfunctional in the OS microenvironment, thereby reducing the immune response to OS cells21,22,23. PI3K/AKT signaling pathway24, MAPK signaling pathway25 and JAK-STAT signaling pathway26 have been extensively studied in the OS. Furthermore, the activation of these signaling pathways is strongly linked to the growth and metastasis of OS cells. Although natural killer cell-mediated cytotoxicity is the host’s first-line anti-cancer defens27, the immune response is a seemingly double-edged sword in the OS microenvironment, as a dysregulated immune response is conducive to the occurrence and development of tumors.

In total, in our study, we obtained 604 DEIRGs. Of note, we identified 10 hub IRGs, namely CXCR3, CXCR4, CCR5, CCL5, CXCL10, CXCL12, CXCL16, OPRL1, S1PR1, and GAL. Among them, CXCR328, CXCR429, CCR530, CCL531, CXCL1632, CXCL1033, CXCL1234 and GAL35 have been widely studied in OS, and are involved in the occurrence, metastasis, and angiogenesis of OS. OPRL1 encodes proteins that are endogenous opioid-related neuropeptides and nociceptin/orphanin receptors, which plays a key role in pain perception and nociception36,37. The high expression of OPRL1 in OS may be related to cancerous pain. The coding product of the SIPR1 gene is a receptor protein that is similar to the G-protein-coupled receptor. When SIPR1 was combined with ligand S1P, the growth, invasion, and metastasis of lung cancer, ovarian cancer, and colon cancer are enhanced38,39,40. Hence, we can speculate that SIPR1 is pivotal in OS. Considering the similarity between molecular functions and cell components of hub IRGs, and through the ranking of semantic similarity, we discovered that CCR5, CXCL12 and CXCR4 are the most closely related genes. CCR5, CXCL12 and CXCR4 genes encode chemokine receptors or ligands, which plays a vital part role in the initiation and growth of OS29,41,42. These findings further support the reliability of our study.

Previous research has shown that IRGs are closely related to OS metastasis and prognosis43. For example, Koirala et al.44 found that immune cell infiltration and PD-L1 expression in the tumor microenvironment were independent risk factors for OS. Li Bo et al.45 reported that CXC12 acts as a driver in OS metastasis and immune response, and knocking down CXC12 could effectively inhibit OS progression. Moreover, IRGs signatures have attracted widespread attention and have been used to predict metastasis and prognosis of different tumors46,47,48. Therefore, in order to further explore the value of IRGs in OS prognosis, we constructed a prognostic signature consisting of 13 prognostic-associated DEIRGs, which has a high diagnostic prognostic efficacy. The high expression lever of GNRH149, BRAF50, PSMD1051 and VEGFA52 closely correlated with the growth, metastasis, and angiogenesis of OS. The high expression of GAL53,54, TNFRSF11B55 and STC256 are linked to prostate cancer and colorectal cancer development and a worse prognosis. The abnormally high expression of BMP8A is an independent factor for the progression and poor prognosis of thyroid carcinoma57. CORT is an endogenous cyclic neuropeptide that can regulate the growth and metastasis of lung cancer and thyroid cancer58,59, and it also regulates the inflammatory response by inhibiting the immune infiltration60. Granulin a (GRNA) is a 6 kDa peptide hydrolyzed from PGRN, which can effectively inhibit the growth and invasion of human hepatoma cells61. The high expression of VAV1 is a positive prognostic factor for early invasive breast cancer62. Zong et al.63 found that the overexpression of SDC3 can significantly inhibit the proliferation and metastasis of mesenchymal tumor cells. Wu et al.64 found that miR20a-5p promotes the proliferation, migration, and invasion of head and neck squamous cell carcinoma by down-regulating TNFRSF21. Another study found that TNFRSF21 also plays an important role in regulating leukocyte infiltration65. obviously, the results of our analysis are consistent with the results of previous studies, which further confirms that this signature has a high value for the prognosis of OS.

To date, a lot of OS prognostic molecules have been found, including MALAT19, UCA110 and miR19111. Most of these were based on single-gene prognosis studies. Existing studies have found that the occurrence and development of tumors are not caused by changes in single genes, but are the result of a series of gene changes66. In addition, the use of single genes cannot avoid the differences caused by individual heterogeneity. Most importantly, these studies did not use large samples to fully explore the relationship between genes and the prognosis of OS. In this study, 13 prognostic IRGs were identified by univariate cox regression and iterative LASSO cox regression analysis for the risk stratification of OS patients. Extensive analyses proved that this prognostic signature has a higher diagnostic value than pre-existing models. Recently, Shi et al.67 also constructed a prognostic signature that consisted of three DEGs (MYC, CPE, and LY86) in OS. However, the DEGs in their study came from metastatic and non-metastatic patients and lacked a normal control sample. Therefore, the gene included in the signature did not reflect the pathological characteristics of the occurrence and development of OS. Our signature was verified by an independent verification set, which has a high diagnostic efficiency compared to that with other biomarkers. However, our research also has some unavoidable limitations and deficiencies. First, in the study, we used normal muscle tissue as a control group. Therefore, compared with normal bone tissue, there may be a certain difference in the expression of IRGs. In addition, due to the lack of protein expression profile data for OS, we used gene expression profile data, which may not fully reflect the biological characteristics of OS. After all, protein is the executor of the function. Above all, there is still a lack of large sample data sets and clinical samples to verify the accuracy of the results of this analysis.


In summary, we developed an IRGs signature that is a prognostic indicator in OS patients, and further verified it in an independent cohort. Hence, the signature might serve as potential prognostic indicator to identify outcome of OS and facilitate personalized management of the high-risk patients.

Materials and methods

Data processing and screening

Therapeutically Applicable Research to Generate Effective Treatments database (TARGET; is an open database for childhood tumors that seeks to identify molecular changes in the occurrence and progression of pediatric cancer using an integrated genomic approach to assist researchers in developing effective treatments. The Genotype-Tissue Expression (GTEx, database68 provides transcriptome data of various normal human tissues. Gene Expression Omnibus database (GEO, is a gene expression database created and maintained by NCBI, which contains high-throughput gene expression data and gene chip expression data submitted by research institutions around the world. We downloaded the gene expression profiles and the corresponding clinical data of OS from the TARGET database, including 88 OS samples, and obtained the normal muscle tissue gene expression profile data set from the GTEx database as a control group, including 396 muscle tissue samples. Then we applied the R software (Version 3.3.3, sva package69 to merge the raw data (CEL files) of the two sets. Subsequently, we used the Limma package70 to screen DEGs between the OS tissue and normal muscle tissue. The cut-off value was | log2 fold change (log2FC) |> 1 and adj. p < 0.05. We downloaded and organized the IRGs list from the ImmPort ( database, selected DEIRGs from DEGs and used them for our analysis.

Functional correlation analysis of DEIRGs

GO is a tool for annotating genes and their products, which aid the integration and utilization of biological data71. KEGG is a database integrating genomics, chemistry, and system function information, which provides currently known biological metabolic signaling pathways72. The clusterProfiler package73 was used to perform GO and KEGG enrichment analysis on DEIRGs; p < 0.05 was used as a cut-off value for significant gene enrichment. The Search Tool for the Retrieval of Interacting Genes online tool (STRING,, Version: 11.0)74 and Cytoscape software75 were used to construct the PPI network for DEIRGs, and the hub IRGs were screened using the cytoHubba plug-in76. The hub IRGs selection criteria shortlisted the top 10 DEGs through the maximum correlation standard algorithm. Based on the semantic similarity of GO terms, GOSemSim package77 was used to compute closeness of the relationship between the molecular function and cell localization among 10 hub IRGs, and used the average functional similarity to rank the 10 hub IRGs78. The results were visualized by the ggplot2 package79.

Identification and assessment of the prognostic signature

To develop the optimal signature for predicting OS prognosis based on IRGs, we performed univariate Cox regression analysis on the obtained DEIRGs, and selected IRGs related to prognosis with a screening criterion of p < 0.05. Next, we used the glmnet ( package80 to perform a machine learning algorithm-iterative LASSO Cox regression analysis on prognostic-associated IRGs to construct the optimal prognosis signature. LASSO is highly dependent on seeds and requires cross-validation to select samples randomly. Once the seeds are replaced, the optimal lambda and resulting features change. Iterative LASSO regression was used to select high-frequency features, such as consensus genes, according to the frequency sequence of features after several runs of LASSO. Then, the consensus genes were sequentially included in the Cox model. After the AUC of ROC reached a peak, the genes were not included. At this point, the model is optimal and contains the least features81. We counted the consensus genes for which the frequency exceeded 50 after 500 LASSO regressions. Then we fit the expression levels of the consensus genes into a variable through the iterative LASSO cox regression to construct the optimal prognosis signature of OS. Next, we scored each sample with the optimal signature and divided the patients into a high- or low-risk group, according to the median of the score. Finally, we used R software to draw a risk factor association chart to display the survival status.

Comparison of signature with other known prognostic biomarkers and verification in an independent cohort

Many prognostic markers for patients with OS have been previously determined. SP140 has been identified as a promising prognostic marker for OS patients8, and the expression of MALAT1 has been shown to be associated with a worse prognosis for OS patients9. UCA1 expression may be an independent prognostic indicator for predicting a poor prognosis in patients with OS10. In addition, miR191 is highly expressed in the serum of patients with osteosarcoma and is positively correlated with clinical stage11. In order to determine whether our signature has a better ability to predict patient survival than known biomarkers, we conducted a ROC comparative analysis of the signature and other biomarkers. Good prognostic markers should also have a high predictive prognostic performance in other independent cohorts. To test the utility of the signature in this study, we verified it with another independent cohort (GSE39055). Details of the GSE39055 dataset are shown in Supplementary Table 1.

Subgroup survival analysis, signature clinical value evaluation

An important feature of a good prognostic marker is that it should be independent of the currently used clinicopathological prognostic factors. To evaluate the independence and applicability of this signature, we regrouped OS patients according to different clinicopathological characteristics, and then performed Kaplan–Meier survival analysis for their subgroups. We performed univariate and multivariate Cox regressions on clinicopathological characteristics and the signature to evaluate whether the signature is an important prognostic factor.

Correlation analysis of prognostic signature and clinical characteristics

To further evaluate the correlation between the risk score based on the prognosis-associated IRGs signature and clinical characteristics, we classified patients according to age, sex, and distant metastatic status. Then we used the ggstatsplot ( package to analyze the correlation between the risk score and the aforementioned. The results are shown in the ggplot2 package.