Abstract
This study explores migrasomes' role in neuroblastoma, a common malignant tumor in children, and their potential impact on tumor formation. We analyzed neuroblastoma RNA-seq datasets from public databases, including GSE62564, GSE181559, target, and fwr144. Through data normalization and unsupervised classification using migrasome-specific molecular markers, Differentially Expressed Genes were identified, followed by functional enrichment analysis. Our novel migrasome-associated machine learning model, MigScore, was developed using ten algorithms and 101 combinations, validated on two single-cell datasets. This enabled immune infiltration assessment and drug compatibility prediction, highlighting the utility of MS275, a histone deacetylase inhibitor. Results showed a significant inverse relationship between MigScore and favorable clinical outcomes, elucidating the link between migrasome pathways and tumor immunogenicity. These findings suggest that migrasomes are crucial in neuroblastoma prognosis, leading to the possibility of personalized treatment strategies and improved outcomes.
Similar content being viewed by others
Introduction
Within the intricate cellular network, the discovery of migrasomes provides a novel dimension to explore the mechanisms of intercellular interactions and communications1,2. Migrasomes, specialized membranous structures formed at the rear of migrating cells, have garnered significant attention. Not only do they contain fragments of the cell, but they also encompass a variety of cellular secretions, suggesting a pivotal role in cellular communication3,4.
In recent years, research on migrasomes primarily focused on their fundamental functions in cellular metabolism, signal transduction, and intercellular communication. However, as research deepens, there's a growing recognition of the potential role of migrasomes in disease processes, particularly in tumor formation and progression5,6. This perspective opens a new avenue to investigate the molecular mechanisms of tumors and potential therapeutic targets.
Neuroblastoma, a malignant tumor originating from embryonic sympathetic cells, is especially prevalent in children7. Despite advances in treating this disease, the prognosis remains bleak for high-risk patients8. Against this backdrop, understanding the function and role of migrasomes in neuroblastoma becomes crucial.
Thus, in this study, we employed a comprehensive array of bioinformatics techniques, including single-cell sequencing, bulk RNA-seq and machine learning, to delve into the role of migrasomes in neuroblastoma from a gene expression perspective. By intensively examining the expression patterns of migrasome-related genes across individual cell populations and aggregated tissue samples, and their correlation with patient prognosis, we aim to unveil their significant impact on neuroblastoma progression and provide valuable insights for future therapeutic strategies.
Methods
Data acquisition
We procured the raw count matrices and corresponding clinical information for neuroblastoma RNA-seq datasets: GSE62564, GSE181559, target, and fwr144. The GSE62564 and GSE181559 datasets were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62564, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE181559). The target cohort were downloaded from https://portal.gdc.cancer.gov/projects/TARGET-NBL. The fwr144 cohort were obtained from the European Genome-phenome Archive (https://ega-archive.org/datasets/EGAD00001006625). Specifically, the GSE62564 dataset functioned as our training set, whereas the other datasets were assigned as validation sets for the development of our migrasome-associated machine learning model.
The Gene Expression Omnibus (GEO) database served as the source for the raw count matrix of the single-cell dataset GSE137804 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE137804), and the pertinent clinical details were derived from associated scientific publications. Adhering to the original authors' methodologies, we conducted data preprocessing and cell type annotation9.
For the immunotherapy dataset (GSE91061) and the PRJEB23709 dataset, we sourced the raw count matrices and related clinical data from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE91061)and the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/view/PRJEB23709), respectively.
For all of the RNA-seq datasets, we applied Transcripts Per Million (TPM) normalization to standardize the gene expression levels and utilized the ComBat function from the "sva" package to effectively adjusts for batch effects across different samples10. This approach ensures that the gene expression values are comparable across datasets by accounting for sequencing depth and gene length. For single-cell RNA-seq dataset, we used the Harmony algorithm for batch effect correction and sample integration11. Harmony aligns the shared embedding of cells by learning the batch-specific effects and then correcting these effects to create a harmonized dataset, ensuring that the batch-corrected data reflects the true biological signal.
Unsupervised classification
Existing literature was explored to extract molecular markers specific to migrasome12. We then gauged the expression levels of these markers within the GSE62564 dataset and proceeded with a median normalization. For unsupervised clustering, we utilized the “ConsensusClusterPlus” software package13. This method employs a consensus clustering algorithm to determine the number of clusters and their memberships, providing quantitative and visual stability evidence. Specifically, we used the k-means clustering method, which partitions the data into k clusters by minimizing the within-cluster variance. The parameters configured included reps = 1000 (number of resampling iterations) and pItem = 0.8 (proportion of items to be sampled in each iteration). The consensus cumulative distribution function (CDF) and delta area plot were employed to determine the optimal number of clusters. The CDF plot shows the cumulative distribution of consensus values, while the delta area plot highlights the relative change in the area under the CDF curve, guiding the selection of the most stable number of clusters. This comprehensive approach ensures a robust framework for classifying neuroblastoma samples based on migrasome-related gene expression.
Determination of stemness index
Within the scope of the GSE62564 dataset, we employed a one-class logistic regression model to derive the stemness score, formally termed as mRNA stemness index (mRNAsi)14. Serving as a measure of the stemness attributes inherent to samples, the mRNAsi scores, standardized between 0 and 1, provided a comparative platform, with heightened scores indicative of increased stemness.
DEGs (Differentially expressed genes) identification
The "limma" package enabled the differentiation of genes based on variances in expression patterns across groups, using specific criteria pertaining to fold change (> 0.5)and p-value thresholds(< 0.01)15. Subsequently, to gain a holistic understanding of the biological processes and pathways influenced by these DEGs, the "ClusterProfile" package was invoked to drive the functional enrichment analysis16.
Machine learning implementation for migrasome-score construction
Our objective to formulate a robust and reliable score representing migrasome was realized through a confluence of 10 distinct machine learning algorithms, further bolstered by a staggering 101 algorithmic combinations. This arsenal included Random Survival Forest (RSF), Elastic Network (Enet), Lasso, Ridge, Coxboost, SuperPC, GBM, plsRcox, StepCox and survival-SVM.
The procedural workflow encompassed:
-
(1)
Identification of Prognostic Genes: We identified prognostic genes within the GSE62564 dataset using Univariate Cox regression analysis, selecting genes with a p-value threshold of < 0.05.
-
(2)
Model Construction: We executed the 101 algorithm combinations on these identified genes to spawn predictive models within the Leave-One-Out Cross-Validation (LOOCV) framework. Specifically, we utilized tools and packages in R, including the glmnet package for Elastic Net, Lasso, and Ridge regression, the superpc package for SuperPC, the coxboost package for CoxBoost, and the randomForestSRC package for Random Survival Forest17,18,19,20. These implementations ensured reproducibility and robustness in our model construction.
-
(3)
Model Validation: Each model was subsequently validated using three datasets: GSE181559, TARGET, and fwr144. The validation process involved calculating Harrell’s concordance index (C-index) to evaluate model performance across all validation datasets. The survival and survcomp packages in R were used for these survival analyses and C-index calculations..
-
(4)
Selection of the Best Model: The model exhibiting the highest average C-index in validation datasets was selected as the most efficacious. The chosen model was further scrutinized to ensure stability and resilience to overfitting.
Model evaluation
The performance of our migrasome-associated machine learning model was primarily evaluated using the concordance index (C-index). The C-index is well-suited for survival analysis as it measures the concordance between predicted risk scores and observed survival times, thus directly assessing the model's discriminative power in ranking survival outcomes. Given our focus on prognostic predictions in neuroblastoma, the C-index provides a relevant and meaningful evaluation metric. While alternative metrics such as MCC, F1 score, or AUC are valuable in classification contexts, they do not directly measure the ability to predict survival times and were therefore not used in this study.
Single-cell validation
For single-cell RNA-seq validation, we utilized the GSE137804 (nb-dong) and nb-jansky datasets. The raw count matrices were preprocessed, and cell type annotations were performed following the methodologies described by the original authors. The Seurat package in R was employed for downstream analysis.
The MigScore for each tumor cell was calculated using the Seurat package's AddModuleScore function, which computes the average expression of a specified gene set and subtracts the aggregated expression of control gene sets21. The gene set used for calculating MigScores was derived from the best-performing machine learning model identified in our study. This model, determined through rigorous validation using multiple datasets, highlighted genes with significant coefficients in the prognostic analysis of neuroblastoma. The AddModuleScore function in Seurat calculates the average expression of these selected genes for each cell and then subtracts the average expression of control genes randomly selected from the transcriptome to generate a standardized score. This approach ensures that the calculated MigScores are robust and reflective of the specific gene expression patterns associated with migrasome activity, thereby providing a reliable metric for assessing the prognostic implications in neuroblastoma.
Prediction of immune infiltration and response
To decipher the intricate stromal and immune scores within tumor samples, we employed the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm22. In addition, the MCP-counter algorithm offered an in-depth analysis of the immune cell composition within the tumor milieu23. The Immunophenoscore (IPS) algorithm furnished the overall immunophenoscore, while the Tumor Immune Dysfunction and Exclusion (TIDE) algorithm was harnessed to anticipate immune-exclusion patterns24. Execution of the ESTIMATE, MCP-counter, and IPS algorithms was orchestrated using the "IOBR" software package, whereas the TIDE was assessed through its designated online portal25.
Connectivity map (Cmap) assessment
For the GSE62564 dataset, genes showing differential expression between samples with high Migscore and those with low Migscore were identified. The top 300 genes showing either increased or decreased expression levels were gathered to create the Migscore-associated signature. The CMAP_gene_signatures, inclusive of 1288 compound-related signatures, was sourced from the provided database link (https://www.pmgenomics.ca/bhklab/sites/default/files/downloads). This data was then utilized to compute the compatibility score and scale to − 1 and 1 for dataset comparison. The analytical approach adhered to procedures highlighted in prior studies26.
Statistical evaluation
All statistical evaluations were executed using RStudio. For the survival assessment, the optimal breakpoint for dividing groups was ascertained using the surv_cutpoint function from the "survminer" package. We adopted the Kaplan–Meier approach to evaluate differences in results between the divided sets. To examine variances between distinct groups, the non-parametric Mann–Whitney test was applied. In all scenarios, a P-value less than 0.05 was deemed to indicate significant differences, suggesting that such distinctions were probably not random.
Ethical apporval
As all data analyzed in this study were obtained from publicly available datasets, no ethics approval form was required for this manuscript.
Results
Unsupervised clustering and clinical relevance of migrasome-associated genes
To comprehend the clinical relevance and distribution of migrasome-related gene expression in neuroblastoma, we performed an unsupervised clustering analysis, the results of which are vividly portrayed in Fig. 1. The consensus cumulative distribution function (CDF) in Fig. 1A displays a clear distinction among different potential clusters, suggesting a robust clustering scheme. The delta area curve in Fig. 1B accentuates a noticeable decline, indicating the optimal number of clusters, which in this case appears to be three. The consensus matrix heatmap (Fig. 1C) visually distinguishes between Cluster1, Cluster2, and Cluster3, highlighting robust intra-cluster consensus and inter-cluster differentiation. Figure 1D illustrates a t-SNE plot demonstrating the spatial distribution of the three clusters post-classification. The clear separation between clusters emphasizes the success of the clustering and the inherent differences in migrasome-related gene expressions among the clusters. Survival probability comparisons among the three clusters (Fig. 1E) show significant differences. Specifically, Cluster1 and Cluster3 display a relatively better prognosis than Cluster2. The survival curves begin to deviate significantly after 2000 days, and this difference remains pronounced throughout the observation period. In conclusion, cluster 1, comprised of 228 samples, exhibits better overall survival rates and a lower likelihood of MYCN amplification. Cluster 2, containing 115 samples, is characterized by a higher incidence of MYCN amplification and poorer survival outcomes. Cluster 3 includes 155 samples, with clinical characteristics that are a mix of cluster 1 and cluster 2, generally showing moderate survival outcomes.
Cluster correlation with genetic and clinical markers
Figure 1F showcases a heatmap of the expression of specific migrasome-related genes in each sample. Notable genes such as PIGK, TSPAN4, NDST1, and ITGB1 manifest marked expression variations among the clusters, potentially driving the differences in survival patterns. The distribution of MYCN amplification status among the clusters is represented in Fig. 1G and H. Clusters exhibit varying proportions of MYCN amplified (Amp) and non-amplified (Non-Amp) samples. Cluster2, in particular, demonstrates a heightened percentage of MYCN-amplified cases, potentially correlating with its poorer prognosis. The INSS stage distribution among clusters is delineated in Fig. 1I and J. Each cluster encapsulates a range of disease stages. Significantly, Cluster2 is devoid of stage 4S cases, indicating potential clinical and molecular distinctions within this group. Figure 1K delves into the mRNA stemness index (mRNAsi) across the clusters. It's evident that Cluster2 exhibits the highest stemness index, implying a pronounced stem-like quality in this group when compared to the other clusters. The Venn diagram in Fig. 1L provides a comprehensive overlap of differentially expressed genes among the clusters, hinting at unique as well as shared migrasome-related genetic signatures that could underpin the distinctions observed among the clusters.
Machine learning analysis illuminates prognostic genes in neuroblastoma
Drawn from insights of Fig. 1, a refined machine learning strategy was deployed, narrowing focus on the 283 intersecting genes. Out of these, 50 were discerned to have significant correlation with neuroblastoma prognosis and were thus employed as inputs for machine learning. To identify the best-performing model, we evaluated ten distinct machine learning algorithms and 101 algorithmic combinations using the GSE62564 dataset as the training set. We employed Harrell’s concordance index (C-index) as the primary metric for evaluating the predictive power of each model, given its robustness in survival analysis. The models were validated on three independent datasets: GSE181559, target, and fwr144. The mean C-index scores for each model were calculated across these validation datasets to select the best model. The CoxBoost + Enet (α = 0.1) model emerged as the best-performing model with the highest mean C-index of 0.664 across the GSE181559, target, and fwr144 datasets. In the GSE181559 dataset, it achieved a C-index of 0.699, while in the target dataset, it reached 0.611, and in the fwr144 dataset, it demonstrated a C-index of 0.683. These results consistently indicated superior performance compared to other models such as the Random Survival Forest (RSF) and GBM, which showed mean C-indices of 0.652 and 0.625, respectively (Supplementary Fig. 1). This model provided the most reliable predictions for neuroblastoma prognosis.
Figure 2A and B dive into the intricacies of the Coxboost + Enet model with α = 0.1. Figure 2A portrays the trajectory of regression coefficients. As the penalty (log lambda) intensifies, numerous gene coefficients approach zero, signifying their reduced significance. However, a certain subset retains non-zero coefficients, underscoring their pivotal relevance. Figure 2B further exemplifies the number of genes selected at varied penalty values, underscoring the model's stability and resilience to overfitting. Figure 2C hones in on the genes exhibiting significant coefficients, implying their indispensable role in neuroblastoma prognosis. Preeminent among these are genes such as NACC2, LAMB2, and CDH5. The magnitudes and orientations of these coefficients signify the genes' positive or negative prognostic implications. The MigScore, is calculated by multiplying the expression level of each gene in the dataset by its corresponding coefficient, as depicted in Fig. 2C. This gives rise to the MigScore for every individual sample. Figure 2D conducts a nuanced survival analysis, categorizing the GSE62564 dataset into high and low MigScore groups. A stark disparity in survival trajectories is evident. Samples with elevated MigScores face notably diminished survival probabilities compared to their lower-scoring counterparts. This delineation is further bolstered by a significant hazard ratio of 12.95 and a compelling p-value (< 0.001). The three external validation cohorts displayed survival curves with trends consistent with the training set (Fig. 2E–G).
Single-cell validation of the prognostic model established from bulk-RNA-seq data
To ensure the robustness and reliability of our MigScore model derived from bulk RNA-seq data, we validated it using two independent single-cell RNA-seq datasets: nb-dong (GSE137804) and nb-jansky. This cross-validation aimed to confirm that the prognostic signatures identified in bulk samples are consistent and reproducible at the single-cell level.
The single-cell RNA-seq datasets were preprocessed and annotated according to established protocols. The cell composition of the nb-dong (GSE137804) and nb-jansky datasets were shown in Supplementary Fig. 2. The MigScore for each tumor cell was calculated using the Seurat package's AddModuleScore function. In the nb-dong dataset, we applied the MigScore calculation to the identified tumor cells. The results showed a clear distinction in MigScores between high-risk and low-risk tumor cells. Tumor cells with higher MigScores were associated with poorer prognostic indicators, consistent with our findings from the bulk RNA-seq data. The distribution of MigScores was significantly higher in high-risk tumor cells, indicating a robust correlation between MigScore and tumor aggressiveness at the single-cell level (Fig. 3A, B).
For the nb-jansky dataset, we similarly calculated the MigScores for individual tumor cells. The UMAP visualization revealed distinct clusters of high-risk and low-risk tumor cells of MigScores(Fig. 3C, D). High-risk tumor cells exhibited elevated MigScores, reinforcing the model's predictive power. This validation further demonstrated that the MigScore is a reliable metric for assessing tumor cell aggressiveness and potential prognosis in neuroblastoma.
Clinical relevance of MigScore in the GSE62564 dataset
To discern the correlation between the devised MigScore and pertinent clinical parameters, and to further understand its prognostic potential, we delved into the GSE62564 dataset. Supplementary Fig. 3A depicts the spread of MigScore across all samples, further categorized into high and low score groups. The accompanying heatmap conveys the overlap of MigScore with a plethora of clinical attributes. Significant interplay is evident between MigScore and various clusters, MYCN amplification status, INSS stage, age, overall survival (OS), disease progression, and COG risk classification. This suggests that MigScore is not an isolated metric but is intertwined with pivotal clinical attributes. Figure 4A represents the distribution of MigScore amongst three distinct clusters. A discernible difference in MigScores between the clusters is evident. MYCN amplification, a crucial genetic marker in neuroblastoma, is closely tied with MigScore. As depicted in Fig. 4B, samples with MYCN amplification (MYCN-Amp) register significantly different MigScores compared to those without amplification (MYCN-Nonamp).
Delving deeper into clinical stages, Fig. 4C unveils the MigScore distribution across various INSS stages (from st1 to st4s). With progression in stages, the variability in scores is palpable, suggesting a potential link between disease stage and MigScore.
Prognostic potential of MigScore
The time-dependent ROC curves were constructed using the "timeROC" package in R, which calculates the ROC curve for each time point by considering the censored survival data27. This method allows for the assessment of the predictive accuracy of the MigScore and other clinical parameters at various time intervals, providing a dynamic view of its prognostic value over time. The time-dependent ROC curves exhibit the performance of MigScore compared with other clinical parameters like INSS stage, MYCN status, and age over a temporal spectrum. MigScore, with its performance trajectory, underscores its potential as a dependable prognostic marker, often outshining other clinical markers in certain timeframes (Fig. 4D).
The Decision Curve Analysis (DCA) was performed using the "rmda" package in R28. The y-axis represents the net benefit, which is calculated by considering the true positive rate and false positive rate, adjusted for the harm of unnecessary treatments. The x-axis represents the risk thresholds, which are the probabilities at which a patient would opt for a treatment. Figure 4E presents decision curve analysis for different timepoints (1-Year, 4-Year, and 8-Year). The graph elucidates the net benefit of using MigScore for prognosis over other clinical markers. In all timepoints, MigScore consistently delivers promising results, advocating its applicability in clinical prognosis.
Biological function exploration of MigScore groups using GSEA
To elucidate the potential biological roles and pathways associated with varying MigScores, we utilized the Gene Set Enrichment Analysis (GSEA) method. The comparison between high and low MigScore groups unveiled several significantly enriched pathways. From the chromosomal standpoint, the high MigScore group exhibited associations with chr13p12, chr19p12, and chr2q11, among others. Moreover, the high MigScore group correlated with specific cancer and disease phenotypes such as the bladder cancer and the invasive ductal breast cancer, suggesting its potential involvement in these disease processes. Several microRNAs, including MIR365_3 and MIR3943_3, and gene targets such as foxr2 displayed enrichment in the high MigScore group, emphasizing the importance of post-transcriptional regulation and specific transcription factors in the MigScore context. Furthermore, distinct modules such as MODULE_44 and MODULE_33 appeared more prevalent in the high MigScore samples. Biological processes and cellular component associations also emerged, with notable enrichments in adaptive immune response, and ribonucleoprotein complex biogenesis (Fig. 5A).
Network visualization of selected enriched pathways
To better understand the interconnectedness and crosstalk of the enriched pathways, we employed the aPEAR package (Advanced Pathway Enrichment Analysis Representation) for the autonomous visualization of pathway enrichment networks29. The aPEAR package is designed to address the redundancy and complexity often encountered in pathway enrichment analysis results by aggregating similar pathways and visualizing their interactions as a network of interconnected clusters. The high MigScore group exhibited higher pathway activity like ribosome biogenesis, sister chromatid segregation, and inner mitochondrial membrane protein complex, and lower pathway activity like exogenous antigen presentation and leukocyte cell adhesion. The size of the pathway nodes indicates the number of genes involved, and the NES (Normalized Enrichment Score) depicts the extent of enrichment in the given pathway (Fig. 5B).
Exploration of immune infiltration and predictive capability of immune response between MigScore groups
To shed light on the differential immune infiltration and predictive capability of immune response between the MigScore groups, a combination of analytical methods and evaluations were employed. The heatmap in Fig. 6A displays the deconstruction of immune infiltration across the two MigScore groups utilizing three distinguished algorithms: ESTIMATE, Cibersort, and MCPCounter. It is evident from the heatmap that there's a distinct cellular composition of immune cells between the high and low MigScore groups. The variations in the infiltration levels of immune cell types such as T cells, B cells, NK cells, monocytes, fibroblasts, among others, underscore a differential immune milieu associated with the MigScore levels. The violin plots presented in Panels B and C elucidate the Immune Prognostic Score (IPS) and Tumor Immune Dysfunction and Exclusion (TIDE) metrics for the respective MigScore groups (Fig. 6B and C). Interestingly, the low MigScore group showcased a higher IPS and a decreased TIDE score, suggesting a more conducive immune response environment in comparison to the high MigScore group. The bar graph outlines the predicted immune response across MigScore groups(Fig. 6D). It can be observed that the low MigScore group predominantly displayed a responsive (R) immune behavior, contrasting the high group which leaned more towards a non-responsive (NR) trend. The stacked bar chart offers insights into the overall predicted immune response rates between the groups. A pronounced difference is evident with the low MigScore group indicating a superior 65% responsive rate as opposed to the 36% responsiveness seen in the high group (Fig. 6E).
External validation of immunotherapy outcomes and associated survival metrics
Figure 6F features the Kaplan–Meier survival curves for the two MigScore groups post immunotherapy intervention of the GSE91061 dataset. The survival probabilities for the low MigScore group were markedly enhanced in comparison to the high group. Furthermore, the response rate distribution in Fig. 6G suggests a heightened 31% responsiveness in the low MigScore group, contrasting the 16% in the high group.
Figure 6H post immunotherapy draw parallels to the GSE91061 dataset observations of the PRJEB23709 dataset, with the low MigScore group reflecting better survival outcomes. Figure 6I further reinforces this narrative, revealing an impressive 80% response rate for the low MigScore group in contrast to the 46% seen in the high group.
Exploration of potential therapeutic agents between MigScore groups using the CMAP database
The analysis uses the CMAP database to uncover potential therapeutic compounds beneficial for various MigScore groups. Figure 7A, B and C depict the scoring distributions of therapeutic agents across the GSE62564 Dataset, Target Dataset, and fwr144 Dataset, respectively.
A score closer to -1 indicates higher sensitivity, suggesting that the compound is more likely to be effective as a therapeutic agent. This scoring is based on the connectivity map (CMAP) methodology, where negative connectivity scores imply that the gene expression signature induced by the compound opposes the disease signature26. Therefore, a score near -1 reflects a strong inverse correlation, meaning the compound potentially counteracts the disease state effectively. Remarkably, MS-275 consistently emerges as the compound nearing the -1 mark across all three datasets, thereby signifying its heightened therapeutic sensitivity. Other compounds like XS109870 and phenoxybenzamine also show scores that suggest potential therapeutic efficacy, but MS-275 remains the most promising among them.
Figure 7D, focused on the GSE181559 Dataset, highlights the scores of MS-275, butein, and STOCK1N-35874. Among these, MS-275 continues its trend, thereby accentuating its potential therapeutic significance. Figure 7E showcases the aggregated therapeutic score distribution for compounds, where scores are the summation from Fig. 7A–D. Here, the lower the score, the higher the compound's cumulative sensitivity across datasets. MS-275, consistent with previous observations, retains its prominence with one of the lowest aggregated scores, emphasizing its potential therapeutic utility.
Given the prominence of MS-275 as an HDAC inhibitor, we further delve into the activity of HDAC1, casting light on its role and implications. Figure 7F elucidates the association between HDAC1-TF activity and MigScore in the GSE62564 dataset. It becomes clear that the high MigScore group displays augmented HDAC1-TF activity relative to the low MigScore group.
To further explore the HDAC1 activity on single-cell dataset, we utilized the nb-jansky dataset. Figure 7G offers a UMAP visualization differentiating HDAC1 activity between high-risk and low-risk tumor cells. The clusters distinctly indicate that high-risk tumor cells have pronounced HDAC1 activity compared to the low-risk tumor cells. Figure 7H delves deeper into the HDAC1-TF activity variances between high-risk and low-risk tumor cells. It is evident that high-risk tumor cells demonstrate considerably elevated HDAC1-TF activity compared to the low-risk counterparts.
Discussion
Our study aimed to elucidate the correlation between migrasome and the prognosis of neuroblastoma. By leveraging an interdisciplinary approach, we unequivocally identified a notable association between a heightened MigScore and suboptimal clinical outcomes among neuroblastoma patients. One defining hallmark of our research is its methodological robustness. Apart from harnessing machine learning techniques, which proved pivotal in pinpointing genes related to migration and in the subsequent prognostic prediction of treatment responses, we also incorporated multi-omics analyses, melding insights from both bulk RNA-seq and single cell RNA-seq. This dual pronged approach not only lent credence to our correlation analyses but also unearthed intricate nuances pertaining to cellular heterogeneity and the overarching molecular canvas of neuroblastoma.
Our research approach's robustness is further underscored by our use of 101 models instead of a single model, a decision rooted in ensemble learning principles. This method leverages the predictive strengths of multiple models, mitigating overfitting risks, and enhancing generalizability across different datasets. By evaluating a broad spectrum of models, we ensured the identification of the most performant one, providing a comprehensive understanding of the predictive capabilities of various algorithms and their combinations. This strategy allowed us to validate the stability and consistency of the MigScore across different datasets, bolstering the credibility and applicability of our findings in diverse clinical contexts.
Beyond its diagnostic promise, the findings of this research have significant implications in the broader field of neuroblastoma treatment. The intricacies of neuroblastoma's pathophysiology have made it a challenging condition to manage30. While current therapeutic approaches have achieved considerable success in treating localized tumors, the high-risk neuroblastoma subset, especially those with metastatic tendencies, remains a formidable challenge. The introduction of the MigScore has the potential to revolutionize this landscape. By unveiling the prognostic significance of the MigScore, especially its connection to survival rates and treatment outcomes, our research introduces a potential avenue for tailored therapeutic strategies. The utility of the MigScore becomes especially evident when considering its potential in guiding therapeutic decisions, such as the choice between aggressive chemotherapy versus more conservative approaches. Moreover, by understanding the underlying molecular and cellular events that contribute to a high MigScore, clinicians and researchers could develop targeted therapies that specifically address these pathways, thus improving overall treatment efficacy.
Tackling neuroblastoma from an immunotherapeutic perspective remains enigmatic31. The tumor's immune microenvironment often exhibits a 'cold' demeanor, denoting a paucity of immune cell infiltration and concomitant activity. This inherent immunological inertia severely curtails the success rate of immunotherapies. In a striking revelation, our analyses delineated that neuroblastoma specimens with diminished MigScores were more amenable to immune infiltration and exhibited superior immune responsiveness. This hints at an intertwined relationship between migrasome-centric pathways and tumor immunogenicity. In this light, the external validation of MigScore's predictive capacity using melanoma immunotherapy datasets further showcases its potential translatability across diverse oncological contexts. Deciphering the molecular orchestrations modulating MigScore might offer innovative strategies to metamorphose the 'cold' neuroblastoma milieu into a more 'responsive' one, thereby amplifying the efficacy of immunotherapeutic interventions. This underscores the imperative to coalesce MigScore evaluations with other clinical indicators, paving the way for a holistic therapeutic regimen for neuroblastoma sufferers.
Further dissecting the mechanistic insights, our study also highlighted the role of the migrasome, an organelle associated with cell migration, in the metastatic tendencies of neuroblastoma. A salient finding from our research points towards the therapeutic potential of MS275, a histone deacetylase (HDAC) inhibitor. HDAC inhibitors have piqued oncological interest owing to their ability to reshape the epigenomic architecture of malignant cells32. Preceding studies spotlight their efficacy in curbing tumor cell migration and invasion, primarily through inducing histone hyperacetylation33. This epigenomic reconfiguration dials down the expression of genes fostering cell migration and metastatic spread. Given this background, the prospects of MS275 in neuroblastoma management appear promising. Exploiting HDAC inhibitors to target migrasome-associated pathways might herald novel therapeutic stratagems against neuroblastoma's metastatic manifestations.
Despite the significant findings of our study, several limitations must be acknowledged. Firstly, our research primarily relied on publicly available RNA-seq datasets, which may introduce selection biases. The sample size for certain datasets, particularly those with single-cell RNA-seq data, was limited. This restriction could potentially influence the generalizability of our findings. Larger, more diverse cohorts are needed to validate the robustness of our results across different populations and settings. Secondly, while we utilized advanced machine learning techniques to develop the MigScore model, the performance of these models is contingent upon the quality and variability of the input data. Any inherent biases within the training datasets could be propagated through the model, impacting its predictive accuracy. Additionally, our study focused on transcriptomic data, which captures gene expression levels but does not account for post-transcriptional modifications, protein activity, or other regulatory mechanisms that may influence neuroblastoma progression. Future studies should incorporate proteomic and metabolomic data to provide a more comprehensive understanding of the molecular underpinnings of neuroblastoma.
Another limitation pertains to the lack of experimental validation for the identified migrasome-related genes and pathways. While our bioinformatics analyses suggest a significant role for these genes in neuroblastoma prognosis, experimental studies are necessary to confirm their functional relevance and mechanistic contributions to tumor biology. Investigating the presence and activity of migrasomes in neuroblastoma, particularly in high-risk metastatic cases, will be crucial for substantiating our findings.
Lastly, the generalizability of our findings to clinical practice remains to be established. Although our study demonstrates the potential prognostic value of the MigScore, its clinical utility must be evaluated in prospective studies involving neuroblastoma patients. Such studies should assess the feasibility of integrating MigScore into existing clinical workflows and determine its impact on treatment decision-making and patient outcomes.
Conclusion
In conclusion, our study not only underscores the prognostic significance of the migrasome in neuroblastoma but also elucidates potential therapeutic avenues, especially with regards to modulating the tumor's immune microenvironment. The integration of machine learning in our approach further amplifies the study's significance, paving the way for more personalized and efficient therapeutic strategies in the future.
Data availability
The GSE62564 and GSE181559 datasets were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62564, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE181559). The target cohort were downloaded from https://portal.gdc.cancer.gov/projects/TARGET-NBL. The fwr144 cohort were obtained from the European Genome-phenome Archive (https://ega-archive.org/datasets/EGAD00001006625). The Gene Expression Omnibus (GEO) database served as the source for the raw count matrix of the single-cell dataset GSE137804 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE137804), and the pertinent clinical details were derived from associated scientific publications. For the immunotherapy dataset (GSE91061) and the PRJEB23709 dataset, we sourced the raw count matrices and related clinical data from the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE91061)and the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/view/PRJEB23709), respectively.
Code availability
All original code has been deposited at Zenodo (https://doi.org/10.5281/zenodo.12564026). In addition, we have provided a supplementary table (Supplementary Table 1) that lists all the R packages and their versions used in our study.
References
Ma, L. et al. Discovery of the migrasome, an organelle mediating release of cytoplasmic contents during cell migration. Cell Res. 25(1), 24–38 (2015).
Jiang, Y. et al. Migrasomes, a new mode of intercellular communication. Cell Commun. Signal 21(1), 105 (2023).
Yu, L. Migrasomes: The knowns, the known unknowns and the unknown unknowns: A personal perspective. Sci. China Life Sci. 64(1), 162–166 (2021).
Yu, S. & Yu, L. Migrasome biogenesis and functions. FEBS J. 289(22), 7246–7254 (2022).
Zhang, Y. et al. Research progress and direction of novel organelle-migrasomes. Cancers (Basel). 15(1), 134 (2022).
Qin, Y. et al. Pan-cancer analysis identifies migrasome-related genes as a potential immunotherapeutic target: A bulk omics research and single cell sequencing validation. Front. Immunol. 13, 994828 (2022).
Smith, V. & Foster, J. High-risk neuroblastoma treatment review. Children (Basel) 5(9), 114 (2018).
Del Bufalo, F. et al. GD2-CART01 for relapsed or refractory high-risk neuroblastoma. N. Engl. J. Med. 388(14), 1284–1295 (2023).
Dong, R. et al. Single-cell characterization of malignant phenotypes and developmental trajectories of adrenal neuroblastoma. Cancer Cell 38(5), 716–33 e6 (2020).
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28(6), 882–883 (2012).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16(12), 1289–1296 (2019).
Zhao, X. et al. Identification of markers for migrasome detection. Cell Discov. 5, 27 (2019).
Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: A class discovery tool with confidence assessments and item tracking. Bioinformatics 26(12), 1572–1573 (2010).
Malta, T. M. et al. Machine learning identifies stemness features associated with oncogenic dedifferentiation. Cell 173(2), 338–54 e15 (2018).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015).
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2(3), 100141 (2021).
Ishwaran, H. & Kogalur, U. B. Consistency of random survival forests. Stat. Probab. Lett. 80(13–14), 1056–1064 (2010).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010).
Bair, E. & Tibshirani, R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2(4), E108 (2004).
Binder, H. & Schumacher, M. Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinform. 10, 18 (2009).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42(2), 293–304 (2024).
Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17(1), 218 (2016).
Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. 24(10), 1550–1558 (2018).
Zeng, D. et al. IOBR: Multi-omics immuno-oncology biological research to decode tumor microenvironment and signatures. Front. Immunol. 12, 687975 (2021).
Yang, C. et al. A survey of optimal strategy for signature-based drug repositioning and an application to liver cancer. eLife https://doi.org/10.7554/eLife.71880 (2022).
Blanche, P., Dartigues, J. F. & Jacqmin-Gadda, H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat. Med. 32(30), 5381–5397 (2013).
Kerr, K. F., Brown, M. D., Zhu, K. & Janes, H. Assessing the clinical impact of risk prediction models with decision curves: Guidance for correct interpretation and appropriate use. J. Clin. Oncol. 34(21), 2534–2540 (2016).
Kerseviciute, I. & Gordevicius, J. aPEAR: An R package for autonomous visualization of pathway enrichment networks. Bioinformatics https://doi.org/10.1093/bioinformatics/btad672 (2023).
Zafar, A. et al. Molecular targeting therapies for neuroblastoma: Progress and challenges. Med. Res. Rev. 41(2), 961–1021 (2021).
Wedekind, M. F., Denton, N. L., Chen, C. Y. & Cripe, T. P. Pediatric cancer immunotherapy: Opportunities and challenges. Paediatr. Drugs 20(5), 395–408 (2018).
Li, G., Tian, Y. & Zhu, W. G. The roles of histone deacetylases and their inhibitors in cancer therapy. Front. Cell Dev. Biol. 8, 576946 (2020).
Alseksek, R. K., Ramadan, W. S., Saleh, E. & El-Awady, R. The role of HDACs in the response of cancer cells to cellular stress and the potential for therapeutic intervention. Int. J. Mol. Sci. 23(15), 8141 (2022).
Jansky, S. et al. Single-cell transcriptomic analyses provide insights into the developmental origins of neuroblastoma. Nat. Genet. 53(5), 683–693 (2021).
Acknowledgements
Special thanks go to Selina Jansky and Frank Westermann from the German Cancer Research Center for making available the processed count matrix along with its clinical information to their single-cell dataset34.
Funding
This work was supported by grants from the Tianjin Health Technology Project (Grant no.2022QN106) to Xin Li and Tianjin Health Technology Project (Grant no.2022QN105) to Yuren Xia.
Author information
Authors and Affiliations
Contributions
W.L. and Y.X. performed data analysis. J.W. performed original draft writing; X.L. and H.J. reviewed and edited the manuscript; X.L. performed the study concept and design; X.L. is the guarantor of this work and takes responsibility for the integrity of the data and accuracy of the data analysis. All the authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, W., Xia, Y., Wang, J. et al. Prognostic significance of migrasomes in neuroblastoma through machine learning and multi-omics. Sci Rep 14, 16629 (2024). https://doi.org/10.1038/s41598-024-67682-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-67682-7