Abstract
Distant metastasis (DM) is relatively uncommon in T1 stage gastric cancer (GC). The aim of this study was to develop and validate a predictive model for DM in stage T1 GC using machine learning (ML) algorithms. Patients with stage T1 GC from 2010 to 2017 were screened from the public Surveillance, Epidemiology and End Results (SEER) database. Meanwhile, we collected patients with stage T1 GC admitted to the Department of Gastrointestinal Surgery of the Second Affiliated Hospital of Nanchang University from 2015 to 2017. We applied seven ML algorithms: logistic regression, random forest (RF), LASSO, support vector machine, k-Nearest Neighbor, Naive Bayesian Model, Artificial Neural Network. Finally, a RF model for DM of T1 GC was developed. The AUC, sensitivity, specificity, F1-score and accuracy were used to evaluate and compare the predictive performance of the RF model with other models. Finally, we performed a prognostic analysis of patients who developed distant metastases. Independent risk factors for prognosis were analysed by univariate and multifactorial regression. K-M curves were used to express differences in survival prognosis for each variable and subvariable. A total of 2698 cases were included in the SEER dataset, 314 with DM, and 107 hospital patients were included, 14 with DM. Age, T-stage, N-stage, tumour size, grade and tumour location were independent risk factors for the development of DM in stage T1 GC. A combined analysis of seven ML algorithms in the training and test sets found that the RF prediction model had the best prediction performance (AUC: 0.941, Accuracy: 0.917, Recall: 0.841, Specificity: 0.927, F1-score: 0.877). The external validation set ROCAUC was 0.750. Meanwhile, survival prognostic analysis showed that surgery (HR = 3.620, 95% CI 2.164–6.065) and adjuvant chemotherapy (HR = 2.637, 95% CI 2.067–3.365) were independent risk factors for survival prognosis in patients with DM from stage T1 GC. Age, T-stage, N-stage, tumour size, grade and tumour location were independent risk factors for the development of DM in stage T1 GC. ML algorithms had shown that RF prediction models had the best predictive efficacy to accurately screen at-risk populations for further clinical screening for metastases. At the same time, aggressive surgery and adjuvant chemotherapy can improve the survival rate of patients with DM.
Similar content being viewed by others
Introduction
Gastric cancer (GC), ranking fifth in morbidity and third in mortality, is one of the most common malignant tumors of the digestive system1,2. As is well known, it is a relatively long process for tumors to occur and progress and significant differences exist for the clinical manifestations and prognosis among each stage 3. Early gastric cancer (EGC) means that the tumor has not invaded the submucosa, which is defined as T1 stage, regardless of lymph node metastasis4. Patients with EGC can obtain a better prognosis after radical surgical resection, while distant organ metastasis regarded as advanced stage represents poor prognosis5. By means of local invasion, hematogenous and lymphatic metastasis, the metastasis of tumors exists throughout the whole process, which implies that distant metastasis(DM) might also occur in the T1 stage 6.
Studies have indicated that the occurrence and development of stage IV GC is related to many factors, among which T-stage is an independent risk factor for DM7. As tumors invade deeper, the possibility of metastasis increases significantly. Since this depth of T1 is superficial and the tumor is only located in the mucous membrane or submucosa, most scholars hold that there is little probability of distant metastasis8. However, it is precisely this traditional cognitive point of view that leads to deficiencies or neglect in the preoperative diagnosis of T1 GC, delaying the optimal time for treatment and affecting the prognosis of patients9. At present, the preoperative examination of GC mostly depends on imaging methods such as CT, but the accuracy of imaging examination in the detection of DM is obviously insufficient10. However, it is undeniable that accurate preoperative diagnosis and prediction of DM in GC patients are crucial for guiding clinical treatment and improving the prognosis of patients.
In recent years, the treatment of stage IV GC has long been controversial11. In line with the treatment guidelines of the Japan Gastric Cancer Association, the treatment of stage IV GC mainly includes radiotherapy, chemotherapy, optimal supportive treatment and palliative surgery 12. It has been reported that the prognosis of stage IV GC is affected by many factors, among which the method of treatment is an independent risk factor for prognosis, as well as T-stage7. Radical resection or endoscopic resection performs well in the therapies for early gastric cancer and usually brings a better prognosis. Once DM occurs in T1 GC, however, the options of treatment and prognosis would be much different13,14. Therefore, this study constructed a predictive model for DM in T1 GC, screened the best predictive model by a machine learning(ML) algorithm, and further analysed the prognosis of patients with T1NxM1 gastric cancer, to better guide clinical diagnosis and treatment.
Materials and methods
Patients and samples
Patients from the Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/data/) were retrieved and downloaded via SEER * stat version 8.3.9 Software by the account 15962-Nov2020. Detailed data include information from 2010 to 2017, as specific T1 staging information is only available in 2010 and later. Meanwhile, we collected clinical data of patients with stage T1 gastric cancer admitted to the Second Affiliated Hospital of Nanchang University from January 2015 to January 2017. The study passed the hospital's ethical review (The Examination and Approval NO.Review [2018]No,(104)).Inclusion criteria: (1) Diagnosed as stage T1 gastric cancer (T1aNxMx and T1bNxMx); (2) Complete survival information.; (3) No pre-operative radiotherapy or immunotherapy. Exclusion criteria: (1) Suffering from multiple in situ tumors; (2) Tumor staging is incomplete; (3) The information is incomplete. Tumor diagnosis based on primary tumor site, grade and histology is coded in International Classification of Diseases for Oncology, Third Edition (ICD-O-3), and the seventh edition of AJCC staging system (the 7th AJCC edition) was applied for tumor-node-metastasis (TNM) stage system. The patient screening process was shown in Fig. 1.
Data and variables selection
In the research, we considered 11 variables totally which was divided into three categories. Population characteristic variables include sex (Male, Female), age (< 40, 40–60, 60–80, > 80). Clinicopathological variables include tumor size (< 2 cm, 2-5 cm, > 5 cm, NA),tumor location (Fundus, Body, Antrum, Pylorus, Lesser curve, Greater curve, Overlapping, NOS), grade (Well, Moderate, Poorly, Undifferentiated, NA), M-stage (M0, M1), N-stage (N0, N1, N2, N3) and T-stage (T1a, T1b). Treatment variables include surgery, chemotherapy and radiotherapy.
Statistical methods
All statistical analyses were performed by R4.1.0 software and SPSS 24.0. The flow of this study was shown in Fig. 2. Heat maps were drawn for correlation analysis between variables including sex, age, tumour size, grade, T-stage, N-stage and tumour location. Independent risks affecting distant metastases from stage T1 gastric cancer were screened by logistic regression analysis. The results are represented by hazard ratios (HRs) and 95% confidence intervals (CIs). All patients were randomly divided 7:3 into a training set and a test set, and hospital patients were used as the external verification set. The training set developed the predictive model and the test set was evaluated for validation. We built seven ML algorithms in the training set: Logistic Regression (LR), Random Forest (RF), LASSO, Support Vector Machine (SVM), k-Nearest Neighbor (KNN), Naive Bayesian Model (NBC), and Artificial Neural Network (ANN). ROCAUC, sensitivity, specificity, F1-score and accuracy were used to compare the performance of the models. A test set further evaluated the validation. The external validation set validated the best predictive model to assess the generalisation capability of the model. For survival prognostic analysis, prognostic independent risk factors were analysed by univariate and multifactorial regression. K-M curves were used to express differences in survival prognosis for each variable and subvariable. For descriptive statistics, the chi-square test or Fisher's exact probability method were used to compare categorical variables. P < 0.05 indicated statistical significance.
Ethics approval and consent to participate
As the study was conducted using a public database, patient informed consent and ethical review were not required.
Results
Patient characteristics
A total of 2698 patients were included in the SEER database for this study, 314 (11.64%) with distant metastases and 2384 (88.36%) without distant metastases. The external validation set consisted of 107 patients, 14 (13.08%) with distant metastases and 93 (86.92%) without distant metastases. the SEER database was randomised 7:3 into training and test sets. There was no statistical difference in age, sex, T-stage, N-stage, M-stage, chemotherapy, radiotherapy, tumour size, surgery, differentiation and Primary site between the two groups (P > 0.05). Table 1 shows the general characteristics of the patients in the three groups.
Comparison and analysis of model variables
First, we performed a Pearson correlation analysis between the variables (Fig. 3a). By stepwise backward LR analysis, we identified six characteristics as independent risk factors for predicting DM (Table 2), including age (P < 0.001), T-stage (P < 0.001), N-stage (P < 0.001), tumour size (P < 0.001), degree of differentiation (P = 0.002), and tumour site (P < 0.001). For the RF algorithm, the results of the analysis of variable significance showed (Fig. 3b) that N-stage, tumour size, T-stage, grade, age and tumour location were positively associated with distant metastases. Notably, this was consistent with the results of the multifactorial logistic regression model analysis.
Establishment of a model for predicting distant Metastasis of T1 GC
We adjust the parameters of the training set to balance the model and avoid overfitting the model. Seven ML algorithms were performed on the balanced training set to construct the prediction model, and finally we found that the RF prediction model had the best prediction performance (AUC: 0.941, Accuracy: 0.917, Recall: 0.841, Specificity: 0.927, F1-score: 0.877) (Table 3, Fig. 4a). We further validated this in our test set and the results showed that the random forest prediction model had a ROCAUC of 0.825, which was significantly better than the other six models (Fig. 4b). Meanwhile, we validated the RF prediction model using 107 hospital patients as an external validation set (ROCAUC = 0.750) (Fig. 4c). Therefore, we believe that the RF prediction model can accurately predict the risk of developing DM in stage T1 GC.
Prognostic Analysis of patients with distant Metastasis of T1 GC
To further analyze the prognosis of patients with distant metastasis in stage T1, we screened out the risk factors potential to influence the prognosis by univariate and multivariate regression analysis, and displayed the conclusion through K-M curve. Univariate analysis showed that chemotherapy (P < 0.001), surgery (P < 0.001), T stage (P < 0.022) and degree of differentiation (P < 0.035) were risk factors for prognosis (Fig. 5a–d). Multivariate regression analysis manifested that surgery and chemotherapy were independent risk factors for prognosis (Table 4). Additionally, subgroup analysis suggested that surgery combined with adjuvant chemotherapy could improve the survival rate of patients (Fig. 5e).
Discussion
The prognosis of GC patients with distant metastasis is poor, with a 5-year survival rate < 5% and a median survival period of 11–18 months5. Extensive evidence has indicated that approximately 40% of patients have distant metastasis at the time of initial diagnosis of GC, and incidence increases as the tumor progresses15,16. Due to the high 5-year survival rate of T1 patients, many scholars have ignored the possibility of distant metastasis in T1 patients, especially in recent years when endoscopic treatment has gradually replaced traditional radical surgery9,17. A recent study showed that the probability of distant metastasis in patients with stage T1 is 8.17%18. Therefore, it is necessary to explore the risk factors and prognosis of distant metastasis of T1 gastric cancer. Meaningfully, this is the first study to construct a model for predicting distant metastasis of stage T1 gastric cancer through machine learning and analyse its survival and prognosis.
Previous studies have demonstrated that distant metastasis rarely occurs in T1 GC, which indicates a good prognosis for most patients with early gastric cancer9. Amazingly, in the present study, we found that the risk of distant metastasis in patients with T1 GC was as high as 11.64%. Thus, there is an urgent need to determine whether T1 patients have distant metastasis at the same time as the initial diagnosis. Conventional imaging tests (e.g., magnetic resonance imaging and computed tomography) can detect significant diffuse lesions, while positron emission tomography is a more reliable method of examining distant metastasis in GC especially in detecting micrometastases. However, it is limited by its effectiveness and practical costs19. Therefore, establishing a simple and effective prediction model can help clinicians identify high-risk patients for further examination and diagnosis.
Machine learning algorithms are a class of emerging methods that can accurately process raw data, analyse the relationships between important data, and make accurate decisions. One of the best features of machine learning algorithms is their excellent performance in predicting results in large databases, which is better than that of traditional regression methods20. In this study, we analysed and compared the prediction models established by seven ML algorithms, including logistic regression (LR), random forest (RF), LASSO, support vector machine (SVM), k-nearest neighbor (KNN), naive bayesian model (NBC), and artificial neural network (ANN). First, we used the training set to construct the prediction models and evaluated the efficacy values of the seven prediction models using AUC, sensitivity, specificity, F1-score and accuracy, and finally found that the RF model had the best prediction efficacy (AUC: 0.941, accuracy: 0.917, recall: 0.841, specificity: 0.927, F1-score: 0.877). The test set was used to further validate the results, which showed that the RF model was the optimal prediction model for predicting DM in stage T1 GC, with the best predictive efficacy (AUC = 0.825). The ability of the RF model to accurately predict DM in stage T1 gastric cancer was also confirmed by an external validation set (AUC = 0.750).The RF seems to be one of the most widely used and accurate machine learning models in clinical application research. Increasing evidence has reported that the random forest model is superior to other algorithms in dealing with data having a large number of features and highly nonlinear data, probably because the RF model uses more advanced classification decisions and different weight ratios compared to other models21,22. This study confirmed that the random forest prediction model can accurately predict the high-risk group with distant metastasis in T1 patients, which is conducive to further clinical examination for this population to develop better diagnosis and treatment strategies.
In this study,the 6 most important characteristics were included in the final RF prediction model: age, T-stage, N-stage, tumor size, grade and tumor site. The results suggested that the rate of DM in young patients (< 60 years old) is significantly higher than that in elderly patients (> 60 years old). Previous studies have reported that the rate of lymph node metastasis is higher in young GC patients13,23,24. More lymph node metastases in younger patients may be one of the reasons for distant metastases. Recently, accumulating studies have found that tumor biology plays a crucial role in the development of disease, which may be closely related to the occurrence and development of distant metastasis25. An additional study has shown that tumor size, depth of invasion and lymph node metastasis are significantly related to advanced gastric cancer26. Nevertheless, in our study we found that N stage and T stage were closely associated with distant metastasis. Interestingly, the rate of distant metastasis in patients with stage T1a was significantly higher than that in patients with stage T1b. This may result from that the lymph node metastasis occuring in submucosal patients first, while hematogenous metastasis occurs later in mucosal patients during infiltration into deeper layers. According to Japanese guidelines for the treatment of GC, patients with a tumor size > 2 cm have a significantly increased risk of metastasis and should receive radical resection for clean removal12. In addition, we found that the risk of distant metastasis increased significantly with tumor expansion, while this risk in patients with a tumor size > 5 cm was 8–9 times higher than that in patients with a tumor size < 2 cm. In our study, tumor site was one of the independent risk factors affecting distant metastasis in patients with T1 GC. Fundus tumors are prone to distant metastasis, which might be attributed to the wealth of blood vessels. Wealthy blood vessels are closely related to hematogenous metastasis. Moreover, our results showed that moderately and poorly differentiated GC patients are more likely to develop distant metastasis than undifferentiated differentiated and highly differentiated patients, which may be because cancer cells have invaded surrounding tissues, capillaries and lymphatic vessels, and these moderately and poorly differentiated tissues have a faster capacity of growth. This appears to be a departure from our previous understanding and requires further verification.
Subsequently, we also performed a prognostic survival analysis of patients with distant metastases. The results revealed that surgery (HR = 3.620, 95% CI 2.164–6.065) and adjuvant chemotherapy (HR = 2.637, 95% CI 2.067–3.365) were independent risk factors for survival and prognosis in patients with T1 distant metastasis. This is consistent with previous research27. Surgery for primary tumors may reduce the potential burden of immunosuppressive tumors and eliminate the source of further metastasis28. Hence, for patients with T1 distant metastases, aggressive surgery combined with adjuvant chemotherapy can greatly improve the prognosis of patients and improve the survival rate.
This study is the first to use an ML algorithm to predict DM in stage T1 GC, and it establishes an accurate predictive model to help identify people at high risk of DM at an early stage in the clinic. However, there are still some limitations in this study. First, as a retrospective study, the sample size of 2698 patients from 2010 to 2017 was relatively small. Next, the variables included in our study are finite, and other similar potential risk factors such as tumor markers, nutrition index and inflammation index are lacking, so a further model with more variables could improve the prediction accuracy.
In conclusion, we constructed and verified a prediction model of DM in patients with T1 GC through ML algorithm. The RF model has the best prediction efficiency and can accurately screen high-risk groups, providing help for further clinical metastasis screening. Meanwhile, our study also found that aggressive surgery and adjuvant chemotherapy can improve the survival rate of patients with DM.
Data availability
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
References
Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021).
Joshi, S. S. & Badgwell, B. D. Current treatment and recent progress in gastric cancer. CA Cancer J. Clin. 71(3), 264–279 (2021).
Ohta, H. et al. Early gastric carcinoma with special reference to macroscopic classification. Cancer 60(5), 1099–1106 (1987).
Edge, S. B. & Compton, C. C. The American Joint Committee on Cancer: The 7th edition of the AJCC cancer staging manual and the future of TNM. Ann. Surg. Oncol. 17(6), 1471–1474 (2010).
Carcas, L. P. Gastric cancer review. J Carcinog. 13, 14 (2014).
Suhail, Y. et al. Systems biology of cancer metastasis. Cell Syst. 9(2), 109–127 (2019).
Zhang, Y. et al. A population-based analysis of distant metastasis in stage IV gastric cancer. Med. Sci. Monit. 26, e923867 (2020).
Ludwig, K., Möller, D. & Bernhardt, J. Surgical management for early stage gastric cancer. Chirurg 89(5), 347–357 (2018).
Smyth, E. C., Nilsson, M., Grabsch, H. I., van Grieken, N. C. & Lordick, F. Gastric cancer. Lancet 396(10251), 635–648 (2020).
Kwee, R. M. & Kwee, T. C. Modern imaging techniques for preoperative detection of distant metastases in gastric cancer. World J. Gastroenterol. 21(37), 10502–10509 (2015).
Smith, J. K. et al. Potential benefit of resection for stage IV gastric cancer: A national survey. J. Gastrointest. Surg. 14(11), 1660–1668 (2010).
Japanese gastric cancer treatment guidelines 2018 (5th edition). Gastric Cancer. 24(1), 1–21 (2021).
Zheng, X., Guo, K., Wasan, H. S. & Ruan, S. A population-based study: How to identify high-risk T1 gastric cancer patients?. Am J Cancer Res. 11(4), 1463–1479 (2021).
Hanada, Y. et al. Low frequency of lymph node metastases in patients in the United States with early-stage gastric cancers that fulfill Japanese endoscopic resection criteria. Clin. Gastroenterol. Hepatol. 17(9), 1763–1769 (2019).
Riihimäki, M., Hemminki, A., Sundquist, K., Sundquist, J. & Hemminki, K. Metastatic spread in patients with gastric cancer. Oncotarget 7(32), 52307–52316 (2016).
Ebinger, S. M. et al. Modest overall survival improvements from 1998 to 2009 in metastatic gastric cancer patients: A population-based SEER analysis. Gastric Cancer 19(3), 723–734 (2016).
Ono, H. et al. Guidelines for endoscopic submucosal dissection and endoscopic mucosal resection for early gastric cancer. Dig. Endosc. 28(1), 3–15 (2016).
Chen, J. et al. A clinical model to predict distant metastasis in patients with superficial gastric cancer with negative lymph node metastasis and a survival analysis for patients with metastasis. Cancer Med. 10(3), 944–955 (2021).
Kawanaka, Y. et al. Added value of pretreatment (18)F-FDG PET/CT for staging of advanced gastric cancer: Comparison with contrast-enhanced MDCT. Eur. J. Radiol. 85(5), 989–995 (2016).
Handelman, G. S. et al. eDoctor: Machine learning and the future of medicine. J Intern Med. 284(6), 603–619 (2018).
Pellegrino, E. et al. Machine learning random forest for predicting oncosomatic variant NGS analysis. Sci. Rep. 11(1), 21820 (2021).
Maniruzzaman, M. et al. Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Comput. Methods Programs Biomed. 176, 173–193 (2019).
Ji, T., Zhou, F., Wang, J. & Zi, L. Risk factors for lymph node metastasis of early gastric cancers in patients younger than 40. Medicine (Baltimore) 96(37), e7874 (2017).
Takatsu, Y. et al. Clinicopathological features of gastric cancer in young patients. Gastric Cancer 19(2), 472–478 (2016).
Li, H. et al. MTHFD1L-mediated redox homeostasis promotes tumor progression in tongue squamous cell carcinoma. Front. Oncol. 9, 1278 (2019).
Li, F. et al. Influential factors and prognostic analysis of blood vessel invasion in advanced gastric cancer. Pathol. Res. Pract. 216(3), 152727 (2020).
Song, Z., Wu, Y., Yang, J., Yang, D. & Fang, X. Progress in the treatment of advanced gastric cancer. Tumour Biol. 39(7), 1010428317714626 (2017).
Danna, E. A. et al. Surgical removal of primary tumor reverses tumor-induced immunosuppression despite the presence of metastatic disease. Cancer Res. 64(6), 2205–2211 (2004).
Acknowledgements
The authors appreciate the efforts of the staff of the Surveillance, Epidemiology, and End Results (SEER) program and thank them for the availability of public access to the SEER database.
Funding
This report was supported by the National Natural Science Foundation of China (Grant Number: 81860433 and 82103645), the Natural Science Youth Foundation of Jiangxi Province (Grant Number: 20192BAB215036), Jiangxi Province Natural Science Key R&D Project-General Project (Grant Number: 20202BBG73024) and Training Plan for Academic and Technical Young Leaders of Major Disciplines in Jiangxi Province (Grant Number: 20204BCJ23021). Jiangxi Provincial Department of Education Youth Program Grant Number:GJJ210252).
Author information
Authors and Affiliations
Contributions
H.K.T. and Z.Zhen contributed to the conception and design of the paper. Z.T.L. H.L. and H.K.T prepared the figures and tables. J.L., Y.M.C. and Z.Zhang analysed and interpreted the data. H.K. T, Z.T.L. performed development of writing and revision of the paper. All authors have read and approved the final paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tian, H., Liu, Z., Liu, J. et al. Application of machine learning algorithm in predicting distant metastasis of T1 gastric cancer. Sci Rep 13, 5741 (2023). https://doi.org/10.1038/s41598-023-31880-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-31880-6
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.