Machine learning predicts lymph node metastasis of poorly differentiated-type intramucosal gastric cancer

To construct a machine learning algorithm model of lymph node metastasis (LNM) in patients with poorly differentiated-type intramucosal gastric cancer. 1169 patients with postoperative gastric cancer were divided into a training group and a test group at a ratio of 7:3. The model for lymph node metastasis was established with python machine learning. The Gbdt algorithm in the machine learning results finds that number of resected nodes, lymphovascular invasion and tumor size are the primary 3 factors that account for the weight of LNM. Effect of the LNM model of PDC gastric cancer patients in the training group: Among the 7 algorithm models, the highest accuracy rate was that of GBDT (0.955); The AUC values for the 7 algorithms were, from high to low, XGB (0.881), RF (0.802), GBDT (0.798), LR (0.778), XGB + LR (0.739), RF + LR (0.691) and GBDT + LR (0.626). Results of the LNM model of PDC gastric cancer patients in test group : Among the 7 algorithmic models, XGB had the highest accuracy rate (0.952); Among the 7 algorithms, the AUC values, from high to low, were GBDT (0.788), RF (0.765), XGB (0.762), LR (0.750), RF + LR (0.678), GBDT + LR (0.650) and XGB + LR (0.619). Single machine learning algorithm can predict LNM in poorly differentiated-type intramucosal gastric cancer, but fusion algorithm can not improve the effect of machine learning in predicting LNM.

Gastric cancer is the world's fourth most common neoplastic disease, and the second most fatal tumor-related disease 1 . With the development of endoscopic techniques, improved diagnostics and the global popularization of gastric cancer screening, the early gastric cancer (EGC) detection rate increases every year, especially in Japan and Korea 2,3 . EGC can be treated with endoscopic resection, D1 or D2 radical surgical resection, as well as other medical auxiliary treatments according to tumor stage 4 . The indications and effects of the various treatments vary. EGC only considers the depth of focal infiltration; it does not consider lymph node metastasis, an important factor in choosing an EGC treatment regimen. Therefore, it is necessary to accurately stage EGC patients prior to surgery to select a reasonable treatment option. Studies have shown that EGC with lymph node metastasis (LNM), the number of lymph node metastases, and lymph node metastasis in different regions, have important effects on EGC treatment and prognosis 5 . Therefore, for over 80% of patients with EGC, radical surgery on D1 or D2 increases unnecessary lymph node dissection. It also increases the trauma caused by surgery, and affects patient recovery. In recent years, the development of endoscopic mucosal dissection and endoscopic mucosal resection has brought new developments to EGC treatment. There is now less trauma and quick postoperative recovery. Thus, patients can avoid the heavy trauma and long recovery time caused by laparotomy or endoscopic surgery. However, it is important to accurately judge lymph node metastasis before surgery 6 .
In recent years, many studies have reported on machine learning in medicine. For example, using large preoperative data to develop and validate machine learning algorithms can predict hospital stay and patient-specific hospital costs after primary total hip arthroplasty 7 ; Additionally, machine learning can predict hospital acquired pneumonia in patients with schizophrenia 8 ; Machine learning techniques can also predict 5-year survival in patients with chondrosarcoma 9 .
However, few studies have investigated the prediction of LNM in early poorly differentiated early gastric cancer [10][11][12] . This study assesses clinicopathological factors for predicting LNM in intramucosal PDC. It also

Methods
Study population. There were no human involved in this study. And this is only a secondary data analysis study using public databases. Data are available from the BioStudies (public) database (https ://www.ebi.ac.uk/ biost udies /studi es?query =S-EPMC4 88197 9), accession numbers: S-EPMC4881979. We prospectively analyzed data from patients diagnosed with PDC who had undergone radical gastrorectal resection and lymph node dissection. Patients included in the study were confirmed as having pure poor differentiated-type T1 (tumor invasion confined to mucosa or submucosa) gastric cancers. Machine learning. Logistic regression (LR) is a broad classification machine algorithm that can predict the probability of future results, whereas "regression" is actually a classification. Accurate, logistic regression is a dichotomous classification algorithm. Random forest (RF) is a supervised learning algorithm. It is trained with the "bagging" method. The bagging method combines multiple models, and can be more effective than a single model. Thus, it can increase the overall effect.
XGB generates multiple regression trees based on features, and each regression tree learns the corresponding residuals, and the sum of the residuals is the predicted value of the sample.
GBDT is an integrated learning method that uses gradients as input to later trees to learn multiple trees. The combination of multiple trees can then generate a comprehensive learner with strong generalizability. Statistical analysis. Statistical analysis was conducted in R, version 3.4.3(https ://cran.r-proje ct.org/bin/ windo ws/base/old/3.4.3/), and machine learning modeling was performed with python, version 3.6.5 (https Ethics approval and consent to participate. This was a secondary data analysis study using data from the BioStudies public database.

Results
A total of 1169 patients were enrolled, with lymph node metastases occurring in 61 (5.2%) of them. The age of the lymph node metastasis and non-metastasis groups did not statistically vary between the training and test groups (P = 0.281 and P = 0.115, respectively) (see Table 1). Correlation analysis showed that lymph node invasion, tumor invasion depth, and tumor size were positively correlated with LNM (Fig. 1). In addition, the Gbdt algorithm in the machine learning results finds that number of resected nodes, lymphovascular invasion and tumor size are the primary 3 factors that account for the weight of LNM (see Fig. 2).

Discussion
At present, research has focused on minimally invasive surgery that can maintain postoperative patient survival rates. The goal is to minimize surgical injury with safe and effective operating procedures, so that patients can enjoy higher quality of life 15,16 . The incidence of lymph node metastasis has been reported to be between 2.2 and 4.2% for intramucosal (T1a) primary gastric adenocarcinoma, and between 9.4 and 16.1% for early (T1) primary gastric adenocarcinoma 10,12 . Our findings suggest that 5.2% of patients with poorly differentiated-type intramucosal gastric cancer develop lymph node metastases. This is consistent with previous findings. Furthermore, the results of this study indicate that the Gbdt machine learning algorithm yields the first 3 factors that account for the weight of lymph node metastasis: number of resected nodes, lymphovascular invasion and tumor size. At the same time, single machine learning algorithm can predict LNM in poorly differentiated-type intramucosal gastric cancer, but fusion algorithm can not improve the effect of machine learning in predicting LNM.
Many clinical pathological factors related to LNM in early gastric cancer have been studied 17,18 . A large sample study in the United States showed that tumor stage, pathological type, and tumor size are independent predictors of LNMin early gastric cancer 19 . Chen et al. have concluded that tumor diameter ≥ 3 cm, whether it is pathological or low-differentiation type, whether it is mixed adenocarcinoma or signet ring cell carcinoma, tumor infiltration into the submucosa, and vascular invasion are independent risk factors for LNM 20 . Our results corroborate this view.  www.nature.com/scientificreports/ The Japanese gastric cancer assistance group noted that the LNM rate was low for tumors > 2 cm in diameter, patients with no ulcers, tumors ≤ 3 cm in diameter, and differentiated intramucosal cancers with ulcers. This could serve as an absolute indication for ESD 21 . Pokala et al. concluded that early intramucosal gastric cancer with tumor diameter < 4 cm has a low risk of LNM, and can be locally resected 22 . This is consistent with the results of our study. Our results corroborate this view.
Submucosal cancers have a higher rate of LNM than intramucosal cancers. Furthermore, they may be rich in capillaries in the submucosa of the gastric wall, which are usucaptible to cancer cell invasion 23,24 . Studies have shown a high rate of LNM in undifferentiated early gastric cancer 25 . As the tumor grows, the invasion deepens and the LNM rate increases. The LNM rate has been shown to be associated with lymphangitic tumor thrombus 26 . Female patients with early gastric cancer are more likely to develop lymph node metastases than males. This is presumably related to endogenous estrogen levels 27 . Another study has shown that low differentiation, infiltration into the submucosa, large tumors, and venous or lymphatic invasion are independent risk factors for LNM 28 . These findings are also corroborated by our findings.
At present, the main problem of machine learning method in medical practice is the lack of application scenarios and related clinical data. At present, a large number of published machine learning articles only use simple machine learning algorithms. In this study, we also use the machine learning fusion algorithm. However, the results of the test set fusion machine learning algorithm are not ideal. This also proves that when the machine learning algorithm is applied in medical clinic, it should pay attention to the application scenarios and the collection of relevant data.
This study has several limitations. Firstly, it only used routine hematoxylin and eosin staining. Therefore, accurate diagnosis of lymph node micrometastases was difficult. For example, lymph node micrometastasis may be a key causative factor in recurrent gastric cancer treatment. Furthermore, this study included only data on tumor characteristics; no data on patient-related tumor genes were collected. This may have contributed to the lack of optimal predictive results. Because different regions, different races and different treatment schemes may cause different incidence of lymphatic metastasis, and the rate of lymph node metastasis in intramucosal gastric adenocarcinoma is low in this study and previous studies.However, these will not affect the prediction results of machine learning in this study. However, more multi-center and forward-looking research is needed in the future.

Conclusion
Single machine learning algorithm can predict LNM in poorly differentiated-type intramucosal gastric cancer, but fusion algorithm can not improve the effect of machine learning in predicting LNM. This may provide guidance for personalized treatment of such patients.