Prediction of lung papillary adenocarcinoma-specific survival using ensemble machine learning models

Accurate prognostic prediction is crucial for treatment decision-making in lung papillary adenocarcinoma (LPADC). The aim of this study was to predict cancer-specific survival in LPADC using ensemble machine learning and classical Cox regression models. Moreover, models were evaluated to provide recommendations based on quantitative data for personalized treatment of LPADC. Data of patients diagnosed with LPADC (2004–2018) were extracted from the Surveillance, Epidemiology, and End Results database. The set of samples was randomly divided into the training and validation sets at a ratio of 7:3. Three ensemble models were selected, namely gradient boosting survival (GBS), random survival forest (RSF), and extra survival trees (EST). In addition, Cox proportional hazards (CoxPH) regression was used to construct the prognostic models. The Harrell’s concordance index (C-index), integrated Brier score (IBS), and area under the time-dependent receiver operating characteristic curve (time-dependent AUC) were used to evaluate the performance of the predictive models. A user-friendly web access panel was provided to easily evaluate the model for the prediction of survival and treatment recommendations. A total of 3615 patients were randomly divided into the training and validation cohorts (n = 2530 and 1085, respectively). The extra survival trees, RSF, GBS, and CoxPH models showed good discriminative ability and calibration in both the training and validation cohorts (mean of time-dependent AUC: > 0.84 and > 0.82; C-index: > 0.79 and > 0.77; IBS: < 0.16 and < 0.17, respectively). The RSF and GBS models were more consistent than the CoxPH model in predicting long-term survival. We implemented the developed models as web applications for deployment into clinical practice (accessible through https://shinyshine-820-lpaprediction-model-z3ubbu.streamlit.app/). All four prognostic models showed good discriminative ability and calibration. The RSF and GBS models exhibited the highest effectiveness among all models in predicting the long-term cancer-specific survival of patients with LPADC. This approach may facilitate the development of personalized treatment plans and prediction of prognosis for LPADC.

Lung cancer remains the leading cause of cancer-related death worldwide, accounting for approximately 1.8 million deaths 1 .In the United States of America, the 5-year survival rate of patients with lung cancer is approximately 20% 2 .Adenocarcinoma is the major histological subtype of non-small cell lung cancer 3,4 .Recent advances in research have facilitated the classification of primary lung cancer 5 .Based on semi-quantitative assessment, the World Health Organization classified the histomorphologic growth pattern of invasive non-mucinous adenocarcinoma into five subtypes (i.e., lepidic, acinar, papillary, micropapillary, and solid) 6 .In particular, primary lung papillary adenocarcinoma (LPADC) is a rare subtype, accounting for approximately 0.84% of all lung cancer cases 7 .This subtype may originate from glandular follicular cells and often exhibits a prominent inflammatory stromal response 8 .In the early stages of LPADC, patients do not develop clinical symptoms (e.g., cough, phlegm, and fever), and are not effective in antibiotic treatment for pneumonia.Studies have investigated differences in the prognosis of different subtypes of LPADC, the evidence highlighted the importance of prognostic prediction in lung adenocarcinoma (a subtype of lung cancer with independent presentation) 9,10 .
Due to the rarity of LPADC, most currently available studies are case reports or single-center small-sample investigations.The 5-year overall survival rate of LPADC patients is less than 35%, and Cox proportional hazards regression models constructing nomograms based on tumor characteristics, demographic characteristics, and treatment modalities are the traditional methods used to predict survival in LPADC 11 .Previous studies have also explored the use of machine learning algorithms in the diagnosis and prognosis of small cell lung cancer in the lung [12][13][14] .Of note, Cox models often rely on the restrictive assumption of proportional risk.In addition, when using this approach, it is important to consider whether the association between predictors and hazards is suitable for modeling, and whether nonlinear effects or higher-order interactions of predictors should be included 15,16 .To overcome this limitation, the evolution of machine learning provides an alternative to semi-parametric modeling by relaxing the assumptions of the data generation mechanism and taking into account all possible interactions between variables and influence correction 17 .
Few studies have used integrated machine learning algorithms to assess the prognosis of patients with lung adenocarcinoma, even fewer studies have used the output of predictive models to aid clinical practice 18 .Therefore, this study used a sample of patients with LPADC from the Surveillance, Epidemiology and End Results (SEER) database to develop and validate an integrated machine learning model for the prediction of LPADC cancer-specific survival (CSS).The objectives were to support clinical decision-making in LPADC, and develop a web-based calculator for estimating the individual probability of CSS for patients with lung adenocarcinoma.The selection of studies was based on the TRIPOD report checklist 19 .1).The SEER database is publicly accessible; hence, there was no requirement for additional ethical approval.

Patient selection.
Cohort definition and variables.We randomly classified the study sample into the training and validation cohorts using a 7:3 ratio.The training and validation cohorts were used to construct and verify the model, respectively.Fourteen variables from the SEER database were included in the study model, including demographic variables (age at diagnosis, sex, race, and marital status), tumor characteristics (laterality, TNM stage, grade, tumor size, and primary site), and treatment status (chemotherapy, surgery, and radiotherapy).Based on the age at diagnosis and tumor size, X-tile software (https:// medic ine.yale.edu/ lab/ rimm/ resea rch/ softw are/) was used to determine the optimal cut-off values for category-based conversion of the measures and also to maximize the difference between categories after conversion 20,21 .The marital status was either married or other, while the cancer grade was I-II, III-IV, or unknown.Primary sites in the lung were classified as lower, middle, upper, other, and not otherwise specified.The three surgical approaches to the primary site were no surgery, lobectomy, and other surgery.The dummy variable design for disordered multicategorical variables was performed using the 'get_dummies' function in the pandas package.In the present study, the eighth edition of TNM staging was used after manual conversion coding.CSS was defined as death specifically due to LPADC and used as the outcome variable of interest in this study.

Model development.
Categorical variables were collated in frequency and percentage format, and differences between groups were compared using the χ 2 test.Four prognostic models, including three ensemble learning models (i.e., gradient boosting survival [GBS] analysis, random survival forest [RSF], and extra survival trees [EST]) and a Cox proportional hazards regression (CoxPH) model, were used to analyze the CSS rates of patients with LPADC.The area under the time-dependent receiver operating characteristic curve (time-dependent AUC) and Harrell's concordance index (C-index) were used to evaluate the discriminative ability of these models 22 .Evaluation of the calibration capability of the prediction model was performed using the integrated Brier score (IBS).Furthermore, we visualized feature importance ('PermutationImportance' function) in the models using the training dataset.A web-based calculator for the probability of CSS in patients with LPADC was deployed, presenting the estimated prognostic survival curves and 3-, 5-, and 10-year survival rates.All machine learning models, statistical analysis, and visualization were implemented in Python version 3.9 (Python Software Foundation for Statistical Computing, Wilmington, DE, USA) using the scikit-survival 23 , tableone 24 , and eli5 packages.
Ethics statement.The SEER database is free for researchers to download and therefore does not require ethical review by the authors' institution.

Patient characteristics.
The best cutoff values for age and tumor size were 79 years and 28 and 52 mm, respectively.Age was divided into two age groups (i.e., < 79 and ≥ 79 years), while tumor size was divided into four groups (i.e., < 28, 28-52, > 52 mm, and unknown).A total of 3,615 patients diagnosed with LPADC (2004-2018) were included in this analysis.After randomization, there were 2,530 and 1,085 patients in the training and validation cohorts, respectively.Overall, 86% of the patients were younger than 80 years; the sample included a slightly higher number of females (51.6%) than males (48.4%).LPADC was more likely to occur on the right side (58.6%) of the lung; 67% of patients had pre-T3 stage disease without regional lymphatic metastases.23% of patients had distant metastases, while 60% had low-grade disease and tumor size < 28 mm, mostly in the lower and upper parts of the lung (86%).Moreover, 80% and 65% of the patients did not receive radiotherapy and chemotherapy, respectively.Lobectomy was performed in more than half of the patients.Other surgical procedures were performed in 18% of the patients, while nearly 30% of the patients did not undergo surgery.Based on the χ 2 test, there was no difference in the correlation index between the two cohorts generated by the random split, indicating that these groups were comparable (Table 1).

Model application and performance.
To ensure comparability, we used all the features for the construction and validation of the models.In the training cohort, the EST model had the largest time-dependent AUC, followed by the RSF, CoxPH, and GBS models.The mean time-dependent AUC for the EST, RSF, CoxPH, and GBS models were 0.935, 0.886, 0.843, and 0.849, respectively.In the training cohort, the time-dependent AUC showed that the GBS and CoxPH models progressively abolished their discriminative ability for the prediction of long-term survival (Fig. 2A).In the validation cohort, the discriminative ability of the four prediction models tended to be similar.According to the time-dependent AUC, the EST and RSF models did not exhibit a similar performance to that observed in the training cohort.The highest mean value of the time-dependent AUC was 0.821, 0.825, 0.830, and 0.827 for the EST, RSF, CoxPH, and GBS models, respectively; according to these findings, the EST model exhibited the worst performance.In terms of time trends, the RSF model and GBS performed more consistently across time than the other models, while the CoxPH model performed less well for long-term forecasts after 10 years (Fig. 2B).
The C-index analysis yielded similar findings to those noted with the time-dependent AUC.In the training cohort, the EST model exhibited the best performance (C-index: 0.850), followed by the RSF, GBS, and CoxPH models; the IBS also showed similar results.In the validation cohort, the CoxPH model had the largest C-index  value (0.783), followed by the GBS, RSF, and EST models.In the validation cohort, the RSF and GBS models had the lowest IBS (0.16), whereas the EST model had the highest IBS (0.166) (Table 2).
Feature importance.The feature importance plot shows the contribution of each feature in the prognostic model.N2 stage, M1 stage, and no surgery occupied the top three positions in the feature importance ranking; this ranking was consistently observed across the four models.In the CoxPH model, T4 stage, and tumor primary location (lower and upper) were more important than other features.In the machine learning survival model, the most important features were chemotherapy, tumor size, grade unknown, and sex (Fig. 3).

Algorithm deployment.
The constructed models for determining the CSS rate of patients with LPADC were deployed on a web page.The functionality of the application and the visualization of the output are shown in the following Fig. 4. The web application, primarily used for research or informational purposes, can be publicly accessed at https:// shiny shine-820-lpapr edict ion-model-z3ubbu.strea mlit.app/.

Discussion
The accurate prediction of survival in patients with LPADC is essential for patient counseling, follow-up, and treatment planning.Previous studies have revealed multiple prognostic factors that affect the survival time of patients with pulmonary papillary carcinoma, including patient age, grade classification, lymph node status, tumor size, distant metastases, and surgical treatment 9,11 .Machine learning is increasingly utilized in research for the prediction of survival of patients with cancer [25][26][27] , with relatively favorable results.Although CoxPH is the classical method utilized for the analysis of survival data, the use of this method requires linear relationships between variables.As a result of the continuous advances achieved in recent years, machine learning is widely applied to the medical field [28][29][30] .In this study, we used ensemble machine learning models to accurately predict CSS in patients with LPADC, and obtained satisfactory results.Consistent with the findings reported by You et al., the four models developed in this study confirmed that surgery is an important prognostic factor for patients with lung adenocarcinoma 3 .Similarly, distant metastases have an important impact on the prognosis of patients with LPADC.In conjunction with previous analyses, the findings demonstrate that patients who developed distant metastases had poorer survival rates than other patients 26,27 .A higher N-stage also plays a crucial role in the model, indicating poor prognosis 28 .Other  characteristics (e.g., tumor size, grade, sex, chemotherapy, primary site, etc.) have different degrees of importance in various models 11,23,27 .These results suggest that the selection of appropriate treatment modalities (e.g., surgery, radiotherapy, and chemotherapy) may be more important for predicting CSS in patients with LPADC than TNM staging alone.Interestingly, the ensemble models (i.e., GBS, EST, and RSF) did not demonstrate a markedly better ability for predicting CSS in LPADC in the validation cohort compared with the CoxPH model.This indicates that the machine learning approach may only offer advantages when traditional models are limited.Therefore, there are several possible explanations for the comparable predictive performance observed between the ensemble and CoxPH models in this study.Firstly, the number of predictors used to construct the model was not sufficiently large, and the advantages of machine learning in analyzing large samples and multivariate data are not fully realized.Secondly, the SEER database collects variables derived from clinical experience; many of these variables are linearly correlated with outcomes.Therefore, the data may be better qualified for the application of parametric (CoxPH) models.The GBS, EST, and RSF models developed in this study achieved the predictive efficacy of the CoxPH model under a broader condition.The web calculator constructed for the study is based on the training dataset, and care should be taken when applying the EST model that may be overconfident.Hence, it is not recommended to use this algorithm for the prediction of survival.In this study, the CoxPH model had poorer long-term predictive power than the ensemble models.Therefore, use of the RSF model is recommended for the prediction of LPADC CSS beyond 10 years.
This study had several limitations.Firstly, in the SEER database, there was a lack of data regarding established predictors of survival in patients with LPADC (e.g., chemotherapy regimens and biological markers).Secondly, due to the retrospective nature of this study and data processing, samples with missing information were excluded; this may have led to considerable bias.Thirdly, the work related to the measurement of prediction model errors in the study is not yet complete.Finally, the results of this study were not externally validated; although we randomly split the study sample during the development of the models, the generalizability and reliability of this approach should be further validated with external datasets.The prognostic value of this approach should be improved in the future by adding more predictors, increasing external validation, and conducting prospective studies.
In conclusion, a geometric model and a CoxPH model were developed and evaluated for the prediction of CSS in patients with LPADC.Overall, all four models showed excellent discriminative and calibration capabilities; in particular, the RSF model and GBS model showed excellent consistency for long-term forecasting.The

Figure 1 .
Figure 1.Screening process for the selection of patients.ICD-O-3, International Classification of Diseases for Oncology (Third Edition).

Figure 2 .
Figure 2. Time-dependent receiver operating characteristic curve for the training (A) and validation (B) cohorts.

Table 1 .
Clinical, pathological, and treatment characteristics of patients with lung papillary adenocarcinoma (LPADC).NOS not otherwise specified.

Table 2 .
Performance of the models.CoxPH Cox proportional hazards, EST extra survival trees, GBS gradient boosting survival, IBS integrated Brier score, RSF random survival forest.