DeepBTS: Prediction of Recurrence-free Survival of Non-small Cell Lung Cancer Using a Time-binned Deep Neural Network

Accurate prediction of non-small cell lung cancer (NSCLC) prognosis after surgery remains challenging. The Cox proportional hazard (PH) model is widely used, however, there are some limitations associated with it. In this study, we developed novel neural network models called binned time survival analysis (DeepBTS) models using 30 clinico-pathological features of surgically resected NSCLC patients (training cohort, n = 1,022; external validation cohort, n = 298). We employed the root-mean-square error (in the supervised learning model, s- DeepBTS) or negative log-likelihood (in the semi-unsupervised learning model, su-DeepBTS) as the loss function. The su-DeepBTS algorithm achieved better performance (C-index = 0.7306; AUC = 0.7677) than the other models (Cox PH: C-index = 0.7048 and AUC = 0.7390; s-DeepBTS: C-index = 0.7126 and AUC = 0.7420). The top 14 features were selected using su-DeepBTS model as a selector and could distinguish the low- and high-risk groups in the training cohort (p = 1.86 × 10−11) and validation cohort (p = 1.04 × 10−10). When trained with the optimal feature set for each model, the su-DeepBTS model could predict the prognoses of NSCLC better than the traditional model, especially in stage I patients. Follow-up studies using combined radiological, pathological imaging, and genomic data to enhance the performance of our model are ongoing.


Data processing for Cox proportional hazard (PH) model
In a deep learning model, the input dataset involves few assumptions that must be satisfied for effective model training. However, in the case of a Cox PH model, there are certain mandatory assumptions that must be met. The most important assumption is that all individuals must have the same hazard function, but a unique scaling factor.
Because a scaling factor does not vary with time, the hazard ratio, which is the ratio of a hazard function of one individual to that of another individual is constant for all t ℎ ( ) Consequently, the training cohort satisfied the PH assumption, and the final feature sets for training both deep-learning and Cox PH models contained 28 out of the initial 30 features.

Deep learning models based on multi-task learning Supervised binned-time survival analysis (s-DeepBTS)
To build the s-DeepBTS model, it was necessary to build the proper output first. For this purpose, the maximum recurrence free survival (RFS) duration (i.e., max (RFS)) in months was obtained and the time axis was divided into J time intervals. Here we set 1 month as the time interval, such that ∀ ∈ [ [1, ]], = [ −1 , ) with 0 = 0 and = (max( )) + 1.

(S4)
The response variable of each time interval , which refers to the survival probability at a specific time point, was set differently according to the follow-up status. For the patients exhibiting recurrence, the survival probability was 1 until the k-th bin satisfying ∈ and 0 after the k-th bin. For the censored patients who did not show recurrence within the observation period or who were lost to follow-up, the calculated Kaplan-Meier survival probability for each time interval using all of the samples in our dataset was applied to from the k-th bin satisfying ∈ , and was set to 1 until the k-th bin.
In summary, = 1 until the patient exhibited recurrence or follow-up loss. After that, was simply set to 0 for the relapsed patient. In the case of the censored patient, was calculated using the following equation.
where is the total number of samples alive without recurrence at the beginning of , After obtaining these outputs for every subject, we built a multi-task regression model with a single-layer perceptron. To calculate the loss in each interval, we built a custom-loss function that calculated the root-mean of the summation of the squared errors between the true and predicted in each time interval. We used RMSprop function as the optimizer and a sigmoid function as the activation function which were defined in Keras.

Semi-unsupervised binned-time survival analysis (su-DeepBTS)
The supervised model is superior to the Cox PH model in that the former does not have to To construct the loss function, the likelihood function of the Cox PH model was used, and the formulae are: Negative log likelihood function : In these formulae, is the regression coefficient, is an at-risk sample for which an event may occur at time , is the value of the explanatory variable for the individual at which the event occurred at time , and ∑ ∈ is the sum of the risks for members of the atrisk set R at time . Because each output node of our model refers to the risk probability at that time point, the predicted value at each node corresponds to the value of the above formulae. Thus, predicted output values were used as inputs of custom loss. This model was trained without an exact answer set, but it is not completely unsupervised because a binary

Traditional model based on proportional hazard assumption: Cox PH model
Two novel models were compared to a statistical model, the Cox PH model, using the Python package "lifelines." The dataset was fitted to Cox PH model using the package class "CoxPHFitter." Then, using the fitted model, the RFS duration was predicted by employing the package function "predict_expectation."

External validation with the public survival related datasets
To verify the scalability of su-DeepBTS, which demonstrated the best performance in training/external validation cohort, additional experiments were conducted with public survival related datasets. Among the three datasets, two datasets, German Breast Cancer Study Group 1 and Molecular Taxonomy of Breast Cancer International Consortium 2 were obtained from the DeepSurv github repository 3 , and NCCTG Lung Cancer Data 5 were available on the Rdatasets github repository 6 . The su-DeepBTS and Cox PH model were compared using the C-index score averaged using five iterations of five-fold cross validation. Therefore, it can be inferred that su-DeepBTS can be applied to various datasets in real-world scenarios, thereby producing valuable results.

Supplementary Tables
Supplementary Table S1. Detailed results of 10 iterations of five-fold cross-validation with the training cohort divided into five sets maintaining the proportion of event patients using all the 28 features.

Supplementary Figures
Supplementary Figure 1. The erasing feature selection method was developed to provide insights into feature importance in deep-learning models. In short, this method measures the importance of each feature based on the prediction performance of the model trained without specific features. As an example, let us assume that there are three features in the input data, and two cycles are required to complete the feature importance set. In the first cycle, three different experiments are conducted by erasing three features, one by one. In the first cycle, the feature that maximizes the test score of each experiment, in this case, F3, is selected as the insignificant feature and ranked as the least important. The second cycle is performed with the feature set excluding feature F3. In this cycle, the average score of the current and previous cycle used for criteria of feature importance to reduce the error from the difference of entire feature set at each cycle. As shown, F1 was ranked the most important feature because model performance was worse when F1 was excluded. By repeating this process for all the features, the unimportant features are eliminated one by one until, causing only the most important feature to remain, and the feature important ranks were determined in the reverse order of deletion.