Iterated cross validation method for prediction of survival in diffuse large B-cell lymphoma for small size dataset

Efforts have been made to improve the risk stratification model for patients with diffuse large B-cell lymphoma (DLBCL). This study aimed to evaluate the disease prognosis using machine learning models with iterated cross validation (CV) method. A total of 122 patients with pathologically confirmed DLBCL and receiving rituximab-containing chemotherapy were enrolled. Contributions of clinical, laboratory, and metabolic imaging parameters from fluorine-18 fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) scans to the prognosis were evaluated using five regression models, namely logistic regression, random forest, support vector classifier (SVC), deep neural network (DNN), and fuzzy neural network models. Binary classification predictions for 3-year progression free survival (PFS) and 3-year overall survival (OS) were conducted. The 10-iterated fivefold CV with shuffling process was conducted to predict the capability of learning machines. The median PFS and OS were 41.0 and 43.6 months, respectively. Two indicators were found to be independent predictors for prognosis: international prognostic index and total metabolic tumor volume (MTVsum) from FDG PET/CT. For PFS, SVC and DNN (both with accuracy 71%) have the best predictive results, of which outperformed other algorithms. For OS, the DNN has the best predictive result (accuracy 76%). Using clinical and metabolic parameters as input variables, the machine learning methods with iterated CV method add the predictive values for PFS and OS evaluation in DLBCL patients.

www.nature.com/scientificreports/ Since the addition of rituximab (an immunoglobulin G1 monoclonal antibody against B-lymphocyte antigen CD20) into the first-line chemotherapy (i.e., cyclophosphamide, doxorubicine, vincristine and prednisone) (R-CHOP), the 5-year survival rate of DLBCL has been improved 4 . However, patients with the same IPI score may still suffer from different outcomes either due to early relapse or refractory disease. Great efforts have been made to improve the evaluation models for risk stratification [5][6][7][8] , and more reliable prognostic predictors are pressingly needed to differentiate patients who are more likely to have poorer outcome 9 . Further information gained from different imaging modalities to build a more reliable prediction model for clinical practice is helpful and valuable.
As stated above, pretreatment staging is crucial. The fluorine-18 fluorodeoxyglucose (FDG), a glucose analog, can be used to measure the degree of glucose utilization. The uptake degree of FDG detected by positron emission tomography/computed tomography (PET/CT) represents the tissue metabolism on the whole-body and functional images. The FDG PET/CT has been widely used in pretreatment staging of disease and assessment of treatment response for patients diagnosed with DLBCL [10][11][12] . It has been reported in the literature that there are a variety of quantitative parameters deriving from image which have potential utility to predict prognosis or treatment outcome. The standardized uptake value (SUV) is the most commonly used parameter in FDG PET/CT and has been proved to be a significant prognostic predictor in DLBCL 13,14 . The percentage change in maximal SUV (SUVmax) between initial and delayed phase images is defined as the retention index (RI) which is found to have significant prognostic potential to predict overall survival (OS) in patient with DLBCL 15 . Beyond SUV and RI, the total metabolic tumor volume (MTVsum) has also been shown to be a predictor for survival outcome in many previous studies [16][17][18][19] . It was also reported that an elevated MTVsum, independent from IPI, is a predictor for shorter progression-free survival (PFS) and OS in patients with DLBCL 20 .
Learning is the process of behavior improvement over time via discovering new information. If the referred process is achieved by machine rather than human brain, it is called machine learning. The experience acquiring from the existing examples helps to find the optimal solution for coming problems in the machine learning process. Due to rapid accumulation of larger and larger raw data, the traditional methods cannot handle well and the big data concept emergences over time with the development of information technologies. Machine learning in which algorithms were used by the computers with a certain order when performing operations is a subset of artificial intelligence. Based on the training data, machine learning algorithms can search for optimal connection weights of a prespecified neural network model in order to make decisions and predictions. Some machine learning models, in which sophisticated mechanism such as high-order and non-linear interactions between predictors and the responses were used, have shown ability to improve overall clinical prediction in various conditions [21][22][23] .
Nonetheless, due to the heterogenous entity of DLBCL, the prognostic parameters of DLBCL, the total effects of these parameters, and individual weight of each parameter remain to be an issue deserving further research. Previous studies have reported that machine learning algorithms, using either molecular profiling data or combined clinical and genetic data, helped to do disease classification, diagnosis, and prognosis prediction [24][25][26][27][28][29] . However, there are few reports, in which using modern machine learning models and incorporating FDG PET/ CT metabolic imaging parameters, to perform the outcome prediction in DLBCL.
Therefore, this study aims to develop the logistic and neural network models to predict DLBCL clinical outcome based on PFS and OS. Both clinical and metabolic parameters from FDG PET/CT scans were used as predictors. The performance of the models is also compared.

Materials and methods
Patient population. This is a retrospective study in which medical records of malignant lymphoma patients diagnosed at the Kaohsiung Medical University Hospital were reviewed. The Institutional Review Board of Kaohsiung Medical University Hospital approved the reviewing process of the clinical data (KMUHIRB-E(I)-20180275). Patient consent was waived because all the clinical data were retrospectively collected via the medical chart reviewing, and the waiver for subject informed consent was also approved by the Institutional Review Board of Kaohsiung Medical University Hospital. The inclusion criteria were recent diagnosis and histologically proven DLBCL. The patients had to be 18 years old or older without known previous or concurrent malignant disease. All patients underwent a complete pre-treatment work-up, including whole-body FDG PET/ CT scan, bone marrow biopsy, clinical history, physical examination, and standard laboratory tests. All methods were carried out in accordance with relevant guidelines and regulations. The definite diagnosis was made by experienced pathologists based on the World Health Organization classification of lymphoid neoplasms in 2016. Only the "diffuse large B-cell lymphoma, NOS" type were enrolled. Patients' clinical stage patient was determined with the Ann Arbor staging system and was achieved by the multi-disciplinary consensus by experts in various professional fields including radiology, nuclear medicine, pathology, and hematology. Treatment and clinical course. All patients received 4-6 cycles of rituximab-containing chemotherapy, that is, rituximab combined with cyclophosphamide, doxorubicin, vincristine, and prednisolone (R-CHOP), as an initial therapy. Involved field radiation therapy was administered for patients with initial clinical bulky disease or residual tumor presentation after completion of the chemotherapy. Patients experiencing refractory or relapsed disease were treated with salvage chemotherapy or received autologous stem cell transplantation (ASCT) with high-dose chemotherapy if clinically indicated. The therapeutic regimens and plans followed the National Comprehensive Cancer Network (NCCN) guidelines for B-cell lymphoma according to the year when patient was diagnosed, and the detail was decided with consensus in the multi-disciplinarily combined conference of lymphoma in accordance with the patient's clinical condition. The PFS was defined as the time from diagnosis to disease relapse, progression or death related to lymphoma. The OS was defined as the time from diagnosis to death from any cause. eters from FDG PET/CT scans) during the pre-treatment workup as the input variables for the machine learning models. The output is a binary label regarding patient clinical outcome at the 3-year timepoint. That is, the output for PFS indicates whether the patient relapse or progresses after 3 years from the date of diagnosis; and for OS, whether the patient remained alive after 3 years from the date of diagnosis. This is a binary classification problem.
First, we fitted an LR model using scikit-learn package (version 0.23.2) for Python (3.7.9). After processing, the prediction outcomes as well as the confusion matrix were computed. Second, the RF model, which is an ensemble of decision trees with bootstrapped training samples for tree induction, was adopted. In this study, the parameter setting for number of trees was 100. Third, SVC is a part of the support vector machine (SVM), which is one of the most robust prediction models. The SVC algorithm formally creates a hyperplane in which the data can be separated into two classes with the maximal soft margin. The SVC has been used in a broad range of classification and pattern recognition problems ranging from speech, text recognition, protein function prediction, and handwriting analysis 30 . So far, however, it has been rarely used in the field of cancer prediction and prognosis. Here, scikit-learn package (0.23.2) was carried to conduct the random forest model and the SVC with regularization parameter C = 1.0 and radial basis function kernel. Fourth, DNN model is a learning machine composed of multiple processing layers including many intermediate hidden layers. Each hidden unit in the hidden layer composes of the non-linear activation functions that are transformed from linear combination of predictors 31 . In the current study, four-layer feedforward model with adaptive moment estimation optimizer, rectified linear unit (Relu) activation function for hidden layers, and sigmoid activation function for output layer conducting with Keras package (2.3.1) was used. The model of DNN with architecture of 2-6-4-1 was structured and a total of 51 parameters were developed. Lastly, an FNN is intrinsically a fuzzy system represented as a neural network. Suppose the fuzzy rules in the rule base are given in the form as shown below, where x is the input, y is the output, and A, B are fuzzy sets: If x 1 is A 1j and x 2 is A 2j and … and x n is A nj , Then y 1 is B j1 and y 2 is B j2 and … and y p is B jp . In this fuzzy system, we specifically use the singleton fuzzifier, product inference engine, center-average defuzzifier, and Gaussian membership functions for the fuzzy sets. Then the fuzzy system can be configured as a neural network, called an FNN. We implemented the FNN using Tensorflow (1.15.4) as a neural network layer with a total of 64 parameters. The source code for defining the FNN layer is provided in the supplementary file.
For better and reliable estimation of the predicting capability of various learning machines, we used 10-iterated fivefold cross validation (CV) with shuffling process. Fivefold CV was conducted and the whole process was then repeated 10 times, resulting in 50 testing results. Then the mean and standard deviation on the 50 testing errors was calculated to evaluate the predicting power of each model. According to the 50 testing results, the confusion matrix results were subsequently computed. The F1 values were also calculated to compare the prediction ability. https:// www. medca lc. org; 2021) was used to perform Kaplan-Meier survival analysis, Cox proportional hazard model, and area under the curve (AUC) index for receiver operator characteristic (ROC). A p < 0.05 was considered statistically significant.

Results
A total of 122 patients, 57 (33.3%) female and 65 (66.6%) males, were included in the study, with the mean age 61.3 ± 17.0 years (  Fig. 2A). For subsequent machine learning applications, we used IPI and MTVsum as the input predictors. The IPI is the international prognostic index clinically used for lymphoma, which is an integer from 0 to 5. The MTVsum is the total metabolic tumor volumes of whole-body lesions, which is the summation of whole-body tumor volumes when they showed abnormal FDG uptake. The MTVsum is a positive real number. The output is a binary label regarding patient clinical outcome (progress or not for PFS, alive or not for OS) at the 3-year timepoint.
The dataset is partitioned into training set and testing set. We use the training set to train the model. Then, we use the testing set to evaluate the performance of the model. The evaluation is based on commonly used statistical indicators given below:     For fivefold CV, the whole dataset is randomly divided into 5 disjoint sets. In each iteration of the fivefold CV, one set is specified as the testing data and the rest are training set. After the training process using training data, the pre-specified metric or performance index (e.g., accuracy) is computed on the testing data. This training/testing process is performed for 5 times producing 5 testing results. The whole process is repeated 10 times constituting the so called 10-iterated fivefold CV and 50 testing results of the statistical indicators are obtained. Finally, we calculate the means and standard deviations of the 50 testing results serving as the performance of the model.

Discussion
Clinically, the PET/CT scan using FDG as functional tracer has been broadly used in the management of oncologic patients for several years. It has been reported that FDG PET/CT plays clinical roles in diagnosis making, disease staging, therapeutic monitoring, and outcome prediction in patients with lymphoma 10,32-34 . Although the www.nature.com/scientificreports/ maximal SUV of the primary tumor has been widely demonstrated to be of prognostic values 13,14 , maximal SUV barely represents the degree of FDG uptake without presenting the volumetric concept. The volumetric analysis of MTV, which provides clinicians more information than maximal SUV, has brought increasing evidence of clinical value, especially in predicting the patient survival in several types of lymphoma [35][36][37][38] .
In DLBCL, there were also several reports mentioned the prognostic value of MTV. In stage II-III DLBCL patients 39 and in patients with bone marrow involvement 40 , Song et al. 41 concluded MTV was superior to Ann Arbor stage in predicting patient survival. Similar results by Sasanelli et al. disclosed that pre-treatment MTVsum is an independent predictor for clinical outcome in patients with all staging. Song et al. 42 also reported that the MTV level is a factor to predict survival in primary gastrointestinal tract DLBCL independently. The MTV also helped to select patients with increased therapy response 43 , define a poor prognosis group and improve the predictive ability 18 when it combined molecular characteristics or early PET/CT response.
The IPI has been used to predict prognosis in DLBCL treated with doxorubicin-containing regimens over the last 2-3 decades. This score has been validated in the rituximab era as revised IPI (R-IPI) 44 . In the current study, we also collected R-IPI as the clinical predictive parameter. When using the ROC analysis, however, the R-IPI had an AUC value inferior to that calculated by IPI, both in PFS and OS analysis. So, it was IPI rather than R-IPI used for further prediction in the machine learning models. More recently, classifications based on cells of origin and molecular characteristics allow the identifications of poor prognosis in different subtypes of patients.
As a subspecialty of artificial intelligence, machine learning denotes the development of algorithms obtaining parameters and models optimally representing the available data. There are two parts in the learning process, (1) estimating the unknown parameters from a known data set in a system, and (2) predicting new outputs of the system using the weights of these parameters. Nowadays, the amount of information to be interpreted and examined when predicting the malignant disease prognosis has been increasing rapidly. The evidence-based medicine is based on randomized controlled trials in which large patient populations and clinical data is handled. In the future, the clinical trials can be better conducted with machine learning approaches, and thus new findings can be obtained using the available data more easily. The machine learning-based clinical decision support systems, in recent years, have been emerging in dealing with certain clinical situations 21,22,45,46 . It is believed that machine learning methods become an alternative means for handling complex and large data sets and for model generation 47 .
The literature review disclosed some studies evaluating patient survival in DLBCL using machine learning models. Shipp et al. reported the successful prediction of patient OS via supervised machine learning algorithms dealing with oligonucleotide microarray gene-expression data 29 . Another paper used the hybrid machine learning approach, more specifically, both clinical and genomic date to create a single classifier to predict patient outcomes in DLBCL 27 . Ando et al. 25 conducted the FNN model and 4 genes transcriptional profiling data to predict lymphoma survival, yielding an accuracy of 73.4% in comparison with the 68.5% accuracy using the Cox model and 17 genes. Another article by Ando et al. 26 reported the FNN model as a powerful tool for extracting significant biological markers affecting prognosis in DLBCL. Biccler et al. 48 collected patients from the Swedish and Danish cohorts and developed a new prognostic model based on machine learning approaches which outperformed known prognostic predictors for patients with DLBCL.
In the similar study published in 2021, Pan et al. 49 established a tumor microenvironment (TME) related prognostic signature for DLBCL patients. When combining with IPI components, it is a promising prognostic model not only to help clarifying immune responses in the DLBCL microenvironment but also to indicate new clinical applications for immune therapy and individualized therapy in patients with DLBCL. In the current study, we incorporated both clinical (i.e., IPI) and the metabolic image parameter from FDG PET/CT scans (i.e., MTVsum) as the prognostic predictors. Additionally, both 3-year PFS and OS evaluations for 122 patients with DLBCL were performed using 5 different learning machines. For PFS, SVC and DNN had the highest survival estimation with a prediction accuracy of 71%. For OS, the DNN model has the highest accuracy for the prediction estimation (76%), and the FNN and SVC models have accuracy of 75%. The results are similar with that reported by Ando et al. 25 .
Although the current study was dealing with a relatively small patient population, we tried to develop 5 machine learning models with iterated CV method to predict clinical outcomes in patients with DLBCL. The machine learning methods may add the values for the survival prediction. It allowed clinicians to pay more attention to the follow-up and therapeutic strategies if high-risk patients were early identified. In the future, we hope that the modern predictive modeling approaches can be applied rather than barely using clinically available dichotomized variables or risk scores to predict patient survival. Further studies dealing with prospective design, larger patient population, and more specific histological subtypes based on different molecular or genetic presentations, may be conducted. Moreover, the prognostic modeling can be grounded on a combination of clinical, pathological, molecular, and metabolic imaging information.

Conclusion
In the current study using machine learning algorithms with iterated CV method in patients with DLBCL, the best results for PFS were obtained with SVC and DNN techniques which outperform LR, RF, and FNN methods. For OS evaluation, the best results were obtained with DNN technique outperforming FNN, SVC, LR, and RF methods.

Data availability
The datasets generated and/or analyzed during the current study are not publicly available due to further studies are ongoing, but are available from the corresponding author on reasonable request.