Highly Sensitive Marker Panel for Guidance in Lung Cancer Rapid Diagnostic Units

While evidence for lung cancer screening implementation in Europe is awaited, Rapid Diagnostic Units have been established in many hospitals to accelerate the early diagnosis of lung cancer. We seek to develop an algorithm to detect lung cancer in a symptomatic population attending such unit, based on a sensitive serum marker panel. Serum concentrations of Epidermal Growth Factor, sCD26, Calprotectin, Matrix Metalloproteinases −1, −7, −9, CEA and CYFRA 21.1 were determined in 140 patients with respiratory symptoms (lung cancer and controls with/without benign pathology). Logistic Lasso regression was performed to derive a lung cancer prediction model, and the resulting algorithm was tested in a validation set. A classification rule based on EGF, sCD26, Calprotectin and CEA was established, able to reasonably discriminate lung cancer with 97% sensitivity and 43% specificity in the training set, and 91.7% sensitivity and 45.4% specificity in the validation set. Overall, the panel identified with high sensitivity stage I non-small cell lung cancer (94.7%) and 100% small-cell lung cancers. Our study provides a sensitive 4-marker classification algorithm for lung cancer detection to aid in the management of suspicious lung cancer patients in the context of Rapid Diagnostic Units.

Scientific RepoRts | 7:41151 | DOI: 10.1038/srep41151 We previously described a three-marker panel that included EGF (Epidermal Growth Factor), soluble CD26 (sCD26) and Calprotectin (CAL), showing a considerable discriminatory capacity to detect patients at high risk for LC (83% sensitivity and 87% specificity) 19 . In this new study, together with EGF, sCD26, and CAL, other 5 serum markers were evaluated: MMP-1, − 7, − 9 (Matrix Metalloprotease − 1, − 7, − 9), CEA (Carcinoembryonic antigen) and CYFRA 21.1, with the aim of improving our previous diagnostic algorithm. These molecules cover a spectrum of biological functions implicated in cancer development and progression, as summarized on Table 1. Since this novel panel is intended to be used in a LC-RDU managed by consultants receiving referrals from primary care doctors, an elevated sensitivity to detect LC among symptomatic patients is imperative.

Results
Marker Levels in Lung Cancer and Controls. Serum levels of the 8 markers analyzed are shown in Table 2, including the median and range for the control group (healthy and benign) and LC. After correction for multiple testing, serum concentrations of EGF, CAL, MMP-1, MMP-7, MMP-9, CEA and CYFRA 21.1 were significantly elevated in LC compared to controls (Mann-Whitney U test, P = 0.001 for EGF, CAL, MMP-9, CEA and CYFRA 21.1; P = 0.047 for MMP-1 and P = 0.013 for MMP-7), while sCD26 levels were notably lower in malignancy relative to controls (Mann-Whitney U test, P = 0.001).
All marker levels were found significantly different between healthy controls and cancer subjects (Mann-Whitney U test, P = 0.002 for EGF, sCD26, CAL, MMP-9, CEA and CYFRA 21.1; P = 0.018 for MMP-1 and MMP-7). However, when comparing patients with benign pathologies and cancer, differences in MMP-1 and MMP-7 resulted not significant (Mann-Whitney U test, P = 0.448 for MMP-1 and P = 0.090 for MMP-7). In multivariate linear regression models adjusted for gender, age and smoking status, significant association was again observed for the occurrence of LC and the markers, except for MMP-1 and MMP-7, when considering both the healthy group and all controls. However, only CAL, MMP-9 and CEA maintained significant association with LC regarding benign pathologies.
Furthermore, correlation between the eight markers analysed was also explored using an annotated heatmap (Supplementary Figure S1). Correlations rank from a minimum of 0.038 between EGF and CYFRA 21.1, and a maximum of 0.489 for CAL and MMP-9. Moderate correlations were also observed for several markers, as with EGF with CAL and MMP-9, and the negative correlation of sCD26 with CAL.
The performance of the candidate markers was evaluated by means of ROC curves (Table 2). CAL showed the best potential to discriminate LC from controls (AUC 0.759), followed by CEA (AUC 0.744), CYFRA 21.1 (AUC 0.734) and MMP-9 (AUC 0.729). EGF and sCD26 exhibited AUCs in the range of 0.7, while MMP-1 and MMP-7 demonstrated poor discriminatory capacity (AUC 0.597 and 0.627, respectively). Table 3, LC cases were evaluated based on histology. Statistically significant differences after correction for multiple testing were found between both non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC) in relation to controls for sCD26, CEA and CYFRA 21.1 (Mann Whitney U test, NSCLC vs controls: P = 0.002 for sCD26, CEA and CYFRA 21.1; SCLC vs controls: P = 0.040 for sCD26, P = 0.002 for CEA and P = 0.012 for CYFRA 21.1). Levels of EGF, CAL, MMP-7 and MMP-9 resulted different when comparing NSCLC and controls (Mann-Whitney U test, P = 0.002 for EGF,

Marker Function in Cancer Usefulness in LC diagnosis
Epidermal Growth Factor (EGF) Binding of EGF to receptor promotes tumour growth and progression 42 Suitable discrimination of LC/NSCLC from healthy and benign lung pathologies 19,20 sCD26 Immune regulation and co-stimulatory activities 43  CAL and MMP-9, P = 0.048 for MMP-7), but not for SCLC (Mann-Whitney U test, P = 0.798 for EGF and MMP-9, P = 0.112 for CAL and P = 0.056 for MMP-7). The potential of the markers to detect early stage LC was analysed according to tumour stage (Table 3). EGF, CAL and MMP-9 were the only molecules significantly altered after multiple testing correction in NSCLC stage I + II (Mann-Whitney U test, P = 0.002 for EGF and MMP-9, P = 0.017 for CAL), suggesting their usefulness for diagnosis at earliest stages. For late stage NSCLC, all markers except for MMP-1 displayed significant differences (Mann-Whitney U test, P = 0.002 for EGF, sCD26, CAL, MMP-9, CEA and CYFRA 21.1, P = 0.023 for MMP-7). Regarding SCLC a dramatic reduction of EGF, CAL and MMP-9 levels in limited SCLC prevents the distinction from non-cancer patients. After adjusting for the common risk factors gender, age and smoking, the same associations were maintained, with the exception of SCLC association with sCD26, and the lack of significance with either LC histology or NSCLC stages and MMP-7.

Association between Clinical Parameters and Marker Levels.
The association of marker concentrations with clinical variables gender, age and smoking status is presented in Supplementary Table S1. sCD26 levels were significantly higher in women relative to men (Mann-Whitney U test, P = 0.018), with no other marker influenced by gender. Older age was associated significantly with higher levels of MMP-7 and CYFRA 21.1 (Mann-Whitney U test, P < 0.001 and P = 0.004, respectively), whereas sCD26 levels diminished with age. Patients with smoking habits had significantly increased serum concentrations of EGF and MMP-7 in relation to never smokers (Mann-Whitney U test, P = 0.028 for EGF and P = 0.026 for MMP-7). Multimarker Panel and Classification Algorithm for Lung Cancer. Lasso regression was employed to simultaneously derive a multivariate panel of markers and an optimal cut-off for LC, with the criterion of maximizing specificity for a predefined sensitivity of 95%. The resulting classification rule as well as the optimal Lasso penalization parameter are available in Supplementary Material S1 and Supplementary Figure S2, respectively. Additionally, Supplementary Table S2 includes several diagnostic measurements for the proposed model. Variability in the proportion of males and smoking, and differences in age between LC and controls, as well as influence of these variables on marker levels motivated their inclusion in the model. Application of Lasso procedure on the training set led to the establishment of a 4-marker panel composed of EGF, sCD26, CAL and CEA. A clinical model composed of gender, age and smoking was also established by logistic regression for comparison.
Performance and ROC curves of this marker panel and single markers for LC diagnosis, besides the clinical model, are presented on Table 4 and Fig. 1, respectively. The 4-marker panel demonstrated a good discriminatory capacity to differentiate LC patients from controls with an AUC of 0.873, showing 97% sensitivity and 43% specificity for LC detection corresponding to a 0.266 cut-off. This combination of markers outperforms the individual markers in terms of specificity. In relation to the clinical model, a lower discriminatory ability was displayed as compared to our proposed multivariate panel (AUC = 0.717 (0.637-0.799), DeLong test P-value < 0.0001). At the desired sensitivity of 95%, the decision rule based on such clinical model renders a poor specificity of 26%. Based on our model and assuming a prevalence of LC of 44.4%, corresponding to the RDU of the Pneumology Service of EOXI Vigo, an optimal Negative Predictive Value (NPV) of 94.7% was reached, and a moderate Positive Predictive Value (PPV) of 57.6%.
To further verify the performance of the 4-marker panel for prediction of LC, the resulting classification algorithm developed in the training set was tested in an independent validation set. Descriptive statistics for each marker of the panel are given in Supplementary Table S3 according to histology and stage. In the validation set the marker panel showed an AUC of 0.837, with a sensitivity of 91.7% and a moderately higher specificity of 45.4%, based on the 0.266 cut-off established (Table 4). Regarding the clinical model, the inferior discriminatory capacity was again evidenced by the AUC of 0.659 (0.488-0.816) (DeLong test P-value = 0.0003).

Classification Accuracy of the 4-Marker Panel for Lung Cancer and Control Subgroups.
To deeply assess the performance of our classification rule we examined its ability to correctly classify specific subgroups of LC patients and controls (Table 5). Training and validation populations were combined, and sensitivity for the histological subgroups and stage was calculated at fixed 43% specificity (0.266 cut-off). The classification rule identified with high sensitivity stage I NSCLC (94.7%) and stage II (100%), similarly to advanced stages III and IV (95.2 and 94.6%, respectively). The most prevalent NSCLC type, adenocarcinoma (ADC), also demonstrated a high sensitivity (93.2%), as in Squamous Cell Carcinoma (SqCC) and Large Cell Carcinoma (LCC) (100% both). All patients with SCLC were likewise detected with 100% sensitivity.
Among non-cancerous patients the panel correctly classified 41 out of 94 controls (43.6%), yielding a specificity of 40.9% for healthy and 46% for benign conditions of the lung. Table 5 also includes the classification accuracy based on the results of the CT scan, specifically when no mass was detected. When additionally no nodules were found, the panel correctly classified all LC cases (7/7; 100% sensitivity). On the contrary, in the presence of nodules, our panel was able to classify 1 out of 3 controls (33.3% specificity).

Discussion
Classification algorithms capable of guiding clinical decision-making constitute a valuable tool that can help predict LC, besides complement CT imaging 17,18 . In a previous work we described a three-marker panel for high-risk patients including the molecules EGF, sCD26 and CAL, and gender and age as confounders, and their implication in lung carcinogenesis was enclosed [19][20][21][22][23] . Here we provide an improved classification algorithm achieving a superior sensitivity for LC in the context of RDU. In this refined algorithm, besides the smoking status, the routinely used CEA was incorporated, corroborating its diagnostic capacity especially for late-stage tumours 4,24-26 . Briefly, our approach involves the measurement of EGF, sCD26, CAL and CEA to generate a classification score for each individual to predict LC.
As for colorectal and breast cancer, LC could also benefit from screening programs. However, at this time in Europe there are no LC screening recommendations though The European Society of Radiology and the European Respiratory Society recommend screening within a clinical trial or in routine clinical practice at certified medical centres 6 . Instead, the strategy implemented in many European hospitals to achieve an early detection is the acceleration in the time to diagnosis in the so called Rapid Diagnostic Units for LC [10][11][12]15,16 . Consequently, we intended to design a marker-based classification algorithm to be used in these Units, where the priority is to detect all LC cases (high sensitivity), in order to select those patients that should be immediately submitted to more invasive tests.
Individual analyses evidenced the usefulness of EGF, sCD26, CAL and CEA among the 8 molecules assayed, with AUCs between 0.698-0.759 for the training set and 0.716-0.871 for the validation set, headed in both cohorts by CAL. Among the four markers, differences were more frequent comparing NSCLC and controls, even at early stages as in the case of EGF and CAL. In relation to SCLC, sCD26 and CEA were the markers that better differentiated this histological group. The individual diagnostic potential of the four markers resulted in a modestly specific signature for the detection of LC when combined through a multivariate logistic Lasso regression approach that provided, by design, desirable sensitivity. This strategy demonstrated 97% sensitivity in the training set and for a > 0.266 cut-off the classification algorithm showed a specificity of 43%. In the validation set sensitivity resulted in a fine 91.7% and 45.4% specificity. This modest specificity is of value in the clinical context of RDU with patients with respiratory symptoms and/or LC suspicion. Performance of our marker panel also outperforms that of a clinical model constituted by gender, age and smoking.  The classification accuracy including training and validation cohorts showed an overall sensitivity of 95.6% for LC. Among the 95% of NSCLC patients correctly classified, 94.7% of stage I tumours were detected. Regarding SCLC, the classification algorithm was effective for all the cases. Among controls, overall specificity resulted 43.6% and was not greatly affected by the nature of the controls themselves. Given the clinical dilemma of indeterminate nodules detected on CT-based screening due to elevated false positive rates 8 , we also evaluated our algorithm according to the absence/presence of nodules. All LC cases (100%) that had a negative CT-scan were correctly classified (6 out of 6 NSCLC and the SCLC case). On the contrary, among controls bearing nodules, our panel classified correctly 1 out of 3 patients. It should be noted that CT-scan data was available for all LC cases but only for 13.8% of controls (3 healthy and 10 benign cases), limiting the analysis.

Table 3. Distribution of Serum Markers in Lung Cancer by Histology and Stage and Comparison with
In the last years several diagnostic multianalyte panels have been proposed for LC, with variable criteria for patient selection such as inclusion limited to NSCLC or absence of controls bearing benign pulmonary pathologies. Studies comparable to ours, at least with similar study population, are scarce. Molina et al. 25  Other studies document protein models combined with CT imaging techniques. Yang et al. 24 reported for high-risk patients with no lesions on CT scan a panel, which resulted positive when at least one of the markers CEA, SCC, CYFRA 21.1 and Progastrin-releasing Peptide was altered, yielding a sensitivity of 76.6% and specificity of 94.4%, though they do not report data on another independent sample set. The algorithm established by Patz et al. 27 based on the combination of nodule size and CEA, alpha-antitrypsin and SCC, rendered acceptable performance for classifying patients with indeterminate nodules (92% sensitivity and 74% specificity).
To date, only two blood tests based on marker panels have been translated into clinical or commercial setting. The EarlyCTD-Lung, which measures autoantibodies, was developed for the early detection of LC in high-risk population or as adjunct to CT 28 . Its performance was demonstrated in clinical practice, yielding 41% sensitivity and 87% specificity. The PAULA's test (Protein Assays Using Lung cancer Analytes) is a 4-marker panel comprising three tumour antigens (CEA, CA125 and CYFRA 21.1) and one autoantibody (NY-ESO1), intended for early NSCLC tumours in high-risk patients. In a validation set the panel discriminated NSCLC (with 67% early-stage) from healthy controls with a sensitivity of 77% and specificity of 80%. However its clinical applicability is limited since benign conditions were not included 4 . None of the cited studies pursued such a high sensitivity as we do, which would probably derive in a diminished specificity. In these circumstances, we would affirm the promising value of our 4-marker panel.
Our model building procedure is based on regularized regression models which are intended to be more flexible and resistant to overfitting compared to stepwise approaches [29][30][31] . Furthermore, by design, our method identifies models which guarantee the optimization of the derived classification rule by choosing the penalty parameter and cut-off which maximizes specificity, assuring a predefined sensitivity. The adaptive nature of our method constitutes one of the strengths of our study. Alternative model building techniques established by first choosing a logistic model based on a reduced set of variables by minimizing the AIC or BIC, and then determining a cut-off depending on the given classification setting, are less flexible, and in general our method outperforms these approaches since it is specifically designed for optimizing classification performance. Moreover, approaches based on exhaustive evaluation of all possible sub-models become rapidly unfeasible when increasing familywise error under multiple comparisons. c Adjusted effects and 95% confidence intervals of histology and stage on each of the log-transformed markers considered as outcome in lineal regression model adjusted for gender, age and smoking. *P-value statistically significant.  the number of candidate markers, whereas our approach, since it relies on shrinkage, is expected to perform well in such situations.

Training Set Cut-off Sn (%) Sp (%) PPV a (%) NPV a (%) AUC (95% CI) b
In Supplementary Table S4 we have included two logistic regression models (two-stage) and derived classification rules for selected 90% and 95% sensitivities. As observed, the obtained models and classification rules are not uniformly optimal and their performance varies according to the classification situation. For example, the classification rule based on BIC performs well for the cut-off that provides 95% sensitivity, while its performance is considerably worse for the cut-off corresponding to 90% sensitivity. Alternatively, if we focus on the AIC as optimal criterion, this method outperforms alternative Lasso-based and BIC for 90% sensitivity, but it presents an inferior performance when we focus on higher sensitivities.
Given the complex challenge of developing an optimal diagnostic panel for LC, a proper study design is also of crucial importance. Besides the consciousness in the statistical approach aforementioned, the inclusion of both benign and healthy individuals in the control group, as well as the two main histological tumours (NSCLC and SCLC) is a strong point of our study. Another important feature is that samples from all the individuals were prospectively collected at their first visit to the Pneumology Service in the presence of respiratory symptoms, reflecting the clinical setting of a RDU. For the refinement of the diagnostic algorithm we have also included information related to tobacco, which constitutes a well-established risk factor 32 and is usually not contemplated in studies developing diagnostic panels.
One of the advantages of our classification algorithm is that only 4 molecules comprise the panel, and 2 of them are already established in hospitals: CEA is routinely measured for various types of tumours, while CAL is also quantified for its utility in inflammatory colon processes 33 . This makes our 4-marker panel simple and affordable to guide clinical decision-making and complement CT scan. Additionally, we are currently working on an interactive web application to facilitate the implementation of the classification algorithm in the biomedical community, based on the Shiny web application for R 34 .
Regarding the limitations of the study, the number of patients was modest for both training and validation sets, particularly for SCLC cases. A possible shortcoming could be the lack of information related to tobacco consumption, which perhaps could have contributed to the improvement of the diagnostic algorithm.
In summary, we defined a modestly specific 4-marker classification algorithm that provided, by design, desirable sensitivity for the detection of LC, conceived to be useful among symptomatic high-risk individuals derived to LC-RDU. The next step along the complicated road to reach the clinical implementation is the validation of our panel in a large, multi-centric cohort.

Methods
Study Population. Between May 2007 and January 2011, 186 patients with respiratory symptoms were prospectively recruited at the Pneumology Service of Hospital Álvaro Cunqueiro EOXI Vigo (Spain). The study population included patients finally diagnosed of LC, and a control cohort with subjects diagnosed of benign lung disease and healthy subjects with no respiratory pathology. Exclusion criteria included relapse or progression of a cancer previously diagnosed, and chemo-or radiotherapy treatment.
Clinical guidelines from the American College of Chest Physicians were followed for LC diagnosis 13,14 . Histological assessment of tumours followed the WHO criteria 35 and staging was performed according to the 7 th edition of TNM 36 .

Marker Panel Selection and Classification Algorithm Development.
Marker concentrations were log 10 -transformed before multivariate analysis to reduce the skewness. We derived a classification rule based on a multivariate combination of the studied markers based on logistic Lasso regression 38 fitted in the training set and including age, gender and smoking as fixed effects. This procedure was also used to obtain a clinical model in which only the variables age, gender and smoking were included. Lasso regularization imposes a penalization over the maximum likelihood estimates of the usual regression coefficients so that they are shrunk towards zero. Actually, some of the resulting coefficients can be exactly zero, and hence Lasso shrinkage performs automatic variable selection. The optimal amount of shrinkage is controlled by the selection of the penalization parameter which maximizes the out-of-sample performance (in terms of some pre-defined criterion) of the model. In our algorithm, we simultaneously chose the penalty parameter and cut-off point which provides the classification rule with maximum specificity, given a fixed value of sensitivity (95%). Namely, we use 10-fold cross validation in the training set and for each possible value of the penalty parameter we apply the resulting estimated coefficients to the out-of-sample data, obtaining case probability scores for each observation of the training set. Each of these scores was subsequently dichotomized to guarantee the desired level of sensitivity. Finally, we choose the penalty parameter which maximized the specificity. Further details concerning the Lasso procedure and the proposed algorithm are displayed as Supplementary Material S1. For prediction of a new individual's diagnosis, the selected classification rule was applied. Based on the coefficients of the regression model, the classification algorithm calculates for a new patient a single score based on the estimated predicted probabilities (p) of presenting lung cancer as a function of markers concentrations and demographic variables. A new individual will be classified as cancer if p is higher than the cut-off estimated in the training set, while classified as non-cancer when the resulting score is below the cut-off.
Applying the Lasso regression model to the train and test samples, their probability scores were obtained and the diagnostic performance of the classification rule was evaluated by providing sensitivity, specificity and predictive values. ROC curves were elaborated for both the Lasso-based marker model and clinical model, providing the AUC. DeLong test was applied for comparison of AUC values of these models 39 .
For sake of comparison, we evaluated the performance of two alternative two-stage methods based on first selecting an optimal logistic model, based, respectively, on exhaustive sub-model evaluation and selection based on minimization of Akaike information criterion (AIC) 40 and Bayesian information criterion (BIC) 41 and secondly, determining the optimal cut-off for guarantying the desired level of sensitivity.
All multivariate calculations were performed using the R program package (Wirtschafts Universität, Wien, Austria).