Main

While there are several recognised factors that predict outcome for patients diagnosed with non-small-cell lung cancer (NSCLC), our ability to risk-stratify individual patients at the time of their diagnosis is limited (Mahar et al, 2015; Detterbeck et al, 2016).

Lung cancer staging enables clustering of tumours of similar anatomic extent (Goldstraw et al, 2007). Staging informs therapy and broadly predicts outcome, however heterogeneity within stage groups suggests individual patient, disease and/or treatment factors influence survival outcomes. In a recent paper by authors on the International Association for the Study of Lung Cancer (IASLC) staging project, fundamental differences between stage classification and prognosis prediction are eloquently discussed (Detterbeck et al, 2016). The authors highlight the multiple and variable factors that influence prognosis vs the deliberately static and singular aim of staging.

Lung cancer prognostication remains rudimentary and to date no single model has demonstrated superior performance, clinical utility, or widespread global uptake. This may in part be due to limited ability to routinely measure proposed variables, or to perceived small incremental survival prediction gain beyond stratification by stage alone. A systematic review of 32 lung cancer prognostic models published from 1996 to 2015 described poor study design and inconsistent analytic approaches as major limitations for meaningful clinical utility (Mahar et al, 2015). Critically, while most tools were subject to some form of internal validation, many used inappropriate methods (subject to over-optimism), and/or reported inappropriate statistical measures. External validation was performed on just one-third of studies reviewed.

The objectives of this study were to identify baseline patient and disease variables associated with overall survival (OS) in patients with newly diagnosed NSCLC, and to derive and subsequently validate a simple and generalisable prognostic model.

Materials and methods

Population

Patients with a new histological diagnosis of NSCLC from 2012 to 2014, with prospectively collected data at the time of diagnosis (baseline) and follow-up data were eligible for inclusion. Treatment was clinician directed including positron emission tomography (PET) staging for patients with potentially curable disease and/or participation in an interventional study.

The derivation cohort (D1) included patients from the Peter MacCallum Cancer Centre Thoracic Malignancies Cohort (TMC); an ethics committee approved single institution prospective observational study. Consenting patients were followed from diagnosis, thrice monthly until death or loss to follow-up.

The validation cohorts (V1 and V2) included patients from two other Australian metropolitan tertiary referral centres, identified from participation in the Victorian Lung Cancer Registry (VLCR) (Stirling et al, 2014). From 2012 the VLCR, an ethics committee approved clinical quality registry, prospectively followed patients at 6 and 12 months following diagnosis, and until death or loss to follow-up.

Ethics approval for this study was obtained from the Peter MacCallum Cancer Centre (study no. 14/107) and Monash University (study no. CF15/727–2015000333) ethics committees. As all sites were located in Melbourne, cohorts were cross-checked and duplicated patients excluded from V1 or V2 as relevant.

Data

Data were prospectively collected according to predefined criteria specified for the TMC and VLCR, with the exception of mutation status, which in the validation cohorts was collected retrospectively from patient medical records.

Variables considered for the survival prediction model were: TNM stage according to 7th edition UICC staging criteria (Sobin et al, 2009); histological subtype (adenocarcinoma, squamous cell carcinoma, large cell carcinoma, NSCLC not otherwise specified (NOS)); mutation status (epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK) and KRAS); comorbidities according to the Simplified Comorbidity Score including aggregate score and individual parameters of tobacco consumption, diabetes mellitus, renal insufficiency, respiratory comorbidity (history of tuberculosis, history of pleural effusion or pneumonia, asthma, pulmonary embolism, hypoxaemia <60 mm Hg, and/or chronic obstructive pulmonary disease inducing a FEV1<1.5 l), cardiovascular comorbidity, neoplastic comorbidity, and alcoholism (Colinet et al, 2005); Eastern Cooperative Oncology Group (ECOG) performance status (PS) (<2 vs 2) (Oken et al, 1982); weight loss within 3 months of diagnosis (0–10, 11–15, >15%); smoking history (current, past, never), smoking magnitude (pack years), sex, and age.

Variables which were dichotomised included: PS to reflect inclusion thresholds for most interventional studies and to limit subjectivity between lower (0–1) and upper (2–3) categories, weight loss (0–10 vs >10%) and smoking history (ever vs never) based on previous analyses demonstrating no incremental benefit from additional stratification (Alexander et al, 2016). Patient status, and where applicable date of death, for D1 was established thrice monthly at attendance for review or phone call to patient or general practitioner, and for V1 and V2, 3–6 monthly phone follow-up. Patients from all cohorts were linked with the Victorian Register for Births, Deaths and Marriages accurate to 31 December 2015, which collects death notification and cause of death.

Statistical analyses

Survival probability using the Kaplan–Meier method was estimated from date of histological diagnosis until death, with living patients censored at the earlier of 31 December 2015 or last study follow-up.

The prognostic influence of variables was assessed by univariate and multivariate Cox proportional hazards regression for patients in D1 with all data fields complete. Preselected variables (TNM stage group) and variables demonstrating an association with OS in univariate analyses (P<0.2) were assessed in multivariate models. Variables retaining an association (P<0.05) were included in the final model. A secondary model excluding PS (subjective variable) and respiratory comorbidity (not routinely or consistently reported) was also evaluated.

Points were allocated to each variable based on the strength of association (the logarithm of the hazard ratio, HR) in the final model to create a simple scoring system for the full model (Lung Cancer Prognostic Index (LCPI)) and for the model excluding PS and respiratory comorbidity (modified LCPI (m-LCPI)). Interactions were explored and sensitivity analyses undertaken to determine composition and points weighting. For illustrative purposes that would be meaningful for clinicians and patients, points were grouped; the number of groups and the threshold for point cut-offs to define the groups were subjectively determined by the ability to detect significantly different survival outcomes for a maximal number of groups while ensuring reasonably narrow confidence intervals (CI). The LCPI and m-LCPI were applied to D1, V1, and V2 to assess internal (D1) and external validity. Missing data from V1 and V2 were resolved using multiple imputation by chained equations using 50 imputations (van Buuren, 2012). Graphical illustrations of survival curves used the 50th imputation. Predictive performance of a Cox model including only tumour stage was also assessed and compared against the LCPI to ascertain any additive benefit of other included variables. Discrimination ability was assessed using Harrell’s c-statistic, and Gonen and Heller’s c-statistic with values from multiply imputed data combined according to Rubin’s rules (Gönen and Heller, 2005; Marshall et al, 2009). Reporting the median for c-statistics was considered (Marshall et al, 2009), but this proposed approach lacked theoretical justification and we found mean and median values to be almost identical. Harrell’s c is most commonly reported in other studies (facilitating comparison) but lacks a variance measure. Gonen and Heller’s c is most appropriate for censored data, is consequently lower in value than Harrell’s c, but addresses any bias concerns about the use of Harrell’s c (Gönen and Heller, 2005). Model calibration was assessed by plotting Cox–Snell residuals against the cumulative hazard function (Lee and Wang, 2003). All tests were two-sided using a 5% significance level and corresponding 95% CI calculated. All analyses were performed using Stata 14.0 software (StataCorp LP, College Station, TX, USA).

Results

Study population

Analyses included 1458 newly diagnosed NSCLC patients from three independent cohorts, Figure 1 and Table 1. Median follow-up for D1, V1, and V2, respectively, was 27 months (range 0–56), 29 months (range 0–51), and 27 months (range 0–47). During follow-up 880 (60%) patients died and 2-year OS rates for D1, V1, and V2, respectively, were 42% (95% CI 38–46), 54% (95% CI 50–59), and 31% (95% CI 26–37).

Figure 1
figure 1

Patient enrolment flow diagram. 1 No hospital follow-up beyond diagnosis and unable to be matched to state death registry (interstate patients). D1, derivation cohort; V1, validation cohort 1; V2, validation cohort 2.

Table 1 Characteristics of patients in the derivation cohort (D1) and validation cohorts (V1 and V2)

Model development – cohort D1

By univariate analyses, age (continuous variable), male sex, comorbidity (renal), stage (IIIA, IIIB, and IV), mutation status (EGFR and ALK but not KRAS), histology (adenocarcinoma), weight loss >10%, and ECOG PS2 predicted OS, Supplementary Table 1. Patients not tested for mutation were more likely to be early stage (P<0.01) and of non-adenocarcinoma histology (P<0.01). Among patients with stage IIIB/IV adenocarcinoma, those not tested for mutation were more likely to be aged 75years (P=0.01), and to have worse prognosis than mutation negative patients (HR 1.54, 95% CI 1.07–2.23).

In multivariate analyses (Table 2), OS was similar for patients with EGFR and ALK mutations (hence combined as actionable mutation). KRAS mutation, with no specific drug target, conferred no survival benefit and differences between patients not tested and negative for mutation were not large after adjusting for age (HR 1.41, 95% CI 0.96–2.03). These groups were combined for future analyses (no proven actionable mutation). After adjusting for mutation, there was no survival difference between adenocarcinoma and non-adenocarcinoma histology, but NOS remained predictive of poorer prognosis.

Table 2 Predictors of survival in the derivation cohort by multivariate Cox proportional hazards regression

Using the LCPI and m-LCPI scoring systems (Table 3), each one-point increase was associated with >2-fold increased mortality risk (HR 2.21, 95% CI 2.07–2.37 and HR 2.08, 95% CI 1.95–2.22, respectively). The ability of the LCPI to stratify patients according to prognostic group was maintained (and superior to stage alone) in subgroup analyses for early stage patients likely receiving curative surgery or curative chemoradiotherapy (stage I–IIIA) and for advanced stage patients likely receiving non-curative chemotherapy, radiotherapy, or molecular therapies (stage IIIB–IV) – Supplementary Figure 1.

Table 3 Weighted scores for predictive model for overall survival in newly diagnosed non-small-cell lung cancer

Model performance and validation – cohorts V1 and V2

In the validation cohorts data were missing mostly for weight loss, PS, and respiratory comorbidity, Table 1. In V1, 40% of patents had all data available, 33% were missing one variable, 11% two variables, 9% three variables, and 7% four variables. In V2, 81% of patents had all data available, 11% were missing one variable, 3% two variables, 4% three variables, and 1% four variables.

Survival by cohort and LCPI is presented in Figure 2 and Table 4 (m-LCPI results in Supplementary Figure 2 and Table 2). Overall, the LCPI and m-LCPI provided greater discrimination than stage alone (Harrell’s c 0.73, 0.72, and 0.68, respectively), Supplementary Figure 3. Model discrimination ability (Harrell’s c) of the LCPI and m-LCPI, respectively, were 0.74 and 0.72 for D1, 0.72 and 0.73 for V1, and 0.71 and 0.68 for V2 (Table 4; Supplementary Table 2). Discrimination on external validation was similar when undertaking analyses for all cases with multiple imputation for missing data (Figure 2) and for complete cases only (Supplementary Figure 4). For all cohorts, Cox–Snell residuals closely aligned with the plotted LCPI cumulative hazard function up to 2 or 3 years, Supplementary Figure 5.

Figure 2
figure 2

Overall survival curves according to Lung Cancer Prognostic Index (LCPI) in (A) derivation cohort, (B) validation cohort 1, and (C) validation cohort 2.

Table 4 Overall survival according to cohort and LCPI

Discussion

The proposed LCPI is a compound scoring system, including established and novel variables, for the prediction of survival following NSCLC diagnosis. The LCPI is simple and generalisable, using data easily and routinely obtained during diagnostic evaluations. Importantly, the model was developed from prospectively collected data and we provide external validation to demonstrate superior survival prediction compared to stage alone (current best practice), and consistent predictive performance across two independent cohorts, c-statistic (Harrell’s c) 0.73 and 0.68.

Development of a single model for all stage groups and histological subtypes aimed to provide broad clinical utility and encourage routine application, while maintaining relevance for individual disease stages. Increasing LCPI was correlated with advancing stage; however the LCPI was able to better stratify patients by using other prognostic factors. For example, across our three cohorts up to 22% of patients with metastatic disease scored low-risk LCPI 1–2 and 37% of patients with stage I–II disease scored high-risk LCPI 3–4. Assessment of LCPI calibration on external validation demonstrated good model fit with deviation from the plotted cumulative function observed only at longer survival times, not unexpected in models with censored data.

With fewer prognostic determinants, performance of the m-LCPI was inconsistent; similar to the LCPI for V1, but comparatively lower both in D1 and V2. Notably, in V2 m-LCPI performance was the same as by stage alone (Harrell’s c 0.68), suggesting no additional benefit of the modified model omitting PS and respiratory comorbidity in this cohort. Predictive ability of m-LCPI was linked to disease stage; V1 had significantly more early-stage patients (46% stage I–II compared to 23% in D1 and 31% in V2), for whom poor PS and respiratory comorbidity were less common and influence on survival prediction less relevant. Moreover, although maintaining performance in V1, m-LCPI had reduced ability to stratify patients based on the presence/absence of other risk factors. For example, in V1 91% of stage I patients were classified in m-LCPI 1 vs 76% in LCPI 1, and only 5% of stage III patients were classified in m-LPCI 4 vs 17% in LCPI 4.

Application of alternatively proposed prognostic models in our cohorts to directly compare predictive performance with the LCPI (i.e., fitting models to our data), was limited by inconsistencies between available and required data, a major influence on the global uptake of any proposed model. Accordingly, we discuss LCPI performance relative to other models based on comparative performance in external validation. While many models have been published, comparisons were limited consequent to a common lack of external validation among proposed prognostic models, most of which have been derived from retrospective data analysis (Mahar et al, 2015).

Lung Cancer Prognostic Index discrimination performance on external validation (Harrell’s c 0.72 and 0.71) was similar to a Surveillance, Epidemiology, and End Results Program (SEER) model (Harrell’s c 0.69–0.72), but overcomes several limitations (Putila et al, 2011). Surveillance, Epidemiology, and End Results Program model derivation on almost two decades of data (1998–2006) achieved an impressive sample size (>230 000) but introduced heterogeneity in terms of available diagnostic modalities, for example, PET staging (Gregory et al, 2012), and treatment algorithms, for example, molecular therapies (Sequist et al, 2007; Solomon et al, 2014). Equally, the SEER model (including tumour stage and grade, age, race, and sex), has not been tested on a more recent cohort, which now routinely incorporates PET and targeted therapies where appropriate. A reflection of this, mutation testing was performed more frequently in our three cohorts compared to a 2010 SEER cohort (72 vs 23% for stage IV adenocarcinoma) (Enewold and Thomas, 2016). The role of tumour grade was considered differently in the SEER and our models, deliberately omitted from our cohorts due to reporting subjectivity and limited application in real-world clinical settings (Stang et al, 2006). Additionally, the SEER model includes only adenocarcinoma and squamous cell carcinoma, and as such would not be applicable for >10% of our study cohort and the general NSCLC population.

A model published in 2015 using data from 1997 to 2008 also included all stage groups but used individual TNM status rather than stage groups for prediction, justified on the basis that periodic staging classification updates would negate the value of a model including stage groups (Schild et al, 2015). This model has not yet undergone prospective external validation. One advantage of using stage group is the ability to include patients where TNM staging is incomplete, for example, documentation of metastasis (common in large epidemiological data sets). In addition to TNM status, that model included quality-of-life score, age, sex, PS, and smoking status. Mutation status, a relevant prognostic indicator for 5–10% of patients harbouring actionable mutations with demonstrated survival benefits (Sequist et al, 2007; Solomon et al, 2014; Tan et al, 2017), was a notable exclusion as for other models.

Global prognostic models (all stages, all treatments) are advantageous for use in clinical settings and for application to large data sets for epidemiologic analyses. Predictive performance in sub-populations remains important, however stage and treatment-specific prognostic models with homogenous populations would be expected to outperform generalised models such as the LCPI. As depicted in Supplementary Figure 1, application of the LCPI to early-stage (I–IIIA) and advanced stage (IIIB–IV) subgroups demonstrated an unsurprising reduction in model performance (LCPI c-statistics: overall: 0.73, stage I–IIIA: 0.69, stage IIIB–IV: 0.67). However the model maintained ability to stratify by survival groups and superior predictive ability compared to stage alone (stage group c-statistics: overall: 0.65, stage I–IIIA: 0.62, stage IIIB–IV: 0.52). The aforementioned SEER model also reported subgroup analyses (by treatment rather than stage) with c-statistic best for early stage and reducing for advanced stages: single modality surgery: 0.77, surgery with radiation: 0.69, single modality radiotherapy: 0.63, treatment unknown: 0.60 (Putila et al, 2011). Subgroup analyses were not reported for the discussed 2015 global prognostic model (Schild et al, 2015).

Withstanding stated disparities in model design and intent, compared with an externally validated baseline variable model for advanced NSCLC (stage IIIB–IV) (Mandrekar et al, 2006), the LCPI performed favourably. That model, including stage, PS, body mass index, haemoglobin level, and white cell count, achieved external validation c-statistic 0.65, whereas the LCPI achieved 0.72 and 0.71 in our two validation cohorts. Although developed using retrospective data and tested in small validation cohorts (n=65, n=36), a model for inoperable NSCLC (stage I–IIIB) treated with radical chemoradiotherapy demonstrated impressive discriminative ability, superior to the LCPI; external validation c-statistics 0.75 and 0.76 (Dehing-Oberije et al, 2009). Like the LCPI, that model included sex and PS, but differed by inclusion of forced expiratory volume (vs respiratory comorbidity), and nodal status/tumour volume (vs stage group). Forced expiratory volume and tumour volume are not routinely assessed in non-curative settings rendering model of limited generalisability.

Lung Cancer Prognostic Index variables are clinically relevant and can be obtained during the diagnostic evaluation of patients presenting with newly diagnosed NSCLC. Advancing age and male sex, as in other studies of early and advanced NSCLC (Wakelee et al, 2006; Agarwal et al, 2010), conferred increased mortality risk. Although both have been linked to comorbidity burden impacting treatment deliverability and survival (Tammemagi et al, 2004), discounting respiratory disease, we found no residual association between comorbidity (including aggregate scores) and survival. With variable prognostic ability across NSCLC cohorts (Alexander et al, 2016), possibly related to data reliability (Alexander et al, 2017, in press), we strongly advocate prospective assessment of respiratory comorbidity (and indeed all other varibles) against predefined critieria. Performance status was a strong predictor of OS in this and other studies, including comparative survival estimation with complex multivariate models (Wang et al, 2015), yet subjectivity may still be percieved as a limitation. We found exclusion of respiratory comorbidity and PS (m-LCPI) reduced model predictive performance.

Inclusion of mutation status within our model is novel and provided increased stratification for patients with advanced disease. Mutation status is a globally relevant prognostic factor with available targeted treatment options, and reflex EGFR and ALK testing for advanced stage non-squamous carcinomas in patients suitable for therapy has received international endorsement (Leighl et al, 2014). Currently including only EFGR and ALK, the LCPI may be expanded in the future to accommodate newly identified targetable mutations. Pursuit of tissue for mutation testing is encouraged in older patients and those presenting with poor PS, particularly where linked to lung cancer burden, with adverse survival outcomes for patients not tested vs negative for mutation attributed to these factors in multivariate analyses. Conversely, patient factors did not explain the adverse prognosis of patients with NSCLC NOS in our study, a finding supported by other studies (Righi et al, 2014; Tane et al, 2014).

The three cohorts in this study were representative of heterogonous real-world patient populations. There were differences across the cohorts, possibly reflecting referral patterns of the tertiary referral specialist cancer centre (D1), compared to major metropolitan general public hospitals (V1 and V2). Notably our population is biased to metropolitan centres and there was overlap in the population catchment areas. However, the ability of the LCPI to retain predictive performance across institutions and cohorts supports generalisability of the model. Reasons for the comparatively poorer 2-year OS rates for V2 are unclear; further validation or recalibration may be required to address this uncertainty. Longer-term analyses are ongoing and will allow assessment of prediction beyond 3–4 years after diagnosis. Missing data were evident in both validation cohorts but overcome by imputation, and reassuringly the LCPI maintained prognostic performance in both imputed and complete case analyses. Further prospective validation, with close to perfect data capture as for our derivation cohort, is warranted. Data linkage to the state-based death registry was implemented to ensure accurate and complete mortality data. However, it is possible a small number of patient deaths were excluded due to interstate migration. While LCPI discrimination ability exceeds current best practice (stage alone), further improvements may be gained by more comprehensive purpose-defined data sets. However, the pursuit of perfect prediction must be balanced against the need for relevant real-time assessment.

Baseline and treatment-based prognostic models have fundamental differences in terms of derivation and application. Treatment undoubtedly impacts survival; however treatment detail, particularly as patients’ cycle through various combination strategies, is not always easy to capture. As such, its contribution to static and simple models may not be feasible. While baseline models do incorporate predictors of treatment, treatment strategy models may offer additional prognostication, particularly as alternate treatment pathways emerge and in the setting of recurrent or progressive NSCLC where prior and future treatments may impact outcome. Key laboratory parameters that have shown some interest (Ay et al, 2012; Khorana et al, 2013; Gu et al, 2015) were not included in LCPI due to limited data availability across the three cohorts, and indeed may not be feasible for use in real-world routine care. Similar consideration was given to inclusion of genomic information, which may in time contribute to prognostication, however, since there is currently a lack of globally standardised routine testing, it was not considered appropriate to include in this model intended for routine clinical use.

Importantly, the validated LCPI from our study has current real-world relevance for routine care contributing additional prognostic information, which may be utilised in conjunction with validated tools and evidence-based patient management guidelines. Specifically, the LCPI could be used to counsel patients, guide treatment decisions and clinical trial eligibility, inform comparative effectiveness analyses, adjust for confounders and case-mix when benchmarking quality of care across institutions, and to standardise mortality risk for epidemiological analyses. We present the m-LCPI as a pragmatic alternate model, which may be utilised where data availability is limited, although this model’s varied consistency and potential lack of additive value over stage alone in some cohorts must be considered. Both the LCPI and m-LCPI are simple to use and include data that can easily be obtained during the diagnostic evaluation of patients with NSCLC.