Introduction

Idiopathic pulmonary fibrosis (IPF) is a fibrotic lung disease of unknown origin that is chronic, progressive, and eventually fatal. IPF mainly affects older adults, especially those in their sixth and seventh decades. Despite promising results with novel drugs such as pirfenidone and nintedanib, the mortality of patients with IPF remains high; median survival time is 2.5–3.5 years1,2,3. Furthermore, IPF disease progression is highly variable; while some IPF patients experience a reduction of symptoms over time, others are stable, experience slow worsening of respiratory symptoms or pulmonary function, or have acute exacerbation of symptoms, ultimately leading to death1. Therefore, clinicians usually face challenges in predicting the clinical course in newly diagnosed IPF patients. Precise prediction of clinical course is important for developing treatment plans and providing clinicians with accurate information that must be communicated to patients and medical teams4.

Previous studies used clinical factors (age, gender, smoking status, finger clubbing, dyspnea, 6-minute walking distance, and hospitalization), pulmonary function tests (PFTs), change in PFT, high-resolution computed tomography (HRCT) findings or scores, pulmonary hypertension, molecular biomarkers (metalloproteinase-7 and C-reactive protein [CRP]), and pathologic finding as variables in predictive models5,6,7,8,9,10,11. However, most of these predictive models are too complex to use and have not been validated externally.

In 2003, Wells et al.12 reported on their composite physiologic index (CPI), which was developed as a tool to reflect the morphologic extent of pulmonary fibrosis in IPF on computed tomography (CT) (r = 0.71, r2 = 0.51, P < 0.0005); it is calculated easily based on lung function parameters. CPI was a more powerful prognostic marker for mortality than either lung function or alveolar-arterial O2 gradient in surgically diagnosed IPF patients (P < 0.0005).

However, gender-age-physiology (GAP) stage, which based on the Fine-Gray competing risk models by Ley et al.13 in 2012, uses gender, age, and two pulmonary function results (percent predicted forced vital capacity [FVC] and percent predicted diffusing capacity of the lung for carbon monoxide [DLCO]). Ley et al. analyzed a relatively large cohort; they obtained a C-index of 69.3 in the derivation cohort (n = 228) and 68.7 in the validation cohort (n = 330).

The aim of this study was to validate and compare the predictive values of CPI and GAP stage models in the Korean population.

Methods

Patient selection

In this retrospective study, patients with newly diagnosed interstitial lung disease (ILD) were enrolled by the Korean Interstitial Lung Disease Research Group from January 1, 2003, through December 31, 200714. The cut-off date was December 31, 2009. The diagnosis was performed by pulmonologists, radiologists, and pathologists at each hospital, and diagnosis was reconfirmed by the Scientific Committee at the Korean Academy of Tuberculosis and Respiratory Diseases. Fifty-four hospitals registered a total of 2,186 ILD patients. Patients with a condition other than IPF (n = 501) and patients with incomplete data (n = 423) were excluded from this study. We diagnosed IPF according to the previous international consensus statement by the American Thoracic Society (ATS), European Respiratory Society (ERS), and the American College of Chest Physicians15,16. Of 1,262 IPF patients, 430 patients lost to follow-up were also excluded from this study. As a result, a total of 832 patients with IPF were included in this study. The clinical data (age, sex, smoking, respiratory symptoms, diagnostic method, and mortality), radiologic findings (HRCT), laboratory data (arterial blood gas analysis, CRP, antinuclear antibody [ANA], and rheumatoid factor [RF]), and physiological data (PFT) were investigated. These data were saved in a web-based registry system (www.ild.or.kr). The mean follow-up period was 22.5 ± 16.3 months.

Predictive model

CPI was calculated according to Wells et al.12: CPI parameters as follows: 91.0 – (0.65 × DLCO percentage of the predicted value [% pred]) − (0.53 × FVC % pred) + (0.34 × forced expiratory volume [FEV1] % pred). Mura et al.11 showed that a CPI > 41.0 was significantly associated with 3-year survival in a prospective cohort (hazard ratio [HR] = 5.36, P = 0.0071), as well as in a retrospective cohort (HR = 4.20, P = 0.042). Based on their study, our patients were divided into two groups according to the calculated CPI value (≤41.0 and >41.0), and their characteristics were examined. GAP score was calculated according to Ley et al.13: gender (0–1 points), age (0–2 points), %FVC (0–2 points), and %DLCO (0–3 points). However, the category “cannot perform DLCO (3 points)” was not considered in the present study because of the retrospective nature of the study. GAP stage was determined based on the total GAP score: stage I (0–3 points), stage II (4–5 points), and stage III (6–8 points).

The primary objective of this study was to compare the predictive ability of CPI and GAP stage models for 1-year, 2-year, and 3-year mortality.

Statistical analysis

Continuous variables were compared by t-test or analysis of variance (ANOVA) according to the number of groups, and these variables were presented as means ± standard deviation. Comparisons between GAP stages were performed with ANOVA, and post hoc analyses were conducted using Bonferroni’s correction. Categorical variables were analyzed by Pearson’s chi-square test, and these categorical variables were presented as frequency (n) and percentage (%). The predictive receiver operating characteristic (ROC) curves of 1-year, 2-year, and 3-year mortality were compared between CPI and GAP models. All statistics were analyzed with SPSS™ Version 22.0 (SPSS, Chicago, IL, USA). We considered an adjusted p-value less than 0.05 as statistically significant.

Ethics statement

The Institutional Review Board (IRB) of Seoul National University Bundang Hospital approved this study protocol (IRB approval number: B-1709/420-102). Informed patient consent was waived due to the retrospective nature of our study. All methods were performed in accordance with the Declaration of Helsinki.

Results

Demographic characteristics

The baseline characteristics of the study population are presented in Table 1. The mean age of study population was 66.4 ± 9.3 years. Male gender was more prevalent than female gender (72.0% versus 23.0%). The most common respiratory symptom was dyspnea (68.0%), followed by cough (58.3%). However, 40 (4.8%) patients were asymptomatic. Mean smoking duration was 35.6 ± 12.7 years and mean smoking amount was 34.8 ± 20.0 pack-years. 73 patients (8.8%) had decreased lung function consistent with COPD (FEV1/FVC < 0.7 and age ≥40 years). A total of 380 (45.7%) patients were diagnosed by the surgical method, and 452 (54.3%) patients were diagnosed on the basis of clinical and radiographic criteria. Mean CPI was 38.6 ± 15.5; 460 (55.3%) patients had a CPI ≤ 41, and 372 (44.7%) patients had a CPI > 41 (Table 2). One-year mortality was 17.3%, 2-year mortality was 24.3%, and 3-year mortality was 29.7%. According to GAP, there were 536 (64.4%) stage I patients, 268 (32.2%) stage II patients, and 28 (3.4%) stage III patients.

Table 1 Baseline characteristics of IPF patients (n = 832).
Table 2 Clinical, radiographic, and physiologic characteristics according to composite physiologic index (CPI).

Tables 2 and 3 show the clinical, radiologic, and physiologic characteristics according to CPI and GAP stage. The elevated CPI group (CPI > 41) was significantly associated with aging, low lung function, and low arterial oxygen tension (PaO2) (similar to advanced GAP stage). However, male sex was significantly predominant in the group with CPI ≤ 41 than in the group with CPI > 41. Radiologic findings were not significantly different between the two groups (CPI ≤ 41 and CPI > 41).

Table 3 Clinical, radiographic, and physiologic characteristics according to GAP stage.

Advanced GAP stage was significantly associated with aging, reduced lung function, low PaO2, and higher mortality rate. CPI were significantly increased in patients with advanced GAP stage (P < 0.001). However, in radiologic findings, only the percentage of honeycombing was significantly different among GAP stages (P = 0.043).

Survival according to predictive models

CPI and GAP stage significantly predicted disease progression according to the Cox proportional hazard model (Table 4). The HR increased with increased CPI score (P < 0.001; HR, 1.025; 95% confidence interval [CI], 1.017–1.034), GAP score (P < 0.001; HR, 1.332; 95% CI, 1.222–1.451), and with advanced GAP stage. The predictive value of CPI and GAP stage at 1-year, 2-year, and 3-year mortality was assessed by ROC curve analysis. Each model showed significant predictive capacity at all time points. ROC curves for all patients with IPF are shown in Fig. 1. The area under the curve (AUC) for 1-year mortality was 0.619 for GAP stage and 0.647 for CPI. The AUC for 2-year mortality was 0.625 for GAP stage and 0.647 for CPI. The AUC for 3-year mortality decreased; it was 0.610 for GAP stage and 0.639 for CPI. CPI predicted survival more accurately than did GAP stage for all time points modelled, but the difference in AUC was not statistically significant (Table S1 [Supplementary Information]). Well et al. reported that CPI had the greatest prognostic significance compared with lung function results or PaO2 in biopsy-proven IPF. Therefore, we investigated the GAP stage and CPI in surgically diagnosed IPF patients (Fig. S1 and Table S1 [Supplementary Information]). In surgically diagnosed IPF patients, the AUC for 1-year mortality was 0.622 for GAP stage and 0.673 for CPI. The AUC for 2-year mortality was 0.624, and 0.674, respectively. The AUC for 3-year mortality was 0.602, and 0.667, respectively. Generally, CPI was a more accurate predictor of mortality than GAP stage. In clinically diagnosed IPF patients, the two predictive models showed statistically significant AUC values, but no significant differences were found between them (Fig. S2 [Supplementary Information]). The predictive value for outcome with AUC was under 0.62 in clinically diagnosed IPF patients, which was lower than the value in surgically diagnosed IPF patients.

Table 4 Univariate analysis of survival in idiopathic pulmonary fibrosis using Cox proportional hazard model (3-year survival).
Figure 1
figure 1

Receiver operator characteristic (ROC) curves of GAP stage and CPI to predict mortality in all IPF patients (n = 832). (A) 1-year mortality, (B) 2-year mortality, and (C) 3-year mortality. All predictive models were significantly robust to predict mortality. CPI model was more accurate than GAP stage to predict 1-year mortality (p = 0.301), 2-year mortality (p = 0.349), and 3-year mortality (p = 0.220), but it did not significant. Note: The straight line in the middle is the reference line. AUC, area under the curve; CPI, composite physiologic index; GAP, (G, 0–1 point), age (A, 0–2 points), and 2 lung physiology variables (P, FVC and DLCO); IPF, idiopathic pulmonary fibrosis; ROC, receiver operator characteristic.

Discussion

CPI and GAP stage are easy to use and can be calculated in-office during the initial visit and thereafter during follow-up visits. CPI can be calculated from PFT results, and the GAP stage calculation requires only age and gender in addition to PFT results12,13. In this retrospective study, we compared the predictive value of CPI and GAP stages. Both were effective for predicting mortality. However, both models had a limited capability to provide accurate prognoses for IPF patients.

As mentioned previously, predictive models of IPF prognosis rely on numerous clinical factors, physiologic parameters, radiologic features, biomarkers, and pathologic findings5,6,7,8,9,10,11,17,18. Most models developed to date are complex. In this study, we demonstrated that two simple-to-use models (GAP stage and CPI) have important predictive values. CPI was developed in a British study and has the advantage of relying on PFT data to predict IPF prognosis. Additionally, CPI was developed by considering the severity of emphysema, which could lead to overestimation of lung function19. However, CPI also has disadvantages: it does not consider clinical data such as age, gender, smoking history, presence of desaturation, or 6-minute walk distance. Furthermore, the value of CPI as a prognostic model is not well-studied. In the original publication, this issue had been determined only in 32 histologically proven subjects with the usual interstitial pneumonia pattern, leading to the hypothesis that CPI may be useful as a prognostic marker12. In this study, we tested CPI as a predictive model for 1-, 2-, and 3-year mortality with a relatively large cohort (n = 832); CPI was effective for prediction of the outcomes of IPF patients (P < 0.001, Table 4), and the AUC was approximately 0.61–0.65 in all patients. Additionally, CPI was more accurate than GAP stage, regardless of time of year or diagnostic method.

GAP staging, which was developed in a study in the United States and Italy, uses the clinical data of gender and age, and two sets of physiological data. However, it does not consider HRCT findings. Coexistent emphysema can be a confounding factor due to increasing %FVC, which could cause clinicians to render a good prognosis with a low GAP score20. In our study, although GAP stage showed a lower AUC value than CPI in predicting 1-, 2-, and 3- year mortality, it exhibited significance in the prediction of mortality (Table 4). Similarly, Sharp et al.21 reported that the AUC is lower in the GAP staging system than the CPI at both 12 months and 24 months. Additionally, age showed higher AUC than the GAP staging system at 12 months in their study. These findings could mean that gender is not a strong predictor of mortality. Furthermore, the pathophysiology of IPF is so complex that even multi-dimensional approaches might fail to predict the prognosis with sufficient accuracy22. Additional assessment of changes in functional lung capacity over time or biomarkers could be helpful in improving predictability23.

In our study, advanced GAP stage was understandably significantly associated with aging, male gender, and poor lung function. However, patients with elevated CPI (CPI > 41) showed significantly low predominance of male gender compared with patients with low CPI (CPI ≤ 41). This could mean that gender has a lesser effect on survival compared with age, HRCT, or PFT. Some studies have shown that gender affects survival, but others have reported no association between the two factors6,10,24,25,26. King et al.6 showed there was no significant gender difference on median survival using Kaplan–Meier analysis (P = 0.15; men, 30.0 months; CI: 19.1–44.3; women, 39.3 months; CI: 19.1–44.3). On the contrary, Flaherty et al.27 showed that female gender was protective in IIP, and when surgically diagnosed IPF patients were added to the analysis, female gender showed a significantly low HR. Douglas et al.28 also reported that male gender was significantly associated with a worse outcome.

CPI was more precise in surgically proven IPF patients in our study (Figs S1, S2). Generally, physicians are unwilling to biopsy IPF patients who are in medically poor condition29. As a result, surgically diagnosed IPF patients showed a significantly lower GAP index and lower CPI level than clinically diagnosed IPF patients (Supplementary Table 1). These results may mean that the predictive capabilities of CPI and the GAP stage system are more accurate in early IPF patients than in advanced IPF patients, and these models are not very reliable for longitudinal assessment. Recently, another study demonstrated that previous multi-dimensional indices for IPF were not more powerful as prognostic markers than clinical or physiologic parameters; DLCO is the more powerful prognostic marker on longitudinal follow-up than CPI or GAP stage21. These could mean that although these two models were created taking into consideration age, gender, lung function, and CT findings, these data were not sufficient to predict the outcome due to the heterogeneity of IPF. Further risk assessments (e.g., 6-minute walking test, oxygen demand, hospitalization due to respiratory problem, pulmonary hypertension, acute exacerbation, or lung cancer), should be considered for a more precise prediction of prognosis in IPF, especially for longitudinal follow-up.

Although GAP stage and CPI models showed significant capability in prediction and association with relative risk (Table 4) on mortality, the AUC values of the CPI and GAP stage were low in this study, especially those for 3-year mortality in GAP stage. This may be due to ethnic differences, complexity of the pathophysiology of IPF itself, and uneven distribution of population in the GAP stage (low numbers in GAP stage III). Kim et al.26 demonstrated that the GAP model failed to predict 3-year mortality with Korean patients. Additionally, 430 (34.1%) patients were lost to follow-up and excluded in this study. This sizeable patient set could have resulted in selection bias, and may have caused a decrease in the predictive capabilities of the two models. Furthermore, this study may have included some patients with combined pulmonary fibrosis and emphysema (CPFE). Although the prognosis between IPF and CPFE is still unclear, patients with CPFE could affect the GAP model30,31,32.

This study had some limitations. First, our patients were diagnosed based on the ATS/ERS criteria published in 200215,16. Therefore, IPF patients, who were diagnosed based on 2011 ATS/ERS/JRS/ALAT guidelines could have a different prognosis1. However, GAP stage and CPI were also first tested in patients who were diagnosed using the 2002 guidelines. Second, we could not correlate HRCT finding with CPI score. Although CPI was also elevated in patients with advanced GAP stage in a previous study from Japan, CPI has not yet been validated in Asian patients33. Third, our study is retrospective in nature, so we excluded patients who did not undergo DLCO testing, many of whom may have been included in the “cannot perform DLCO” category in the GAP stage model. As a result, the GAP stage III group was small compared with the others, which could result in selection bias.

In conclusion, both GAP stage and CPI showed significant capabilities to predict mortality, and CPI was more accurate than GAP stage in predicting mortality at 1, 2, and 3 years. However, the complexity of IPF and the inconsistencies in physiologic and clinical parameters limit the capability of both models to provide accurate prognoses for IPF patients. Further large-scale prospective studies are needed to investigate a more accurate predictive model.