Diagnostic and prognostic implications of 2018 guideline for the diagnosis of idiopathic pulmonary fibrosis in clinical practice

The purpose of this study was to evaluate the implications of the 2018 updated guideline for the diagnosis of idiopathic pulmonary fibrosis (IPF) in clinical practice compared to 2011 guideline. This study involved 535 patients including 339 IPF and 196 non-IPF, and we retrospectively evaluated CT classifications of usual interstitial pneumonia (UIP) by two guidelines. Interobserver agreement of 2018 criteria showed moderate reliability (κ = 0.53) comparable to 2011 (κ = 0.56) but interobserver agreement for probable UIP was fair (κ = 0.40). CT pattern of indeterminate for UIP was associated with better prognosis compared with the other groups (adjusted hazard ratio [HR] = 0.36, p < 0.001). Compared to possible UIP, probable UIP demonstrated a lower positive predictive value (PPV, 62.9% vs 65.8%). In analysis of patients with CT patterns of non-definite UIP, diagnosing IPF when CT pattern showed probable UIP with lymphocyte count ≤ 15% in BAL fluid, and either male sex or age ≥ 60 years showed a high specificity of 90.6% and a PPV of 80.8% in the validation cohort. The 2018 criteria provide better prognostic stratification than the 2011 in patients with possible UIP. BAL fluid analysis can improve the diagnostic certainty for IPF diagnosis in patients with probable UIP CT pattern.

The purpose of the current study was validate the latest 2018 diagnostic guideline of IPF in the cohort of fibrosing interstitial lung disease (ILD) and evaluate the diagnostic and prognostic implications compared to previous 2011 guideline in clinical practice. The interobserver agreement, diagnostic performance and survival outcomes were compared. Furthermore, the added diagnostic value of cellular analysis of BAL fluid was assessed in patients with CT patterns of probable UIP, indeterminate for UIP and alternative diagnoses following the guidline.

Results
The study cohort included 535 patients (mean age 60.8 ± 9.0 years, 348 men) with fibrosing ILD (Fig. 1). The clinical characteristics of these patients are summarized in Table 1.
In the validation cohort, the addition of lymphocyte count in BAL fluid to CT pattern alone or together with either older age or male sex improved specificity, PPV and diagnostic accuracy (Table 5). Diagnosing IPF when CT pattern showed probable UIP with lymphocyte count ≤ 15% in BAL fluid, and either male sex or age ≥ 60 years showed a high specificity of 90.6% (95% CI: 79.3-96.9) and a PPV of 80.8% (95% CI: 63.4-91.1). The model incorporating result of BAL fluid analysis had higher net benefits persistently than the other models for risk thresholds > 15% in the validation cohort of non-definite UIP CT pattern and for risk thresholds > 20% in the validation cohort of probable UIP CT pattern (Fig. 3). The net reclassification improvement of the model incorporating result of BAL fluid analysis over the model of age and sex was 0.75 (95% CI: 0.19-1.31) in the www.nature.com/scientificreports/ validation cohort of probable UIP CT pattern, which showed significant improvement in classification accuracy for IPF diagnosis (p = 0.007).

Discussion
The present study showed that application of the latest 2018 diagnostic criteria for IPF proposed by ARS/ERS/ JRS/ALAT resulted in the reclassification of patients categorized as having possible UIP based on 2011 criteria into two categories, probable UIP and indeterminate for UIP. Prognoses differed significantly among patients  Figure 3. Decision curve analysis of prediction models for the diagnosis of idiopathic pulmonary fibrosis in patients with non-definite CT pattern in the validation cohort. (a) Net benefits (proportion of true-positive results minus weighted proportion of false-positive results with weight equal to the ratio of risk threshold to 1 minus risk threshold) of combined model of probable UIP CT pattern, demographic characteristics (sex and age) plus low lymphocyte count in bronchoalveolar lavage fluid were comparable to or higher than those of models with CT pattern alone or combined with age and sex for risk thresholds > 15% in patients with CT pattern of non-definite UIP in the validation cohort. (b) Net benefits of combined model of demographic characteristics (sex and age) plus low lymphocyte count in bronchoalveolar lavage fluid were higher than those of models with demographic characteristics alone for risk thresholds > 20% in patients with CT pattern of probable UIP in the validation cohort. www.nature.com/scientificreports/ grouped by both 2018 and 2011 criteria, with patients classified as indeterminate for UIP according to 2018 criteria showing significantly better survival than the other groups. Although diagnosing IPF based on a CT pattern of probable UIP on 2018 criteria showed increase in sensitivity, its specificity and PPV remained insufficient. The latest ARS/ERS/JRS/ALAT guideline recommend cellular analysis of BAL fluid for patients with newly detected ILD of unknown cause who are clinically suspected of having IPF and have an HRCT pattern of probable UIP, indeterminate for UIP or an alternative diagnosis. In our study, the inclusion of BAL fluid analysis increased diagnostic performance, including specificity and PPV, compared with a model that included CT pattern and appropriate clinical features increasing the likelihood of IPF, i.e. old age and male. A CT pattern of probable UIP has been found to indicate a higher likelihood of UIP on biopsy 8-10 , especially when compared with a CT pattern of indeterminate for UIP (82.4% vs 54.2%; p = 0.01) 11 . Our study found, however, that the proportion of patients diagnosed with IPF did not differ significantly in patients with CT patterns of probable UIP from those with indeterminate for UIP (62.9% vs 78.6%; p = 0.05). The discrepant results might be due to different methods used for accounting for the outcome (i.e. histologic UIP vs multidisciplinary diagnosis of IPF) and different proportions of diseases in the tested cohort as there were small numbers of patients with NSIP (9%) and patients with connective tissue disease were not clearly excluded in the Chung et al. 's study 11 . In addition, coexistence of more than moderate degree of emphysema was present in patients who were classified as indeterminate for UIP and the emphysema could affect the evaluation of CT pattern, which could lead to interpret as indeterminate for UIP, finally increasing the proportion of IPF in patients with CT pattern indeterminate for UIP. However, this was not addressed in the 2018 guideline.
This result also affected the diagnostic performance and CT pattern of probable UIP demonstrated a slightly higher specificity (63.3% vs 57.6%) but a lower PPV compared to possible UIP category based on 2011 criteria (62.9% vs 65.8%). As demonstrated in our study, the PPV can be variable and may not be satisfactory regarding the study population of cohort, and among the patients with probable UIP CT pattern, there still can be a significant heterogeneity of underlying disease that certain proportion of iNSIP also can show similar CT findings. This argues the assertion to forgo surgical lung biopsy in patients with probable UIP CT pattern. When we encounter fibrosing ILD without an identifiable cause and suspected to be IPF in clinical practice, the most problematic differential diagnosis, which should be differentiated from IPF would be iNSIP or cHP. Moreover, interobserver agreement for probable UIP was fair (κ = 0.40), which was quite unsatisfying to use as a final confirmative diagnostic criteria in clinical decision making to guide further invasive diagnostic procedures. Among patients with CT pattern of probable UIP, compared to non-IPF patients (i.e. iNSIP), prognosis was also different and the survival time was shorter in IPF patients which was consistent with a previous study 12 . This emphasizes that the patients with CT pattern of probable UIP are still a heterogeneous group, and increasing the clinical likelihood of IPF is important. Diagnosing IPF based on a probable UIP CT pattern should be applied carefully in a selected population with a high clinical likelihood of IPF. Although our study cohort did not include all diseases that can cause fibrosing ILD but only major diseases, we think that our study cohort better reflects the problems regarding the diagnosis of IPF in real clinical practice.
The discriminatory value of BAL in the real-world population is poorly defined and remains controversial even among experts [13][14][15] . In this study of Asian cohort, adding BAL to rule out the possibility of non-IPF, provide better diagnostic performance for diagnosing IPF in patients with non-definitive UIP. We found that the optimum lymphocyte count cut-off was 15%, a percentage lower than expected. BAL lymphocyte content is similar in IPF and healthy control subjects 16 , and when increased in IPF is associated with moderate to severe alveolar septal inflammation, raising the possibility of an alternative disease, such as cHP, iNSIP or connetive tissue diseaseassociated ILD. Many studies have demonstrated differences in BAL cell counts between lung diseases , but their variability has always been found to be great 17,18 , resulting in considerable overlap that might not allow a reliable diagnosis in individual patients regardless of statistical significance. However, it can be used as a safe guard and important complementary method for optimal selection of patients to undergo a diagnostic surgical biopsy in patients with probable UIP CT pattern.
Patients categorized as indeterminate for UIP were a heterogeneous group but including the majority of patients with heavy smokers and patients with emphysema which was in accord with Tzilas et al. 19 or early stage ILD showing better pulmonary function. Morevoer, regardless of final diagnoses of IPF or non-IPF, patients with CT pattern of indeterminate for UIP had a good prognosis. Our study showed that prognosis of patients with CT pattern of possible UIP can be better stratified based on 2018 criteria.
There are several limations in our study. A main limitation of our study lies in the lack of an external validation cohort which to confirm our findings. However, the scarcity of well-characterised populations of IPF patients, even in tertiary centers, is well recognised. External validation with a cohort of a different prevalence of disease or a prospective study would be helpful to further provide the evidence of routine use of BAL. Second, although we included patients who underwent surgical lung bipsy to build a cohort with less diagnostic uncertainty, it can cause selection bias, mainly in patients with IPF this could include patients with more atypical features compared to patients who did not undergo lung biopsy, which can cause a lower PPV in patients with probable UIP CT pattern. However, the rate of biopsies has changed over time and many patients with typical clinical and imaging features also underwent biopsies in the early period and are included in the study cohort. Third, we did not involve all fibrosing ILD which we can encounter in the real clinical practice and our study cohort may not fully reflect the real prevalence of fibrosing ILD. However, we included iNSIP and cHP since those are the major fibrosing ILD which should be differentiated from IPF in clinical practice. Lastly, as a single center retrospective study, all three readers, thoracic radiologists, were from the same institution, which can limit the generalizability of the results of the study.
In conclusion, the 2018 ATS/ERS/JRS/ALAT diagnostic criteria for IPF provide a better prognostic stratification in patients with CT pattern of possible UIP than 2011 criteria. However, diagnosis of IPF based on CT

Materials and methods
Study population. This retrospective study was approved by the Institutional Review Board of Asan Medical Center (IRB number 2018-1284), which waived the requirement for written informed consent. Patients newly diagnosed with IPF and a clinically relevant control group consisting of patients with iNSIP and cHP who underwent surgical lung biopsy at Asan Medical Center, Seoul, South Korea, between July 1995 and January 2016 were included (Fig. 1). The final diagnoses were revalidated through multidisciplinary discussion by clinician, radiologists and pathologists with reference to the 2011 ATS/ERS/JRS/ALAT guideline 20 . Clinical data (presentation, antigen exposures, smoking status, associated disease and lung function changes), radiological and histopathological findings were discussed for the final diagnosis. All diagnoses of iNSIP were only made when the histopathologic features suggest fibrotic NSIP with compatible radiological findings 21,22 . A diagnosis of cHP was only made when the histopathological features were typical showing the presence of poorly formed nonnecrotizing granulomas with chronic fibrosing interstitial pneumonia or airway-centered fibrosis on surgical lung biopsy 23,24 . For diagnosis of cHP, the presence of an inciting antigen, compatible HRCT imaging, and typical histopathologic findings were considered during the multidisciplinary discussion. The results of BAL fluid analysis were not considered for the multidisciplinary diagnosis in our institution. In our center, the clinical diagnosis of IPF is reached by first excluding other known causes of ILD, and then by the presence of UIP pattern on HRCT in patients not subjected to surgical lung biopsy. In patients who underwent surgical lung biopsy, the diagnoses are based on combinations of HRCT and the surgical lung biopsy pattern followed by multidisciplinary discussion. The approximate rate of biopsy in patients with IPF in our center was about 30% between 2004 and 2017 as reported in a previously published study 25 . However, the rate of biopsies has changed over the long study inclusion period from 1995 to 2016, many patients with typical clinical and imaging features also underwent biopsies in the early period of the study inclusion. In IPF patients of our study cohort, histopathological UIP pattern was present on surgical lung biopsy in patients with CT pattern of alternative diagnosis, and either the histopathological UIP pattern or probable UIP pattern was present in patients with CT pattern indeterminate for UIP. Patients with auto-immune features (i.e., marked lymphoid follicles or plasmacytosis) were excluded priorly and were not screened for study inclusion as we only included a confirmed diagnosis of IPF, iNSIP and cHP. However, 4 patients were excluded who were finally diagnosed with a connective tissue disease and 6 patients were excluded as unclassifiable ILD during the course of a retrospective revalidation of their diagnosis. Patients who were unavailable for histopathologic evaluation or baseline CT and those who were proven to have an isolated cellular NSIP were excluded. Clinical data, including age, sex, smoking history, the results of 6 min walk tests, pulmonary function tests, BAL fluid analysis, histopathologic findings of surgical lung biopsy, and survival were collected. BAL was was performed in accordance with the guidelines 26 . All methods were performed in accordance with the relevant guidelines and regulations.
CT evaluation. HRCT scans were perfomed in both supine and prone positions. HRCT were indepedently evaluated by three thoracic radiologists (K.D., E.J.C. and J.C., with 18, 15 and 4 years of experience in thoracic radiology, respectively), who were blinded to clinical data and final diagnosis. HRCT scans were simultaneously classified into three categories (UIP, possible UIP or inconsistent with UIP) according to 2011 criteria, or into four categories (UIP, probable UIP, indeterminate for UIP or alternative diagnosis) according to 2018 criteria 5,20 .
Statistical analysis. All statistical analyses were performed using SPSS (IBM SPSS Statistics for Windows, Version 21.0, IBM Corp., Armonk, N.Y., USA, www. ibm. com/ produ cts/ spss-stati stics) and R (R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, www.R-proje ct. org) software. To assess the differences in variables between groups, the t-tests or Mann-Whitney U-tests was used for continuous variables and the χ2 test was used for categorical variables. The generalized interobserver agreement for all three readers was evaluated using Fleiss' κ. Cox proportional hazards models with adjusted survival curves were used to evaluate the associations between CT pattern and survival time. The prognostic impacts were adjusted by age, gender, treatment history of anti-fibrotic agent, baseline FVC and baseline DL CO .
To evaluate the added value of BAL fluid analysis for diagnosing IPF in patients with non-definite UIP CT patterns (probable UIP, indeterminate for UIP and alternative diagnosis), such patients were randomly divided into development and independent validation cohorts (3:2 ratio). The correlations with a IPF diagnosis and the results of BAL fluid analysis, along with CT pattern and clinical characteristics such as old age (≥ 60 years) and male sex, were assessed by logistic regression analyses 5 . The best-cutoff for cell count (%) in BAL fluid in the development cohort was determined using Youden's index 27 . The sensitivity, specificity, PPV, and NPV for the diagnosis of IPF were calculated for each regression model. Diagnostic performance was quantified by measuring the AUC, and to quantify the improvement of usefulness added by BAL fluid analysis, a net reclassification improvement was also evaluated 28 . The clinical utility of each prediction model was assessed by decision curve analysis, by quantifying the net benefits at different threshold probabilities 29,30 . A p value < 0.05 was considered statistically significant.