Introduction

Chronic obstructive pulmonary disease (COPD) is a systemic disease with a great impact on patients’ lives.1,2 The majority of COPD patients are treated in primary care and are usually seen by their general practitioner at regular intervals. From a patient’s perspective, one of the most important outcomes is health-related quality of life (HRQL).3 To support management and shared decision making on lifestyle changes and treatment in COPD patients in primary care, it would be useful to predict (changes in) their future course of HRQL based on current characteristics.

Prediction models can serve different purposes. For management of COPD, one important purpose is to inform patients about the future course of their disease and assist physicians and patients with treatment decisions. In addition, these models can support selection of the required patient spectrum for scientific research on COPD4 and efficient subgroup analyses in randomised trials.5 On average, prediction using formal prediction models has been shown to be at least as accurate, more consistent and less expensive than prediction by experienced clinicians.6

The majority of prediction models developed for COPD patients predict mortality,718 although resource utilisation,19 health status,20 hospitalisation7,16,19 and exacerbations7,21,22 have also been used as outcomes. Although HRQL is arguably the most relevant outcome from a patient’s perspective, so far the health-activity-dyspnoea-obstruction (HADO) score seems to be the only existing score that predicts HRQL.10 Specifically, the HADO index was developed for classifying severity of COPD. It predicts HRQL, as measured by several questionnaires (the SF-36 health survey, the St George’s Respiratory Questionnaire and the chronic respiratory questionnaire), and 3-year mortality using information on dyspnoea, overall health and physical activity. The HADO score was derived in men, discriminated only moderately between patients with different levels of HRQL, as measured by the chronic respiratory questionnaire (R2=0.21) and it did not correct for overoptimism.10

We describe here the development of prediction models for domain-specific and overall HRQL in primary care COPD patients. We have taken into account that deteriorations in, for example, the dyspnoea domain may trigger consideration of different therapeutic actions than similar deteriorations in, for example, the emotional domain. We use nomograms to visualise the impact of the different predictors on all outcomes: dyspnoea, fatigue, emotional function, mastery and overall HRQL at 6 and 24 months.

Materials and Methods

Study design and population characteristics

Our analyses were based on the international prospective cohort study (International Collaborative Effort on Chronic Obstructive Lung Disease: Exacerbation Risk Index Cohorts (ICE COLD ERIC)) with 409 primary care COPD patients from Switzerland and the Netherlands. Details of the study design and the baseline characteristics of the patients were published previously.2326 Briefly, 409 COPD patients from primary care were included in 2008/2009. 66%, 25% and 9% were in GOLD stages II, III and IV, respectively; 41%, 21%, 15% and 23% in GOLD stages A–D, respectively. The study is ongoing and will be completed in 2014 after 5 years of follow-up. Patients are followed biannually through telephone interviews and live visits at baseline and after 2 and 4 years of follow-up. The analyses reported here are based on the 2-year follow-up data. The study has been approved by all local medical ethics committees (Academic Medical Center, University of Amsterdam, The Netherlands; Kanton of Zurich, Switzerland and Kanton of St Gallen, Switzerland) and all patients provided written informed consent.

Outcome

The outcome was COPD-specific HRQL at 6 and 24 months as measured by the self-administered chronic respiratory questionnaire (CRQ).27,28 The CRQ consists of 20 questions providing a total score and four domain-specific summary scores for dyspnoea, fatigue, emotional function and mastery, all on 7-point scales, where 1 indicates the worst and 7 the best possible score.

Candidate predictors

We used the data collected at baseline to develop the prediction models. All candidate predictors were specifically selected on the basis of their likely predictiveness and practicality in primary care. Details of the candidate predictors and the data at baseline were published previously.23,24 Table 1 summarises all candidate predictors.

Table 1 Candidate predictors and their characteristics

Nomograms

We simplified the potential application of the prediction models by creating nomograms. For reasons of readability and practicality, some predictors that were initially retained by the lasso procedure (see below) were dropped from the nomograms as they turned out to be weak predictors. We defined ‘weak’ as an effect of <5 on the upper points scale (see nomogram in Figure 3), which corresponds to a change of <0.1 on the HRQL outcome scale, clearly less than the minimal important difference of 0.5.29

Figure 3
figure 3

Nomogram for CRQ dyspnoea outcome at 6 months. FEV1, forced expiratory volume in 1 s; FT, feeling thermometer; CRQ, chronic respiratory questionnaire.

Missing data

Missing data were multiply imputed (10 times) via chained equations (mice package in R version 2.15.2).30 Only patients who were alive at 6 or 24 months were considered.

Statistical analysis

Predictor selection and model fitting were based on penalised linear regression using the “least absolute shrinkage and selection operator” (lasso).31,32 The lasso allows for automatic variable selection on the basis of predictive value (see Supplementary Appendix for details). In summary, the lasso performs backward selection of variables in combination with a penalty on the absolute value of the regression coefficients, such that some are set to zero whereas others are shrunk towards smaller (absolute) values. Compared with standard backward selection, the additional shrinkage improves model performance in new patients.

Calibration of the final model was assessed by comparing the means of predicted values against the means of observed outcomes by deciles of predicted values. Model discrimination was expressed as explained variance (EV). EV expresses increase in predictive performance relative to a model that does not include any predictor. For the measure of predictive value that we used, EV is the same as R2. Often, prediction models perform well in the data set in which they have been developed. But when applied to new patients or other populations, predictive performance may be less. It is possible to (partly) correct for this so-called overoptimism. Although the lasso is expected to show less overoptimism than backward elimination, we still corrected for overoptimism in calibration and discrimination performance using bootstrap resampling.33 No confidence intervals are given, because there are currently no validated methods to calculate them when bootstrap-based correction for overoptimism is used. Analyses were done using Stata (version 10.1), the R statistical computing environment (version 2.15.2) and the caret and glmnet packages in R. See Supplementary Appendix for more elaborate statistical information.

Results

Table 2 shows selected predictors and their regression coefficients for all prediction models. For all CRQ outcomes, the best predictor was that particular CRQ score at baseline (regression coefficients 0.66, 0.63, 0.56 and 0.43 for dyspnoea, fatigue, emotional function and mastery, respectively). For each CRQ outcome some additional predictors were selected, such as the feeling thermometer for dyspnoea and the HADS depression score for fatigue. See Supplementary Appendix Table 1 for all regressions equations.

Table 2 Selected predictors and their regression coefficients

Discrimination performance of the models

Figure 1 shows the EVs for all outcomes (four domains and overall HRQL) for 6 and 24 months. EV scores ranged from 0.23 to 0.58 and were higher for 6-months than for 24-months prediction. The mastery domain had the lowest values. See Supplementary Appendix Table 2 for all EVs.

Figure 1
figure 1

Explained variance (EV) of the prediction models at 6 and 24 months for mastery, emotional function, fatigue, dyspnoea and overall health-related quality of life.

Calibration performance of the models

Figure 2 visually displays calibration for the dyspnoea outcome. The other models were similar (see Supplementary Appendix Figures 1 and 2). For at least 90% of the deciles, the predicted CRQ values did not differ from those observed by >0.5, the minimal important difference.29 Note that CRQ dyspnoea scores were relatively high, which is to be expected in this primary care cohort.

Figure 2
figure 2

Calibration curve for the dyspnoea model at 6 months. x axis, predicted CRQ score; y axis, observed CRQ score (scores range from 1 (worst) to 7 (best)); (──) diagonal, x=y, perfect prediction; (- - -) regression line, note that predicted score > observed score up to a predicted score of 4.5, and predicted score < observed score for values above 4.5; (······)±0.5 (minimal important difference); (■) deciles, note that all deciles remained within the 0.5 range, meaning that the average per decile is within the limits of the minimal important difference; grey numbers, all predicted values per decile. Note that CRQ dyspnoea scores are relatively high, which is expected in this primary care cohort.

Nomograms

Figure 3 shows the nomogram to predict dyspnoea at 6 months. Figure 4 illustrates how the nomogram should be used and read off. See Supplementary Appendix Figure 3 for the nomograms for the other outcomes. All predictors can be read from these nomograms, as well as their contribution to the prediction.

Figure 4
figure 4

Example of using the nomogram. FEV1, forced expiratory volume in 1 s; FT, feeling thermometer; CRQ, chronic respiratory questionnaire. From each predictor scale, draw a vertical line up through the points scale (upper scale) and sum all points, Next, fill in the sum value in the total points scale, draw a vertical line through the outcome variable (here dyspnoea at 6 months) and read off the predicted outcome.

Discussion

Main findings

We found that COPD-specific HRQL after 6 and 24 months in primary care COPD patients could be reasonably well predicted by the corresponding domain-specific score at baseline. As expected, six-months predictions turned out to be better than 24-months predictions and dyspnoea, fatigue and emotional function were easier to predict than mastery. These models could be improved by adding between one to six other predictors to the strongest predictor, such as the HADS, the feeling thermometer and the other domain-specific CRQ scores. The predictions were close to the average observed values and within limits demarcated by the minimal important difference of 0.5.29 This indicates good calibration. Explained variances (EV) were high and >0.4 with the exception of mastery whereas the HADO score from an earlier published model had an EV of 0.21.

Candidate predictors

All candidate predictors were selected on the basis of their likely predictiveness of HRQL and their practicality in primary care settings. According to Tsiligianni et al.,34 HRQL is strongly associated with dyspnoea, depression, anxiety and exercise tolerance, all of which were candidate predictors in our study. Some potentially important but more difficult to collect predictors may have been omitted, such as the 6-minute-walk test to measure exercise capacity. We decided to use the more practical 1-minute sit-to-stand and handgrip strength tests, which are strongly associated with HRQL. With regard to assessing self-efficacy, we used three questions, which were not formally validated.

There were few missing data. In particular, only 19 out of 409 patients were lost to follow-up (4.6%) at 24 months. With regard to predictors, across eight candidate predictors <1% of values were missing and for one candidate predictor 7% of values were missing (see Table 1). All other data were complete. Multiple imputation was used for missing values.

Strengths and limitations of this study

Our study has some limitations. First, as with all prediction models based on current practice data, major changes in treatment practice may decrease predictive value of the models. For example, if some new and effective intervention will be developed and extensively used in COPD care, our models could lose some predictive value. Second, we only modelled main effects, no interactions. We cannot exclude that particular combinations of two or more candidate predictors have additional predictive value over and above that provided by the individual candidate predictors. Third, although we corrected for overoptimism, the models were only validated internally, that is, within the same data set. Therefore, the models may benefit from formal validation on data from other patients (external validation).

We see the following strengths of our study. First, HRQL is one of the most important outcomes from a patient’s perspective and the primary outcome measure of the ICE COLD ERIC cohort. The CRQ, which is a validated questionnaire,27,28 enabled us to exploit its four domains and develop domain-specific models to assist (shared) decision making directed at clinically clearly demarcated outcomes. Another practical advantage of the CRQ is that it is a patient-reported outcome. Second, in both countries the patients were recruited from primary care and all candidate predictors were explicitly selected on the basis of their practicality in primary care settings. Third, all questionnaires were validated except for the self-efficacy questions. Fourth, for each country an adjudication committee of experienced general practitioners and pulmonologists assessed the exacerbations. Fifth, practical nomograms were developed facilitating the use of the models in everyday care. Finally, the use of advanced statistical methods enabled us to correct for overoptimism and increase the external validity of our models. The lasso method also reduced the size of the models keeping them as practical as possible in busy practice.

Interpretation of findings in relation to previously published work

So far, only the HADO score has been developed to predict HRQL in COPD patients. Unfortunately, the HADO score was derived in men, and it showed moderate discrimination (R2=0.21) for HRQL10 and was not corrected for overoptimism.

Implications for future research, policy and practice

In COPD research, many prediction models predicting death (or age at death) exist.718 There are a few models that predict the probability of new exacerbations as a function of patient characteristics.7,21,22 These models have in common that age and the number of exacerbations in the previous year are among the strongest predictors of the two phenomena that these models predict, respectively. From a patient’s perspective, one of the most important outcomes is (disease-specific) HRQL.3 The burden of COPD differs between patient groups; some are almost untouched by the disease whereas others can be completely handicapped. These differences can be captured by health-related quality of life, namely, that part of quality of life (ability to enjoy normal life activities) that is determined by health. HRQL includes several dimensions such as general health status (an overall evaluation of a person’s health), mental and psychological status, the ability to perform social activities and so on. COPD-specific HRQL is the potential impact of COPD on the HRQL. The GOLD guidelines recommend using repeated measurements on COPD-specific HRQL questionnaires for monitoring and follow-up. Still, performing repeated measurements only will not improve a patient’s HRQL. A major goal in the management of patients with COPD is to ensure that the burden of the disease is as limited as possible and the COPD-specific HRQL is as good as possible.3,34

Our current study shows that COPD-specific HRQL is no exception to the rule that a previous measurement of a phenomenon usually is a strong predictor of the (probability of) the next occurrence. A prediction model as such only does what the name suggests: predict. By itself, it does not change the course of disease. The latter, arguably the most important aim, comes about by acting on the prediction appropriately. Evidence external to the prediction model studies is needed to learn which actions may change the disease course or the probabilities of any untoward events predicted.

In daily clinical practice, our models can be used to inform patients about future HRQL. Since all predictors are available in primary care, general practitioners can use the models to predict their patients’ courses in different domains of, and in overall, HRQL. All predictors in the models are medically plausible in their capacity to predict HRQL. A commonly used predictor, such as FEV1, does not seem to be a strong predictor of HRQL. Depending on the outcome per domain, general practitioners and patients can discuss and try to prioritise different treatment actions. Suppose our models predict that, in the next 6 months, a patient will clearly decline in the dyspnoea domain. It is known that pulmonary rehabilitation has a beneficial effect on all domains of HRQL and on dyspnoea in particular.35,36 According to Lacasse et al.,37 on average, pulmonary rehabilitation improves the CRQ dyspnoea score by more than 1 point, clearly exceeding the minimal important difference of 0.5. Our models can be used in daily clinical practice to show patients their expected course on (different domains of their) HRQL and, in the case of marked decline in one or more domains, they may assist the physician and patient to prioritise treatment decisions. In our example patient, one may discuss with the patient the option of a pulmonary rehabilitation programme to prevent the expected decline in the dyspnoea domain.

Suppose that an effective intervention is very cheap and has no adverse events. Obviously, no prediction model is needed as all patients may receive this intervention. However, this is rare. Usually, effective interventions are (somewhat) costly and do have adverse events. In this case, a high predicted risk is usually needed to justify the use of that intervention. Prediction models may help clinicians in the decision making process.

Future studies should further validate our models in other populations with respect to discrimination and calibration. Also, data from randomised trials and meta-analyses can be incorporated or linked to our models to estimate how the prediction of HRQL is likely to change when adding treatments such as smoking cessation programs, pulmonary rehabilitation or specific drug treatments.38,39 Finally, cost-effectiveness evaluation of prediction models for HRQL in COPD should be performed to determine if it is worth the effort incorporating these models into practice. After successful completion of these steps, our models can support treatment selection on the basis of the individual patient’s prognosis.

Conclusions

To predict COPD-specific HRQL in primary care COPD patients, previous HRQL is the best predictor. Asking patients explicitly about dyspnoea, fatigue, depression and coping with COPD provides additional important information about future HRQL whereas FEV1 and some other commonly used predictors, such as exercise capacity add little to the prediction of HRQL.