Introduction

Cystic fibrosis (CF) is an autosomal disease caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene. In 2017 the Australian CF data registry (ACFDR) recorded 3151 patients to have CF with 54% being adult, a median age of 19.6 years, and 54% male1. Significant clinical advances have prolonged the average life expectancy for patients with CF and the development of CFTR modulator therapies will also impact disease progression. Despite improved outcomes, lung disease progression will still lead to respiratory failure, and monitoring of lung function and appropriate referral for lung transplantation will remain extremely important2.

The main measure of lung function in people with CF is forced expiratory volume in one second percentage predicted (FEV1pp). One of the main goals of clinical care is to minimise the loss of lung function (decrease in FEV1pp) across the patients’ lifespan. Understanding the natural trajectory of FEV1pp in people with CF is essential to direct early intervention and prevent progression of lung damage, and to inform referral of patients for lung transplantation and novel therapeutic strategies3. Longitudinal statistical models, accounting for known predictors such as patient sex, height and genotype, and based on clinical data, provide a means to predict the course of lung function in patients and therefore inform clinical decision making4. There have been several attempts to study FEV1pp changes in people with CF using different methodologies and a recent review of articles has identified gaps and concluded that “further longitudinal studies are needed to develop statistical models to deal with nonlinear changes in lung function, quantify variability and the influence of risk factors. This should be undertaken without ignoring the censoring effect of death or lung transplantation”4.

Since 1998, the ACFDR has collected diagnostic and treatment data from more than 90 percent of the population of people with CF in Australia1. Data about a CF patient’s diagnosis, treatment and complications are collected by the registry through the treating sites so as to improve health service provision and better comprehend the treatment of CF and outcomes for patients. The ACFDR is managed by Monash University as part of a portfolio of approximately 30 clinical quality registries and is funded by Cystic Fibrosis Australia with the CF specialist centres contributing data. The main objective of our study was to provide a holistic FEV1pp modelling approach dealing with different slopes for before and after lung transplantation, handling death as an outcome and optimally dealing with the issue of non-linearity between FEV1pp and age of visit.

Methods

Data

The main source of data for our analysis was de-identified patient-level records by visit dates from the ACFDR from 2008–2019. People are registered at the time they are diagnosed with CF unless they do not consent. For minors, parental/ guardian consent is required. The ACFDR contains detailed demographic and clinical information of patients with a confirmed diagnosis of CF, and who receive clinical care at 23 specialist CF centres in Australia1. We excluded visits where patients were aged less than 6 years, as FEV1 may not be as reliable or consistently measured for these younger patients and hence the data may not be epidemiologically robust. We also excluded patients who had only one FEV1pp measurement across the study period. FEV1pp was calculated from the raw FEV1 scores using the Global Lung Initiative(GLI) established prediction equations.

To improve the quality of the registry, transplantation data linkage was conducted with the Australia and New Zealand Cardiothoracic Transplant Registry (ANZCOTR). Additional information on the type and date of the transplants was sought from the individual centres. Accurate survival information is an important outcome of the Registry, providing an understanding of the impact of quality improvements in practice and care over time. To ensure accuracy of the survival data, a linkage with the National Death Index was undertaken1. Patient names are not held by the registry, so linkage was based on probabilistic matching, using patient initials, date of birth, gender and patient’s residential postcode.

Ethics approval for this study was provided by Monash University Human Research Ethics Committee (approval number: 16397). Informed consent was obtained from individuals and families of those under 18 years of age, for sites where ethics committee required and an opt-out consent was applied to the other sites where patients had the opportunity to contact the registry and opt out from the registry. All methods were carried out in accordance with relevant guidelines and regulations.

Statistical methods

Our primary endpoint was FEV1pp, which was calculated for each patient visit where FEV1 was measured. Several different statistical approaches have been used to study the relationship between important risk factors and CF lung disease progression as measured by FEV1pp. Current statistical models include generalised linear models (GLM) (which are unable to account for within-patient correlation in the data)5,6,7,8,9,10, generalised estimating equations (GEE) (which do account for within-patient clustering, but are population averaged and therefore unable to model individual patient’s trajectories in FEV1pp)11,12,13,14,15,16,17 and linear mixed models, which address the limitations mentioned above as well as allowing both fixed effect covariates and random intercepts to be modelled (e.g. treating centre effects)18,19,20,21,22,23,24,25,26,27,28,29,30,31,32.

A linear mixed model4 was used to determine changes in FEV1pp over age of visit. A two-way random intercept term was fitted to capture both treating centre effects as well as within-patient variation in FEV1pp. A random slope term was specified at the patient level, so that each patient was allowed to have their own trajectory of FEV1pp over time. Unstructured covariance was specified to allow for distinct correlations between the random intercept and slope terms.

Univariate models were fit for each of these covariates: age of patient at visit, gender, Body Mass Index (BMI) z-score grouped in quartiles, age at diagnosis, patient genotype, lung transplant status, pancreatic insufficiency, birth cohort, insulin dependent diabetes, cirrhosis or portal hypertension, Pseudomonas aeruginosa infection and baseline variability in FEV1pp. As there was a non-linear relationship between FEV1pp and age of patient at visit, we fitted restricted cubic splines, carefully evaluating various knots and knot locations and used the Akaike information criterion (AIC) to select the optimal combination. Pancreatic insufficiency was defined as presence of clinical symptoms (fecal fat test, steatorrhoea) and administration of pancreatic replacement enzymes. P. aeruginosa infection was defined as at least one positive test during followup. Baseline variability in FEV1pp was calculated as the difference between the highest FEV1pp and the median FEV1pp during the subjects’ first 2 years of followup33.

Starting from the most significant covariate identified in the univariate analysis, we used the likelihood ratio test to see whether the inclusion of the next most significant variable helped to significantly improve the fit of the model. This was done sequentially until we evaluated all the variables. For the final multivariate model, to account for different slopes for lung transplantation, we specified separate slopes for FEV1pp before/ after transplantation by including appropriate interaction terms in the model. We plotted separate adjusted means for the variables in the final multivariate model using the margins command, where all the other covariates were kept at their mean values and we specified interaction terms between the cubic splines and the covariates.

To deal with the impact of mortality on our final multivariate results, we undertook a joint longitudinal and survival model using the ‘jmxtstcox’ in Stata which fits a cox proportional hazards model and a linear mixed model with a random intercept, assuming that they share common random intercepts34. As this is a new command still under development by Stata Corp, we opted to include the results as a supplementary analysis in our manuscript. Survival time was defined as time from birth till death or censored at 31 December 2019 for those still alive. We also undertook sensitivity analyses for the multivariate model in order to test the robustness our models, particularly relating to the impact of potential outliers and excluding lung transplantation variable from the final model. Data analysis was performed in Stata V16 (Collage Station, Tx, USA) and level of significance set at 5%.

Results

Descriptives

Out of the original 188,394 visits in the study period, after excluding patient visits before 6 years of age, and those with missing or only 1 FEV1pp measurement across the whole study period, the final number of patients included in our analysis was 3655 (100,907 visits). The median duration of follow-up between birth and last visit was 21.7 years (Interquartile range IQR: 14.9–32.2 years) for the entire cohort.

Table 1 highlights the demographic and clinical characteristics of all patients. Of the 3655 patients, slightly more than half (52.4%) were male. The mean age at last visit was 24.6 ± 12.8 years. Around 27.4% were underweight at the last visit and around 55.3% presented with normal weight. The median age of diagnosis was 0.1 years (IQR: 0.05–0.4 years). 18.1% had insulin dependent diabetes and 6.5% had cirrhosis or portal hypertension. The prevalence of P. aeruginosa infection was 50.3% while 9.6% of patients had a lung transplant and 7.8% of the entire cohort died. There were significant differences in the demographic and clinical variables across age group at visit (Table 1).

Table 1 Demographic and clinical characteristics of cystic fibrosis patients.

We found that there was a non-linear relationship between FEV1pp and age at visit (Fig. 1). There was an initial steeper non-linear decrease in FEV1pp from age 6 up till 30, and then a more gradual linear decline thereafter. Between ages 6 and 18, the rate of decline ranged from − 2.2 to − 1.8 units/ year. From 19 to 30 years, the rate varied from − 1.8 to − 0.5 units, and for those aged more than 30 years, the rate of decline was − 0.5 to 0.06 units.

Figure 1
figure 1

Restricted cubic splines showing non-linear relationship between FEV1 percent predicted and age at visit.

Univariate predictors of FEV1pp

We found that the model with random intercepts for patient and centre, along with a random slope for age at visit, provided a good fit to the data. We also found that a restricted cubic spline model for age of patient at visit with 3 knots fitted at ages 12, 18 and 30 provided a good fit to the model with the lowest AIC of 723,447 (Supplementary Information Table S1). In the univariate analysis, we found the following factors to be associated with FEV1pp: Age of visit, BMI z-score, pancreatic insufficiency, patient genotype, lung transplantation, birth cohort, insulin dependent diabetes, cirrhosis or portal hypertension, P. aeruginosa infection and baseline variability in FEV1pp (Table 2). Gender and age of diagnosis were non-significant.

Table 2 Univariate factors associated with FEV1 percent predicted.

There was a non-linear decline in FEV1pp across age of visit. We also found a dose–response relationship between the BMI z-score quartiles and FEV1pp. Those who were in the second, third and fourth quartile had a higher 3.1, 5.4 and 7.1 unit difference in FEV1pp as compared to those in the first quartile. A larger variability in baseline FEV1pp was associated with lower FEV1pp scores (− 0.5, 95% CI: − 0.7 to − 0.4), p < 0.001. Patients born in the most recent era between 1998 and 2013 had a higher FEV1pp value of 11.7 units (95% CI: 10.0 to 13.4) when compared to those born in an earlier era before 1998 (p < 0.001).

Multivariate predictors of FEV1pp

In the multivariate analysis (Table 3), the following predictors were found to be statistically and independently associated with FEV1pp: age of visit, BMI z-score, lung transplantation and interaction between pre/post transplantation and age of visit, pancreatic insufficiency, insulin dependent diabetes, cirrhosis or portal hypertension, Pseudomonas aeruginosa infection and baseline variability in FEV1pp. The magnitude of the effect sizes was reduced as compared to the univariate results, probably due to confounding. Patients in the second, third and fourth quartile of BMI z-score had higher FEV1pp values of 3, 5, and 7 when compared to those in the first quartile (p < 0.001). Pancreatic insufficiency was associated with a lower FEV1pp value of − 3.7 (95% CI: − 5.5 to − 1.9) when compared to those who were sufficient (p < 0.001). Figure 2a–d highlights the predicted slope of FEV1pp across each level of covariate from the multivariate model. We observed that the slope of FEV1pp decline was steeper for patients with insulin dependent diabetes, cirrhosis or portal hypertension, P. aeruginosa infection and pancreatic insufficiency.

Table 3 Multivariate factors associated with FEV1 percent predicted.
Figure 2
figure 2

Decline in FEV1 percent predicted by key subgroups based on the final multivariate model.

Joint modelling with mortality as the outcome showed that the results from the multivariate model generally persisted in terms of magnitude of effect as well as statistical significance (Supplementary Information Table S2). The effect of age of visit was less attenuated, while magnitude of effect for lung transplantation became larger when mortality as included in the model. There was a reduction in effect size for effects of comorbidities like diabetes and cirrhosis. The effect of pseudomonas and baseline variability in FEV1pp were hardly unchanged from the original analysis.

For the final multivariate model, the residuals were generally normally distributed, except for very slight skewness towards the tail ends of the distribution. As a sensitivity analysis, the model was re-run with the outliers (1% of observations of both ends of the distributions) excluded, and we found minimal changes to the magnitude of the coefficients (Supplementary Information Table S3). We found there was a less than 20% relative change in the magnitude of the coefficients for all variables except for cirrhosis or portal hypertension where there was a reduction in magnitude of the effect size (from − 3.4 to − 2.6). When lung transplantation covariate was excluded from the final multivariate model, minimal changes to the magnitude and direction of the effect sizes for most of the covariates in the final multivariate model were seen, except for pancreatic insufficiency (coefficient of − 4.6 versus − 3.7 in the original analysis) and baseline variability in FEV1pp (coefficient of − 0.55 versus − 0.45 in the original analysis).

Discussion

Our study has identified a number of significant demographic and clinical predictor variables associated with FEV1pp among people with CF. We have addressed some existing methodological challenges particularly relating to accounting for changes in slope of FEV1pp post lung transplantation, dealing with non-linearity through restricted cubic splines and joint modelling with mortality as an outcome.

We found that gender was not significant in the multivariate model. In a systematic review, the effect of gender was mixed, with 5 out of the 9 articles finding a steeper decline in FEV1pp among females, particularly from age 14 onwards4. However, the review did not look at gender effects post 21 years of age, presumably due to lack of follow-up.

The effect of patient genotype on FEV1pp was also not consistent in literature with only 3 out of 8 studies looking at this factor demonstrating that the homozygous F508del mutation group had a greater decline in FEV1pp compared to the heterozygous A455E group, one study showing the effect in the opposite direction and others with no effect4. In our study, patients with pancreatic insufficiency had lower FEV1pp as compared to those who were sufficient, which was consistent with what has been reported in literature4. In our study, although patient genotype was significant in the univariate model, it was no longer significant in the multivariate model. The strong relationship between BMI and FEV1pp demonstrated in our results may not be due to obstruction, but rather poor growth/ nutrition.

We used FEV1pp as an endpoint for our study as calculated from the Global Lung Function (GLI) initiative35, although alternate older methods exist36,37. We used cubic splines to visually demonstrate the non-linear relationship between FEV1pp and age as well as to identify an optimal inflexion point in age where there is a significant change in slope for FEV1pp. Restricted cubic splines can greatly increase the power of these methods to model non-linear relationships. Conditional median survival age estimates from a recent study has shown increases of 11 and 13 years for males and females given survival to age 40 in F508del homozygotes, when compared to survival from birth38. This increase in conditional survival (survivorship bias) could explain the gradual decline in FEV1pp we found among our cohort of patients aged 40 years and above.

We also chose the linear mixed model over others such as the GEE and GLM as our main aim was to study individual patient trajectories in FEV1pp, while the other models are based on population-based averages. We coded our model to allow for separate slopes for patients before and after lung transplantation and we also extended the model to jointly analyse mortality as an outcome to account for any effect of censoring in the data.

Our registry does not collect data on physical findings and symptoms which have been shown to be risk factors for FEV1pp11 and hence these could not be included in the models. As an observational study, unmeasured predictors could affect our results, but our registry is designed according to best practice guidelines and recent research and fields are consistent with international registries. Our study also does not provide a formal comparison of the various statistical techniques used to study longitudinal changes in FEV1pp among patients with CF. Such an analysis would require simulation studies to establish the generalisability of results across different countries and settings. We have opted to include a category for patients with missing data for a particular variable and included them in the analysis so as to preserve the original sample size. Although techniques to impute missing data exist, they rely on the assumption that data is missing at random, which would be problematic to assume with our registry data, since one of the main aims of the registry is to provide benchmarked outcomes to sites and we rely on sites to provide us this data.

The major strength of our paper is that this study was based on a large multi-centre registry of 23 centres across the five states in Australia. By including patients who had lung transplant, we were also able to adequately address the issue of differences in FEV1pp slopes these patients could have before/ after transplantation as compared to other studies which opt to censor the data after lung transplantation39. We were also able to adequately account for the effect of mortality as an outcome rather than a factor variable through joint modelling. Our model also adequately allowed for Centre treatment effects as well as allowed individual patients to have their own starting point FEV1pp and trend over age at visit. We also undertook a sensitivity analysis to test the robustness of key assumptions, specifically the impact of outliers and excluding lung transplantation from the final model. Our population based study provides a unique opportunity to forecast individual patient-level trajectory of FEV1pp.

Conclusion

Our study has addressed important methodological challenges in the analysis of FEV1pp among patients with CF and provide a holistic approach to analysis of such longitudinal data. This epidemiological modelling can better aid epidemiologists through providing a framework to understand the impact of new and potentially useful treatments and interventions.