Lung function over the life course of paediatric and adult patients with cystic fibrosis from a large multi-centre registry

A key measure of lung function in people with Cystic Fibrosis (CF) is Forced Expiratory Volume in the first second FEV1 percent predicted (FEV1pp). This study aimed to address challenges in identifying predictors of FEV1pp, specifically dealing with non-linearity and the censoring effect of death. Data was obtained from a large multi-centre Australian Cystic Fibrosis Data Registry (ACFDR). A linear mixed model was used to study FEV1pp as the endpoint. There were 3655 patients (52.4% male) included in our study. Restricted cubic splines were used to fit the non-linear relationship between age of visit and FEV1pp. The following predictors were found to be significant in the multivariate model: age of patient at visit, BMI z-score, age interaction with lung transplantation, insulin dependent diabetes, cirrhosis/portal hypertension, pancreatic insufficiency, Pseudomonas aeruginosa infection and baseline variability in FEV1pp. Those with P. aeruginosa infection had a lower mean difference in FEV1pp of 4.7 units, p < 0.001 compared to those who did not have the infection. Joint modelling with mortality outcome did not materially affect our findings. These models will prove useful for to study the impact of CFTR modulator therapies on rate of change of lung function among patients with CF.

Scientific Reports | (2020) 10:17421 | https://doi.org/10.1038/s41598-020-74502-1 www.nature.com/scientificreports/ a portfolio of approximately 30 clinical quality registries and is funded by Cystic Fibrosis Australia with the CF specialist centres contributing data. The main objective of our study was to provide a holistic FEV 1 pp modelling approach dealing with different slopes for before and after lung transplantation, handling death as an outcome and optimally dealing with the issue of non-linearity between FEV 1 pp and age of visit.

Methods
Data. The main source of data for our analysis was de-identified patient-level records by visit dates from the ACFDR from 2008-2019. People are registered at the time they are diagnosed with CF unless they do not consent. For minors, parental/ guardian consent is required. The ACFDR contains detailed demographic and clinical information of patients with a confirmed diagnosis of CF, and who receive clinical care at 23 specialist CF centres in Australia 1 . We excluded visits where patients were aged less than 6 years, as FEV 1 may not be as reliable or consistently measured for these younger patients and hence the data may not be epidemiologically robust. We also excluded patients who had only one FEV 1 pp measurement across the study period. FEV 1 pp was calculated from the raw FEV 1 scores using the Global Lung Initiative(GLI) established prediction equations.
To improve the quality of the registry, transplantation data linkage was conducted with the Australia and New Zealand Cardiothoracic Transplant Registry (ANZCOTR). Additional information on the type and date of the transplants was sought from the individual centres. Accurate survival information is an important outcome of the Registry, providing an understanding of the impact of quality improvements in practice and care over time.
To ensure accuracy of the survival data, a linkage with the National Death Index was undertaken 1 . Patient names are not held by the registry, so linkage was based on probabilistic matching, using patient initials, date of birth, gender and patient's residential postcode.
Ethics approval for this study was provided by Monash University Human Research Ethics Committee (approval number: 16397). Informed consent was obtained from individuals and families of those under 18 years of age, for sites where ethics committee required and an opt-out consent was applied to the other sites where patients had the opportunity to contact the registry and opt out from the registry. All methods were carried out in accordance with relevant guidelines and regulations.
Statistical methods. Our primary endpoint was FEV 1 pp, which was calculated for each patient visit where FEV 1 was measured. Several different statistical approaches have been used to study the relationship between important risk factors and CF lung disease progression as measured by FEV 1 pp. Current statistical models include generalised linear models (GLM) (which are unable to account for within-patient correlation in the data) 5-10 , generalised estimating equations (GEE) (which do account for within-patient clustering, but are population averaged and therefore unable to model individual patient's trajectories in FEV 1 pp) [11][12][13][14][15][16][17] and linear mixed models, which address the limitations mentioned above as well as allowing both fixed effect covariates and random intercepts to be modelled (e.g. treating centre effects) [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32] . A linear mixed model 4 was used to determine changes in FEV 1 pp over age of visit. A two-way random intercept term was fitted to capture both treating centre effects as well as within-patient variation in FEV 1 pp. A random slope term was specified at the patient level, so that each patient was allowed to have their own trajectory of FEV 1 pp over time. Unstructured covariance was specified to allow for distinct correlations between the random intercept and slope terms.
Univariate models were fit for each of these covariates: age of patient at visit, gender, Body Mass Index (BMI) z-score grouped in quartiles, age at diagnosis, patient genotype, lung transplant status, pancreatic insufficiency, birth cohort, insulin dependent diabetes, cirrhosis or portal hypertension, Pseudomonas aeruginosa infection and baseline variability in FEV 1 pp. As there was a non-linear relationship between FEV 1 pp and age of patient at visit, we fitted restricted cubic splines, carefully evaluating various knots and knot locations and used the Akaike information criterion (AIC) to select the optimal combination. Pancreatic insufficiency was defined as presence of clinical symptoms (fecal fat test, steatorrhoea) and administration of pancreatic replacement enzymes. P. aeruginosa infection was defined as at least one positive test during followup. Baseline variability in FEV 1 pp was calculated as the difference between the highest FEV 1 pp and the median FEV 1 pp during the subjects' first 2 years of followup 33 .
Starting from the most significant covariate identified in the univariate analysis, we used the likelihood ratio test to see whether the inclusion of the next most significant variable helped to significantly improve the fit of the model. This was done sequentially until we evaluated all the variables. For the final multivariate model, to account for different slopes for lung transplantation, we specified separate slopes for FEV 1 pp before/ after transplantation by including appropriate interaction terms in the model. We plotted separate adjusted means for the variables in the final multivariate model using the margins command, where all the other covariates were kept at their mean values and we specified interaction terms between the cubic splines and the covariates.
To deal with the impact of mortality on our final multivariate results, we undertook a joint longitudinal and survival model using the 'jmxtstcox' in Stata which fits a cox proportional hazards model and a linear mixed model with a random intercept, assuming that they share common random intercepts 34 . As this is a new command still under development by Stata Corp, we opted to include the results as a supplementary analysis in our manuscript. Survival time was defined as time from birth till death or censored at 31 December 2019 for those still alive. We also undertook sensitivity analyses for the multivariate model in order to test the robustness our models, particularly relating to the impact of potential outliers and excluding lung transplantation variable from the final model. Data analysis was performed in Stata V16 (Collage Station, Tx, USA) and level of significance set at 5%.

Results
Descriptives. Out of the original 188,394 visits in the study period, after excluding patient visits before 6 years of age, and those with missing or only 1 FEV 1 pp measurement across the whole study period, the final number of patients included in our analysis was 3655 (100,907 visits). The median duration of follow-up between birth and last visit was 21.7 years (Interquartile range IQR: 14.9-32.2 years) for the entire cohort. Table 1 highlights the demographic and clinical characteristics of all patients. Of the 3655 patients, slightly more than half (52.4%) were male. The mean age at last visit was 24.6 ± 12.8 years. Around 27.4% were underweight at the last visit and around 55.3% presented with normal weight. The median age of diagnosis was 0.1 years (IQR: 0.05-0.4 years). 18.1% had insulin dependent diabetes and 6.5% had cirrhosis or portal hypertension. The prevalence of P. aeruginosa infection was 50.3% while 9.6% of patients had a lung transplant and 7.8% of the entire cohort died. There were significant differences in the demographic and clinical variables across age group at visit (Table 1).
We found that there was a non-linear relationship between FEV 1 pp and age at visit (Fig. 1). There was an initial steeper non-linear decrease in FEV 1 pp from age 6 up till 30, and then a more gradual linear decline thereafter. Between ages 6 and 18, the rate of decline ranged from − 2.2 to − 1.8 units/ year. From 19 to 30 years, the rate varied from − 1.8 to − 0.5 units, and for those aged more than 30 years, the rate of decline was − 0.5 to 0.06 units.

Univariate predictors of FEV 1 pp.
We found that the model with random intercepts for patient and centre, along with a random slope for age at visit, provided a good fit to the data. We also found that a restricted cubic spline model for age of patient at visit with 3 knots fitted at ages 12, 18 and 30 provided a good fit to the model with the lowest AIC of 723,447 (Supplementary Information Table S1). In the univariate analysis, we found the following factors to be associated with FEV 1 pp: Age of visit, BMI z-score, pancreatic insufficiency, patient genotype, lung transplantation, birth cohort, insulin dependent diabetes, cirrhosis or portal hypertension, P. aeruginosa infection and baseline variability in FEV 1 pp (Table 2). Gender and age of diagnosis were non-significant.
There was a non-linear decline in FEV 1 pp across age of visit. We also found a dose-response relationship between the BMI z-score quartiles and FEV 1 pp. Those who were in the second, third and fourth quartile had a higher 3.1, 5.4 and 7.1 unit difference in FEV 1 pp as compared to those in the first quartile. A larger variability in baseline FEV 1 pp was associated with lower FEV 1 pp scores (− 0.5, 95% CI: − 0.7 to − 0.4), p < 0.001. Patients born in the most recent era between 1998 and 2013 had a higher FEV 1 pp value of 11.7 units (95% CI: 10.0 to 13.4) when compared to those born in an earlier era before 1998 (p < 0.001).

Multivariate predictors of FEV 1 pp.
In the multivariate analysis (Table 3), the following predictors were found to be statistically and independently associated with FEV 1 pp: age of visit, BMI z-score, lung transplantation and interaction between pre/post transplantation and age of visit, pancreatic insufficiency, insulin dependent diabetes, cirrhosis or portal hypertension, Pseudomonas aeruginosa infection and baseline variability in FEV 1 pp. The magnitude of the effect sizes was reduced as compared to the univariate results, probably due to confounding. Patients in the second, third and fourth quartile of BMI z-score had higher FEV 1 pp values of 3, 5, and 7 when compared to those in the first quartile (p < 0.001). Pancreatic insufficiency was associated with a lower FEV 1 pp value of − 3.7 (95% CI: − 5.5 to − 1.9) when compared to those who were sufficient (p < 0.001). Figure 2a-d highlights the predicted slope of FEV 1 pp across each level of covariate from the multivariate model. We observed that the slope of FEV1pp decline was steeper for patients with insulin dependent diabetes, cirrhosis or portal hypertension, P. aeruginosa infection and pancreatic insufficiency.
Joint modelling with mortality as the outcome showed that the results from the multivariate model generally persisted in terms of magnitude of effect as well as statistical significance (Supplementary Information Table S2). The effect of age of visit was less attenuated, while magnitude of effect for lung transplantation became larger when mortality as included in the model. There was a reduction in effect size for effects of comorbidities like diabetes and cirrhosis. The effect of pseudomonas and baseline variability in FEV 1 pp were hardly unchanged from the original analysis.
For the final multivariate model, the residuals were generally normally distributed, except for very slight skewness towards the tail ends of the distribution. As a sensitivity analysis, the model was re-run with the outliers (1% of observations of both ends of the distributions) excluded, and we found minimal changes to the magnitude of the coefficients (Supplementary Information Table S3). We found there was a less than 20% relative change in the magnitude of the coefficients for all variables except for cirrhosis or portal hypertension where there was a reduction in magnitude of the effect size (from − 3.4 to − 2.6). When lung transplantation covariate was excluded from the final multivariate model, minimal changes to the magnitude and direction of the effect sizes for most of the covariates in the final multivariate model were seen, except for pancreatic insufficiency (coefficient of − 4.6 versus − 3.7 in the original analysis) and baseline variability in FEV1pp (coefficient of − 0.55 versus − 0.45 in the original analysis).

Discussion
Our study has identified a number of significant demographic and clinical predictor variables associated with FEV 1 pp among people with CF. We have addressed some existing methodological challenges particularly relating to accounting for changes in slope of FEV 1 pp post lung transplantation, dealing with non-linearity through restricted cubic splines and joint modelling with mortality as an outcome. We found that gender was not significant in the multivariate model. In a systematic review, the effect of gender was mixed, with 5 out of the 9 articles finding a steeper decline in FEV 1 pp among females, particularly from age 14 onwards 4 . However, the review did not look at gender effects post 21 years of age, presumably due to lack of follow-up.
The effect of patient genotype on FEV 1 pp was also not consistent in literature with only 3 out of 8 studies looking at this factor demonstrating that the homozygous F508del mutation group had a greater decline in FEV 1 pp compared to the heterozygous A455E group, one study showing the effect in the opposite direction and others with no effect 4 . In our study, patients with pancreatic insufficiency had lower FEV 1 pp as compared to those who were sufficient, which was consistent with what has been reported in literature 4 . In our study, although patient genotype was significant in the univariate model, it was no longer significant in the multivariate model. The strong relationship between BMI and FEV1pp demonstrated in our results may not be due to obstruction, but rather poor growth/ nutrition.
We used FEV 1 pp as an endpoint for our study as calculated from the Global Lung Function (GLI) initiative 35 , although alternate older methods exist 36,37 . We used cubic splines to visually demonstrate the non-linear relationship between FEV 1 pp and age as well as to identify an optimal inflexion point in age where there is a significant change in slope for FEV 1 pp. Restricted cubic splines can greatly increase the power of these methods to model non-linear relationships. Conditional median survival age estimates from a recent study has shown increases of 11 and 13 years for males and females given survival to age 40 in F508del homozygotes, when compared to survival from birth 38 . This increase in conditional survival (survivorship bias) could explain the gradual decline in FEV 1 pp we found among our cohort of patients aged 40 years and above.
We also chose the linear mixed model over others such as the GEE and GLM as our main aim was to study individual patient trajectories in FEV 1 pp, while the other models are based on population-based averages. We coded our model to allow for separate slopes for patients before and after lung transplantation and we also extended the model to jointly analyse mortality as an outcome to account for any effect of censoring in the data.
Our registry does not collect data on physical findings and symptoms which have been shown to be risk factors for FEV 1 pp 11 and hence these could not be included in the models. As an observational study, unmeasured predictors could affect our results, but our registry is designed according to best practice guidelines and recent research and fields are consistent with international registries. Our study also does not provide a formal comparison of the various statistical techniques used to study longitudinal changes in FEV 1 pp among patients with CF. Such an analysis would require simulation studies to establish the generalisability of results across different countries and settings. We have opted to include a category for patients with missing data for a particular variable and included them in the analysis so as to preserve the original sample size. Although techniques to impute  www.nature.com/scientificreports/ missing data exist, they rely on the assumption that data is missing at random, which would be problematic to assume with our registry data, since one of the main aims of the registry is to provide benchmarked outcomes to sites and we rely on sites to provide us this data. The major strength of our paper is that this study was based on a large multi-centre registry of 23 centres across the five states in Australia. By including patients who had lung transplant, we were also able to adequately address the issue of differences in FEV 1 pp slopes these patients could have before/ after transplantation as compared to other studies which opt to censor the data after lung transplantation 39 . We were also able to adequately account for the effect of mortality as an outcome rather than a factor variable through joint modelling. Our model also adequately allowed for Centre treatment effects as well as allowed individual patients to have their own starting point FEV 1 pp and trend over age at visit. We also undertook a sensitivity analysis to test the robustness of key assumptions, specifically the impact of outliers and excluding lung transplantation from the final model. Our population based study provides a unique opportunity to forecast individual patient-level trajectory of FEV 1 pp.

Conclusion
Our study has addressed important methodological challenges in the analysis of FEV 1 pp among patients with CF and provide a holistic approach to analysis of such longitudinal data. This epidemiological modelling can better aid epidemiologists through providing a framework to understand the impact of new and potentially useful treatments and interventions.