Evaluation of a five-year predicted survival model for cystic fibrosis in later time periods

We evaluated a multivariable logistic regression model predicting 5-year survival derived from a 1993–1997 cohort from the United States Cystic Fibrosis (CF) Foundation Patient Registry to assess whether therapies introduced since 1993 have altered applicability in cohorts, non-overlapping in time, from 1993–1998, 1999–2004, 2005–2010 and 2011–2016. We applied Kaplan-Meier statistics to assess unadjusted survival. We tested logistic regression model discrimination using the C-index and calibration using Hosmer-Lemeshow tests to examine original model performance and guide updating as needed. Kaplan-Meier age-adjusted 5-year probability of death in the CF population decreased substantially during 1993–2016. Patients in successive cohorts were generally healthier at entry, with higher average age, weight and lung function and fewer pulmonary exacerbations annually. CF-related diabetes prevalence, however, steadily increased. Newly derived multivariable logistic regression models for 5-year survival in new cohorts had similar estimated coefficients to the originals. The original model exhibited excellent calibration and discrimination when applied to later cohorts despite improved survival and remains useful for predicting 5-year survival. All models may be used to stratify patients for new studies, and the original coefficients may be useful as a baseline to search for additional but rare events that affect survival in CF.

Survival of patients with cystic fibrosis (CF) has improved worldwide [1][2][3][4][5][6] . Improved care over the last 80 years increased the expectation of life at birth from a median of six months to approaching 50 years 2,7,8 . Patients, their families, friends and caretakers often compare individual age at death with the expectation of life at birth as a final measure of their efforts to extend a single life. However, this comparison is overly stringent and cannot account for individual circumstances that markedly change outcomes. If truly desired, one might look instead to the aggregated median age at death which reflects the age-distribution of the contemporary CF population, current mortality rates and the improving conditional survival from all ages as a result of rapidly improving therapies 3,9 . Recently, this measurement rose to nearly 30 years 2 . Currently recommended [10][11][12][13] beneficial treatments include pancreatic enzymes 10,14 , airway clearance 15,16 , mucolysis [17][18][19] , inhaled antibiotics [20][21][22] , anti-inflammatory agents 23,24 , and CF transmembrane regulator protein (CFTR) modulators [25][26][27] . Although gains have been tremendous, CF survival continues to fall short of the approximately 80 year expectation of life for the average newborn in the United States 28,29 , providing continued motivation to improve treatments 2 .
To help clinicians better understand relative survival associations of different demographic and disease-related factors, we previously developed and tested a 5-year predictive survivorship model for CF 30 using a cohort of patients alive and enrolled on January 1, 1993 in the US CF Foundation Patient Registry (CFFPR) followed through 1997 (Table 1). Because part of the original intent was to create a clinically useful tool, we chose logistic regression which most easily produces a single risk assessment derived from multiple but readily obtained measurements at the point of care.
The most important single variable is the forced expiratory volume in 1 second (FEV 1 ), a measurement of airway physiology, expressed as the percent predicted FEV 1 (FEV 1 %) estimated from each patient's age, sex, www.nature.com/scientificreports www.nature.com/scientificreports/ height, race and ethnicity 31,32 . Every percentage point higher value of FEV 1 % indicates a 4% reduction in 5-year risk of death on average, all other factors being equal (odds ratio, OR = 0.96, 95% Confidence Interval [CI] 0.957-0.968, P < 0.001). Burkholderia cepacia has the highest impact of any single factor implying that the infection increases the risk of death more than six-fold (OR = 6.20, 95% CI 3.42-11.23, P < 0.001). Fortunately, the number of patients with this infection was small in 1993 (and remains small, Table 2). The size of the effect of B cepacia is partially explained because of an interaction with the number of pulmonary exacerbations in the year prior to evaluation. Each observed pulmonary exacerbation signals a 59% increased risk of death within 5 years (OR = 1.59 per exacerbation, 95% CI 1.50-1.69, P < 0.001). The interaction term indicates that pulmonary exacerbations have a smaller expected survival effect in the presence of B cepacia infection. Thus, exacerbations in patients with B cepacia infection have an effect that is reduced by about a third by the interaction term (OR = 0.67, 95% CI 0.53-0.84, P = 0.001) compared to exacerbations in the absence of that infection. A diagnosis of CF-related diabetes (CFRD) has the same effect on 5-year predicted survival as a 12 percentage point reduction in FEV 1 % (OR = 1.63, 95% CI 1.21-2.21, P = 0.001). The finding quantified the high impact of the diagnosis in CF 33,34 . Two variables were included in the original multivariable model despite high P values, Staphylococcus aureus infection (OR = 0.81, 95% CI 0.64-1.02, P = 0.07) and pancreatic sufficiency (OR = 0.635, 95% CI 0.35-1.16, P = 0.14) for two reasons. First, they are prominent features of the clinical syndrome that are useful signals of health and disease as commonly assessed in isolation at the bedside, and, second, these variables substantially improved the fit of the overall model to the data.
Clinicians and researchers currently use the model to understand survival implications. For example, Rubin et al. 29 projected long-term survival outcomes of CFTR modulator use for patients homozygous for F508del mutations using the original 5-year predicted model using the coefficients from the proportional hazards version of the model 30 .
Since 1993, death rates with CF have dramatically decreased 2 , the majority of effective CF-specific therapies were introduced 17,18,20,21,[24][25][26][27] , and the CFFPR itself was extensively edited to improve data quality and comply with current privacy and data use practices 1 . Considering these changes, we examined shifts in the distributions of factors underlying 5-year survival and assessed prediction model usefulness. The original model incorporated commonly measured and recorded clinically important variables on which clinicians continue to focus. Thus, we evaluated the original variables in CFFPR-derived patient cohorts from later time periods, examined model discrimination and calibration and considered the need for modifications.

Methods
Data. This study was performed in accordance with Good Clinical Practice and the Declaration of Helsinki.
The University of Utah Investigational Review Board (IRB) assessed the ethics of our procedures and approved our study. After review, we requested the CFFPR 1986-2015 supplemented by 2016 outcomes data from the US CF Foundation (Bethesda, MD, USA). CFFPR data are collected from patients or their guardians after written informed assent (if 12-18 years old) and consent. All data are acquired with local IRB approval in accredited US CF care centers and affiliate programs 2 . Study design. To assess original model performance 30 , we selected patients from the CFFPR who were alive and seen at least once in 1993,1999,2005 or 2011 to create new cohorts to compare to the original cohort. Using the original patient selection criteria 30 , we created a new 1993-1997 cohort with January 1, 1993 as the time origin with follow up until December 31, 1997 or death for comparison with the original cohort to better understand data cleaning effects 1 . Patients without death dates were censored on December 31, 1997.
We used first encounter dates during 1993,1999,2005 or 2011 as the cohort time origins for 1993-1998, 1999-2004, 2005-2010 and 2011-2016, respectively, with follow up to death or last encounter within five years of the entry date. Loss to follow up was defined as having a final recorded contact with a patient before the end of 5 years of follow up for any study cohort without a record of death. These individuals were included as members of the group who remain alive at the end of each 5-year cohort study period. To address potential impact of loss to follow up during each cohort period, we treated censored patients as having died in sensitivity analyses. The 1993-1998 cohort allows assessment of using actual encounter dates for study inclusion compared to the original 1993-1997 cohort 30 . The 5-year survival model 30 includes nine variables, of which some require calculation from underlying variables (Table 1). For inclusion in a cohort, patients had to be at least 6 years old at one or more clinic encounters in the first year of each cohort, 1993, 1999, 2005 or 2011. At the baseline encounter for each cohort, patients needed height, weight and FEV 1 measurements, pulmonary exacerbation counts for the prior year, pancreatic sufficiency status, CFRD status and sputum culture data including methicillin-sensitive Staphylococcus aureus (MSSA) and Burkholderia cepacia complex infections. Repeating our prior method 30 , we excluded patients if they received lung transplantation during or prior to each cohort period for the main analysis and had no other specific exclusion criteria. We repeated the entire analysis to understand the sensitivity to inclusion of patients who underwent lung transplantation during each cohort period. Initial data processing. We applied National Health and Nutrition Examination Survey (NHANES) III equations 31 to the best FEV 1 upon or in the year prior to cohort entry to derive FEV 1 % as these were the values primarily used during the periods of study and in the previous publication 30 . We used more recent Global Lung Initiative (GLI) equations 32 to re-calculate FEV 1 % to assess the potential impact on results and interpretations due to the potential change in patients studied due to the differing availability of equations for specific ethnic and racial backgrounds. We used patient age on the date of start of follow up and the highest FEV 1 , worst microbial culture results and count of pulmonary exacerbations in the year prior to the start of follow up. We used reported insulin and pancreatic enzyme treatments as indicators of CFRD and pancreatic sufficiency, respectively. Weight-for-age z-score was calculated as done previously 30,35,36 .
We developed a method to identify and correct weight, height and FEV 1 values that appeared incorrect for reasons such as recording values in incorrect units (for example, 182 cm measured as 72 inches and reported as 72 cm) or entering wrong values (for example, misrecorded digits such as 182 cm recorded as 82 cm or as 128 cm). These types of mistakes are usually identifiable with review of patient-specific longitudinal data but are often within    Prognostic risk score is the equivalent to the log-odds of death within 5-years.
physiologically normal ranges (Fig. S1). To standardize the process, we fitted generalized additive models (GAM) 37,38 to identify weight, height and FEV 1 values unlikely to be correct. For each patient, we treated height, weight and FEV 1 as dependent variables and age as the independent variable. For each fit, we computed the standard deviation of the residuals and used this to find the z-score for the residual for each individual data point. We removed values with absolute z-scores greater than 3 for height and 4 for weight. We repeated the fits and removal of values with absolute z-scores greater than 3 or 4 on each successive fitting until no outstanding values remained. For FEV 1 , we did not correct values with negative z-scores because acute decreases are expected with pulmonary exacerbations. We used a z-score cutoff of 4 to identify high values likely to be incorrect without falsely identifying similar values following lung transplantation. We used the final individual GAM fits to estimate replacement values for height, weight and FEV 1 values flagged as incorrect. However, we did not correct first or last values to avoid extrapolation outside the bounds of measured data. The main analysis used data from patients with complete data sets after corrections, but we repeated the analyses three times to understand model sensitivity to using complete data sets with (1) uncorrected but physiologically plausible, (2) corrected or (3) corrected and imputed data.
Statistical analysis. We calculated Kaplan-Meier 5-year death rates for the four new cohorts, using each patient's first encounter with complete data as the time origin. We summarized each predictor variable included in the original model and tested for differences between the original 1993-1997 cohort and all other cohorts. We examined each disease characteristic in the whole CFFPR by year to understand whether changes in distribution of values were isolated to study cohorts only 39,40 . We further evaluated year-to-year changes in CFRD prevalence using a multivariable model using generalized estimating equations with an independence working correlation matrix with CFRD as the outcome variable and age as the input variable adjusted by FEV 1 %, weight-for-age z-score and CFFPR year of study 41,42 .
We assessed model discrimination and calibration 43,44 . We used the original model 30 to calculate a prognostic risk score in the new cohorts defined for individual i as . We compared the distribution of the prognostic risk score across the new cohorts. To assess model discrimination, we derived the area under the receiver operating characteristic (ROC) curve or C-index 45,46 .
To assess model calibration, we divided subjects for each study cohort into 10 sub-groups, indexed by g, based on predicted probabilities of death PR i (0-0.1, 0.1-0.2,…,0.9-1) and calculated the expected number of deaths within each sub-group during follow-up, = ∑ ∈ E P R g i g i , and compared to the observed number of deaths both graphically and using a χ-squared test with 9 degrees of freedom 47,48 . To further assess model calibration, we fitted a logistic regression model in each cohort, with the indicator of 5-year mortality as the outcome for each patient, Y i , and the prognostic risk score as the predictor: In a perfectly calibrated model, the intercept (α 0 ) and slope (α 1 ) from the regression should be 0 and 1, respectively 44,49 . We considered two approaches 50 to further assess and improve the performance of the original model using the original 9 covariables and 1 interaction term for the five new cohorts: (1) The calibration intercept method allows a different intercept for the prognostic model in each new cohort, by changing the value of b 0 in Eq. 1. The new intercept is α + b 0 obtained by fitting model (2) with the slope (α 1 ) fixed at 1. An estimate with α < 0 0 indicates that the predicted probabilities obtained from the model are systematically too high while α > 0 0 indicates they are too low. The modified prognostic risk score is: The calibration intercept and slope method systematically alters parameters b 1 , …, b 10 in Eq. (1) by a constant multiplicative factor and derives a new intercept in place of b 0 . The calibration intercept and slope are obtained by fitting model (2), and the prognostic risk score is: Under each method we compared expected and observed probabilities of 5-year survival in 10 risk sub-groups as described above 47 .
We used the statistical system R to create study cohorts and perform all analyses 51 .

Results
The www.nature.com/scientificreports www.nature.com/scientificreports/ measured forced expiratory volume in one second (FEV 1 ) values to derived percent predicted FEV 1 (FEV 1 %) did not substantially change the patterns of inclusions or exclusions. The primary reasons for exclusion were age less than 6 years, missing data or lung transplantation (Table S1). Missingness increased with identification of incorrectly recorded values for height, weight and FEV 1 , but sensitivity analyses using uncorrected, corrected, or corrected and imputed data produced no evidence that data were not missing completely at random. Further sensitivity testing by inclusion of patients undergoing lung transplantation during each study cohort period had no substantial effect on results and no effect on interpretations.
The new 1993-1997 cohort has few changed characteristics relative to the original published 1993-1997 cohort. We found a small clinically unimportant increase in FEV 1 % values, more frequent pancreatic sufficiency and fewer pulmonary exacerbations (Table 2) suggesting that data cleaning since the original analysis 1 does not substantially affect the applicability of the original model publication 30 .
However, the distributions of patient characteristics for the more recent and new 5-year cohorts differ from the original 1993-1997 cohort. Most changes reflect improving trends in the CFFPR (Fig. 1A-D, F-H). In contrast, CF-related Diabetes mellitus (CFRD) prevalence as a function of age worsened (Fig. 1E) by about 9% per year (P <0.001, Table S2), a finding unexplained in multivariable analysis by increasing FEV 1 % (Fig. 1B) or weight-for-age z-score (Fig. 1C), which are negatively 52 and independently associated (Table S2) and incompletely explained by modestly improving detection 34 .
The distribution and range of individual prognostic risk scores (log-odds ratios for death within 5-years) derived using the original 5-year predicted survival model (Table 1) 30 were similar between all new cohorts and the original 1993-1997 cohort ( Table 2). There was no significant difference between the prognostic risk scores from the new 1993-1997 cohort relative to those from the original. Scores for successive cohorts tended to be lower on average. Estimated Kaplan-Meier 5-year death probabilities decreased with successive study cohorts (Fig. 2), although without adjustment, differences could partially be due to age distribution changes at the start of each cohort. Nevertheless, histograms of prognostic risk scores and predicted probabilities showed similar distributions with no obvious outliers (Fig. S2). Refitting the 5-year multivariable logistic regression prediction model produced new model coefficients similar to published coefficients ( Fig. 3 and Table S3), with the exception of the intercepts indicating that associations between predictors and outcomes remained approximately stable in the new study cohorts (Fig. 4). A small percentage of patients were lost during each cohort prior to reaching a full 5 years of follow up (Table S1 and Fig. S3). When we treated patients lost to follow up as having died rather than alive, we obtained similar model coefficients and predicted probabilities compared to those derived from treating those patients as alive at the end of the 5 years.
When we applied the original model to new cohorts, the C-Index ranged from 0.87 to 0.91 demonstrating high discriminative power (Fig. S4) similar to the C-Index of 0.89 in the original 1993 validation cohort (Fig.  S5). Calibration was best in the 1993-1997 validation cohort with similar numbers of expected and observed deaths within sub-groups defined by risk scores (Table S4). Expected tended to be lower than observed numbers of deaths, with increasing differences with successive cohorts due to a bias towards over-optimistic estimates of predicted survival due to exclusion of transplanted patients (see limitations section of Discussion). Table 3 shows results from logistic regression of death within 5-year follow-up on the prognostic score, using Eq. (2). Under a well calibrated model we should find an intercept of 0 (α = 0 0 ) and a slope of 1 (α = 1 1 ), as seen with the 1993-1997 validation cohort. The estimated slopes are close to 1 in all new cohorts, but the intercepts are greater than zero, indicating that original model predicted probabilities are too low. This also holds when the slope is fixed to be 1.
Findings from calibration assessments and similarities between coefficients in new models derived from new cohorts suggest that modifying the intercepts alone (Fig. 4) or both intercepts and slopes in the prediction model (b 0 and α 1 , respectively) would improve the performance of the original model in the new cohorts. Modified intercepts alone, thus using α 0 estimates with α 1 set to 1 from Eq. (2), produced modified prognostic risk scores using Eq. (4) that improved model calibration in all new cohorts (Table S5). For the two most recent cohorts, modifying intercepts alone produced better calibration than using both new intercepts and slopes by using α 0 and α 1 estimates from Eq. (2) (Table S6).
Sensitivity analyses showed similar results using data with no attempts to correct potentially incorrect data with values within physiologic limits (for example accepting a physiologically plausible height of 165 cm without further testing vs deleting a recorded height of 1,650 cm) or using data after imputation of missing data for height, weight and FEV 1 . Results were similar whether using NHANES III 31 or GLI 32 equations to calculate FEV 1 %, although the choices of equations select somewhat different sets of study patients due to racial or ethnic differences. For example, patients of Asian race are excluded when using NHANES III while Hispanic ethnicity cannot be considered when using GLI to derive FEV 1 % because of the lack of applicable equations for race or ethnicity. Additional sensitivity analyses showed similar results when including patients who underwent lung transplantation during each cohort.
In summary, the prediction model has excellent discrimination in new cohorts. Model intercept modifications improve the calibration and accuracy of predicted probabilities of death within 5 years especially for recent cohorts. From Eq. (4), the modified intercept for the 2011 cohort is −1.38 which produces the most appropriate model for use today depending on the application (Original Model with Modified Intercept, Table S3).

Discussion
We evaluated a previously published 5-year predicted survival model of CF and found that it remains a useful prediction model in updated cohorts. However, performance improved with adjustment of the model intercept to account for overall improvements in mortality rates over time. Coefficients for each included variable derived from new cohorts were similar (Fig. 3 and Table S3) showing that associations between demographic factors and measures of disease state remain largely unchanged despite 5-year survival improvements over time (Fig. 2). After intercept adjustment, the original 5-year prediction model has excellent calibration ( Fig. 4  www.nature.com/scientificreports www.nature.com/scientificreports/ unchanged clinical implications and equally good discrimination for all new patient cohorts ( Fig. S4 and Fig.  S5). Five-year survival probabilities for CF improved because of slowing disease progression and shifts in distributions of most survival predictors (Fig. 1). These findings suggest that the original unmodified 5-year predicted survival model remains useful for stratifying individuals into expected survival groups for observational or www.nature.com/scientificreports www.nature.com/scientificreports/ interventional studies of CF 29 . Worksheets allow comparisons of survival predictions using original and modified models (Table S7). The model with modified intercept is more useful for applications where precise comparisons between individual predictions and outcomes are needed, for example, for investigation of the survival impact of lung transplantation 53,54 in a setting of markedly improved survival with CF.
In the current study, all covariates included in the 5-year model except diabetes improved (Fig. 1), favoring better survival on average within each successive cohort. Some patients remain at every disease level, although the proportion of patients in the most severe states of disease continue to decrease (Table 2).
Unexpectedly, diabetes was more common at every age in each succeeding year of the CFFPR (Fig. 1e). By the end of the study period, the roughly 15% increase in CFRD was associated with an increase in mortality equal to approximately 20% of the observed decrease in mortality.
Multiple mechanisms cause CFRD including physical destruction of pancreatic β-islet cells from inflammation 55 , modifier gene influences 56,57 and CFTR dysfunction itself 58,59 . Sustained increases in CFRD prevalence (Fig. 1E) may stem partially from competing influences of modestly improving CFRD detection 2,34 , mild   30 , new 1993-1997, 1993-1998, 1999-2004, 2005-2010, 2011-2016 www.nature.com/scientificreports www.nature.com/scientificreports/ phenotype frequency and weight-for-age z-score. However, improving survival may allow better observation of a direct effect of CFTR dysfunction suggesting that modulators of defective CFTR 25,26 may modify CFRD pathogenesis and that CFRD biomarkers might be novel reporters to help guide use of these new agents. CFTR modulators may treat or prevent CFRD itself independently of lung disease.
The prediction model was fitted using logistic regression modeling of patients with complete data with and without methods to account for missing and incorrect data. Loss to follow up in the CFFPR was recently evaluated in a 2009-2013 cohort and involved less than 10% of patients 1 . We found similar occurrences of loss to follow up in our cohorts (Table S1, last row and Fig. S3). Sensitivity analyses using the four cohorts, 1993-1998, 1999-2004, 2005-2010 and 2011-2016 and treating patients who were lost to follow up as having died rather than as alive at the end of the 5-year period resulted in similar model coefficients and similar fits to the data with no effect on interpretation of our results. Independent evaluation found nearly complete clinical data for 2003-2009 and missingness of no more than 4.2% of death dates 60 . The high follow-up rates and low proportion of individuals lost to follow-up in the CFFPR data 1,60 probably explain the lack of material differences between using uncorrected and corrected data and suggest that finding patients previously missing from the CFFPR may provide no further substantial changes in the prediction model.
The high degrees of model discrimination and calibration suggest that further improvements to the model by simply adding new variables to the model may be difficult. Addition of variables of high interest in the CF community include assessments of liver disease 61 , renal dysfunction 62 , arthropathy 54 as well as severe and acute but unusual or rare events such as massive hemoptysis 63 and pneumothorax 64 . Within the CFFPR, addition of any of these variables to the current logistic regression model eliminated large numbers of patients from assessment   www.nature.com/scientificreports www.nature.com/scientificreports/ due to missingness of data, sometimes exceeding 50% of the patients included in the current study. This degree of missingness introduced severe bias to the analysis of survival (not shown).
Expansion of the model with novel variables is highly desirable but must avoid introduction of bias from exclusion of large numbers of patients with missing data. Inclusion of sufficient relevant events to assess variables in addition to those in our original multivariable model may be feasible using methods that incorporate longitudinal follow up over extended periods. Such a model could allow inclusion of sparsely collected or observed events such as pneumothorax and quantitatively relate their effects to those of more common factors such as low FEV 1 %. The present work, by demonstrating the stability of the 5-year predicted survival model using 5 cohorts collectively followed for over 24 consecutive years, provides the foundational work that supports the feasibility of such a longitudinal approach.
Some improvements to the prediction model might also be achieved by incorporating non-linear functions of continuous variables, e.g., age and FEV 1 %, and by treating prior pulmonary exacerbations as a categorical variable. Further, natural day-to-day fluctuations in FEV 1 %, which may be considered measurement error, could be assessed by other methods, such as joint modeling of the longitudinal FEV 1 % process and the survival process 65 .
Our study has limitations. The data are derived from non-randomly selected CFFPR participants and thus may include biases; however these should not be greater than for our prior work 30 and should be reduced by intervening data cleaning 1 . We excluded patients too young for pulmonary function testing possibly leading to more pessimistic survivorship estimates: with current therapies, these patients tend to start and stay in the highest categories of health and therefore contribute infrequently to deaths in CF 2 . Higher variation in recording data of some variables, such as pancreatic sufficiency status (Fig. 1D) contributes to somewhat larger standard errors in variable estimates (Fig. 3); however, extensive data cleaning did not change our results or interpretations. Methods for recording pulmonary exacerbations changed in 2005 which might have changed the impact on survivorship; however, we are confident of the steady decreases in the numbers of pulmonary exacerbations before and after 2005 (Fig. 1H). We excluded lung transplantation recipients during each cohort period thus incorporating conditioning on a future event. However, reanalysis without transplant exclusions produced similar results with no impact on interpretations. (These results are insufficient, however, to allow comment on transplant survival effects.) The exclusion from the original 1993 cohort resulted in 5-year survival probabilities that were approximately 2% over-optimistic relative to observed survival, a bias that was not clinically meaningful in an analysis of lung transplantation survival outcomes 53 . Finally, many patients were eligible for and included in multiple study cohorts. This allowed assessments of the potential impacts of intervening data cleaning for cohorts beginning in 1993. For cohorts non-overlapping in time, inclusion of all eligible patients provided the most appropriate population for understanding survival during the specific study period.
We tested the published 5-year predicted survival model of CF derived from a 1993-1997 US cohort of patients using new cohorts because new treatments (including CFTR modulators for 2005-2010 and 2011-2016 cohorts) improved observed mortality rates. Results of re-derivation of 5-year survival models using updated cohorts were similar to original results and were stable through multiple sensitivity analyses even when patients lost to follow up were reclassified as being among the dead or including patients who received lung transplants during each cohort. The original model maintains good calibration and discrimination with new cohorts, especially in the most recent cohort with modified intercept alone, demonstrating the stability of both the model and the underlying disease processes of CF in the face of multiple effective therapies. CFRD is an increasingly important detection and treatment target with substantial potential for improving survival with CF. The 5-year predicted survival model in original and modified forms remains useful for disease categorization and individual prognosis, and the demonstrably stable effects of underlying variables provide a foundation for new models incorporating extended longitudinal follow up.