Introduction

Prevention and management of chronic lung diseases such as COPD, asthma, or lung cancer are of great importance given their high prevalence and the associated economic burden on the healthcare system1,2,3,4,5. While dedicated tests are available for reliable diagnosis and monitoring of lung diseases6,7,8, accurate prediction to identify those who will eventually develop severe morbidity and mortality is currently limited. Therefore, new methods to improve risk stratification are desirable. Chest radiographs (CXR) are the most common diagnostic imaging test and are acquired in the workup of many lung diseases9. However, although most of them are without actionable radiological findings by a human reader10, especially in the early stages of lung disease, more quantitative, computer-aided analyses may provide a window into the risk and extent of lung disease beyond established methods.

With recent advances in artificial intelligence, new possibilities to automatically capture and quantify a multitude of information have become available11,12. This is particularly true in medical imaging, where deep learning (convolutional neural networks or CNNs) has demonstrated high performance in estimating the risk of mortality, incident lung cancer, or biological aging from a chest radiograph image13,14,15. These results indicate that medical imaging might be helpful to personalize risk assessment based on changes to our anatomy, even in an asymptomatic preclinical stage16,17,18,19. Moreover, using medical imaging for risk estimation may have broader applications compared to established methods in clinical care as imaging-based measures can be calculated opportunistically from existing scans acquired in daily routine13,20,21,22.

In this study, we developed a CNN (CXR-Lung-Risk) to identify individuals at high risk for lung disease mortality. The only input to the model is a single existing chest radiograph and the output is a risk for lung disease mortality expressed in years meaning, if the model outputs a risk of 75 years this is an equal risk of lung disease mortality as the risk of an average 75-year-old individual. We tested the prognostic value of CXR-Lung-Risk in three distinct clinical scenarios, including an asymptomatic community population enrolled in the Prostate, Lung, Colorectal, Ovarian (PLCO) Cancer Screening Trial23,24, heavy smokers eligible for lung cancer screening CT enrolled in the National Lung Screening Trial (NLST)25 and patients with histologically confirmed early-stage (I-III) lung cancer from the Boston Lung Cancer Study (BLCS). Our findings motivate the use of deep learning to identify individuals at high risk of lung disease mortality from easily obtainable and low-cost chest radiograph images. These findings may allow for improved risk assessment of those who would benefit most from personalized prevention and treatment strategies.

Results

We developed a deep learning model to estimate the risk of lung disease mortality using a chest radiograph as the only input and independently tested the model in three held-out datasets comprising more than 15,000 individuals: I) 20% of participants (n = 10,155, median follow-up=17.0 [IQR 14.8–19.0] years) not seen during model development from PLCO23,24. PLCO was a multicenter randomized controlled trial of chest radiography for cancer screening in asymptomatic individuals aged 55–74 years enrolled at 10 US sites from 1993 through 2001. Outcomes were assessed via annual questionnaires, communication with next of kin and the National Death Index. Cause of death was determined using ICD-9 codes. II) Participants from the NLST25 chest radiograph arm (n = 5,414; median follow-up=11.9 [IQR 7.3–12.3] years). NLST was a randomized controlled trial that enrolled heavy smokers (≥30 pack years) aged 55–74 years for lung cancer screening via chest CT vs. chest radiograph at 21 US sites from 2002 through 2004. Similar to PLCO, outcomes were assessed via annual questionnaires, communication with next of kin and the National Death Index and ICD 9 codes were used to determine cause of death. III) patients from the BLCS (n = 407; median follow-up=3.4 [IQR 1.5–7.2] years), which is an ongoing multicenter observational epidemiologic cohort registry of patients with histologically confirmed lung cancer. Mortality was verified by study staff via manual chart review and was available for lung cancer-specific mortality only. An overview of the study design and analyses is provided in Fig. 1.

Fig. 1: Overview of the study design.
figure 1

a The CXR-Lung-Risk model was developed in PLCO. The only input to the model is a chest radiograph image; the model output is an estimated risk of lung disease mortality. Independent testing was performed in a held-out subset of PLCO participants, individuals enrolled in NLST and patients with histologically confirmed lung cancer from the BLCS. b The prognostic performance of the CXR-Lung-Risk model was evaluated and compared to clinical risk factors in all datasets. Source data are provided as a Source Data file. PLCO Prostate, Lung, Colorectal, Ovarian Cancer Screening Trial; NLST National Lung Screening Trial, BLCS Boston Lung Cancer Study.

PLCO had the lowest mean CXR-Lung-Risk (mean 63.0 ± 5.5 years), followed by NLST (screening eligible heavy smokers; [mean 66.1 ± 5.7 years]) and then BLCS (patients with histologically confirmed lung cancer [mean 70.5 ± 6.7]) (p < 0.001 for all comparisons). In general, CXR-Lung-Risk was significantly higher in men, current or former smokers, and if traditional radiographic findings were present (Supplementary Figs. 1 and 2). Further detailed patient demographics for all datasets are provided in Tables 1a, b and 2, and Supplementary Table 1.

Table 1 (a) Patient demographics and clinical risk factors of PLCO and NLST participants. (b) Radiology findings of PLCO and NLST participants
Table 2 Patient demographics and clinical risk factors of BLCS patients for the entire data set stratified by CXR-Lung-Risk groups

Internal testing in PLCO: First, CXR-Lung-Risk was independently tested in the remaining held-out dataset of PLCO (n = 10,155) not seen during any part of training. Kaplan-Meier survival analysis revealed a graded association between CXR-Lung-Risk categories and lung disease mortality (Fig. 2a). The univariable hazard ratio for lung disease mortality for those with a CXR-Lung-Risk between 65 and 75 years old was 5.74 [4.69–7.01]; p < 0.001 and 31.45 [24.43–40.48]; p < 0.001 for those >75 years old compared to the reference group (CXR-Lung-Risk <65 years old). This association remained robust after adjusting for baseline demographics (chronological age, sex, race, smoking status, pack years, body mass index) and clinical risk factors (prevalent diabetes mellitus, hypertension, history of stroke, myocardial infarction and cancer) (adjusted hazard ratio for CXR-Lung-Risk between 65–75 years old was 3.52 [2.81–4.41]; p < 0.001 and 11.86 [8.64–16.27]; p < 0.001 for those >75-year-old).

Fig. 2: Independent testing of the CXR-Lung-Risk model in the PLCO testing dataset and in NLST to estimate lung disease mortality.
figure 2

The CXR-Lung-Risk model was independently tested in (a) the PLCO testing set (n = 10155 independent individuals) and in (b) NLST (n = 5414 independent individuals). Kaplan-Meier survival analysis shows a graded association between CXR-Lung-Risk groups and lung disease mortality. Pairwise comparison of survival curves was performed using two-sided Log-Rank tests. P-values are adjusted for multiple comparisons using the Bonferroni-Holm method. Forest plots show univariable and multivariable-adjusted hazard ratios (box) with 95% confidence intervals (error bars) for the different CXR-Lung-Risk groups. Multivariable models are adjusted for: chronological age, sex, race, smoking status, pack years, body mass index, prevalent diabetes mellitus, hypertension, history of stroke, myocardial infarction, cancer and 9 chest x-ray findings as described in the methods. Source data are provided as a Source Data file. ***p values <2*10−16. CXR chest radiograph, PLCO Prostate, Lung, Colorectal, Ovarian Cancer Screening Trial, NLST National Lung Screening Trial; y years.

To test whether CXR-Lung-Risk adds incremental value to a baseline multivariable model with the same covariates but without CXR-Lung-Risk, nested Cox proportional hazard models were compared. Adding CXR-Lung-Risk to the baseline model resulted in a modest improvement to estimate lung disease mortality compared to the baseline model alone (c-index: 0.83 [95% CI 0.81–0.85] vs. 0.81 [95% CI 0.79–0.83]).

To account for the confounding effect of smoking, we stratified the PLCO dataset by smoking status (n = 5505 ever smoker and n = 4650 never smokers). After adjustment for the same risk factors, CXR-Lung-Risk remained independently associated with lung disease mortality in both subpopulations (Supplementary Fig. 3b, c, Supplementary Table 1).

In addition, stratified analyses by sex and chronological age (<65 years old vs. ≥65 years old) are provided in the Supplements (Supplementary Figs. 4 and 5), which revealed similar results for all investigated subgroups.

In a sensitivity analysis, we tested the association between CXR-Lung-Risk and lung cancer-specific mortality in the entire PLCO testing data set and in current or former smokers (quit <15 years ago) with a smoking history of ≥30 pack years to allow for a better comparison to individuals enrolled in NLST (see below). CXR-Lung-Risk showed a graded and independent association with lung cancer-specific mortality after adjusting for demographics and clinical risk factors (Supplementary Fig. 6a, b).

External testing in heavy smokers participating in NLST: As in the PLCO testing data sets, Kaplan-Meier survival curves showed a graded association between CXR-Lung-Risk categories and lung disease mortality in NLST (Fig. 2b). Univariable hazard ratios for lung disease mortality for those with a CXR-Lung-Risk between 65 and 75 years old was 3.03 [2.34–3.93]; p < 0.001 and 10.92 [8.07–14.77]; p < 0.001 for those >75-year-old. Multivariable hazard ratios adjusted for the same baseline demographics and clinical risk factors as in PLCO were 2.48 [1.88–3.29]; p < 0.001 for those with a CXR-Lung-Risk between 65–75 years and 6.48 [4.52–9.31]; p < 0.001 for those with a CXR-Lung-Risk > 75-year-old. In addition, CXR-Lung-Risk showed a modest improvement in estimating lung disease mortality when added to the multivariable model of demographics and clinical risk factors alone (c-index: 0.76 [95% CI 0.74–0.78] vs. 0.72 [95% CI 0.70–0.74).

As for PLCO, similar results were seen in NLST participants for sex and age-stratified analyses (Supplementary Figs. 7 and 8) and lung cancer-specific mortality for the entire cohort (Supplementary Fig. 6c).

Testing in patients with early-stage lung cancer from the BLCS: Similar to asymptomatic screening individuals, CXR-Lung-Risk showed a significant graded association with lung cancer-specific mortality in patients with histologically confirmed early-stage (I-III) lung cancer in the BLCS (Fig. 3A). The univariable hazard ratio for a CXR-Lung-Risk between 65–75 years was 1.74 [1.15–2.64]; p = 0.009 and 3.30 [2.07–5.25]; p < 0.001 for a CXR-Lung-Risk >75 years. After multivariable adjustment for age, sex, race, obesity, smoking status, cancer stage, and treatment, the association for the CXR-Lung-Risk category 65–75 years was attenuated (hazard ratio: 1.28 [0.81–2.02]; p = 0.30) but remained robust for those categorized as being >75 years old (hazard ratio: 2.33 [1.36–3.99]; p = 0.002). Likewise to the other testing data sets, a small improvement to estimate lung cancer-specific mortality was found for CXR-Lung-Risk when comparing multivariable nested Cox models with and without CXR-Lung-Risk (c-index: 0.76 [95% CI 0.72–0.80] vs. 0.75 [95% CI 0.71–0.79]).

Fig. 3: Independent testing of the CXR-Lung-Risk model in the Boston Lung Cancer Study (BLCS) to estimate lung cancer-specific mortality.
figure 3

In contrast to PLCO and NLST, cause of death was only available for lung cancer but not for other lung diseases. a Kaplan-Meier survival analysis shows a graded association between CXR-Lung-Risk groups and lung cancer-specific mortality in the entire cohort (n = 407 independent individuals). Subgroup analyses stratified by chronological age (b) <65 years old (n = 194 independent individuals) vs. (c) ≥65 years old (n = 213 independent individuals) revealed that this effect seems to be driven by older patients with a chronological age ≥65 years. Pairwise comparison of survival curves was performed using two-sided Log-Rank tests. P values are adjusted for multiple comparisons using the Bonferroni-Holm method. Forest plots show univariable and multivariable-adjusted hazard ratios (box) with 95% confidence intervals (error bars) for the different CXR-Lung-Risk groups. Multivariable models are adjusted for chronological age, sex, race, obesity, smoking status, cancer stage, and treatment. Source data are provided as a Source Data file. *p value = 0.01; ***p value = 2.6*10−7; ns = nonsignificant; ***1p value = 0.0006; CXR chest radiograph, BLCS Boston Lung Cancer Study, y = years.

In subanalysis stratified by chronological age (<65 years old vs. ≥65 years old), we found a significant association between CXR-Lung-Risk categories and lung cancer-specific mortality in chronologically older patients, which was not observed in chronologically younger patients (Fig. 3b, c).

To investigate the potential clinical impact of CXR-Lung-Risk in patients with lung cancer we calculated risk reclassification tables based on the CXR-Lung-Risk categories and chronological age (<65 years old vs. ≥65 years old) (Table 3). We found increasing mortality rates by CXR-Lung-Risk categories in both those <65 years of chronologic age and ≥65 years.

Table 3 Risk reclassification of lung cancer-specific mortality based on risk categories defined by chronological age and CXR-Lung-Risk

In a subset of BLCS patients with available lung function testing (n = 348), the proposed CXR-Lung-Risk was compared to a previously described method to estimate a lung age (Lung-Age) via a linear regression using the forced expiratory volume in the first second (FEV1), sex and height26, which showed a modest correlation (Pearson´s r = 0.45; p < 0.001) with CXR-Lung-Risk (Supplementary Fig. 9a). Univariable and multivariable hazard ratios for CXR-Lung-Risk and Lung-Age are provided in Supplementary Fig. 9b, c.

Finally, the relation between CXR-Lung-Risk and FEV1 was investigated, which showed a moderate negative correlation (Pearson´s r = −0.30; p < 0.001; Supplementary Fig. 10a). When adding FEV1 to a multivariable model with the same demographic and clinical risk factors as above, the association between CXR-Lung-Risk and lung-cancer-specific mortality remained significant for the >75 years old category (hazard ratio: 1.99 [1.07–3.68]; p = 0.03; Supplementary Fig. 10b).

Discussion

In this study, we propose a deep-learning convolutional neural network that estimates the risk of lung disease mortality from a chest radiograph image as the only input. In three independent testing datasets, CXR-Lung-Risk discriminated individuals at high vs. low risk for lung disease mortality. In addition, CXR-Lung-Risk proved to be independent of and additive to baseline demographics (including age and smoking status), cardiovascular risk factors and traditional radiologic findings after multivariable adjustment. We observed a graded association between CXR-Lung-Risk and individual risk profiles. Higher CXR-Lung-Risk estimates were associated with risk factors like smoking, hypertension, a history of myocardial infarction and stroke as well as traditional radiologic findings. The lowest mean CXR-Lung-Risk (63.0 ± 5.5 years) was found in PLCO, an asymptomatic screening population without known lung cancer 23. NLST25 (all ≥30 pack year smokers) had a higher CXR-Lung-Risk on average (mean 66.1 ± 5.7 years), while the highest CXR-Lung-Risk was found in BLCS patients with histologically confirmed lung cancer (mean 70.5 ± 6.7 years). These findings can be intuitively understood - increasing damage to the chest is associated with higher CXR-Lung-Risk regardless of other risk factors.

These findings could have clinical implications for the treating physician and the patient as decisions on treatment allocation are strongly based on clinical risk factors such as chronological age, FEV1 and comorbidities of an individual to estimate eligibility and tolerability to the chosen regimen8,27,28. For example, Walter et al. found that increasing age was negatively associated with the receipt of cancer-directed treatment (e.g. surgery or radiotherapy) in a cohort of more than 13,000 lung cancer patients29. Wang et al. reported in a study including more than 20,000 veterans with lung cancer that higher age was a stronger predictor for not receiving guideline-recommended treatment than the presence of comorbidities30. If CXR-Lung risk, a personalized risk estimate, rather than chronological age was used, this could affect treatment decisions for tumor-directed therapy. In this context, CXR-Lung-Risk might be a helpful objective decision-making tool to reduce age-related bias by the treating physician and the risk of withholding a potentially beneficial therapy in an older, but physiologically fit patient31. Furthermore, the substitution of chronological age with the biological age captured by CXR-Lung-Risk in existing risk calculators/predictors may provide more accurate decision support. For example, lung cancer prediction models (either for selection of screening candidates or cancer risk prediction of solitary pulmonary nodules) commonly include chronological age32, and substitution with CXR-Lung-Risk may improve the utility and accuracy of these clinical risk predictions and help for risk reclassification beyond current methods.

Chest radiographs are the most common imaging test and are especially common in persons at risk for lung disease9. Although most chest radiographs do not show findings that require clinical interventions, there is increasing evidence that chest radiographs carry additional prognostic information beyond traditional diagnostic findings (e.g. lung consolidations or nodules). For example, in our previous work we demonstrated that deep learning can identify heavy smokers at high risk for incident lung cancer and that a deep learning biological chest x-ray age predicts longevity beyond chronological age and independent of baseline risk factors13,14. CXR-Lung-Risk is a new model that allows for identifying individuals at increased risk for mortality of various lung diseases demonstrating that potentially relevant prognostic information captured in a chest radiograph may go unreported. Implementing a tool like CXR-Lung-Risk into the EMR or PACS to automatically extract this currently unused information could help to increase the diagnostic value of this imaging test. For example, only around 5% of eligible Americans are screened for lung cancer33,34,35. Here, the proposed model could help to flag individuals at high risk to prompt risk discussion and encourage entry intro screening programs. Furthermore, it has been reported that approximately 70% of individuals with COPD are underdiagnosed36 with the risk for increased morbidity and mortality. In this context, CXR-Lung-Risk could be deployed to automatically notify the treating physician to schedule a follow-up visit for the patient to investigate possible causes and discuss potential interventions, such as a full pulmonary function test and the use of bronchodilator. As no human input is necessary, CXR-Lung-Risk could be used with minimal disruption of current clinical workflows and automatically analyze the latest radiograph of a patient at high speed and low additional cost37. As such, CXR-Lung-Risk could serve as an early warning system to triage patients into existing screening and chronic pulmonary disease pathways, and to both provide more accurate risk assessments for those programs and increase adherence to guidelines-based therapies.

The following limitations of our study need to be considered. First, the input to the model is a raw chest radiograph. It remains unknown, which alterations and findings in the image are important for the final prediction. This is a common drawback of deep learning models that may limit the acceptance by physicians and patients to use this information for clinical decision-making. However, association analysis shows correlation with clinical risk factors (e.g. smoking, age, prevalent hypertension) and traditional radiologic findings (e.g. nodules, fibrosis, emphysema) suggesting that the model identifies anatomical changes known to be correlated with increased risk. Second, the majority of participants in all datasets (development and testing) were Non-Hispanic White. Detailed analysis regarding generalizability of the model to other races and ethnic groups was not possible in our datasets and needs to be investigated in future studies. Third, testing CXR-Lung Risk in BLCS as a potential clinical use case using existing chest radiographs obtained through routine care only comprised a relatively small hospital cohort of lung cancer patients. Whether there is a similar prognostic value for early detection/prognosis of other lung diseases such as COPD or asthma or even broader adoption remains to be seen. Fourth, the age range in PLCO was 55–74 years old, which will likely limit the value of the model in substantially younger individuals. Moreover, although CXR-Lung Risk accurately stratified risk in lung cancer patients, it remains unknown whether this improves clinical decision-making or treatment planning. This needs to be tested in future prospective trials. In addition, many patients at increased risk for lung cancer or prevalent disease get other imaging tests, including serial computed tomography. Whether specifically tailored models to estimate prognosis using this imaging data needs to be investigated in additional studies. Further, in PLCO and NLST there is a discrepancy between the relatively small increase in the c indices in nested model comparison and the large hazard ratios, especially in the high-risk groups (CXR-Lung-Risk >75 years), which is likely explained by the significantly different number of individuals in the different risk groups. Finally, PLCO chest radiographs were collected from 1993-2001 and available as scanned films. Whether this has an impact on model accuracy in more modern datasets was not systematically analyzed in the current study. However, independent testing in BLCS, where the most recent radiographs were acquired in 2016, showed robust performance.

In conclusion, a deep learning model can estimate risk of lung disease mortality from a chest radiograph beyond demographics, including smoking status, cardiovascular risk factors and traditional radiologic findings and may help to identify high-risk individuals in screening and cancer populations.

Methods

All analyses performed in this study comply with relevant ethical regulations. Secondary use of the PLCO, NLST and BLCS cohorts has been approved by the Mass General Brigham, Boston, Massachusetts institutional review board. All participants provided informed consent at enrollment into the original study.

The CXR-Lung-Risk model was developed in a large multicenter prospective cancer screening trial and independently tested in one internal and two external, held-out datasets not seen during any part of the development process. Results are reported for the three testing data sets only. An overview of the study design is provided in Fig. 1.

Model development

The CXR-Lung-Risk model was developed using data from the Prostate, Lung, Colorectal, Ovarian (PLCO) Cancer Screening Trial23,24, as it was the largest available dataset in the current study. PLCO was a multicenter randomized controlled trial of chest radiography for cancer screening in asymptomatic individuals aged 55–74 years enrolled at 10 US sites from 1993 through 2001. Individuals in the intervention arm received a chest radiograph at enrollment and up to 3 annual follow-up radiographs. For model development, a random sample of 80% (n = 40,643) of individuals enrolled in the intervention arm was used, including chest radiographs from all timepoints (n = 147,497). 20% of the training data was reserved for hyperpameter tuning. For model development, each radiograph exam was used as an independent sample; for testing, only baseline radiographs defined as the initial radiograph obtained at the enrollment (T0) exam were used. The only input to the proposed CXR-Lung-Risk model is a chest radiograph image; the output is an estimated risk of 18-year lung disease mortality (defined below) expressed in years (e.g., CXR-Lung-Risk of 75 years means an equal risk of lung disease-related death as the average 75-year-old individual). Usually, risk probabilities are expressed in percentages, which are difficult to grasp. Therefore, we decided to express CXR-Lung-Risk in years rather than a probability between 0-100%. In contrast to our previous work15, which was developed as a single prediction model, CXR-Lung-Risk was built as an ensemble model to reduce variance in the output38,39. The ensemble consisted of 20 CNNs. The model architecture for each CNN was chosen randomly from a set of architectures popular in medical image analysis (inceptionv4, resnet34, tiny40,41,42). Hyperparameters for each model were randomly selected during training (Supplementary Table 2), as random hyperparameters have been shown to improve performance in ensemble learning by reducing correlation between models43. The output of these 20 models was combined into a single prediction using a LASSO regression model trained on the hyperparameter tuning dataset. LASSO regression coefficients are given in Supplementary Table 2. We found that 13 out of 20 models had a nonzero LASSO regression coefficient and were included in the final ensemble. A comparison of these 13 single models vs. the ensemble model is given in Supplementary Fig. 11.

Instead of a binary target variable of lung-related mortality, we defined age-adjusted labels reflecting the risk of lung disease mortality based on prevalent risk factors for those that did not die of a lung-related disease during follow-up. We posit that these labels are more informative than assigning a “0” for all individuals that did not die, regardless of their underlying risk profile. We define these age-adjusted labels according to the following equation:

$${LR}={CA}+(E-D)$$

where LR is the Lung-Risk label, CA is the current chronologic age, E is expected age-at-death based on US social security life tables44. D corresponds to age-at-death based on A) actual, observed age at death for those that died of a lung disease or lung cancer or B) an individual’s predicted age at death due to lung disease/lung cancer based on a survival regression model trained using data from the control (no imaging) arm of the PLCO trial (n = 77,444) for those who did not die (Supplementary Table 4). This regression model used prevalent risk factors as input to estimate the age an individual would die of lung-related disease. This model accurately estimated age at death due to lung disease with a concordance index of 0.82 (95% CI [0.817–0.825]) for all lung disease mortality and 0.84 (95% CI [0.835–0.845]) for lung cancer death. This approach was similar to our previous work in which we trained a model to estimate a general biological chest x-ray age14. Cause of death was adjudicated by the trial based on death certificates and the National Death Index. These Lung-Risk labels were only used for training. The reported results in the testing datasets are all based on the actual observed deaths during follow-up.

For PLCO and NLST participants, only the baseline radiograph was used, which was acquired upright in posterior-anterior projection. PLCO radiographs were provided as scanned films in.tif file format with protected health information redacted using black pixels. PLCO radiographs were converted to Portable Network Graphics (.png) format using ImageMagick v6.8.9-9. For NLST, all chest radiographs were available in Digital Imaging and Communications in Medicine (DICOM) format. Radiographs for BLCS patients were acquired through clinical care and available in DICOM format from the hospital’s Picture Archiving and Communications System. We converted NLST and BLCS DICOM files to.tif using DCMTK v3.6.1 and then to.png using ImageMagick to maintain consistency with the aforementioned PLCO radiographs. To increase the sample size in BLCS, both posterior-anterior or anterior-posterior radiographs were included if they were taken up to 3 months prior to histologically confirmed lung cancer diagnosis. For training, images were rescaled to 224 pixels on the short-axis and randomly cropped to 224 ×224 before input into the model. This random cropping was done each time the image was fed to the model as a form of data augmentation. Additional data augmentations used for training included mixup data augmentation, up to 20 degrees of random rotation, up to 20% zoom in/out, and up to 40% brightness/contrast adjustments.

The model was trained using a mean-squared error loss function with the ADAM optimizer. The number of epochs for training was selected uniformly at random between 40 and 70. This was independently chosen for each of the 20 models. Training was done using an Ubuntu Linux workstation with an AMD 3960×24-core CPU with 128 GB of system RAM, and a single NVIDIA RTX A6000 GPU with 48 GB of GPU RAM. The model was developed using fastai v2.5.3, PyTorch v1.10, and CUDA v11.2. The full source code and programming environment are freely available at https://aim.hms.harvard.edu/cxr-lungrisk.

The final model was an ensemble of 20 models combined using a LASSO regression. LASSO regression coefficients were fit using the glmnet package in R, and hyperparameters for the LASSO were selected based on the minimum mean-squared error in an internal 10-fold cross-validation. LASSO regression coefficients for each of the 20 models are given in Supplementary Table 2 and the performance of each of the 20 models in Supplementary Fig. 11.

Model testing: The CXR-Lung-Risk model was tested in three independent test datasets not used during any part of model development. All results are reported for the testing datasets only based on the actual observed lung diseases mortality. The first testing dataset comprised a random sample of 10,155 asymptomatic individuals from PLCO aged 55–74 (20% of individuals; median follow-up=17.0 [IQR 14.8–19.0] years) not used for model development23. Annual questionnaires, communication with next of kin and the National Death Index were used to determine mortality. Cause of death was defined based on International Classification of Disease-9 (ICD) codes (Supplementary Table 3) (clinical trial registration number: NCT00047385). For model testing, only baseline radiographs were used.

The second testing dataset included 5,414 participants from the chest radiograph arm of the National Lung Screening Trial (NLST)25. NLST was a randomized controlled trial that enrolled heavy smokers (≥30 pack years) aged 55–74 years for lung cancer screening via chest CT vs. chest radiograph at 21 US sites from 2002 through 2004. Each participant had a baseline scan and up to 2 annual follow-up scans if no lung cancer was detected. Median follow-up time was 11.9 [IQR 7.3–12.3] years. Mortality was assessed via annual questionnaires, communication with next of kin and the National Death Index. Cause of death was determined using ICD-9 codes (Supplementary Table 3) (clinical trial registration number: NCT01696968). For this study, only the baseline chest radiograph was used.

The third testing dataset included 407 patients from the Boston Lung Cancer Study (BLCS), which is an ongoing multicenter observational epidemiologic cohort registry of patients with histologically confirmed lung cancer. For this study, only patients with early-stage (I-III) and a diagnosis between 2004-2016 were included. Median follow-up was 3.4 [IQR 1.5-7.2] years. Death was verified by dedicated study personnel via manual chart review. In contrast to PLCO and NLST, cause of death was only collected for lung cancer but not for other lung diseases. A consort diagram for all three study cohorts is provided in Supplementary Fig. 12.

Clinical covariates, radiographs and traditional radiographic findings: Baseline demographics and prevalent risk factors such as diabetes, hypertension or smoking status are self-reported by trial participants in PLCO23 and NLST25. For BLCS patients, all clinical covariates were extracted from the electronic medical record by dedicated study staff. Unlike PLCO and NLST, pack-years were not available in BLCS for all individuals and not included in the analysis. Traditional radiographic findings including lung nodules, atelectasis, pleura and lung fibrosis, COPD/emphysema, opacities, cardiac abnormalities, lymphadenopathy and bone/chest wall lesions were only available for PLCO and NLST and reported by centrally qualified radiologists for all participants.

Outcomes: The primary endpoint of this study was a composite of lung disease mortality, including lung cancer, interstitial pulmonary disease, emphysema, and COPD; the secondary endpoint was lung cancer-specific mortality. All cause-specific deaths were based on ICD-9 codes provided in the PLCO23 and NLST25 trials (Supplementary Table 3). For BLCS, endpoints were verified by manual chart review and available for lung cancer-specific mortality only.

Statistical analysis: Continuous variables are presented as mean±standard deviation (SD) or median and interquartile range (IQR). Categorical variables are reported as frequencies and percentages. Baseline demographics were compared using the student’s t-test or Kruskal-Wallis test for continuous variables, as appropriate. For categorical variables, the Chi-square test was conducted.

To investigate time to lung disease mortality and lung cancer-specific mortality, Kaplan-Meier survival estimates and log-rank tests were calculated. The association between CXR-Lung-Risk and time to lung disease mortality as well as lung cancer-specific mortality was assessed via univariable and multivariable Cox proportional hazards regression analysis. Multivariable models in PLCO and NLST were adjusted for the following covariates: age, sex, race, smoking status, pack years, body mass index, prevalent diabetes mellitus, hypertension, history of stroke, myocardial infarction, and cancer. For BLCS patients, the following covariate were available: age, sex, race, obesity, smoking status, cancer stage (I-III), and treatment (surgery only vs. adjuvant treatment). Additionally, in BLCS patients with available lung function testing (n = 348; FEV (l) = forced expiratory volume in liters in 1 second), lung age as proposed by Morris et al.26 was calculated (lung agewomen = 3.56*height – 40 (FEV1) – 77.28; lung agemen = 2.87*height – 31.25 (FEV1) – 39.375) and compared to CXR-Lung-Risk proposed in this study. For all cohorts, sex-stratified analyses were performed. All p values are two-sided and considered statistically significant if below 0.05. All statistical analyses were performed in R (version 3.6.1).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.