Role of sex in lung cancer risk prediction based on single low-dose chest computed tomography

A validated open-source deep-learning algorithm called Sybil can accurately predict long-term lung cancer risk from a single low-dose chest computed tomography (LDCT). However, Sybil was trained on a majority-male cohort. Use of artificial intelligence algorithms trained on imbalanced cohorts may lead to inequitable outcomes in real-world settings. We aimed to study whether Sybil predicts lung cancer risk equally regardless of sex. We analyzed 10,573 LDCTs from 6127 consecutive lung cancer screening participants across a health system between 2015 and 2021. Sybil achieved AUCs of 0.89 (95% CI: 0.85–0.93) for females and 0.89 (95% CI: 0.85–0.94) for males at 1 year, p = 0.92. At 6 years, the AUC was 0.87 (95% CI: 0.83–0.93) for females and 0.79 (95% CI: 0.72–0.86) for males, p = 0.01. In conclusion, Sybil can accurately predict future lung cancer risk in females and males in a real-world setting and performs better in females than in males for predicting 6-year lung cancer risk.

Researchers at Massachusetts General Hospital (MGH) and Massachusetts Institute of Technology (MIT) have developed and validated an open-source deep-learning (DL) algorithm called Sybil that accurately predicts long-term lung cancer risk from a single LDCT without the need for human annotation.Sybil has the potential to inform personalized screening strategies among individuals undergoing LDCT, and decrease the risk of both under-and over-screening 21 .However, Sybil was trained on the NLST cohort, which consists of 60% male subjects.Training AI algorithms on biased samples can lead to lower performance in underrepresented groups and Sybil's performance among males and females has not been compared in a contemporary dataset.Given the known sex differences in lung cancer risk, and the importance of improving risk stratification for females, we aimed to study whether Sybil works equally well among males and females on contemporary LDCT obtained as part of routine clinical care across a large healthcare system.

Participants
With approval from the institutional review board (Mass General Brigham Institutional Review Board), this retrospective study included consecutive subjects who underwent lung cancer screening at Brigham and Women's Hospital (BWH; Boston, USA) or MGH (Boston, USA) between January 1, 2015, and June 30, 2021.The cohort from MGH matches one of the external validation sets used for Sybil 21 .The research was performed in accordance with the Declaration of Helsinki and the Health Insurance Portability and Accountability Act (HIPAA).Informed consent was obtained from all subjects and/or from their legal guardian(s).We chose these cohorts to evaluate Sybil's performance on contemporary scans obtained as part of routine clinical care.Based on the USPSTF eligibility criteria applicable at the time, patients aged 55 to 80 years with a 30 pack-year history of smoking who were either current smokers or quit smoking within the past 15 years underwent LDCT for lung cancer screening 22 .Participants without clinical follow-up to establish the presence or absence of lung cancer were excluded.We excluded studies obtained after cancer diagnosis and with a slice thickness greater than 2.5 mm to align with Sybil's training set 21 .Scans with a Lung-RADS score of 0 which indicates that part or all of the lung could not be evaluated were also excluded.

Data collection
Demographics including sex and smoking history were abstracted from the electronic medical record.LDCT scans were obtained according to standard of care according to the American College of Radiology practice guideline on lung cancer screening 5 .Image acquisition parameters were extracted from the Digital Imaging and Communications in Medicine (DICOM) header of included LDCT scans.

Lung cancer diagnosis
Participants diagnosed with lung cancer according to the institutional cancer registry within 6 years after the baseline LDCT were considered as having a confirmed lung cancer diagnosis.Those without a lung cancer diagnosis per the cancer registry and one or more negative follow-up screening LDCT scans were considered as not having lung cancer.Screening LDCT was considered negative if the Lung-RADS score was either 1 or 2 23 .

Image analysis
Sybil is a validated DL algorithm that predicts the future risk of developing lung cancer based on a single LDCT scan 21 .The model aggregates visual information across all three dimensions of the LDCT volume using a 3D convolutional neural network architecture, as previously described 21 .
For each LDCT scan, the algorithm selects the thinnest axial series.Sybil does not require additional clinical data or annotations by a radiologist.Sybil's output consists of six numbers representing the cumulative lung cancer risk per year over a period of 6 years following each LDCT.

Statistical analysis
Categorical variables are expressed as frequencies (percentages), while continuous values are expressed as mean ± standard deviation (SD) or median and interquartile range (IQR).Normality of continuous parameters was tested with the Shapiro-Wilk test.To assess Sybil's performance in the prediction of future lung cancer risk in our cohort stratified by sex, we plotted receiver operating characteristics (ROC) curves for each year up to 6 years following the LDCT.We plotted ROC curves for females and males and compared the area under the curve (AUC) values using DeLong's test.We computed bootstrapped confidence intervals (CIs) using 5000 resamples after clustering LDCTs by participant.Uno's concordance (C)-index was computed to express how likely the scan closer to a lung cancer diagnosis had a higher predicted risk in a randomly selected pair of LDCTs.A two-sided p-value of < 0.05 was considered statistically significant for all tests.All analyses were performed using statistical software R (version 4.2.2) and its packages, namely 'pROC' (version 1.18.0) and 'ggplot2' (version 3.4.2).

Baseline characteristics
We obtained 16,375 LDCTs from 8459 consecutive adult subjects (3066 LDCTs from 2107 subjects at BWH and 13,309 LDCTs from 6,352 subjects at MGH).After exclusion, 10,573 LDCTs from 6127 subjects (47.3% female, mean age 64.9 ± 6.2) were analyzed (Fig. 1).27.6% of the initial study population were excluded due to lack of follow-up.47.0% of the excluded patients were female and 53.0% were male.Excluded patients were 90.4% white,
When comparing Sybil's risk prediction between the two centers, the performance was significantly better at BWH at 1 year (0.97 [95% CI: 0.94-0.99])than at MGH (0.86 [95% CI: 0.82-0.90],p < 0.001).From the second year of follow-up onward, this difference in performance was not statistically significant (Table 3).Additional details regarding Sybil's ability to predict lung cancer by center are reported in Supplementary Table 1.

Discussion
The validated DL algorithm called Sybil accurately predicted future lung cancer risk in both females and males with AUCs of 0.89 (95% CI: 0.85-0.93)for females and 0.89 (95% CI: 0.85-0.94)for males at 1 year.For longterm lung cancer risk prediction at 6 years, Sybil performed better in females than in males with AUCs of 0.88 (95% CI: 0.83-0.93)for females and 0.79 (95% CI: 0.72-0.86)for males.This study provides a validation of the prediction of sex-stratified lung cancer incidence up to 6 years in a large cohort of participants who recently underwent lung cancer screening as part of standard of care.
The role of artificial intelligence (AI) continues to expand in medicine.However, the utility of AI-generated predictions is largely dependent on the quality and diversity of the training datasets 24 .Females have been underrepresented in clinical trials, including in the NLST 25 .The 2020 report on ' Artificial Intelligence and Gender Equality' by UNESCO (United Nations Educational, Scientific and Cultural Organization) calls for increased efforts to address the potential for sex inequities in AI 26 .Given the possibility for biases to be introduced during the development of DL models, careful evaluation is warranted using contemporary data collected from the clinical setting in which deployment is considered 27 .If uncovered, biases can be addressed by re-training and adapting the algorithm, thereby preventing future digital systems to the perpetuate disparities of the past 28 .
The current study adds to prior work in several ways.Sybil was developed on a 60% male population using LDCT scans obtained almost two decades ago as part of a clinical trial while our study population consisted of more than 10,500 contemporary LDCT scans from participants who underwent lung cancer screening LDCT as standard of care since 2015.The percentage of males in the current study is a more balanced 53% and reflects current clinical practice in our health system.In the current study, Sybil performed equally well at predicting lung cancer risk in both sexes in years 1-5 and demonstrated significantly better performance in risk prediction among females at 6 years.The difference in performance at 6 years is hypothesis generating and might reflect Sybil's capability to detect early signs of tumorigenesis.A prior study showed that lung cancer prevalence was nearly twice as high in females compared to males of similar age and smoking history 29 .Consistent with this finding, the overall incidence of lung cancer in our study population was higher in females than in males (3.7% in females vs. 2.8% in males).These results suggest that lung cancer may progress more slowly in females and might contribute to Sybil's better performance in females at 6 years.
Lung cancer remains the leading cause of cancer mortality in the United States in both sexes.However, females have different risk profiles as compared to males and over the last decades, the incidence of lung cancer has decreased far more slowly in females compared to males 16 .Moreover, lung cancer in people who never smoked is increasing world-wide, especially among women 13 .The 2021 USPSTF lung cancer screening eligibility criteria lowered the age at which one can access screening from 55 to 50 years and reduced the requisite smoking packyears from 30 to 20 5 .These changes should ideally have reduced sex disparities in eligibility by increasing the number of females eligible for lung cancer screening.However, data suggest that while more women can now access lung screening LDCT, sex disparities in eligibility still persist 14 .Given the potential inaccuracy of widely used clinical risk scores when applied to females, identifying more accurate risk stratification tools is particularly important in females, as it is in racially and ethnically minoritized groups.DL-based prediction systems for lung cancer screening have shown promise in assisting radiologists with early detection and risk assessment 2,[30][31][32] .We anticipate that DL-based algorithms will eventually be integrated into clinical practice, which raises medicolegal concerns related to liability, accuracy, patient consent, and data privacy.Addressing potential biases in DL-based algorithms rigorously, establishing clear user guidelines, and continuously evaluating these systems are essential steps to mitigate these challenges and promote the integration of DL-based prediction systems into patient management pathways.
It is important to acknowledge that our cohort predominantly comprised participants identified as White, constituting approximately 80% of the participants both in males and females.This raises concerns about the generalizability of our results to more diverse populations, as both race/ethnicity and socioeconomic status disparities are associated with lung cancer risk and might affect Sybil's performance 33,34 .Studies are required to assess Sybil's performance in more diverse populations.A considerable proportion of patients were excluded from the analysis due to the lack of clinical follow-up.While no imbalance was detected between the included and excluded groups in terms of sex and race/ethnicity, there may have unmeasured differences in these groups.
There are several limitations to this study.First, this was a retrospective study relying on routinely collected clinical data which may be subject to misclassification.Second, we did not evaluate the impact of other demographic features or the possible mediating and compounding impacts of socioeconomic status or gender, which may affect lung cancer risk.Third, this experiment was conducted using data from a single health system with limited racial and ethnic diversity.This study was limited to participants who qualified for lung cancer screening under the 2013 USPSTF criteria since CT scans were obtained between 2015 and 2021, prior to the release of the 2021 USPSTF criteria.Another limitation of our study is the limited follow-up time, which mirrors the current state of lung cancer screening.
In summary, our results did not reveal sex differences in the performance of an open access DL algorithm that predicts lung cancer risk based on a single LDCT, suggesting that Sybil can accurately predict future lung cancer risk in females and males and there is no need for retraining.For predicting long-term lung cancer risk at 6 years, Sybil performs better in females than in males.Pending validation in a more racially diverse populations and confirmation of benefit in prospective trials, both of which are underway, Sybil may eventually be deployed in clinical settings.For example, Sybil could potentially be used to risk-stratify individuals following their annual screening LDCT, thereby enabling the extension of screening interval for low-risk individuals.

Table 1 .
Characteristics of the 6127 participants of the Lung Cancer Screening Program in a health system from 2015 to 2021, per low-dose chest computed tomography (LDCT).

Table 2 .
Sybil's future lung cancer predictions per year in 2901 females with 5067 LDCT scans and 3226 males with 5506 LDCT scans between 2015 and 2021.AUC area under the curve, C-Index concordance index.Receiver operating characteristics curves displaying the ability of the Sybil algorithm to predict future lung cancer risk over 6 years following a single low-dose computed tomography scan in 2,901 females and 3,226 males who underwent lung cancer screening between 2015 and 2021.CIs for each curve can be found in Table2.AUC area under the curve.

Table 3 .
Sybil 's risk prediction for future lung cancer risk in 2901 females with 5067 LDCT scans and 3226 males with 5506 LDCT scans, by hospital.AUC area under the curve, BWH Brigham and Women's Hospital, C-Index concordance index, MGH Massachusetts General Hospital.