Introduction

Immunotherapy with immune checkpoint inhibitors (ICI) has substantially improved clinical outcomes in patients with advanced cancers such as melanoma, non-small cell lung cancer (NSCLC), bladder, renal, breast, and other cancers1,2,3,4,5,6,7. ICIs block the ability of malignant cells to escape detection through immune checkpoints such as programmed cell death protein 1/programmed cell death ligand 1 (PD-1/PD-L1) or cytotoxic T-lymphocyte associated protein 4 (CTLA-4). Blockade of these checkpoints restores host immunosurveillance in some tumors by stimulating cytotoxic T-cells to induce cancer cell apoptosis2,8,9,10,11.

Despite ICIs being a paradigm-shifting breakthrough in cancer treatment, enhanced activation of the immune system can lead to immune-related adverse events (irAEs) that can result in permanent discontinuation of ICIs, severe morbidity, and even patient death12,13,14. The most severe irAEs include hypophysitis, diabetes, colitis, hepatitis, and pneumonitis, with other common irAEs including rash and thyroiditis12,15,16,17. The incidence of immune checkpoint inhibitor-mediated colitis (IMC) ranges from 1%-25% and varies by ICI therapy18,19. The incidence of IMC is higher in patients treated with combined anti-PD-1/PD-L1 and anti-CTLA4 therapy20,21. Nearly 15-20% of patients receiving combination therapy develop severe IMC, which is the leading cause of hospitalization and treatment cessation13,14,18,20,21. Endoscopic and histological findings suggest that the presentation of IMC mimics autoimmune colitis such as ulcerative colitis (UC), a form of inflammatory bowel disease (IBD)22,23. Despite the phenotypic similarities between IBD and IMC, it is unclear if the underlying mechanism is shared or distinct.

In this study, we sought to characterize the relationship between genetic predisposition to common types of autoimmune colitis (ulcerative colitis (UC) and Crohn’s disease (CD)), and IMC in a cohort of NSCLC patients receiving ICI treatment. We first develop polygenic risk scores (PRS) for UC and CD using individuals not diagnosed with cancer at baseline in UK Biobank (UKB) and validate these PRSs in an independent dataset of cancer-free participants in Vanderbilt University Medical Center biobank (BioVU)24. Next, we evaluate the association between each of these PRS and the development of IMC in a cohort of patients with NSCLC receiving ICI therapy and conduct an independent replication in a cohort of patients with diverse cancer types treated with ICI therapy in BioVU24. We also investigate the association between human leukocyte antigen (HLA) alleles known to affect UC risk with IMC. Additionally, we examine the role of IMC and PRS for UC, and CD, respectively, on progression-free survival (PFS) and overall survival (OS).

Results

Patient characteristics

We analyzed data from 1316 study participants included in the GeRI cohort, which included four sites (Supplementary Table 1 and see “Methods”). The GeRI cohort comprised approximately 50% men and the mean age at lung cancer diagnosis was 65 years (+/−10.3). The study was composed of 69.5% of individuals who self-reported as White followed by 6.7% identifying as Asian, and 5.3% as Black. A small proportion (9%) received the combined anti-PD-1/PD-L1 and anti-CTLA-4 inhibitor therapy and the remainder received either anti-PD-1 or PD-L1 inhibitor monotherapy (91%). The cumulative incidence of IMC was ~4% (55 events); it was ~2% (32 events) for severe IMC in the GeRI cohort. The rates were similar across all study sites. The analytic strategy of our study is illustrated in Fig. 1.

Fig. 1: Overview of the analytical pipeline.
figure 1

Development and validation of the polygenic risk scores (PRSs) for ulcerative colitis and Crohn’s disease was conducted in cancer-free individuals using UK Biobank and BioVU. LDPred2 method was used to tune the parameters for the PRS for ulcerative colitis and Crohn’s disease (PRSUC, PRSCD) in 70% of the UK Biobank, using the summary statistics from the largest genome-wide association study of UC and CD. The PRSs were then tested in the remaining 30% of the UK Biobank and validated in BioVU. In the next step, the role of PRSUC and PRSCD on all-grade and severe immune checkpoint inhibitor-mediated colitis (IMC) was evaluated in a cohort of 1316 non-small cell lung cancer patients who received at least one dose of immune checkpoint inhibitor therapy. Furthermore, replication was conducted using 873 pan-cancer patients treated with immune checkpoint inhibitors obtained from BioVU. Finally, associations of all-grade and severe IMC along with PRSUC and PRSCD on progression-free survival (PFS) and overall survival (OS) were assessed. Figure created with BioRender.com.

Development and validation of PRS for UC and CD

We used 70% of the cancer-free UKB dataset to tune parameters for PRS using LDpred225. We then obtained effect estimates for the PRS for CD and UC in the remaining 30% (testing data). In the UKB testing data, the area under the receiver operating curve (AUROC) for the PRSUC was 0.66 (95% CI = 0.64–0.68), and the AUROC for PRSCD was 0.72 (95% CI = 0.69–0.74) (Supplementary Fig. 1). In the adjusted model, PRSUC was strongly associated with UC with an odds ratio (OR) of 1.84 per standard deviation (SD) (95% CI = 1.76–1.93, p < 1.0 × 10−12). Similarly, PRSCD was positively associated with CD with OR of 1.83 per SD (95% CI = 1.72–1.95, p < 1.0 × 10−12). We observed an intermediate correlation between the two PRSs (Pearson correlation = 0.38). Additionally, the AUROC for PRSUC on CD was 0.58 (95% CI: 0.57−0.60), while PRSCD on UC yielded an AUROC of 0.58 (95% CI: 0.56−0.59). These results suggest the presence of some shared genetic susceptibility between UC and CD. However, the distinct genetic factors influencing each phenotype remain the primary drivers of the individual PRS effects. The two PRSs were validated in another sample of cancer-free individuals from BioVU21. Similar to the UKB results, the individual PRSCD and PRSUC were also strongly associated with CD and UC in BioVU. We observed an OR of 2.18 per SD (95% CI = 2.05–2.32, p < 1.0 × 10−12) for PRSCD, and an OR of 1.75 per SD (95% CI = 1.59–1.92, p < 1.0 × 10−12) for PRSUC. The AUROC for PRSCD and PRSUC were 0.72 (95% CI = 0.70–0.73) and 0.65 (95% CI = 0.62–0.68), respectively (Supplementary Fig. 2).

PRS of autoimmune colitis as a predictor of IMC

The mean PRSUC was significantly higher in patients who developed IMC (Supplementary Fig. 3). We examined the cumulative incidence of IMC (all-grade and severe) in the top 10th percentile (high genetic risk), 10–90th percentile (average genetic risk), and lowest 10th percentile (low genetic risk) of the PRSUC. Individuals in the top 10th percentile of the PRSUC had higher rates of IMC (all-grade: p = 0.01 and severe: p = 0.03) compared to the other two categories (Fig. 2). Using the Cox proportional hazards model and adjusting for genetic ancestry, recruiting site, age, sex, cancer histology, and type of therapy, we observed that the PRSUC was significantly associated with any diagnosis of IMC in the GeRI cohort with a hazard ratio (HR) of 1.34 per SD (95% CI = 1.02–1.76, p = 0.04). For a diagnosis of severe IMC, the HR per SD was 1.62 (95% CI = 1.12–2.35, p = 0.01) (Table 1). We found no significant association between PRSCD and IMC or severe IMC (Table 1).

Fig. 2: Cumulative incidence curves of all-grade and severe immune checkpoint inhibitor-mediated colitis by polygenic risk score of ulcerative colitis in the GeRI cohort.
figure 2

Cumulative incidence curves of a All-grade immune checkpoint inhibitor-mediated colitis (IMC) and b Severe IMC by categories of polygenic risk score of ulcerative colitis (PRSUC) in the entire GeRI cohort. Cumulative incidence curves are unadjusted, and PRSUC is categorized as ≤10th percentile (low genetic risk), 10–90th percentile (average genetic risk), and >90th percentile (high genetic risk). The p-values included on each plot are the results of a log-rank test for the difference between the curves (two-sided). Underneath each set of curves is the number of study participants at risk beyond that time point for each of the PRS groups. Source data are provided as a Source Data file.

Table 1 Polygenic risk score (PRS) of ulcerative colitis (UC) and Crohn’s disease (CD) as a predictor of time to development of all-grade and severe immune checkpoint inhibitor-mediated colitis (IMC) in the entire GeRI cohort, using Cox proportional hazards models and stratified analysis assessing the association between PRSUC and all-grade/severe IMC by type of therapy and lung cancer histology

Additionally, we conducted stratified analysis by type of therapy and histology of lung cancer to further characterize the association between PRSUC and IMC (all-grade and severe). For all-grade IMC, the results showed little attenuation and nominal significance when stratified by type of therapy (Table 1). However, for severe IMC we observed an HR per SD of 1.51 (95% CI = 1.01–2.27, p = 0.04) in patients receiving anti-PD1/anti-PD-L1 monotherapy versus a HR per SD of 4.31 (95% CI = 1.08–17.24, p = 0.03) in those patients receiving a combined therapy. Patients with adenocarcinoma had an HR per SD of 1.43 (95% CI = 1.06–1.93, p = 0.02) for all-grade IMC and an HR per SD of 2.12 (95% CI = 1.37–3.26, p = 6 × 10−04) for severe IMC. We also performed association analyses between ulcerative colitis PRS and IMC using different previously published PRSUC and noted consistent and robust trends toward the association (Supplementary Table 2).

Replication of the association between PRSUC and IMC

Replication was conducted within an independent study of 873 patients from a pan-cancer cohort in BioVU24 who underwent treatment with either anti-PD1/PD-L1 monotherapy or combination ICI therapy. The characteristics of the replication cohort are shown in Supplementary Table 2. Briefly, the replication study consisted of 63% males and 37% females. Among 873 ICI-treated patients, approximately 95% of the patients received anti-PD1/PD-L1 monotherapy and 5% of patients received combined anti-PD-1/PD-L1 and anti-CTLA4 therapy. An additional 274 cancer patients were identified and were treated with anti-CTLA4 monotherapy.

The results from the analysis in the replication study are presented in Table 2. In our analysis of 873 patients, we found a trend toward association between PRSUC and all-grade IMC (OR per SD = 1.29, 95% CI = 0.98–1.69, p = 0.07). However, for PRSUC and severe IMC, we observed statistically significant replication with an OR per SD of 1.39 (95% CI = 1.02–1.90, p = 0.04, Table 2). Within our stratified analysis by type of therapy, for anti-PD1/PD-L1 monotherapy, we observed an OR per SD of 1.25 (95% CI = 0.88−1.78, p = 0.21) for all-grade IMC, while a slightly stronger and nominally significant association was seen for severe IMC (OR per SD = 1.47, 95% CI = 0.96−2.25, p = 0.08). For those receiving dual therapy, we observed an OR per SD of 2.04 (95% CI = 0.79−5.28, p = 0.14) for all-grade IMC and an OR per SD of 1.89 (95% CI = 0.74−4.86, p = 0.19) for severe IMC. Furthermore, we conducted an adjusted logistic regression model within the anti-CTLA4 monotherapy (N = 274) and found an OR per SD of 0.92 (95% CI = 0.67–1.26, p = 0.59) for all-grade IMC. For severe IMC in the anti-CTLA4 monotherapy group, we observed an OR per SD of 1.00 (95% CI = 0.71–1.40, p = 0.99).

Table 2 Polygenic risk score (PRS) of ulcerative colitis (UC) as a predictor of all-grade and severe immune checkpoint inhibitor-mediated colitis (IMC) in the replication cohort (BioVU) and meta-analysis (GeRI and BioVU), using logistic regression model and stratified analysis assessing the association between PRSUC and all-grade/severe IMC by type of therapy

Meta-analysis of PRSUC and IMC associations in discovery and replication studies

Next, we performed a meta-analysis using fixed-effect inverse-variance weighting, combining the logistic regression models from the initial GeRI cohort and BioVU replication cohort (Table 2). Our findings show a significantly positive association between PRSUC and all-grade IMC with an ORmeta per SD of 1.35 (95% CI = 1.12–1.64, p = 2 × 10−03). Similarly, a robust association of PRSUC and severe IMC was observed with an ORmeta per SD of 1.49 (95% CI = 1.18–1.88, p = 9 × 10−04).

For patients who received anti-PD1/PD-L1 monotherapy, PRSUC demonstrated a significant association with all-grade IMC, showing an ORmeta per SD of 1.35 (95% CI = 1.07−1.69, p = 0.01). Similarly, a stronger association was observed with severe IMC, with an ORmeta per SD of 1.48 (95% CI = 1.10−1.98, p = 9 × 10-3). Among patients treated with combination or dual therapy, a trend towards association with all-grade IMC was seen (ORmeta per SD = 1.80, 95% CI = 0.95−3.41, p = 0.07); however, a robust and pronounced association was found in relation to severe IMC (ORmeta per SD = 2.20, 95% CI = 1.07−4.53, p = 0.03).

Role of known UC-HLA associations on IMC in GeRI cohort

We assessed the association between all-grade IMC and HLA markers known to be associated with ulcerative colitis26,27 (Supplementary Fig. 4). Out of 12 known UC-associated HLA markers, we observed an OR of 2.63 (95% CI = 1.08–6.40, p = 0.03) for HLA-DRB1*12:01 and all-grade IMC. However, at a false-discovery rate (FDR) < 0.05 none of the known HLA markers were associated with all-grade IMC in the GeRI cohort.

IMC and PRS of autoimmune colitis as a predictor of PFS and OS

To assess the role of IMC on clinical outcomes, we conducted a Cox proportional hazards model with a 90-day treatment landmark in the GeRI cohort (Table 3 and Fig. 3). We observed the effect of all-grade IMC on OS with an HR of 0.40 (95% CI = 0.24–0.66, p = 3.0 × 10−04) and of severe IMC on OS with an HR of 0.23 (95% CI = 0.09–0.55, p = 9.0 × 10−04). However, we observed no significant association between PFS and IMC (Table 3 and Supplementary Fig. 5).

Table 3 All-grade and severe immune checkpoint inhibitor-mediated colitis (IMC) as predictors of progression-free survival (PFS) and overall survival (OS) in the entire GeRI cohort, using Cox proportional hazards models with 90-day landmark
Fig. 3: Immune checkpoint inhibitor-mediated colitis (IMC) as a predictor of overall survival (OS) in the entire GeRI cohort.
figure 3

a All-grade IMC and b Severe IMC. Kaplan–Meier survival curves are unadjusted with 90-day landmark and compare those who had an IMC (all-grade or severe) with those who did not have an IMC (No IMC). The p-values in the graph represent the log-rank p-values (two-sided), and the dotted line represents the median survival time. Underneath each set of curves is the number of study participants at risk beyond that time point for the IMC and No IMC groups. Source data are provided as a Source Data file.

Despite the association between PRSUC and IMC, PRSUC was not associated with PFS (HR per SD = 1.00, 95% CI = 0.94–1.07, p = 0.99) and OS (HR per SD = 1.01, 95% CI = 0.93–1.09, p = 0.91) in the GeRI cohort (Table 4). Similarly, we observed no association between PRSCD and PFS (HR per SD = 0.98, 95% CI = 0.91–1.05, p = 0.50) and OS (HR per SD = 1.02, 95% CI = 0.93–1.11, p = 0.68), respectively (Table 4).

Table 4 Polygenic risk scores of ulcerative colitis (PRSUC) and Crohn’s disease (PRSCD) as predictors of progression-free survival (PFS) and overall survival (OS) in the GeRI cohort, using Cox proportional hazards models

Discussion

Immune checkpoint inhibitors are part of standard regimens to treat many advanced cancers and are used in adjuvant and neoadjuvant settings for early-stage diseases in multiple cancers3,4,10,28,29,30,31,32. Immune-related adverse events are common complications from ICI, and there are few predictors of irAEs33,34. We sought to identify genetic predictors of immune checkpoint inhibitor-mediated colitis which frequently results in hospitalization and ICI discontinuation and can occasionally lead to death18,19,35. Specifically, we evaluated the relationship between genetic predisposition for autoimmune colitis (UC, CD) and IMC, and found that the PRSUC can predict IMC. The association was stronger when analyses were restricted to individuals with severe IMC—an important finding as the most important clinical cases to identify were best predicted by PRSUC. Furthermore, we investigated the role of HLA markers associated with UC on the development of IMC. However, we did not have HLA typing for these individuals, and therefore, the imputation of HLA was not validated. Furthermore, our study was not well-powered to detect the effect of many different HLA alleles after multiple hypothesis testing. Future studies will need to analyze HLA effects on IMC.

Our findings significantly contribute to our understanding of the biological underpinnings of IMC and may also impact the management of patients treated with ICIs. First, we demonstrate that IMC has some genetic overlap with UC, but we found no evidence for overlap with CD. This is notable despite the correlation observed in our PRS for UC and CD, signifying that the genetic factors associated with IMC align more closely with the distinct genetic markers associated with UC. Our finding is also consistent with clinical reports in which the most frequent phenotype of IMC resembles UC most closely22,23,36. Our results also suggest that as the genetic risk prediction of UC improves, the genetic risk of IMC may also be improved. In particular, rare variants in certain genes substantially increase the risk of UC and we hypothesize may also affect IMC risk37,38,39. Prior reports on ICI-induced hypothyroidism40,41 and rash42 demonstrated that PRS for autoimmune disorders predict irAEs, suggesting that ICI may unmask autoimmune syndromes in some genetically predisposed individuals.

We also found that individuals who developed IMC had improved survival outcomes when compared to those who did not develop IMC, including in a landmark sensitivity analysis, which is concordant with previously published literature43,44,45,46,47,48. However, PRSUC and PRSCD were not associated with PFS or OS, suggesting that the genetic basis of autoimmune disease susceptibility is distinct from genetic factors influencing survival outcomes. It has been postulated that both anti-tumor responses to ICIs, and the development of irAEs are representative of a robust immune response; however, one possible explanation for our finding is that the genetic contributions captured in the autoimmune PRSs are probably capturing the cross-presentation of shared antigens which may not be associated with clinical outcomes. This suggests there could be other genetic and environmental factors driving the association between IMC and overall survival.

Our study has several implications that may impact the care of cancer patients treated with ICIs. For example, our results suggest that germline genotyping could help assist selection of patients at high risk of IMC in a clinical trial setting to assess the role of preventative measures such as the commencement of concurrent anti-TNFα therapies or anti-integrin α4β7 antibodies49,50 along with ICI treatment in patients at high risk for IMC and toxicity-related early treatment cessation. Additionally, these findings may also help facilitate clinical decision-making. Combination immunotherapies are more effective but are also associated with a substantially increased risk of irAEs45,51,52,53,54,55. Our stratified analysis by type of therapy demonstrated the association between PRSUC and severe IMC in individuals receiving anti-PD-1/PD-L1 and anti-CTLA4 combination therapy. Among patients who may be candidates for combination immunotherapies but have a high genetic risk based on PRSUC, oncologists may consider monotherapy, particularly in clinical situations in which the benefits of dual therapy on disease control may be modest. Conversely, patients who are at relatively low risk based on PRSUC may be better candidates for combination therapy. In addition, the use of PRSUC might also be considered to assist with treatment decisions in clinical settings where ICI therapy is approved but there is substantial clinical equipoise; for example, in the adjuvant setting for patients with resectable NSCLC56,57 and low PD-L1 expression or adjuvant setting for resected stage II melanoma58. Our analysis within the anti-CTLA4 monotherapy subgroup did not reveal any significant association between PRSUC and IMC. These results should be interpreted cautiously since the sample size was limited in this subgroup. However, anti-CTLA4 as monotherapy has become less common in contemporary clinical practice, with its predominant use being in combination with anti-PD1/PD-L1 therapy, and our PRSUC did predict IMC in these patients. Our initial findings were observed in a cohort of NSCLC patients. However, our replication study included a broader array of pan-cancer studies and demonstrated the generalizability of PRSUC to predict IMC.

Although our study has important clinical implications and strengths, it also has some limitations. While PRS effectively captures established variants associated with UC, it may not account for unidentified genetic contributors (missing heritability). Nevertheless, as we unveil the missing heritability of UC, we expect to further improve the polygenic prediction of IMC. Furthermore, we developed these PRSs in a predominantly European ancestry cohort (UK Biobank) and the GeRI cohort and BioVU replication study were also predominantly of European ancestry; more work is needed to generalize these results to other ancestries. In addition, there may be other limitations to implementing PRS in the clinic including cost, rapidity of return of results, and reliability and consistency across different algorithms59,60,61,62. Although we included one replication cohort, additional studies of more ICI-treated patients will help strengthen our findings and, in particular, may give better power to evaluate HLA associations and other individual loci that may improve our understanding of the genetic similarities and potential differences between IMC and UC. For most complex traits, including autoimmune disorders and, likely for irAEs, environmental factors also play an important role. Our study does not address how environmental factors affect the risk of IMC. For example, the gut microbiome may modify susceptibility to and severity of IBD63 and, therefore, may also contribute to the susceptibility of developing IMC in cancer patients who have undergone ICI treatment. To determine the joint associations between PRSUC and environmental risk factors, further studies are necessary.

We also found an association between IMC and OS. This result could be due to survivor bias64,65, where patients who respond to therapy and are on therapy longer are at an increased risk of developing irAEs. We used a 90-day landmark analysis66 to account for this bias for both PFS and OS, although this may not completely eliminate the survivor bias.

Overall, our findings suggest a shared genetic basis between ulcerative colitis and immune checkpoint inhibitor-mediated colitis among patients undergoing ICI treatment. Prediction of IMC using genetic information should create new opportunities for better risk stratification and ultimately for better management and possibly prevention of this common and important side effect from immunotherapy.

Methods

This research complies with all relevant ethical regulations. Institutional Review Board approvals were obtained at each site individually, and written informed consent was acquired from all study participants prior to inclusion in the study.

Study population

Genetics of immune-related adverse events and Response to Immunotherapy (GeRI) cohort is comprised of 1316 advanced Stage IIIB/IV NSCLC patients who received ICI therapy (PD-1 or PD-L1 inhibitors as monotherapy or in combination with either CTLA-4 inhibitors and/or chemotherapy) and were recruited from four different institutions: Memorial Sloan Kettering Cancer Center (MSKCC), Vanderbilt University Medical Center (VUMC), Princess Margaret Cancer Center (PM), and University of California, San Francisco (UCSF).

A total of 752 individuals were treated with ICIs at MSKCC between 2011 and 2018 and had an available blood sample. Clinical data were extracted from a manual review of medical and pharmacy records for demographics, lung cancer histology, and ICI treatment history, including detailed information on immune-related adverse events (irAEs). The VUMC cohort is composed of 267 patients who received ICI therapy at the medical center between 2009 and 2019. Patients participated in BioVU21, Vanderbilt’s biomedical repository of DNA that is linked to de-identified health records. Treatment dates and irAEs were extracted using manual chart review by a trained thoracic oncology nurse. The PM cohort included 266 advanced NSCLC patients who received ICI therapy between 2011 and 2022; all provided a blood sample and completed a questionnaire. Clinical data were manually extracted by trained abstractors and supplemented by the PM Cancer Registry. From UCSF, 31 patients who had received ICIs were identified by thoracic oncologists between 2019 and 2021 and provided either a blood or saliva sample after informed consent. Clinical data including, demographics, history of lung cancer and ICI treatment, and irAEs were extracted after a manual review of electronic health records.

Immune checkpoint inhibitor-mediated colitis (IMC)

After the initiation of ICI therapy, immune checkpoint inhibitor-mediated colitis (IMC) was defined based on clinical chart review and documentation of IMC by the primary oncologist, gastroenterologist, and/or other clinicians treating the patient based on clinical features and/or radiologic/histologic evidence suggesting colitis due to ICI. Participants who were diagnosed with infectious causes of colitis including Clostridium difficile, or a pathogen on a gastrointestinal pathogen panel or ova and parasite test were excluded. To assess the severity of IMC, we used 2 metrics based on NCI Common Terminology Criteria for Adverse Events Version 5 (NCI-CTCAE) that capture grade 3 IMC or above (i) hospitalization for management of IMC and/or (ii) permanent cessation of ICI therapy due to the adverse event.

IMC was coded as a dichotomous variable (1: all IMC, 0: no IMC) and time-to-IMC was assessed from the start of the ICI therapy to the date of onset of IMC or the date of ICI discontinuation due to IMC. Patients who did not experience IMC were censored either at the end of treatment due to any reason or last follow-up date if the treatment was ongoing. Based on the severity criteria, severe IMCs were also coded as binary variables (1: severe IMC, 0: no IMC).

Ascertainment of clinical outcomes

Progression-free survival (PFS) and overall survival (OS) were evaluated from the date of initiation of ICI therapy to the date of progression and death, respectively, at MSK, PM, and UCSF sites. At VUMC, time-to-discontinuation of therapy due to progression from therapy initiation was used as a surrogate. If the treatment was ongoing, patients were censored at the date of the last follow-up. The VUMC cohort is de-identified and not linked to the National Death Index; therefore, all-cause mortality (overall survival) information is unavailable for VUMC participants (n = 267).

Quality control, genotyping, and imputation of the GeRI cohort

DNA from blood or saliva was extracted and genotyped using Affymetrix Axiom Precision Medicine Diversity Array. Samples with a call rate <95% were excluded from the analysis and SNPs with missing rates >5% were also excluded from the analysis. Genetic ancestry was calculated using principal component analysis in PLINK after linkage disequilibrium pruning (R2 < 0.1). Imputation was performed using the Michigan Imputation Server with the 1000 Genomes phase3 v5 reference panel. Standard genotyping and quality control procedures were implemented. Variants with minor allele frequency <0.01 were excluded from the analysis.

Development and validation of polygenic risk score (PRS) for autoimmune colitis

We developed PRS for CD (1312 CD cases and 16,303 controls) and UC (2814 UC cases and 16,303 controls), separately using UK Biobank (UKB) data, where we divided the data into two parts: 70% for hyperparameter tuning and 30% of the remaining data for testing the PRS. Genetic data from both the UKB Affymetrix Axiom array (89%) or the UK BiLEVE array (11%)67 which have been imputed using the Haplotype Reference Consortium and the UK10K and 1000 Genomes phase3 reference panels67 were utilized in the analysis. Analyses were restricted to European ancestry individuals based on self-reported White ethnicity and genetic ancestry PCs within five standard deviations of the population mean. Samples with discordant self-reported and genetic sex were excluded. Additionally, we also excluded one sample from each pair of first-degree relatives. Samples with greater than five standard deviations from the mean heterozygosity were further excluded from the analysis. Information from both self-report and ICD9/10 codes were used to capture CD (1312 cases) and UC (2814 cases) phenotypes in UKB.

We used the LDPred225 method to develop PRS of CD and UC. LDpred2 estimates the posterior effect sizes based on summary statistics from genome-wide association studies while taking into account the linkage disequilibrium between variants and assuming a prior on the markers. To derive PRS, summary statistics were obtained from the previously published largest genome-wide association study of CD, and UC68. We restricted the analysis to HapMap3 variants and implemented LDPred2-auto function to evaluate the posterior effect sizes for each variant. LDPred2-auto first estimates the proportion of causal variants and heritability for the trait under evaluation. Next, it determines the posterior effects estimates for the included variants. The final PRS weights are available at PGS catalog (See Data Availability). Briefly, PRSUC included 744,575 variants, whereas PRSCD comprised 744,682 variants.

PRS was constructed using the formula: PRS = β1 x SNP1 + β2 x SNP2 +………+ βn x SNPn, where β was estimated using LDPred2-auto function. Each PRS was standardized to have a mean of zero and a standard deviation of 1. The association of PRSCD and PRSUC with each respective target phenotype was assessed using logistic regression models, adjusted for age at diagnosis for cases and age at enrollment for controls, sex, genotyping array, and the top 10 genetic ancestry principal components (PCs). Area under the receiver operating characteristic (AUROC) curves were calculated in the testing dataset and used to assess the overall prediction accuracy of each PRS in UKB.

We validated the two PRSs in a sample of cancer-free individuals (1420 CD cases, 459 UC cases, and 20,876 controls in the VUMC BioVU24. All analyses were restricted to individuals of European ancestry and adjusted for age, sex, and ten principal components. AUROC curves were used to estimate the prediction of the PRSs.

Assessment of autoimmune colitis PRS to predict IMC in GeRI cohort

Using the weights generated from LDPred2 for CD, and UC, we separately calculated two weighted PRSs (PRSCD, PRSUC) for the GeRI participants. The cumulative incidence of IMC (all-grade and severe) was assessed by categories of PRS percentiles. Individuals in the top 10% of the PRS distribution (PRS > 90th percentile) were classified as having high genetic risk, those in the bottom 10% (PRS ≤ 10th percentile) were classified as low risk, and the middle category (>10th to ≤90th percentile) classified as average genetic risk. Additionally, to evaluate the performance of each potential PRS on either time-to-IMC or time-to-severe IMC, we used Cox proportional hazards models, adjusted for age at diagnosis, sex, lung cancer histology, type of therapy, recruiting site, and the first 5 genetic ancestry PCs. To further understand the differential effects of type of therapy and histology on the association between PRSUC and IMC, we conducted stratified analysis by type of ICI therapy and histology of lung cancer.

Replication of PRSUC and IMC in an independent study

We performed an independent replication to further characterize the association between PRSUC and IMC. Our replication study comprises of 873 patients enrolled in BioVU24, across all cancer types and treated with either anti-PD-1/PD-L1 monotherapy or a combination of anti-PD1/PD-L1 and anti-CTLA4 therapy. There was no overlap of samples between individuals from BioVU included in the GeRI cohort (discovery) and the replication dataset from BioVU. Immune checkpoint inhibitor-mediated colitis was ascertained by a manual review of the electronic health records. An IMC case was defined as either biopsy-confirmed colitis or the occurrence of diarrhea in ICI patients, not attributable to any other cause, that required treatment with steroids and subsequently showed improvement with steroid therapy. All samples were genotyped using Illumina Expanded Multi-Ethnic Genotyping Array (MEGA-EX) and imputed to 1000 Genomes reference panel (version 3)24. Post-imputation standard quality control procedures were employed to exclude low-quality variants and samples. In short, samples with a call rate <95% and SNPs with missing rates >2% were excluded from the analysis. Additionally, all SNPs with minor allele frequency <1% and Hardy–Weinberg p-value < 1e-06, and INFO < 0.95 were excluded.

We performed unconditional logistic regression to assess the association between PRSUC and all-grade IMC and severe IMC, respectively. All models were adjusted for age at diagnosis, sex, type of therapy, and 5 principal components. In addition, we conducted stratified logistic regression by type of therapy, and the models were adjusted for age at diagnosis, sex, and 5 principal components. This study had an additional 274 patients who received anti-CTLA4 monotherapy, and we further evaluated the association between PRSUC and IMC separately in this group.

Meta-analysis of the association between PRSUC and IMC

For meta-analysis, we conducted standard logistic regression adjusted for age at diagnosis, sex, type of therapy, site, and 5 PCs in the GeRI study. Next, we carried out an inverse-variance weighted fixed-effect meta-analysis between our discovery and replication studies. Additionally, we conducted a meta-analysis of the stratified results by type of therapy in the GeRI cohort and the replication study from BioVU.

Role of HLA markers associated with UC and CD on IMC in GeRI cohort

To elucidate the role of known UC-associated HLA markers on IMC, we performed HLA imputation using CookHLA69 and HATK70. HLA alleles were imputed at 2-field resolution against the Type 1 Diabetes Genetics Consortium reference panel71 and using the nomenclature from IPD-IMGT/HLA database v3.51. Association analysis with all-grade IMC was conducted using logistic regression models adjusted for age at diagnosis, sex, lung cancer histology, type of therapy, recruiting site, and 5 PCs. Analyses were restricted to common HLA alleles (frequency ≥0.01) known to be associated with ulcerative colitis26.

Impact of IMC and PRS of autoimmune colitis on PFS and OS in GeRI cohort

The association of IMC (all-grade and severe) on PFS and OS was examined using the Cox proportional hazards model by examining only the patients who had PFS and OS longer than 90 days (90-day landmark)66. All models were adjusted for age at diagnosis, sex, lung cancer histology, type of therapy, and 5 PCs. Survival curves and rates were estimated using the Kaplan–Meier method. To investigate the association between PRSCD, PRSUC on PFS, and OS, we conducted Cox proportional hazards models, adjusted for age at diagnosis, sex, histology, type of therapy, and 5 PCs. All p-values are two-sided, and analyses were conducted using Plink2, R v4.2.2 (R Foundation for Statistical Computing) with RStudio v2022.12.0.353.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.