Immune-mediated genetic pathways resulting in pulmonary function impairment increase lung cancer susceptibility

Impaired lung function is often caused by cigarette smoking, making it challenging to disentangle its role in lung cancer susceptibility. Investigation of the shared genetic basis of these phenotypes in the UK Biobank and International Lung Cancer Consortium (29,266 cases, 56,450 controls) shows that lung cancer is genetically correlated with reduced forced expiratory volume in one second (FEV1: rg = 0.098, p = 2.3 × 10−8) and the ratio of FEV1 to forced vital capacity (FEV1/FVC: rg = 0.137, p = 2.0 × 10−12). Mendelian randomization analyses demonstrate that reduced FEV1 increases squamous cell carcinoma risk (odds ratio (OR) = 1.51, 95% confidence intervals: 1.21–1.88), while reduced FEV1/FVC increases the risk of adenocarcinoma (OR = 1.17, 1.01–1.35) and lung cancer in never smokers (OR = 1.56, 1.05–2.30). These findings support a causal role of pulmonary impairment in lung cancer etiology. Integrative analyses reveal that pulmonary function instruments, including 73 novel variants, influence lung tissue gene expression and implicate immune-related pathways in mediating the observed effects on lung carcinogenesis.

L ung cancer is the most commonly diagnosed cancer worldwide and the leading cause of cancer mortality 1 . Although tobacco smoking remains the predominant risk factor for lung cancer, clinical observations and epidemiological studies have consistently shown that individuals with airflow limitation, particularly those with chronic obstructive pulmonary disease (COPD), have a significantly higher risk of developing lung cancer [2][3][4][5][6][7] . Several lines of evidence suggest that biological processes resulting in pulmonary impairment warrant consideration as independent lung cancer risk factors, including observations that previous lung diseases influence lung cancer risk independently of tobacco use 6,[8][9][10] , and overlap in genetic susceptibility loci for lung cancer and chronic obstructive pulmonary disease (COPD) on 4q24 (FAM13A), 4q31 (HHIP), 5q.32 (HTR4), the 6p21 region, and 15q25 (CHRNA3/CHRNA5) [11][12][13][14] . Inflammation and oxidative stress have been proposed as key mechanisms promoting lung carcinogenesis in individuals affected by COPD or other non-neoplastic lung pathologies 9,11,15 .
Despite an accumulation of observational findings, previous epidemiological studies have been unable to conclusively establish a causal link between indicators of impaired pulmonary function and lung cancer risk due to the interrelated nature of these conditions 7 . Lung cancer and obstructive pulmonary disease share multiple etiological factors, such as cigarette smoking, occupational inhalation hazards, and air pollution, and 50-70% of lung cancer patients present with co-existing COPD or airflow obstruction 6 . Furthermore, reverse causality remains a concern since pulmonary symptoms may be early manifestations of lung cancer or acquired lung diseases in patients whose immune system has already been compromised by undiagnosed cancer.
Disentangling the role of pulmonary impairment in lung cancer development is important from an etiological perspective, for refining disease susceptibility mechanisms, and for informing precision prevention and risk stratification strategies. In this study we comprehensively assess the shared genetic basis of impaired lung function and lung cancer risk by conducting genome-wide association analyses in the UK Biobank cohort to identify genetic determinants of three pulmonary phenotypes, forced expiratory volume in 1s (FEV 1 ), forced vital capacity (FVC), and FEV 1 /FVC. We examine the genetic correlation between pulmonary function phenotypes and lung cancer, followed by Mendelian randomization (MR) using novel genetic instruments to formally test the causal relevance of impaired pulmonary function, using the largest available dataset of 29,266 lung cancer cases and 56,450 controls from the OncoArray lung cancer collaboration 16 .

Results
Heritability and genetic correlation. Array-based, or narrowsense, heritability (h g ) estimates for all lung phenotypes were obtained using LD score regression 17 based on summary statistics from our GWAS of the UKB cohort (n = 372,750 for FEV 1 , n = 370,638 for FVC, n = 368,817 for FEV 1 /FVC; Supplementary  Fig. 1) are presented in Table 1. Heritability estimates based on UKB-specific LD scores (n = 7,567,036 variants) were consistently lower but more precise than those based on the 1000 Genomes (1000G) Phase 3 reference population (n = 1,095,408 variants). For FEV 1 , h g = 0.163 (SE = 0.006) and h g = 0.201 (SE = 0.008), based on UKB and 1000 G LD scores, respectively. Estimates for FVC were h g = 0.175 (SE = 0.007) and h g = 0.214 (SE = 0.010). Heritability was lower for FEV 1 /FVC: h g = 0.128 (SE = 0.006) and 0.157 (SE = 0.010), based on internal and 1000 G reference panels, respectively. For all phenotypes, h g did not differ by smoking status and estimates were not affected by excluding the major histocompatibility complex (MHC) region.
Partitioning heritability by functional annotation identified large and statistically significant (p < 8.5 × 10 −4 ) enrichments for multiple categories ( Fig. 1; Supplementary Tables 1-3). A total of 35 categories, corresponding to 22 distinct annotations, were significantly enriched for all three pulmonary phenotypes, including annotations that were not previously reported 18 . Large enrichment, defined as the proportion of heritability accounted for by a specific category relative to the proportion of SNPs in that category, was observed for elements conserved in primates 19,20 (17.6% of SNPs, 54.7-58.5% of h g ), McVicker background selection statistic 21,22 (17.8% of SNPs, 22.6-25.1% of h g ), flanking bivalent transcription starting sites (TSS)/enhancers from Roadmap 20,23 (1.4% of SNPs, 11.1-13.2% of h g ), and super enhancers (16.7% of SNPs, 33.9-38.6% of h g ). We also replicated previously reported significant enrichments for histone methylation and acetylation marks H3K4me1, H3K9Ac, and H3K27Ac 18,24 .
Genome-wide association analysis for instrument development. Based on the results of our GWAS in the UK Biobank, we identified 207 independent instruments for FEV 1 (P < 5 × 10 −8 , replication P < 0.05; LD r 2 < 0.05 within 10,000 kb), 162 for FVC, and 297 for FEV 1 /FVC. We confirmed that our findings were not affected by spirometry performance quality, with a nearly perfect correlation between effect sizes (R 2 = 0.995, p = 2.5 × 10 −196 ) in the main discovery analysis and after excluding individuals with potential blow acceptability issues (Field 3061 ≠ 0; n = 60,299). After applying these variants to the lung cancer OncoArray dataset and selecting LD proxies (r 2 > 0.90) for unavailable variants, the final set of instruments consisted of 193 variants for FEV 1 , 144 for FVC, and 264 SNPs for FEV 1 /FVC (Supplementary Data 1-3), for a total of 601 instruments. The proportion of trait variation accounted for by each set of instruments was estimated in the UKB replication sample consisting of over 110,00 individuals ( Supplementary Fig. 1), and corresponded to 3.13% for FEV 1 , 2.27% for FVC, and 5.83% for FEV 1 /FVC. We also developed instruments specifically for never smokers based on a separate GWAS of this population, which yielded 76 instruments for FEV 1 , 112 for FEV 1 /FVC, and 57 for FVC, accounting for 2.06%, 4.21%, and 1.36% of phenotype variation, respectively (Supplementary Data 4-6).
After removing overlapping instruments between pulmonary phenotypes and LD-filtering (r 2 < 0.05) across the three traits, 447 of the 601 variants were associated with at least one of FEV 1 , FVC, or FEV 1 /FVC (P < 5 × 10 −8 , replication P < 0.05). We compared these 447 independent variants to the 279 lung function variants recently reported by Shrine et al. 18 based on an analysis of the UK Biobank and SpiroMeta consortium, by performing clumping with respect to these index variants (LD r 2 < 0.05 within 10,000 kb). Our set of instruments included an additional 73 independent variants, 69 outside the MHC region (Supplementary Table 5), that achieved replication at the Bonferroni-corrected threshold for each trait (maximum replication P = 2.0 × 10 −4 ).
For completeness, we also present MR estimates for the effect of impaired pulmonary function on lung cancer risk in smokers (Supplementary Table 11). Despite the larger sample size (23,223 cases and 16,964 controls) compared to never smokers, a genetically predicted 10% reduction in FEV 1 /FVC was weakly and inconsistently associated with lung cancer risk (OR IVW = 1.15, p = 0.038; OR RAPS = 1.08, p = 0.488). Genetic predisposition to FEV 1 and FVC impairment did not appear to confer an increased risk among smokers.
Extensive MR diagnostics are summarized in Supplementary  Table 12. All analyses used strong instruments (F-statistic > 40) and did not appear to be weakened by violations of the no measurement error (NOME) assumption (I 2 GX statistic > 0.97). MR Steiger test 32 was used to orient the causal effects and confirmed that instruments for pulmonary function were affecting lung cancer susceptibility, not the reverse, and this direction of effect was highly robust. No instruments were removed based on Steiger filtering. We also confirmed that none of the genetic instruments were associated with nicotine dependence phenotypes (P < 1 × 10 −5 ), such as time to first cigarette, difficulty in quitting smoking, and number of quit attempts, which were available for a subset of individuals in the UKB. All MR analyses were adequately powered, with >80% power to detect a minimum OR of 1.25 for FEV 1 and FEV 1 /FVC ( Supplementary Fig. 2). For never smokers, we had 80% power to detect a minimum OR of 1.40 for FEV 1 /FVC and 1.60 for FEV 1 .
Given the genetic correlation observed for pulmonary phenotypes cigarette smoking and adiposity, we conducted several sensitivity analyses to further address any potential confounding by these phenotypes. The finding for squamous     We confirmed that none of the genetic instruments were associated with smoking status (ever/never), cigarette pack-years (continuous), or adiposity (body fat percentage) at the P < 5 × 10 -8 level. However, several variants were associated based on a P < 1 × 10 −5 threshold (25 for FEV 1 and 18 for FEV 1 /FVC). We repeated MR analyses after removing these variants (Supplementary Table 13) and confirmed that our results remained robust for FEV 1 and squamous cell carcinoma (OR IVW = 2.02, 1.40-2.92, p = 1.9 × 10 −4 ) and FEV 1 /FVC and adenocarcinoma (OR IVW = 1.19, 1.01-1.40, p = 0.04). However, there was still significant heterogeneity among the causal effect estimates. After filtering the remaining outliers, the effect of a 10% decrease in FEV 1 /FVC on adenocarcinoma strengthened (OR IVW = 1.24, 1.08-1.43, p = 2.4 × 10 −3 ), while estimates attenuated slightly for FEV1 and squamous carcinoma (OR IVW = 1.46, 1.14-1.87, p = 2.7 × 10 −3 ).
We also considered the possibility of residual confounding in our GWAS due to insufficient adjustment for smoking-related factors. We thus re-estimated SNP effects on FEV 1 , FVC, and FEV 1 /FVC with adjustment for continuous cigarette pack-years and years since quitting. The distribution of effect sizes did not differ between the two analyses (p > 0.05), and the correlation with our original instrument weights was strong for all phenotypes (Pearson's r ≥ 0.87, p < 1 × 10 −40 ) ( Supplementary  Fig. 3).
Functional characterization of lung function instruments. To gain insight into biological mechanisms mediating the observed effects of impaired pulmonary function on lung cancer risk, we conducted in silico analyses of functional features associated with the genetic instruments for each lung phenotype.
A total of 70 lung function instruments were mapped to genome-wide significant (p < 5.0 × 10 −8 ) protein quantitative trait loci (pQTL) affecting the plasma levels of 64 different proteins (Supplementary Data 8), based on data from the Human Plasma Proteome Atlas 36 . Many of these pQTL targets are involved in regulation of immune and inflammatory responses, such as interleukins (IL21, IL1R1, IL17RD, IL18R1), MHC class I polypeptide-related sequences, transmembrane glycoproteins expressed by natural killer cells, and members of the tumor necrosis receptor superfamily (TNFSF12, TNFRSF6B, TR19L). Other notable associations include NAD(P)H dehydrogenase [quinone] 1 (NQO1) a detoxification enzyme involved in protecting lung tissues in response to reactive oxidative stress (ROS) and promoting p53 stability 37 . NQO1 is a target of the NFE2-related factor 2 (NRF2), a master regulator of cellular antioxidant response that has generated considerable interest as a chemoprevention target 38,39 .
Next, we analyzed genes where the lung function instruments were localized using curated pathways from the Reactome database. Significant enrichment (FDR q < 0.05) was observed only for FEV 1 /FVC instruments in never smokers, with an overrepresentation of pathways involved in adaptive immunity and cytokine signaling (Supplementary Fig. 6). Top-ranking pathways with q = 2.2 × 10 −6 included translocation of ZAP-70 to immunological synapse, phosphorylation of CD3 and TCR zeta chains, and PD-1 signaling. These findings are in line with the predominance of immune-related pQTL associations. Examining all instruments for FEV 1 and FEV 1 /FVC identified significant over-representation (FDR q < 0.05) of six immunologic signatures from the ImmuneSigDB collection 40 , including pathways implicated in host response to infection and immunization (Supplementary Fig. 7).

Discussion
Despite a substantial body of observational literature demonstrating an increased risk of lung cancer in individuals with pulmonary dysfunction [2][3][4][5][6][7]41 , confounding by shared environmental risk factors and high co-occurrence of lung cancer and airflow obstruction created uncertainty regarding the causal nature of this relationship. We comprehensively investigated this by characterizing shared genetic profiles between lung cancer and lung function, and interrogated causal hypotheses using Mendelian randomization, which overcomes many limitations of observational studies. We also provide insight into biological pathways underlying the observed associations by incorporating functional annotations into heritability analyses, assessing eQTL and pQTL effects of lung function instruments, and conducting pathway enrichment analyses.
The large sample size of the UK Biobank allowed us to successfully create instruments for three pulmonary function phenotypes, FEV 1 , FEV 1 /FVC, and FVC. Although these phenotypes are closely related, they capture different aspects of pulmonary impairment, with FEV 1 and FEV 1 /FVC used for diagnostic purposes in clinical setting. Our genetic instruments captured known and novel mechanisms involved in pulmonary function. Of the 73 novel variants identified here, many were in loci implicated in immune-related functions and pathologies. Examples include HORMAD2, which has been previously linked to inflammatory bowel disease 42,43 and tonsillitis 44 , and RIPOR1 (also known as FAM65A), which is part of a gene expression signature for atopy 45 . PIEZO1 is primarily involved in mechano-transduction and tissue differentiation during embryonic development [46][47][48] , however, recent evidence has emerged delineating its role in optimal T-cell receptor activation and immune regulation 49 . BACH2, the new signal for FEV 1 /FVC in never smokers, is involved in alveolar macrophage function 50 , as well as selectionmediated TP53 regulation and checkpoint control 51 . The lead variant identified here is independent (r 2 < 0.05) of BACH2 loci nominally associated with lung function decline in a candidate gene study of COPD patients 52 , suggesting there may be differences in the genetic architecture of pulmonary traits in never smokers.
Our genetic correlation analyses indicate shared genetic determinants between pulmonary function with anthropometric traits and cigarette smoking. Our results are in contrast with the recent findings of Wyss et al. 24 , who did not observe statistically significant genetic correlations for any pulmonary function phenotypes with height and smoking, as well FVC and FEV 1 /FVC, using publicly available summary statistics from the UKB and other studies of European ancestry individuals. In this respect, assessing genetic correlation within a single well-characterized population provides improved power while minimizing potential for bias and heterogeneity when combining data from multiple sources.
We observed statistically significant genetic correlations between pulmonary function impairment and lung cancer susceptibility for all lung cancer subtypes, except for never smokers. Reduced FEV 1 /FVC was significantly correlated with increased risk of lung cancer overall, squamous cell carcinoma, and adenocarcinoma. Significant genetic correlations with FEV 1 and FVC were observed for lung cancer overall, in smokers, and for tumors with squamous cell histology, but not adenocarcinoma. Jiang et al. 25 reported a similar magnitude of genetic correlation with FEV 1 /FVC, but did not observe an association with FVC, and did not assess FEV 1 . Differences in our results may be attributable to their use of GWAS summary statistics for pulmonary phenotypes from the interim UK Biobank release. Our findings demonstrate substantial overlap in the genetic architecture of obstructive and neoplastic lung disease, particularly for highly conserved variants that are likely to be subject to natural selection, and super enhancers. However, genetic correlations do not support a causal interpretation, especially considering the shared heritability with potentially confounding traits, such as smoking and obesity.
On the other hand, Mendelian randomization analyses revealed histology-specific effects of reduced FEV 1 and FEV 1 / FVC on lung cancer susceptibility, suggesting that these indicators of impaired pulmonary function may be causal risk factors. Genetic predisposition to FEV 1 impairment conferred an increased risk of lung cancer overall, particularly for squamous carcinoma. This relationship persisted after filtering potentially pleiotropic instruments and performing other sensitivity analyses, including multivariable Mendelian randomization and manual filtering of variants associated with smoking or adiposity. FEV 1 / FVC reduction appeared to increase the risk of lung adenocarcinoma, as well as lung cancer among never smokers. The latter finding is particularly compelling since it precludes confounding by smoking-related factors and demonstrates an association with the most clinically relevant pulmonary phenotype. The increased lung cancer risk in never smokers was also observed using genetic instruments developed specifically in never smokers and in sensitivity analyses using instruments from the population that also includes smokers. We hypothesize that the effects of pulmonary obstruction are mediated by chronic inflammation and immune response, which is supported by the over-representation of adaptive immunity and cytokine signaling pathways and pQTL effects among FEV 1 and FEV 1 /FVC instruments.
Examining lung eQTL effects of our genetic instruments identified additional relevant mechanisms, including gene expression of SECISBP2L and DISP2. SECISBP2L at 15q21 is essential for ciliary function 53 and has an inhibitory effect on lung tumor growth by suppressing cell proliferation and inactivation of Aurora kinase A 54 . This gene was among several susceptibility regions identified in the most recent lung cancer GWAS 16 , and now we more conclusively establish impaired pulmonary function as the mechanism mediating SECISBP2L effects on risk of lung cancer overall, particularly adenocarcinoma. Less is known about DISP2, although it has been implicated in the conserved Hedgehog signaling pathway essential for embryonic development and cell differentiation 55 .
One of the main challenges and outstanding questions in previous epidemiologic studies has been clarifying how smoking fits into the causal pathway between impaired pulmonary function and lung cancer risk. Are indicators of airway obstruction simply proxies for smoking-induced carcinogenesis? The association between reduced FEV 1 /FVC and risk of adenocarcinoma and lung cancer in never smokers observed in our Mendelian randomization analysis and in previous studies 8,9 , argues against this simplistic explanation and points to alternative pathways. Chronic airway inflammation fosters a lung microenvironment with altered signaling pathways, aberrant expression of cytokines, chemokines, growth factors, and DNA damage-promoting agents, all of which promote cancer initaiton 15 . This mechanism may be particularly relevant for adenocarcinoma, which is the most common lung cancer histology in never smokers, arising from the peripheral alveolar epithelium that has less direct contact with inhaled carcinogens.
Dysregulated immune function is a hallmark of lung cancer and COPD, with both diseases sharing similar inflammatory cell profiles characterized by macrophages, neutrophils, and CD4+ and CD8+ lymphocytes. Immune cells in COPD and emphysema exhibit T helper 1 (Th1)/Th17 polarization, decreased programmed death ligand-1 (PD-L1) expression in alveolar macrophages, and increased production of interferon (IFN)-γ by CD8+ T cells 56 , a phenotype believed to prevail at tumor initiation, whereas established tumors are dominated by Th2/M2-like macrophages 57 . These putative mechanisms were highlighted in our pathway analysis, with an enrichment of genes involved in INF-γ, PD-1 and IL-1 signaling among FEV 1 /FVC genetic instruments, and over-representation of pQTL targets in these pathways. Furthermore, a study of trans-thoracically implanted tumors in an emphysema mouse model demonstrates how this pulmonary phenotype results in impaired antitumor T-cell responses at a critical point when nascent cancer cells evade detection and elimination by the immune system resulting in enhanced tumor growth 58 .
Other relevant pathways implicating pulmonary dysfunction in lung cancer development include lung tissue destruction via matrix degrading enzymes and increased genotoxic and apoptotic stress resulting from cigarette smoke in conjunction with macrophage-and neutrophil-derived ROS 15,59 . This may explain our findings for FEV 1 and squamous carcinoma, for which cigarette smoking is a particularly dominant risk factor. Genetic predisposition to impaired FEV 1 may create a milieu that promotes malignant transformation and susceptibility to external carcinogens and tissue damage, rather than increasing the likelihood of cigarette smoking. In our analysis we attempted to isolate the former pathway from the latter by carefully instrumenting pulmonary phenotypes and confirming that they are not associated with behavioral aspects of nicotine dependence. However, residual confounding by smoking cannot be entirely precluded, given its high genetic and phenotypic correlation with FEV 1 .
The causal interpretation of our results critically depends on the validity of fundamental Mendelian randomization assumptions. We employed a range of estimation techniques with different underlying assumptions, as well as diagnostic tests, to interrogate the robustness of our results with respect to confounding, horizontal pleiotropy, and weak instrument bias. However, despite these efforts, residual confounding by related phenotypes, such as smoking, or subtle effects of population structure cannot be ruled out. In evaluating the contribution of our findings, several limitations should be acknowledged. Our approach to outlier removal based on Cochran's Q-statistic with modified second order weights may have been overly stringent; however, manually pruning based on such a large set of genetic instruments may not be feasible and may introduce additional bias, thus we feel this systematic conservative approach is justified. Furthermore, outlier removal did not have an adverse impact on instrument strength and precision of the MR analysis.
In addition to pleiotropy, selection bias may also undermine the validity of a Mendelian Randomization study, particularly in the form of collider bias, if selection is a function of the exposure or outcome. In the context of the UKB, low participation (5.5%) may have resulted in an unrepresentative study population 60,61 . Although enrollment in the cohort was not explicitly contingent on cancer status or pulmonary function, it is likely that individuals who did not complete a spirometry assessment were more likely to be smokers and have poor lung function. Simulations by Gkatzionis and Burgess 61 demonstrate that when the effect of a risk factor on selection is mild to moderate (odds of selection: 0.82-0.61), the type I error rate remains reasonable at 5.0-6.6%. The direction of the resulting bias depends on the direction and strength of the exposure (lung function)-confounder (smoking) relationship. In the context of our study, the causal effect may be underestimated since the confounder and exposure are both likely to increase non-participation or result in missing spirometry data.
Another limitation is that we did not assess the relationship between the velocity of lung function decline and lung cancer risk, which may also prove to be a risk factor and capture a different dimension of pulmonary dysfunction. Furthermore, since our study includes the largest GWAS of lung cancer cases in never smokers, this precludes a well-powered replication study in an independent European ancestry population. In addition, dichotomous stratification by smoking status does not permit an evaluation of the relationship between pulmonary impairment and lung cancer risk across more granular levels of smoking. Last, in our efforts to present the most comprehensive assessment of pulmonary function impairment and lung cancer risk, a number of analyses were conducted, and it may be possible that some inconsistently observed associations were due to chance.
Despite these limitations, important strengths of this work include the large sample size for instrument development and causal hypothesis testing. Our Mendelian randomization approach leveraged a large number of genetic instruments, including variants specifically associated with lung function in never smokers, while balancing the concerns related to genetic confounding and pleiotropy. By triangulating evidence from gene expression and plasma protein levels, we also provide a more enriched interpretation of the genetic effects of pulmonary function loci on lung cancer risk, which implicate immunemediated pathways. Despite the small individual SNP effect sizes, combining multiple instruments revealed meaningful increases in lung cancer risk. A genetically predicted 10% reduction in FEV 1 / FVC confers an~55% increased risk of lung cancer in never smokers, and a similar magnitude of effect was observed for FEV 1 and squamous carcinoma. However, effects of FEV 1 /FVC on adenocarcinoma were more modest (16-23% increase). Taken together, these findings provide more robust etiological insight than previous studies that relied on using observed lung function phenotypes directly as putatively casual factors.
As our understanding of the shared genetic and molecular pathways between lung cancer and pulmonary disease continues to evolve, identification of new susceptibility loci for pulmonary function and lung cancer risk may have important implications for future precision prevention and screening endeavors. Multiple genetic determinants of lung function are in pathways that contain druggable targets, based on our pQTL findings and previous reports 18 , which may open new avenues for chemoprevention or targeted therapies for lung cancers with an obstructive pulmonary etiology. In addition, with accumulating evidence supporting the effectiveness of low-dose computed tomography for lung cancer 62,63 , impairment in FEV 1 and FEV 1 /FVC and their genetic determinants may provide additional information for refining risk stratification and screening eligibility criteria.

Methods
Study populations. The UK Biobank (UKB) is a population-based prospective cohort of over 500,000 individuals aged 40-69 years at enrollment in 2006-2010 who completed extensive questionnaires on health-related factors, physical assessments, and provided blood samples 64 . Participants were genotyped on the UK Biobank Affymetrix Axiom array (89%) or the UK BiLEVE array (11%) 64 . Genotype imputation was performed using the Haplotype Reference Consortium data as the main reference panel as well as using the merged UK10K and 1000 Genomes (1000G) phase 3 reference panels 64 . Our analyses were restricted to individuals of predominantly European ancestry based on self-report and after excluding samples with either of the first two genetic ancestry principal components (PCs) outside of 5 standard deviations (SD) of the population mean. Samples with discordant self-reported and genetic sex were removed. Using a subset of genotyped autosomal variants with minor allele frequency (MAF) ≥0.01 and call rate ≥97%, we filtered samples with call rates <97% or heterozygosity >5 standard deviations (SD) from the mean. First-degree relatives were identified using KING 65 and one sample from each pair was excluded, leaving at total of 413,810 individuals available for analysis.
We further excluded 36,461 individuals without spirometry data, 207 individuals who only completed one blow (n = 207), for whom reproducibility could not be assessed ( Supplementary Fig. 1). For the remaining subjects, we examined the difference between the maximum value per individual (referred to as the best measure) and all other blows. Values differing by more than 0.15 L were considered non-reproducible, based on standard spirometry guidelines 66 , and were excluded. Our analyses thus included 372,750 and 370,638 individuals for of FEV 1 and FVC, respectively. The best per individual measure among the reproducible blows was used to derive FEV 1 /FVC, resulting in 368,817 individuals. FEV 1 and FVC values were then converted to standardized Z-scores with a mean of 0 and standard deviation (SD) of 1.
The OncoArray Lung Cancer study has been previously described 16 . Briefly, this dataset consists of genome-wide summary statistics based on 29,266 lung cancer cases (11,273 adenocarcinoma, 7426 squamous carcinoma) and 56,450 controls of predominantly European ancestry (≥80%) assembled from studies part of the International Lung Cancer Consortium. Summary statistics from the lung cancer GWAS were adjusted for appropriate covariates, including genetic ancestry PCs, and showed no signs of genomic inflation for lung cancer overall (λ GC = 1.0035) or for any subtypes, including adenocarcinoma (λ GC = 1.0050), squamous carcinoma (λ GC = 1.0051), and lung cancer in never smokers (λ GC = 1.0060).
Informed consent was obtained from study participants in the UK Biobank and studies contributing data to the OncoArray Lung Cancer collaboration. UK Biobank received ethics approval from the Research Ethics Committee (REC reference: 11/NW/0382). Approval for OncoArray studies was obtained from each of the participating institutional research ethics review boards.
Genome-wide association analysis. Genome-wide association analyses of pulmonary function phenotypes in the UK Biobank cohort were conducted using PLINK 2.0 (October 2017 version). We excluded variants out of with Hardy-Weinberg equilibrium at p < 1 × 10 −5 in cancer-free individuals, call rate <95% (alternate allele dosage required to be within 0.1 of the nearest hard call to be non-missing), imputation quality INFO < 0.30, and MAF < 0.005. To minimize potential for reverse causation, prevalent lung cancer cases, defined as diagnoses occurring up to 5 years before cohort entry and incident cases occurring within 2 years of enrollment, were excluded (n = 738). Linear regression models for pulmonary function phenotypes (standardized Z-scores for FEV 1 and FVC; untransformed FEV 1 /FVC ratio bounded by 0 and 1) were adjusted for age, age2, sex, genotyping array and 15 PCs to permit an assessment of heritability (h g ) and genetic correlation (r g ) with height, smoking (status and pack-years), and anthropometric traits.
Heritability and genetic correlation. LD Score regression 17 was used to estimate h g for each lung phenotype and r g with lung cancer and other traits. To better capture LD patterns present in the UKB data, we generated LD scores for all variants that passed QC with MAF > 0.0001 using a random sample of 10,000 UKB participants. UKB LD scores were used to estimate h g for each lung phenotype and r g with other non-cancer traits. Genetic correlation with lung cancer was estimated using publicly available LD scores based on the 1000G phase 3 reference population (n = 1,095,408 variants).
To assess the importance of specific functional annotations in SNP-heritability, we partitioned trait-specific heritability using stratified-LDSC 67 . The analysis was performed using 86 annotations (baseline-LD model v2.1), which incorporated MAF-adjustment and other LD-related annotations, such as predicted allele age and recombination rate 20,22 . The MHC region was excluded from partitioned heritability analyses. Enrichment was considered statistically significant if p < 8.5 × 10 −4 , which reflects Bonferroni correction for 59 annotations (functional categories with and without a 500 bp window around it were considered as the same annotation).
Development of genetic instruments for pulmonary function. For the purpose of instrument development, a two-stage genome-wide analysis was employed, with a randomly sampled 70% of the cohort used for discovery and the remaining 30% reserved for replication. In addition to age, age2, sex, genotyping array and 15 PC's, models were adjusted for covariates that explain a substantial proportion of variation in pulmonary phenotypes, such as smoking and height, in order to decrease the residual variance and help isolate the relevant genetic signals. Specifically, we adjusted for height, height2, and cigarette pack-year categories (0, corresponding to never smokers, >0-10, >10-20, >20-30, >30-40, and >40). Other covariates, such as UKB assessment center (Field 54), use of an inhaler prior to spirometry (Field 3090), and blow acceptability (Field 3061) were considered. However, these covariates did not explain a substantial proportion of phenotype variation and had low variable importance metrics (lmg < 0.01), and thus were not included in our final models. Instruments were selected from independent associated variants (LD r 2 < 0.05 in a clumping window of 10,000 kb) with P < 5 × 10 −8 in the discovery stage and P < 0.05 and consistent direction of effect in the replication stage. Since the primary goal of our GWAS was to develop a comprehensive set of genetic instruments we applied a less stringent replication threshold in anticipation of subsequent filtering based on potential violation of Mendelian randomization assumptions.
Mendelian randomization. Mendelian randomization (MR) analyses were carried out to investigate the potential causal relationship between impaired pulmonary function and lung cancer risk. Genetic instruments excluded multi-allelic and noninferable palindromic variants with intermediate allele frequencies (MAF > 0.42). Odds ratios (OR) and corresponding 95% confidence intervals were obtained using the maximum likelihood and inverse variance weighted multiplicative randomeffects (IVW-RE) estimators 28,29 . Effects for FEV 1 and FVC were estimated for a genetically predicted 1-SD decrease in the standardized Z-score. For FEV 1 /FVC, we modeled cancer risk corresponding to a 10% decrease in the ratio. Sensitivity analyses included the weighted median (WM) estimator 30 , which provides unbiased estimates when up to 50% of the weights are from invalid instruments, and MR RAPS (Robust Adjusted Profile Score), which incorporates random effect and robust loss functions to limit the influence of potentially pleiotropic instruments. MR RAPS assumes balanced (mean 0) horizontal pleiotropy. In contrast to IVW-RE, MR RAPS models idiosyncratic and systematic pleiotropy effects as additive, rather than multiplicative 31 . Using MR estimation techniques with different underlying statistical models allows for a more comprehensive assessment of the robustness of our results with respect to violations of MR assumptions. We also applied the following diagnostic tests: (i) significant (p < 0.05) deviation of the MR Egger intercept (β 0 Egger ) from 0, as a test for directional pleiotropy 68 ; (ii) I 2 GX statistic < 0.90 indicative of regression dilution bias and inflation in the MR Egger pleiotropy test due to violation of the no measurement error (NOME) assumption 68 ; (iii) Cochran's Q-statistic with modified second order weights to asses heterogeneity (p-value < 0.05) indicative of (balanced) horizontal pleiotropy 69 .
All statistical analyses were conducted using R (version 3.6.1). Mendelian randomization analyses were conducted using the TwoSampleMR R package (version 0.4.23).
Functional characterization of lung function instruments. In order to characterize functional pathways that are represented by the genetic instruments for FEV 1 and FEV 1 /FVC, we examined effects on gene expression in lung tissues from 409 subjects from the Laval eQTL study 35 . Lung function instruments with significant (Bonferroni p-value < 0.05) eQTL effects were used as instruments to estimate the effect of the gene expression on lung cancer risk. For genes with multiple eQTLs, independent variants (LD r 2 < 0.05) were used to obtain IVW estimates of the predicted effects of increased gene expression on lung cancer risk. For genes with a single eQTL, OR estimates were obtained using the Wald method.
Next, we examined data from the genetic atlas of the human plasma proteome 36 , queried using PhenoScanner 70 , to assess whether any of the genetic instruments for FEV 1 and FEV 1 /FVC had significant (p < 5 × 10 −8 ) effects on intracellular protein levels. Last, we summarized the pathways represented by the genes where the lung function instruments were localized using pathway enrichment analysis via the Reactome database and ImmuneSigDB (collection C7 from MSigDB). Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The datasets generated during and/or analyzed during the current study are available from the authors on request. Genotype data for the Oncoarray Consortium Lung Cancer studies have been deposited in the database of Genotypes and Phenotypes (dbGaP) under accession: phs001273.v2.p2. Readers interested in obtaining a copy of the lung cancer GWAS summary statistics can do so by completing the proposal request form at http:// oncoarray.dartmouth.edu/. The UK Biobank in an open access resource, available at https://www.ukbiobank.ac.uk/researchers/. This research was conducted with approved access to UK Biobank data under applications number 14105 and 23261. All data supporting the findings of this study are available within the article and its supplementary information files, and from the corresponding authors upon reasonable request. A reporting summary for this article is available as a Supplementary file.