Multi-ancestry genome-wide association analyses improve resolution of genes and pathways influencing lung function and chronic obstructive pulmonary disease risk

Lung-function impairment underlies chronic obstructive pulmonary disease (COPD) and predicts mortality. In the largest multi-ancestry genome-wide association meta-analysis of lung function to date, comprising 588,452 participants, we identified 1,020 independent association signals implicating 559 genes supported by ≥2 criteria from a systematic variant-to-gene mapping framework. These genes were enriched in 29 pathways. Individual variants showed heterogeneity across ancestries, age and smoking groups, and collectively as a genetic risk score showed strong association with COPD across ancestry groups. We undertook phenome-wide association studies for selected associated variants as well as trait and pathway-specific genetic risk scores to infer possible consequences of intervening in pathways underlying lung function. We highlight new putative causal variants, genes, proteins and pathways, including those targeted by existing drugs. These findings bring us closer to understanding the mechanisms underlying lung function and COPD, and should inform functional genomics experiments and potentially future COPD therapies.

Lung-function abnormality predicts mortality and is a diagnostic criterion for chronic obstructive pulmonary disease (COPD) 1 , which is the most prevalent respiratory disease globally 2 and lacks disease-modifying treatments.Although smoking and other environmental risk factors for COPD are well known and genetic susceptibility is recognized, the molecular pathways underlying COPD are incompletely understood.As with other complex traits there has been a lack of ancestral diversity in genome-wide association studies (GWAS) 3 of lung function [4][5][6] .Multi-ancestry studies improve the power and fine-mapping resolution of GWAS and increase the prospects for prediction, prevention, diagnosis and treatment in diverse populations 3,4,7 .
Understanding of the genes, proteins and pathways involved in disease-related traits underpins modern drug development.A high yield of genetic-association signals, improved signal resolution and integration with functional evidence assist confident identification of causal genes as well as the variants and pathways that impact gene function and regulation.Although datasets and in silico tools to connect GWAS signals to causal genes are improving, the findings from different datasets and tools have lacked consensus 8,9 , highlighting a need for frameworks to integrate functional evidence types and compare findings 10 .
Aggregation of lung-function-associated genetic variants into a genetic risk score (GRS) provides a tool for COPD prediction 5 .When a GRS comprises many variants, partitioning the GRS according to the biological pathways the variants influence could provide a tool to explore their aggregated consequences across different traits through phenome-wide association studies (PheWAS).Just as PheWAS of individual genetic variants predicts the consequences of perturbations of specific protein targets, informing assessment of drug efficacy, drug safety and drug repurposing 11 , PheWAS of pathway-partitioned GRS Article https://doi.org/10.1038/s41588-023-01314-0and Supplementary Tables 2-4), and then applied genomic control using the linkage disequilibrium (LD) score regression intercept 12 .After filtering and meta-analysis across multi-ancestry cohorts, 66.8 million variants were available in each of four lung-function traits, with genomic inflation factors λ of 1.025, 1.022, 0.984 and 0.996 for FEV 1 , FVC, FEV 1 /FVC and PEF, respectively (Supplementary Figs.2,3 and Supplementary Table 5).

1,020 signals for lung function
After excluding eight signals associated with smoking behavior (Supplementary Table 26) and combining signals that co-localized across traits, we identified 1,020 distinct signals for lung function using a stringent threshold of P < 5 × 10 −9 (ref. 13; Fig. 1a).Of these, 713 are novel with respect to the signals and studies described in the Supplementary Note (Supplementary Table 6).These 1,020 signals explain 33.0% of FEV 1 /FVC heritability (21.3% for FEV 1 , 17.3% for FVC and 21.4% for PEF; Methods).
To facilitate fine mapping, we included larger, more diverse populations than previous lung-function GWAS.We performed multi-ancestry meta-regression with MR-MEGA 7 , which incorporates axes of genetic ancestry as covariates to model heterogeneity (Methods).We then incorporated functional annotation for chromatin accessibility and transcription-factor binding sites in respiratory-relevant cells and tissues, and enriched genomic annotations 14 to weight prior causal probabilities of association for putative causal variants (Methods).Overall reductions in credible set size and higher maximum posterior probabilities of association for the most likely causal variants were evident after multi-ancestry meta-regression and after functional annotations were incorporated (Supplementary Fig. 4).Following fine mapping, 438 (43%) signals had a single putative causal variant (posterior probability > 50%) and the median credible set size was nine variants (Supplementary Note).
Of the 960 sentinels represented in ≥7 cohorts, 109 signals showed heterogeneity attributable to ancestry (P Het < 0.05; Supplementary could inform the understanding of the consequences of perturbations of specific pathways. Through the largest global assembly of lung-function genomics studies to date we: (1) undertook a multi-ancestry GWAS meta-analysis of lung-function traits in 588,452 individuals to detect novel signals, improve fine mapping and estimate heterogeneity in allelic effects attributable to ancestry; (2) tested whether lung-function signals are age-or smoking-dependent, and assessed their relationship to height; (3) investigated cell-type and functional specificity of lung-function association signals; (4) fine-mapped signals through annotation-informed credible sets, integrating functional data such as respiratory cell-specific chromatin accessibility signatures; (5)  applied a consensus-based framework to systematically investigate and identify putative causal genes, integrating eight locus-based or similarity-based criteria; (6) developed and applied a GRS for the ratio of forced expiratory volume in 1 s (FEV 1 ) to forced vital capacity (FVC) in different ancestries in the UK Biobank and COPD case-control studies; and (7) applied PheWAS to individual variants, GRS for each lung-function trait and GRS partitioned by pathway.Through these approaches, we aimed to detect novel lung-function signals and putative causal genes as well as provide new insights into the mechanistic pathways underlying lung function, some of which may be amenable to drug therapy.
We examined associations of lung-function-associated SNPs in children's cohorts (Supplementary Table 8) and tested for differences in the estimated effect sizes of lung-function-associated SNPs between children and adults as well as between ever-smokers and never-smokers in EUR individuals (Methods).The effect-size estimates between children and adults were correlated (r from 0.51 for FEV 1 /FVC to 0.79 for FEV 1 ; Supplementary Fig. 7), although 113 signals showed nominal evidence (P < 0.05) of age-dependent effects (more than expected, binomial P = 2.56 × 10 −13 ).Three signals (rs7977418 (CCDC91), rs34712979 (NPNT) and rs931794 (HYKK) showed age-dependent effects (Bonferroni-corrected P < 4.64 × 10 −5 ; Supplementary Table 9).We observed nominal evidence (P < 0.05) of smoking-dependent effects for 69 of 1,020 signals (Supplementary Fig. 8), more than expected (binomial P = 0.0079).The intronic SNP rs7733410 in HTR4, a signal we previously reported for lung function 15 , showed a 76.2% larger effect on FEV 1 in ever-smokers compared with never-smokers (P = 4.09 × 10 −5 ; Supplementary Table 10).As height is a determinant of lung growth, we compared height and lung-function associations, and tested the impact of additional height adjustments for sentinel SNPs.We found no correlation between estimated effect sizes for height and lung function (Supplementary Fig. 9), and the addition of height squared and height cubed covariates had little impact on effect-size estimates (Supplementary Fig. 10).

Cell-type and functional specificity
We assessed whether our association signals were enriched for regulatory or functional features in specific cell types.Using stratified LD-score regression 16 , we found enrichment of all histone marks tested (H3K27ac, H3K9ac, H3K4me3 and H3K4me1) in lung-and smooth-muscle-containing cell lines (Supplementary Table 11).Using GARFIELD 17 we assessed for enrichment of our signals for DNase l hypersensitivity sites and chromatin accessibility peaks, showing enrichment in a wide variety of cell types, including higher enrichment in both fetal and adult lung and blood for FEV 1 , FEV 1 /FVC and PEF as well as fibroblast enrichment for FVC (Supplementary Fig. 11a).Our signals were enriched for transcription-factor footprints in fetal lung for FEV 1 , FEV 1 /FVC and PEF, for footprints in skin for FVC and also in blood for PEF (Supplementary Fig. 11b).Genic annotation enrichment patterns were similar across all traits, with enrichment mainly in exonic, 3′ UTR and 5′ UTR regions (Supplementary Fig. 11c).For all traits, we saw enrichment for transcription start sites, weak enhancers, enhancers and promoter flanks, with cell types for weak enhancer enrichment including endothelial cells for FEV 1 , FEV 1 /FVC and PEF (Supplementary Fig. 11d).For transcription-factor binding sites, we observed a similar enrichment pattern across all of the lung-function traits, with the largest fold enrichment observed for endothelial cells (Supplementary Fig. 11e).Our signals were enriched for assay for transposase-accessible chromatin  The gray in the first eight columns indicates that at least one variant implicates the gene as causal via the evidence for that column.The last four columns indicate the level of association of the most significant variant implicating the gene as causal with respect to the FEV 1 /FVC decreasing allele; red indicates that this association is in the same direction of effect as the FEV 1 /FVC decreasing allele and blue indicates the opposite direction with the shade indicating P < the corresponding value in the legend.
To supplement our understanding of the biological pathways and clinical phenotypes influenced by lung-function-associated variants, we undertook PheWAS of selected individual variants.We selected 27 putative causal genes implicated by ≥4 criteria (20 genes) or by a single putative causal missense variant that was deleterious (five genes: ACAN, ADGRG6, SCARF2, CACNA1S and HIST1H2BE) or rare (two genes: SOS2 and ADRB2; Supplementary Table 14).We interpreted the PheWAS findings (shown in full in Supplementary Fig. 13 and Supplementary Table 15) alongside literature reviews (Supplementary Table 16) and highlight examples below.
The putative causal deleterious missense ABCA3 rs149989682 (A allele; frequency of 0.6%) variant associated with reduced FEV 1 / FVC was reported to cause pediatric interstitial lung disease 21 .ABCA3, which is expressed in alveolar type II cells and localized to lamellar bodies, is involved in surfactant-phospholipid metabolism, and ABCA3 mutations cause severe neonatal surfactant deficiency 22 .The putative causal deleterious missense GATA5 rs200383755 (C allele, frequency of 0.6%) variant associated with lower FEV 1 was associated with increased asthma risk, higher blood pressure and reduced risk of benign prostatic hyperplasia (Supplementary Fig. 13i).GATA5 associations have not been previously noted in asthma GWAS, although Gata5-deficient mice show airway hyperresponsiveness 23 .GATA5 encodes a transcription factor expressed in bronchial smooth muscle, bladder and prostate; a previous benign prostatic hyperplasia GWAS reported a GATA5 signal 23,24 .CLDN18 was implicated by four criteria, including a mouse knockout with abnormal pulmonary alveolar epithelium morphology 25 .Through calcium-independent cell adhesion, CLDN18 influences epithelial-barrier function through tight-junction-specific obliteration of the intercellular space 26 .Its splice variant, CLDN18.1, is predominantly expressed in the lung 27 .Reduced CLDN18 expression was reported in asthma 26 .However, our PheWAS showed no association with asthma susceptibility or other traits (CLDN18 rs182770 in Supplementary Table 15).LRBA was also implicated by four criteria.Mutations resulting in LRBA deficiency cause common variable immunodeficiency-8 with autoimmunity, which can include coughing, respiratory infections, bronchiectasis and interstitial lung disease 28,29 .The putative causal LRBA tolerated missense variant rs2290846 (posterior probability of 56.3%) was associated with 31 traits (false discovery rate (FDR) < 1%; Supplementary Fig. 13n and Supplementary Table 15); the G allele, associated with lower FVC and lower FEV 1 , was associated with lower neutrophils as well as lower risk of cholelithiasis, cholecystitis 30 and diverticular disease.
FGFR1, encoding Fibroblast growth factor receptor 1, has roles in lung development and regeneration 31 .Loss-of-function FGFR1 mutations cause hypogonadotropic hypogonadism 32 .The T allele of rs881299, associated with lower FEV 1 /FVC and higher FVC, was strongly associated with higher testosterone (particularly in males) and higher sex-hormone-binding globulin (SHBG), lower body-mass index (BMI) as well as lower levels of alanine transaminase and urate (Supplementary Fig. 13w-y

Article
https://doi.org/10.1038/s41588-023-01314-0rs72681869 also showed association with SHBG; in both sexes, the G allele, associated with lower FVC and lower FEV 1 , was associated with lower SHBG, higher alanine aminotransferase (ALT) and aspartate aminotransferase (AST), higher fat mass, HbA1c and higher systolic and diastolic blood pressure, higher urate and creatinine, and in males lower testosterone and reduced inguinal hernia risk (Supplementary Fig. 13z-bb).Mutations in SOS2 have been reported in individuals with Noonan syndrome.The A allele of rs7514261 implicating CFH, associated with lower FVC, was strongly associated with reduced risk of macular degeneration 33 as well as raised albumin (Supplementary Fig. 13g).
CACNA1S is one of several putative causal genes encoding calcium voltage-gated channel subunits in skeletal muscle (CACNA1S, CACNA1D and CACNA2D3 supported by ≥2 criteria; CACNA1C was supported by PoPS).CACNA1S mutations have been reported in hypokalemic periodic paralysis 34 and malignant hyperthermia 35 .CACNA1S is strongly expressed in skeletal muscle but at much lower levels in airway smooth muscle.The common CACNA1S missense variant rs3850625 (A allele, frequency of 11.8% in EUR and 21.4% in SAS) was associated with lower FVC, lower FEV 1 , lower whole body fat-free mass, reduced hand grip strength as well as lower AST and creatinine levels (Supplementary Fig. 13f).CACNA1S and CACNA1D are targeted by dihydropyridine calcium channel blockers, which previously produced small improvements in lung function in asthma 36 .For the low-frequency missense ADRB2 variant rs1800888 (T; 1.49% in EUR), associated with lower FEV 1 and lower FEV 1 /FVC, the strongest PheWAS association was with increased eosinophil count (Supplementary Fig. 13d).

Druggable targets
Using the Drug Gene Interaction Database, we surveyed 559 genes supported by ≥2 criteria.CheMBL interactions identified 292 drugs mapping to 55 genes (Supplementary Table 17), including ITGA2, which encodes integrin subunit alpha 2. The reduced expression of ITGA2 in lung tissue associated with the C allele of rs12522114 mimics vatelizumab-induced ITGA2 inhibition; this allele is associated with higher FEV 1 and FEV 1 /FVC, indicating the potential to repurpose vatelizumab, which increases T regulatory cell populations 37 , for COPD treatment.

Pathway analysis
Using ConsensusPathDB 38 , we tested biological pathway enrichment for 559 causal genes supported by ≥2 criteria, highlighting pathways relevant for development, tissue integrity and remodeling (Supplementary Table 18).These include pathways not previously implicated in pathway enrichment analyses for lung function-such as PI3K-Akt signaling, integrin pathways, endochondral ossification, calcium signaling, hypertrophic cardiomyopathy and dilated cardiomyopathy-as well as those previously implicated via individual genes 5 such as TNF signaling, actin cytoskeleton, AGE-RAGE signaling, Hedgehog signaling and cancers.We found strengthened enrichment through newly identified genes in previously described pathways, such as extracellular matrix organization (34 new genes), elastic fiber formation (eight new genes) and TGF-Core (four new genes).Consistent with our ConsensusPathDB findings, Ingenuity Pathway Analysis (https://digitalinsights.qiagen.com/IPA) 39 highlighted enrichment of cardiac hypertrophy signaling and osteoarthritis pathways and also implicated pulmonary and hepatic fibrosis signaling pathways, axonal guidance and PTEN signaling as well as the upstream regulators TGFB1 and IGF-1 (Supplementary Table 19).

Multi-ancestry GRS for FEV1/FVC and COPD
We built multi-ancestry and ancestry-specific GRSs weighted by FEV 1 / FVC effect sizes and tested association with FEV 1 /FVC and COPD (GOLD stage 2-4) within groups of individuals of different ancestries in the UK Biobank (Methods).Our new GRS improved lung-function and  20), and the multi-ancestry GRS outperformed the ancestry-specific GRS in all UK Biobank ancestries.We then tested the multi-ancestry GRS in five independent COPD case-control studies (Supplementary Table 21 and Methods).Stronger COPD susceptibility associations were observed across five EUR-ancestry studies compared with a previous GRS 5 (Fig. 3c and Supplementary Table 22).In the meta-analysis of these EUR studies, the odds ratio for COPD per s.d. of GRS increase was 1.63 (95% confidence interval (CI), 1.56-1.71;P = 7.1 × 10 −93 ); members of the highest GRS decile had a 5.16-fold higher COPD risk than the lowest decile (95% CI, 4.14-6.42;P = 1.0 × 10 −48 ; Fig. 3d and Supplementary Table 23).The results for individuals in the SPIROMICS study of AFR ancestry were comparable to individuals from the UK Biobank with AFR ancestry but lower in magnitude compared with the COPDGene AFR population (Fig. 3c).

PheWAS of trait-specific GRSs
To study the aggregate effects of lung-function-associated genetic variants on a wide range of diseases and disease-relevant traits, we created GRSs for FEV 1 , FVC, FEV 1 /FVC and PEF, each comprising sentinel variants (P < 5 × 10 −9 ) with weights estimated from the multi-ancestry meta-regression (Methods), and tested these in PheWAS.These GRS values showed distinct patterns of associations with respiratory and non-respiratory phenotypes (Supplementary Fig. 14 and Supplementary Table 24).A GRS for lower FEV 1 was most strongly associated with increased risk of asthma and COPD, family history of chronic bronchitis/ emphysema, lower hand grip strength, increased fat mass, increased HbA1c and type 2 diabetes risk, and elevated C-reactive protein.In addition, associations were observed with increased asthma exacerbations and lower age of onset for COPD (Supplementary Fig. 14a).The GRS for lower FEV 1 /FVC was associated with key respiratory phenotypes: increased risk of COPD and asthma, family history of chronic bronchitis/emphysema, increased emphysema risk, increased risk of respiratory insufficiency or respiratory failure and younger age of onset for COPD but a slightly lower risk of COPD exacerbations (Supplementary Fig. 14b).In contrast, the GRS for lower FVC was strongly associated with many traits-among the strongest associations were high C-reactive protein, increased fat mass, raised HbA1c and type 2 diabetes, raised systolic blood pressure, lower hand grip strength and raised ALT as well as increased risk of clinical codes for asthma and COPD (Supplementary Fig. 14c).Although the GRS for lower FEV 1 /FVC was associated with increased standing and sitting height, the GRSs for lower FEV 1 and FVC were associated with increased standing height but reduced sitting height.Broadly similar phenome-wide associations were seen for the PEF and the FEV 1 GRS (Supplementary Fig. 14d).

PheWAS of GRSs partitioned by pathway
Finally, we hypothesized that partitioning our lung-function GRS into pathway-specific GRSs according to the biological pathways the variants influence could inform understanding of mechanisms underlying impaired lung function, and the probable consequences of perturbing specific pathways.Informed by the above prioritization of putative causal genes and classification of these genes by pathway ('Pathway analysis' section), we undertook PheWAS for FEV 1 /FVC-weighted GRSs partitioned by each of the 29 pathways enriched (FDR < 10 −5 ) for the 559 genes implicated by ≥2 criteria (Methods).Partitioning of GRSs in this way highlighted markedly different patterns of phenome-wide associations (Supplementary Fig. 15 and Supplementary Table 25).

Article
https://doi.org/10.1038/s41588-023-01314-0association with COPD clinical codes and family history of chronic bronchitis/emphysema, although the associations with other traits varied.The GRS for lower FEV 1 /FVC specific to elastic fiber formation was associated with increased risk of inguinal, abdominal, diaphragmatic and femoral hernia; diverticulosis; arthropathies; hallux valgus as well as genital prolapse; reduced carpal tunnel syndrome risk and BMI; and increased asthma risk (Fig. 4).In contrast, the GRS for lower FEV 1 /FVC specific to PI3K-Akt signaling was associated with increased asthma risk, lower IGF-1, lower liver enzymes (ALT, AST and gamma glutamyltransferase (GGT)), lower lymphocyte counts, raised eosinophils, lower fat mass and BMI, and reduced diabetes risk (Fig. 5).The GRS for lower FEV 1 /FVC specific to the hypertrophic cardiomyopathy pathway was associated with reduced liver enzymes (ALT and GGT) as well as lower apolipoprotein B, LDL, IGF-1 and mean platelet volume (Fig. 6).The GRS associations for lower FEV 1 /FVC partitioned to signal transduction were specific to respiratory traits, including asthma and emphysema (Fig. 7).Variable height associations were evident: the GRS for lower FEV 1 /FVC showed association with increased height when partitioned to elastic fiber formation or hypertrophic cardiomyopathy (Figs. 4 and 6), reduced height when partitioned to ESC pluripotency (Supplementary Fig. 15g) and no height association when partitioned to PI3K-Akt signaling or signal transduction (Figs. 5 and 7).We hypothesized that individuals may have high GRS for ≥1 pathways and low GRS for other pathways.Comparisons of the GRSs of individuals across pairs of pathways for each of the 29 pathways (Supplementary Fig. 16a) and in detail for the elastic fiber, PI3K-Akt signaling, hypertrophic cardiomyopathy and signal transduction pathways (Supplementary Fig. 16b) demonstrated how GRS profiles may be concordant or discordant across pathways, which could have implications for the choice of therapy.

Discussion
We present a large ancestrally diverse lung-function GWAS and a comprehensive initiative to relate lung-function-and COPD-associated variants to functional annotations, cell types, genes and pathways.It is the first to investigate possible consequences of intervening in relevant pathways through PheWAS studies, utilizing pathway-partitioned GRS.
The 1,020 signals identified were enriched in functionally active regions in alveolar type 1 cells, fibroblasts, myofibroblasts, bronchial epithelial cells, and adult and fetal lung.We showed effect heterogeneity attributable to ancestry for 109 signals (including LTBP4, THSD4, EFEMP1 and MECOM), between ever-smokers and never-smokers (HTR4), and differences in effects between adults and children (including CCDC91 and NPNT).We mapped lung-function signals to 559 putatively causal genes meeting ≥2 independent criteria.Exemplar genes supported by ≥4 criteria or by deleterious or rare putative causal missense variants implicated surfactant-phospholipid metabolism, smooth-muscle function, epithelial morphology and barrier function, innate immunity, calcium signaling, adrenoceptor signaling, and lung development and regeneration.Among the pathways enriched for putative causal genes were PI3K-Akt signaling, integrin pathways, endochondral ossification, calcium signaling, hypertrophic cardiomyopathy and dilated cardiomyopathy.These pathways have not been previously implicated in lung function using GWAS.
Combined as a GRS weighted by FEV 1 /FVC effect size, the 1,020 variants strongly predicted COPD in the UK Biobank and in COPD case-control studies, with a more than fivefold change in risk between the highest and lowest GRS deciles.This GRS more strongly predicted FEV 1 /FVC and COPD across all ancestries than a previous GRS 5 .Partitioning the FEV 1 /FVC GRS by the pathways defined by specific variants, informed by detailed, systematic variant-to-gene mapping and

Article
https://doi.org/10.1038/s41588-023-01314-0pathway analyses, and using our new Deep-PheWAS platform 40 , illustrated unique patterns of phenotype associations for each pathway GRS.These patterns of PheWAS findings are relevant to the potential efficacy and side effects of intervention in these pathways.As a proof-of-concept, the GRS associated with lower FEV 1 /FVC specific to PI3K-Akt signaling was associated with increased risk of COPD but a lower risk of diabetes; PI3K inhibition impairs glucose uptake in muscle and increases hepatic gluconeogenesis, contributing to glucose intolerance and diabetes 41 .The PheWAS and druggability analyses we conducted have the potential to identify drug repurposing opportunities for COPD.
The patterns of pleiotropy we show through PheWAS for individual variants, trait-specific GRS and pathway-partitioned GRS may help explain variants and pathways that increase susceptibility to more than one disease and thereby predispose to particular patterns of multimorbidity.For example, the elastic fiber pathway GRS was associated with increased risk of muscular (for example, hernias) and musculoskeletal conditions related to connective-tissue laxity.Our findings also further inform the complex relationship between height, BMI and obesity, and lung function and their genetic determinants 5,42 .Lung-function and height associations were uncorrelated, and height relationships differed between GRS for different lung-function traits, and even between sitting and standing height for the same trait.The pathway-partitioned GRS analyses indicate that the relationship between genetic variants, height and lung-function traits depends on the pathways through which the variants act.
The last comprehensive attempt to map lung-function-associated variants to genes identified 107 putative causal genes, mostly through eQTLs only, and only eight genes were then implicated by ≥2 criteria 5 .In contrast, we implicated 559 causal genes meeting ≥2 criteria by drawing on new data and methodologies, such as single-cell epigenome data, rare variant associations identified in sequencing data in the UK Biobank and similarity-based approach PoPS 9 .Nevertheless, our study has limitations.We focused on multi-ancestry rather than ancestry-specific signals, as the sample sizes for lung-function genomics studies in all non-EUR ancestry groups were limited, particularly for the AFR ancestries 4 .Non-EUR ancestries are under-represented in genomic studies 3 , constraining GWAS and PheWAS studies in these populations.Correcting this will require substantial global investment in suitably phenotyped and genotyped studies, with appropriate community participation and workforce development.Improved sample sizes across all ancestries would improve power in ancestry-specific studies 42 and fine mapping of multi-ancestry meta-analysis signals.
Strategies for in silico mapping of association signals to causal genes are evolving and difficult to evaluate without a reference set of fully functionally characterized lung-function-associated variants and causal genes.Our variant-to-gene mapping framework parallels one that was recently adopted 10 and could help prioritization of genes for functional experiments such as gene editing in relevant organoids with appropriate readouts to confirm mechanism.An additional limitation is that classifications of pathways may be imperfect; we used multiple pathway classifications as it is unclear which is superior across all component pathways and we present the pathway-partitioned PheWAS results as a resource to others.
In summary, our multi-ancestry study highlights new putative causal variants, genes and pathways, some of which are targeted by existing drug compounds.These findings bring us closer to understanding mechanisms underlying lung function and COPD and will inform functional genomics experiments to confirm mechanisms and consequently guide the development of therapies for impaired lung function and COPD.

GWAS in each cohort
Following cohort-level quality control of the lung-function phenotypes (Supplementary Note), all phenotypes were rank inverse-normal transformed after adjustment for age, sex, height, smoking, ancestry principal components and relatedness (mixed models in BOLT-LMM or SAIGE).Quality control of the imputation and association summary statistics in each cohort was performed by the central analysis team (Supplementary Note).We assigned each cohort to one of the five 1000 Genomes super-populations-EUR, AFR, AMR, EAS or SAS-based on self-reported ancestry, apart from the UK Biobank (57.4% of the total sample size), where we used ADMIXTURE v1.3.0 (ref. 43) to determine ancestry (Supplementary Note and Supplementary Table 4).We also acquired lung-function-association results from each cohort using untransformed phenotypes for analysis using MR-MEGA.

Meta-analysis
Before meta-analysis, association statistics in each cohort were adjusted by the LD-score regression intercept calculated in each cohort to adjust for any residual confounding (Supplementary Table 5); the appropriate ancestry-specific LD reference was used for each cohort (10,000 UK Biobank samples for EUR and 1000 Genomes Project samples for AFR, AMR, SAS and EAS).Before meta-analysis, variants with imputation INFO < 0.5 or minor-allele counts (MAC) < 3 were excluded.As transformed effects were not on comparable scales, we meta-analyzed across cohorts using sample-size weighted Z-score meta-analysis with METAL (released version 28 August 2018) 44 .No genomic control was applied post meta-analysis.Following meta-analysis, variants with MAC < 20 were excluded.

Signal selection and conditional analysis
We chose a genome-wide significance threshold of P < 5 × 10 −9 , as recommended from sequencing studies 13 .We selected 2-Mb regions centered on the most significant variant for all regions containing a variant with P < 5 × 10 −9 .Regions within 500 kb of each other were merged for conditional analysis.Stepwise conditional analysis was run in each region in each cohort using GCTA v1.93.2beta 45 with an ancestry-specific LD reference for each cohort (Supplementary Note), and then the conditional results were meta-analyzed across cohorts and any new conditionally independent signals with P < 5 × 10 −9 were added to our list of signals.We used moloc v0.1.0(ref. 46) to co-localize signals across the four lung-function traits to obtain a set of distinct signals, which were then co-localized with previously reported signals to obtain a set of novel lung-function signals (Supplementary Note).

Exclusion of smoking signals from follow-up
We checked our sentinels for association with the smoking quantitative traits 'age of initiation' (n = 262,990) and 'cigarettes per day' (n = 263,954), and the binary traits 'smoking cessation' (n = 139,453 cases and n = 407,766 controls) and 'smoking initiation' (n = 557,337 cases and n = 674,754 controls) in the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) consortium 47 (proxies with a squared correlation coefficient (r 2 ) > 0.8 were checked for sentinels not present in GSCAN).We excluded eight lung-function signals from further analysis, which we determined to be primarily driven by smoking behavior (Supplementary Table 26), according to the following criteria: (1) P < 4.86 × 10 −5 (Bonferroni-corrected 5% threshold for 1,028 signals) for association with any smoking trait and (2) the same 'risk' allele that increases smoking exposure behavior and decreases lung function.

Heritability estimate
We calculated the proportion of variance explained by the sentinels reported for each trait using the formula where n is the number of variants, f i and β i are the frequency and effect estimates of the ith variant from the UK Biobank European ancestry untransformed results, respectively, and V is the phenotypic variance (always one as our phenotypes were inverse-normal transformed).We assumed a heritability of 40% (refs. 48,49) to estimate the proportion of additive polygenic variance.

Ancestry-adjusted trans-ethnic meta-analysis using MR-MEGA
To improve the fine-mapping resolution using LD differences between ancestries and to estimate the heterogeneity of variant associations attributable to ancestry, we undertook multi-ancestry meta-regression using MR-MEGA v0.2 (ref. 7), which incorporates axes of genetic ancestry as covariates.MR-MEGA uses multidimensional scaling of allele frequencies across cohorts to derive principal axes of genetic variation to use for ancestry adjustment (Supplementary Note).The location of the cohorts on the first two multidimensional scaling-derived principal components, plotted in Supplementary Fig. 17, shows clustering in accordance with the assigned ancestry groups.We used four principal components for ancestry adjustment, as this captured most of the variance.MR-MEGA implements genomic control at study level; therefore, no further genomic control was applied.We ran MR-MEGA at each locus containing ≥1 signals; in the loci with multiple signals, we ran MR-MEGA multiple times, each time conditioning on all except one signal at the locus.For each sentinel, we obtained an estimated ancestry-associated (P-value_ancestry_het) and residual (P-value_residual_het) heterogeneity.In addition, MR-MEGA reports the log-transformed Bayes factor, which can be used for the construction of credible sets.

Effects in children
To obtain unbiased effect estimates for comparison between adults and children, we first redefined 1,077 lead SNPs for lung function in the UK Biobank EUR population (n = 320,656) by selecting 1-Mb regions centered on the most significant variant for regions containing a variant with P < 5 × 10 −8 .For these SNPs, we then took the untransformed effect estimates from the meta-analysis of the non-UK Biobank EUR cohorts (34 cohorts for FEV 1 and FVC, n = 128,071; 33 cohorts for FEV 1 / FVC, n = 123,429; 15 cohorts for PEF, n = 60,122).Next, we meta-analyzed two EUR-ancestry children's cohorts-ALSPAC and Raine Study (age, 13-15 yr, n = 6,070)-to obtain effect estimates in children at the new lead SNPs.To investigate the age-dependent effects of genetic variants on lung function, we compared the effect sizes estimated in adults and children using a Welch's t-test; a Bonferroni significance threshold for 1,077 tests was applied (P < 4.64 × 10 −5 ).

Cell-type and functional specificity
Stratified LD-score regression.We tested for enrichment of regulatory features at variants overlapping four histone marks (H3K27ac, H3K9ac, H3K4me3 and H3K4me1) that are specific to adult lung, fetal lung, and peripheral blood mononuclear primary and smooth-muscle-containing cell lines (colon and stomach) using stratified LD-score regression 12 .We only considered EUR-specific meta-analysis with 39 cohorts for FVC, FEV 1 and FEV 1 /FVC (17 cohorts for PEF).For the analysis of cell-type-specific annotations, we assessed statistical significance at the 0.05 level after Bonferroni correction for 60 hypotheses tested.Given that these annotations are not independent, a Bonferroni correction is conservative.We also report results with FDR < 0.05 using the Benjamini-Hochberg method.
Regulatory and functional enrichment using GARFIELD.We tested enrichment of SNPs at functionally annotated regions (DNase I hypersensitivity hotspots, open chromatin peaks, transcription-factor footprints https://doi.org/10.1038/s41588-023-01314-0and formaldehyde-assisted isolation of regulatory elements, histone modifications, chromatin segmentation states, genic annotations and transcription-factor binding sites) using GARFIELD 17 .We used the EUR meta-analysis with 17 cohorts for PEF and 39 cohorts for FVC, FEV 1 and FEV 1 /FVC.We applied GARFIELD to DNase I hypersensitivity hotspot annotation in 424 cell lines and primary cell types from ENCODE and Roadmap Epigenomics and derived enrichment estimates at trait-genotype association P-value thresholds of P < 5 × 10 −5 and P < 5 × 10 −9 .
Nearby Mendelian respiratory-disease genes.We selected rare Mendelian-disease genes from ORPHANET (https://www.orpha.net/)within ±500 kb of a lung-function sentinel that were associated with respiratory terms matching regular expression-that is, respir, lung, pulm, asthma, COPD, pneum, eosin, immunodef, cili, autoimm, leukopenia, neutropenia and Alagille syndrome.We implicated the gene if it had a corresponding respiratory term match in the disease name or if it occurs frequently in human phenotype ontology terms for that disease (Supplementary Note).
Nearby mouse-knockout orthologs with a respiratory phenotype.We selected human orthologs of mouse-knockout genes with phenotypes in the 'respiratory' category, as listed in the International Mouse Phenotyping consortium (https://www.mousephenotype.org/),within ±500 kb of a lung-function sentinel (Supplementary Note).
PoPS.We calculated a gene-level PoPS 9 based on the assumption that if the associations enriched in genes share functional characteristics with a gene near to a lung-function signal, then that gene is more likely to be causal.The full set of gene features used in the analysis included 57,543 total features-40,546 derived from gene expression data, 8,718 extracted from a protein-protein interaction network and 8,479 based on pathway membership.In this study we prioritized genes for all autosomal lung-function signals within a 500-kb (±250 kb) window of the sentinel and reported the top prioritized genes in the region.For the signals that did not have prioritized genes within the 500-kb window, we looked for prioritized genes using a 1-Mb (±500 kb) window (Supplementary Note).
Annotation-informed credible sets.We used the enriched annotations in respiratory-relevant cell types and tissues and enriched genic annotations (Supplementary Table 12) to create annotation-informed 95% credible sets using fGWAS based on the MR-MEGA ancestry-adjusted meta-regression results (Supplementary Note).We implicated a putative causal missense variant if it accounted for >50% of the posterior probability in the credible set and annotated these using Ensembl Variant Effect Predictor 60 to check for a deleterious effect by the SIFT, PolyPhen or CADD metrics.
Allocation of genes prioritized with ≥3 variant-to-gene to lungfunction biology categories.We allocated prioritized genes with ≥3 criteria to different lung-function roles (epithelial, inflammatory, peripheral lung (including alveolus and endothelial), lung remodeling (including connective tissue), chest-wall movement and lung development) based on literature reviews, including GeneCards (https://www.genecards.org)and PubMed (https://pubmed.ncbi.nlm.nih.gov).Eighteen of the genes were difficult to assign to a specific category on this basis, mainly because they were involved in generic processes such as transcriptional control in a wide variety of cell types; these are not shown in Supplementary Fig. 12 but are included in Supplementary Table 13.

Interaction with smoking
Association testing for lung-function traits (FEV 1 , FVC, FEV 1 /FVC and PEF) was calculated separately in ever-and never-smoker subgroups and meta-analyzed across EUR-ancestry cohorts.We included untransformed phenotypes with ever-and never-smoking summary statistics (n = 28 cohorts) comprising 206,162 ever-smokers and 229,046 never-smokers.A z-test was used to compare genetic effect between the untransformed association results for the ever-and never-smokers: where se is the standard error of the effect β.We considered a significant interaction any signal with a P < 4.9 × 10 −5 (5% Bonferroni-corrected for 1,020 signals tested).

GRS
We selected four ancestry groups in the UK Biobank (UKB) as test datasets (SAS was excluded from GRS analyses because UKB SAS was the only cohort in the multi-ancestry analysis for SAS): UKB EUR, UKB AMR, UKB EAS and UKB AFR.All of the other cohorts except UKB SAS and Qatar Biobank were used as discovery datasets.
We repeated the multi-ancestry meta-regression (MR-MEGA), after excluding the four test GWAS, incorporating the same four axes of genetic variation as covariates to account for ancestry.Autosomal signals for each lung-function trait that were reported in the target ancestry population were included in downstream analysis for each ancestry.For ancestry j (j = EUR, AMR, EAS or AFR), we estimated ancestry-specific predicted allelic effects for the ith SNP to be used as weights in the multi-ancestry GRS by https://doi.org/10.1038/s41588-023-01314-0where x kj is the averaged position of discovery studies with ancestry j on the kth axis of genetic variation from multi-ancestry meta-regression, and α 0i and α ki denote the intercept and effect of the kth axis of genetic variation for the ith SNP from the multi-ancestry meta-regression.
We ran each of the ancestry-specific fixed-effect meta-analyses after excluding the test GWAS from the ancestry group using METAL using the inverse-variance weighting method.For comparison, SNPs used as weights in multi-ancestry GRS were selected to build ancestry-specific GRS for each ancestry.
Testing GRS in independent COPD case-control cohorts.We tested the association of multi-ancestry GRS with COPD susceptibility in five EUR-ancestry COPD case-control studies: COPDGene (non-Hispanic white), ECLIPSE, GenKOLS, NETT/NAS and SPIROMICS (non-Hispanic EUR) (Supplementary Table 21).We also tested the association in two AFR ancestry COPD case-control studies: COPDGene (African American) and SPIROMICS (African American) (Supplementary Table 21).Associations were tested using logistic regression models, adjusted for age, age squared, sex, height and principal components.In each COPD case-control study, we divided individuals into deciles according to their weighted GRS.For each decile, logistic models were fitted to compare the risk of COPD for members of the test decile with those with the lowest decile (that is, those with the lowest genetic risk).The results were meta-analyzed by ancestry-specific study groups using the fixed-effect model.

PheWAS
We used Deep-PheWAS 40 , which addresses both phenotype matrix generation and efficient association testing while incorporating the following developments that are not yet available in current platforms and online resources: (1) clinically curated composite phenotypes for selected health conditions that integrate different data types (including primary and secondary care data) to study phenotypes that are not well captured by current classification trees; (2) integration of quantitative phenotypes from primary care data, such as pathology records and clinical measures; (3) clinically curated phenotype selection for traits that are extremely highly correlated and (4) GRSs.The platform includes 2,421 phenotypes in the UK Biobank, with a subset of 2,243 recommended for association testing-some phenotypes that are generated are used solely in the definition of other phenotypes.We removed the four measures of lung function and added seven phenotypes defined in-house (P4002-6) to give 2,246 as our final maximum number of phenotypes for association.Deep-PheWAS then filters these, requiring a minimum case number; we chose to keep the default settings of a 50-case minimum for binary phenotypes and a 100-case minimum for quantitative phenotypes.After limiting to EUR ancestry and filtering for case numbers, 1,909 phenotypes were left for association analysis (Supplementary Table 27).No additional phenotypes were removed when removing pairs related up to second degree (KING kinship coefficient ≥ 0.0884).
There are five types of phenotypes within Deep-PheWAS categorized according to the data and methods used to create them.Composite phenotypes are made using linked hospital and primary care data, including in some cases primary care prescription data, alongside any of the UK Biobank field-IDs (DFP), including self-reported non-cancer diagnosis and self-reported operations.Phecodes are defined using only linked hospital data (https://phewascatalog.org/phecodes_icd10). Formula phenotypes combine available data using bespoke R code per phenotype rather than the in-built functions of phenotype development available in Deep-PheWAS.Added phenotypes are lists of cases and controls that have been added to the PheWAS and not developed by the Deep-PheWAS phenotype matrix generation pipeline.More complete definitions for all none-added phenotypes can be found in the Deep-PheWAS description 40 .All phenotypes were adjusted for age, sex and the first ten principal components.
Single-variant PheWAS.We ran 28 single-variant PheWAS across 1,909 traits (Supplementary Table 27) in up to 430,402 unrelated EUR individuals in the UK Biobank.We selected the variant with the most significant P value for each of the 20 genes with ≥4 lines of evidence for being causal (Supplementary Table 13).A further seven variants were included in single-variant PheWAS that were putatively causal (accounted for >50% posterior probability in the credible set and had a deleterious annotation; Supplementary Table 14) but in a gene that was implicated by fewer than four lines of evidence.The single-variant PheWAS was aligned to the lung-function-trait decreasing allele.Where we noted associations with testosterone and SHBG, we also undertook sex-stratified PheWAS.
Association with trait-specific GRS.We created four GRSs for the UK Biobank EUR samples, one for each trait FEV 1 , FVC, FEV 1 /FVC and PEF, including all conditionally independent sentinel variants for the trait that were associated with P < 5 × 10 −9 , yielding 425, 372, 442 and 194 variants in each trait-specific GRS, respectively.Each of the four GRS were weighted by the effect sizes from the multi-ancestry meta-regression for the relevant trait and then checked for association with 1,909 traits in the PheWAS.
Association with pathway-specific GRS.We selected 29 pathways that were enriched at FDR < 10 −5 for our 559 genes implicated by ≥2 lines of evidence (Supplementary Table 18).We created a weighted GRS (weights estimated from multi-ancestry meta-regression for FEV 1 /FVC) for each of the 29 pathways by including for each gene in the pathway (as for 'Single-variant PheWAS') the variant with the most significant P value for the trait that implicates the gene in our variant-to-gene mapping (Supplementary Table 13).Each of the 29 GRSs were then checked for association with 1,909 traits in the PheWAS.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Fig. 1 |
Fig. 1 | Study overview.a, Discovery meta-analysis.*For signals present in more than one trait, the signal is only counted once (for the most significant trait).b, Pathway analyses, GRS analyses and PheWAS studies.

)FGFR1 ( 4 )Fig. 2 |
Fig. 2 | 135 genes prioritized with ≥3 variant-to-gene criteria.The number of variant-to-gene criteria implicating the gene is in brackets after the gene name.The gray in the first eight columns indicates that at least one variant implicates the gene as causal via the evidence for that column.The last four columns indicate the level of association of the most significant variant implicating the for COPD per s.d.change GRS Sen's slope estimator and Mann-Kendall test