Introduction

Lung cancer is the most common cancer and the leading cause of cancer-related mortality around the world1. While smoking is the leading cause of lung cancer, genetics plays an important role as less than 20% of smokers develop this deadly disease in their lifetime2 and non-smokers with a family history of cancer have an increased risk of lung cancer3.

Genetic variants influencing lung-cancer risk fall into three categories: rare high-risk variants (prevalence of 1% or less), moderate-risk variants (prevalence of not more than 5%), and common low-risk variants (prevalence of more than 5%). Family-based linkage studies is most appropriate for high risk variants with high penetrance but more costly to conduct as lung cancer is a common disease and multiple occurrences of lung cancer in a family are less common. To date, the most concrete linkage and fine mapping studies reveal a lung-cancer susceptibility locus at 6q23–25 and RGS17 as a possible culprit gene4,5,6.

Based on the “common disease and common variant” hypothesis, genome-wide association studies (GWAS) provide a powerful tool for investigating the genetic association of a complex disease7. Over the past ten years, common genetic variations at 5p15.33 (TERT/CLPTM1L), 6p21.33 (BAT3/MSH5) and 15q25.1 (CHRNA5/CHRNA3/CHRNB4) are identified to modify the lung cancer susceptibility in GWAS8,9,10,11,12,13 and GWAS-based meta-analyses14, 15 (eg, TERT rs2736100, CHRNA3 rs8042374, APOM rs3117582, MSH5 rs3131379, and GTF2H4 rs114596632). However, these only explain less than 10% of the risk contribution to lung cancer16.

Candidate-gene approaches were the mainstay of genetic association studies before the GWAS era. They are relatively cost-effective and easy to perform. Over 1,000 such studies on the lung cancer susceptibility have been published for the past 25 years. However, there are a number of conflicting reports and it is very challenging to find reliable associations from these highly diverse studies. As a method for systematically integrating data from multiple studies to develop a single conclusion with greater statistical power, meta-analysis is a good way to deal with the diverse and fragmented studies. Although some meta-analyses have been performed on lung cancer, most are limited to investigating a single genetic variant, several variants in a gene, or several variants across a pathway. The recent systematic meta-analyses push the limit to all available genetic association studies in a specific disease and help to achieve a comprehensive view to the genetic contributions to the disease. Alzheimer’s disease17, breast cancer18, and colorectal cancer19 are a few good examples using systematic meta-analyses with consensus outcomes.

Establishing robust evidence of genetic predisposition to lung cancer risk has a potential clinical utility for not only population risk stratification but also primary prevention. The main objective of our study was to identify, consolidate, and interpret genetic associations of common variants with lung cancer using a comprehensive research synopsis and systematic meta-analysis. We attempted to systematically evaluate all published candidate-gene association studies in lung cancer following credible guidelines, which were used to guide and standardize these field synopses20,21,22. Additionally, for variants with significant associations by meta-analysis, we applied Venice criteria21 proposed by the Human Genome Epidemiology Network (HuGENet) to assess the epidemiological credibility of cumulative epidemiological evidence of these associations, so as to obtain more reliable results. Moreover, to get a better insight of the differences in genetic variations among populations with different characters, associations stratified by ethnicity, histological types, and smoking status were also examined.

Results

Among the final 1,018 eligible publications for our meta-analysis (Fig. 1), vast majority (n = 926, 91%) were published after 1999, and 684 (67%) of these papers were published over the past decade (2006~2015) (Supplementary Fig. S1 ). A total of 2,910 genetic variants from 754 unique candidate genes or loci were eligible for further analyses. The included studies had a mean of 414 cases (range 13–4257) and 565 controls (range 12–55823). Among the 2,910 variants, 254 were reported in at least three independent datasets, and eight had been reported as the top association variants with lung cancer (P < 5 × 10−8) in published GWAS8, 9, 23, 24. Therefore, our meta-analyses were focused on the remaining 246 genetic variants in 138 genes or loci (Supplementary Table S1 ). More detailed information of the variants was presented in the Supplementary Results.

Figure 1
figure 1

Flowchart of literature search and selection for meta-analyses for candidate-gene association studies of lung cancer.

Main meta-analyses

For the 246 variants, we first conducted 246 main meta-analyses, one for each variant. On average, these analyses had 6,315 subjects (range 397–71120) and were combined from eight studies (range 3–133) (Supplementary Table S1). The allelic model was performed for all but nine because of insufficient available data from the original studies (Supplementary Table S1). Of the 246 main meta-analyses, 56 variants within 45 different genes showed nominally significant genetic associations with lung cancer (p-value < 0.05) (Table 1, Supplementary Table S2). The strength of association between each genetic variant and lung cancer as measured by ORs had the mean of 1.36 (range 1.08–2.55) for putative “risk” variants and 0.78 (range 0.55–0.90) for putative “protective” variants. Of the 56 main meta-analyses with significant results, 24 had little or no heterogeneity, 16 had evidence of potential bias (publication bias, small study effects, or excess significance bias), and 16 were lack of robustness based on the sensitivity analyses. More details of the results were presented in the Supplementary Results.

Table 1 Genetic variants with significant associations with lung cancer risk in main meta-analyses (Continued on next page)

The credibility assessment of the cumulative epidemiological evidence found eight genetic variants (APEX1 rs1760944, AXIN2 rs2240308, CHRNA3 rs6495309, CXCR2 rs1126579, CYP2E1 rs6413432, HYKK rs931794, PON1 rs662, and REV3L rs462779) were strong and ten were moderate (ATM rs189037, CD3EAP rs967591, CYP2A6 rs1801272, HIF1A rs11549467, PDCD5 rs1862214, PROM1 rs2240688, TP53 rs12951053, TP63 rs10937405, WWOX CNV-67048, and XRCC1 rs3213255) (Table 1, Supplementary Table S2).

In the dominant genetic model analyses (Supplementary Table S1), 44 variants showed significant associations with lung cancer risk, of which seven had non-significant association in the main allelic meta-analyses yet, interestingly, two (ATM rs66467 and REV3L rs465646) showed strong and moderate cumulative epidemiological evidence, respectively (Table 2, Supplementary Table S2). Under the recessive model, 39 variants showed statistically significant associations, of which ten were non-significant under an allelic model. However, none of these showed strong cumulative epidemiologic evidence, although five variants (CASC8 rs6983267, CHRNA5 rs142774214, CYP2A6 non*4/*4, IL17A rs2275913, and XPA rs1800975) showed moderate evidence (Table 2).

Table 2 Genetic variants with significant associations with lung cancer risk under a dominant or recessive genetic model.

Subgroup meta-analyses

Ethnicity

Subgroup meta-analyses were conducted in Caucasian and Asian population separately under each of the three genetic models (allelic, dominant, or recessive model) depending on the available data (Supplementary Table S3 ). We found that 19 and 26 variants were significantly associated with lung cancer susceptibility in Caucasian and Asian population, respectively. Five variants (APEX1 rs1130409, CHRNA5 rs16969968, CLPTM1L rs402710, ERCC2 rs13181, and SOD2 rs4880) showed strong and five (CYP1A2 rs762551, CYP1B1 rs1056836, CYP2A6 rs1801272, CYP2E1 rs2031920, and XRCC1 rs1799782) showed moderate evidence in the Caucasian population (Table 3, Supplementary Table S4). For the significant variants in the Asian population, strong and moderate cumulative evidence were observed in seven (APEX1 rs1760944, CLPTM1L rs402710, CYP2E1 rs6413432, MIR146A rs2910164, MIR196A2 rs11614913, REV3L rs462779, and TERT rs2736098) and seven variants (ATM rs189037, CHRNA3 rs6495309, CYP2A6 non*4/*4, GSTT1 present/null, PROM1 rs2240688, REV3L rs465646, and WWOX CNV-67048), respectively (Table 3, Supplementary Table S4). Comparing the significant variants across ethnic groups, we found that 13 variants (AGER rs1800624, ATM rs189037, CYP2A6 non*4/*4, FASLG rs763110, IL10 rs1800872, MAPKAPK2 CNV-30450, MIR196A2 rs11614913, PROM1 rs2240688, REV3L rs462779, REV3L rs465646, VEGFA rs833061, WWOX CNV-67048, and XRCC1 rs25487) were unique to the Asian population, and seven (APEX1 rs1130409, CYP1A2 rs762551, CYP2A6 rs1801272, ELANE rs351107, ELANE rs7254054, HRAS1 a VNTR variation, and MTHFR rs1801131) to Caucasian population. Four variants (CLPTM1L rs402710, CYP1A1 rs4646903, CYP1A1 rs1048943, and GSTM1 present/null) shared between the two groups, including one (CLPTM1L rs402710) showed consistent strong evidence of significant associations in both groups (Supplementary Fig. S2).

Table 3 Genetic variants with significant associations with lung cancer risk in subgroup meta-analyses with strong or moderate cumulative evidence (Continued on next page).

Histological types of lung cancer

Considering the etiologic differences of different subtypes of lung cancer, subgroup meta-analyses were performed for genetic variants with data available for non-small cell lung cancer [NSCLC], small cell lung cancer [SCLC], adenocarcinoma [AD], and squamous cell carcinoma [SCC] under each of the three genetic models (allelic, dominant, or recessive model) (Supplementary Table S5). In the NSCLC subgroup, statistical significant associations were found for 25 variants where eight variants (CHRNA5 rs16969968, CLPTM1L rs402710, CYP2E1 rs6413432, ERCC1 rs11615, FGFR4 rs351855, HYKK rs931794, MIR146A rs2910164, and TERT rs2736098) demonstrated strong cumulative epidemiological evidence (Table 3, Supplementary Table S6). In the SCLC group, five variants showed significant associations but all were moderate or weak cumulative evidence. Three significant variants (CHRNA5 rs16969968, CYP1A1 rs4646903, and GSTM1 present/null) shared between the NSCLC and SCLC group (Supplementary Fig. S3). For the AD group, 15 variants showed significant associations where four of them have strong evidence (CYP2E1 rs6413432, OGG1 rs1052133, TERT rs2736098, and TP53 rs1042522). As for SCC, two out of eight significant variants (CYP1A1 rs4646903 and CYP2E1 rs6413432) showed strong cumulative evidence. Four significant variants (CYP2E1 rs6413432, GSTM1 present/null, SOD2 rs4880, and TERT rs2736098) were shared between the AD and SCC group, including one (CYP2E1 rs6413432) showed consistent strong evidence of significant associations in both groups (Supplementary Fig. S4).

Smoking status

As for subgroup meta-analyses by smoking status, significant associations were found for twenty-two variants and ten variants in the smokers and the non-smokers, respectively. In the smoker population, the significant associations only showed moderate (APEX1 rs1760944, CYP1A1 rs4646903, CYP2A6 non*4/*4, CYP2E1 rs6413432, CYP2E1 rs2031920, GSTP1 rs1138272, and NBN rs1805794) or weak cumulative evidence, mostly due to lack of large-scale evidence and the presence of potential biases (Table 3, Supplementary Table S8). In the non-smokers populations, the significant associations had strong, moderate, or weak evidence for one (ERCC1 rs11615), six (CYP2E1 rs6413432, CYP2E1 rs2031920, ERCC2 rs13181, GSTM1 present/null, TP53 rs1042522, and XRCC1 rs3213245), and three variants, respectively. Comparing the significant variants between two groups, seventeen were unique to the smoking population, five to the non-smoking population, and five shared between the two populations (Supplementary Fig. S5).

Functional annotations

Based on main and subgroup meta-analyses, a total of 22 variants showed significant associations to lung cancer susceptibility with strong cumulative evidence. We further performed genomic annotations for these variants using HaploReg v4.125, which can help to predict the functional variants. Of them, twelve variants are located in exon, two in microRNA (miRNA), and the others in non-coding regions (four intronic, two intergenic, one 5′UTR, and one 3′UTR) (Table 4). Most of these variants are located within enhancer or promoter elements that are active across a wide range of tissue types (including lung cancer or normal lung tissues). Furthermore, majority of these 22 variants have been identified as expression quantitative trait loci (eQTLs) of a number of genes in various tissue types including normal lung tissues. The functional potential of ten non-synonymous SNPs were further predicted using PolyPhen-226. The variant rs351855 may result in a probably damaging effect on FGFR4 function. The other non-synonymous SNPs were predicted to be “benign”.

Table 4 Functional annotation of 22 variants associated with lung cancer risk with strong evidence using HaploReg v4.1 and PolyPhen-2.

Non-significant associations

Non-significant associations for 150 variants within 98 genes were found under any genetic model (allelic, dominant, or recessive model) in both main and subgroup meta-analyses (Supplementary Table S9). Among these 150 variants, credibility of cumulative epidemiological evidence were identified as strong, moderate, or weak for seven (ERCC1 rs16979802, ERCC1 rs2298881, ERCC1 rs735482, POLI rs3730668, PPARG rs1801282, PTGS2 rs20417, and TNF rs1799724), four (ERCC2 rs1799793, TYMS 28-bp tandem repeat, XPC rs2228000, and XRCC3 rs861539), and 139 variants, respectively (Supplementary Table S9).

Discussion

To the best of our knowledge, this systematic meta-analysis is the largest and most comprehensive assessment of currently available literatures on candidate-gene association studies in lung cancer. This study examined associations between genetic variants and lung cancer risk using data from 1,018 candidate-gene association studies including 2,910 genetic variants. The meta-analyses and evidence evaluations allowed us to identify 22 genetic variants in 21 genes with strong evidence of associations with lung cancer risk. For these variants, additional genomic annotation information provided evidence of putative regulatory functions, including regulatory histone modification marks, DNase I hypersensitivity, motif changed, and transcription factor binding in multiple cell types including lung tissue.

Variants in non-coding region associated with lung cancer risk may have their effects through transcription, mRNA stability, protein structure/function, or binding sites of miRNA27. For example, the variant rs1760944 (−656T > G) at the 5′-promoter region of APEX1 28 was shown as a significant variant (T vs. C allele, OR 1.16, 95%CI 1.08–1.25) with strong cumulative evidence. This variant is predicted to influence promoter histone marks in 24 tissues including lung and lung cancer cell lines. Previous in vitro promoter assay has detected that the rs1760944 T allele significantly lowered promoter activity than that of the G allele, which indicated the variant allele (T) may be associated with a low transcriptional activity of the APEX1 in lung cancer cells28. The variant rs6495309 in CHRNA3/B4 intergenic region12 showed strong evidence of association with lung cancer susceptibility in our meta-analysis. This finding was consistent with the results from a previous meta-analysis performed in Chinese population29, and a recent meta-analysis performed on the basis of GWASs of lung cancer15. Additional subgroup analysis of Asians in our study also showed the risk effect for the rs6495309 C allele. This SNP overlaps with promoter histone marks and alters regulatory motif. Functional study also demonstrated that the rs6495309 C allele significantly increased the CHRNA3 expression through altering the ability of CHRNA3 promoter binding to the transcriptional factor Oct-112. A common genetic variation rs1126579 (C > T) located in the 3′UTR of the CXCR2 (IL8RB) was found to be associated with a reduced risk of lung cancer with strong evidence. The HaploReg tool identified that rs1126579 was an eQTL for a number of genes including CXCR2. Previous studies also reported that CXCR2 was down regulated in lung cancer tissue and might play a suppressive role in lung cancer via the p53-dependent senescence30, 31. Functional data indicated that the rs1126579 variant can disrupt the binding site of miR-516a-3p and further increase the expression of CXCR230, which may also explain why rs1126579 showed a protective effect on the risk of lung cancer.

Variants falling within coding regions, especially non-synonymous SNPs, could have some effects on protein structure, function, or expression level, which may explain its association with the susceptibility of disease32. For example, the non-synonymous CHRNA5 rs16969968 (Asp398Asn) causes an amino acid substitution at codon 398 of the CHRNA5 protein. And the aspartic acid (Asp398) is located at the central part of the second intracellular loop in the structure of CHRNA5 protein, and was reported highly conserved across multiple species10. The rs1042522 (Arg72Pro) is a common functional SNP in the exon 4 of TP53, which encodes an important tumor suppressor protein. TP53 gene is often mutated in NSCLC tumors, an early event in development of lung cancer33. Further functional data showed that the 72Pro allele carriers of lung cancer patients may have a low frequency of the TP53 mutations in tumors34. The rs351855 (Gly388Arg) influences the transmembrane domain of the FGFR4 protein35. This SNP resides in a conserved region and causes a possibly damaging effect on protein function of FGFR4 predicted by PolyPhen. Also, rs4800 (Ala16Val) is a non-synonymous SNP in SOD2. This SNP with valine variation can reduce enzyme activity36 and further increase oxidative stress. Rs2736098 is a synonymous SNP (Asn305Asn) in exon 2 of the TERT gene, which is a well known oncogene and encodes the catalytic subunit of the telomerase37. This SNP may have association with telomere length38. Although it does not change protein amino acid, this SNP is located within the gene regulatory elements and may alter transcription factor binding.

In addition, we found two SNPs with strong evidence of associations with lung cancer risk are located in miRNA gene coding regions, rs2910164 (C > G) in the seed of miR-146a-3p encoded by MIR146A and rs11614913 (C > T) in the mature sequence of miR-196a-3p encoded by MIR196A2 39. Both SNPs showed significant miRNA expression differences between their alleles39, 40 and could affect the stability of secondary hairpin structure39. Study also showed that rs2910164 can influence the interaction between miR-146a-3p and its potential target genes, and rs11614913 can increase the affinity of miR-196a-3p for TP53 39.

Our subgroup analyses also provided additional important details of genetic associations in specific groups. The results of subgroup meta-analyses by ethnicity supported the well-known cognition of “racial” differences in genetic effects for complex diseases including lung cancer41 and indicated that some variants (eg, APEX1 rs1130409, CHRNA5 rs16969968, ERCC2 rs13181, SOD2 rs4880, and CYP2E1 rs6413432) with strong evidence may be ethnic-specifically associated with lung cancer risk. Previous studies had demonstrated the existence of different genetic background in different histological subtypes of lung cancer15, 42. When cases were stratified according to histological types, the associations between several variants (eg, CYP2E1 rs6413432, OGG1 rs1052133, TP53 rs1042522, and CYP1A1 rs4646903) and specific subtypes of lung cancer were of strong evidence. A growing number of studies demonstrates interactions between genetic variants and smoking43, 44. Our subgroup analysis also found that some variants showed significant associations with lung cancer risk in smokers but not in non-smokers, for example CYP1A1 rs4646903 and GSTP1 rs1695.

As the purpose of meta-analysis is not only to reveal genetic variants significantly associated with lung cancer risk, but also to identify the variants with non-significant associations. Our study revealed that 150 variants in 98 genes had non-significant associations with lung cancer risk. However, most of these variants had weak cumulative epidemiological evidence due to the presence of insufficient statistical power (119/150) and/or strong between-study heterogeneity (73/150), and only 11(7.3%) variants had strong or moderate cumulative evidence. Our results provided important clues to further assess the main effects of these variants.

Despite a comprehensive and systematic approach was applied to the synopsis of genetic association studies in lung cancer, several limitations should be considered when interpreting our results. First, although available studies were searched widely and eligible studies were selected strictly according to the inclusion and exclusion criteria, it is possible that some studies might have been overlooked. Our studies didn’t include research published in the form of abstracts or in language other than English. However, for most abstracts, we also searched and included relevant studies published with whole text and reported by the same research groups. Publication biases were not identified in most meta-analyses with significant association results. Also, the proportion of studies published in language other than English is small therefore it should not have significant influence on the main results. Second, the percentage of meta-analyses with high heterogeneity (I 2 > 50) was more than 40% for all meta-analyses with a significant result. Although subgroup analyses stratified by ethnicity, histology, and smoking status were performed to address the heterogeneity, other sources of heterogeneity could exist and are difficult to address because of limited available data. Third, although we tried to explore the consistency and difference in genetic associations between some variants and lung cancer risk across different ethnic groups, meta-analyses stratified by ethnicity were performed only for Caucasian and Asian populations. Since very few enrolled original studies were carried out in other descent populations (e.g. African descent), the available data were not sufficient to perform subgroup meta-analyses in other descent populations. Additional association studies are needed to establish in populations of other ethnic descent for these reported variants. Finally, although we conducted systematic evaluations of cumulative epidemiological evidence for variants associated with lung cancer risk, biases cannot be completely excluded in this study.

In summary, our comprehensive research synopsis and meta-analysis identified 22 variants in 21 genes had strong cumulative epidemiological evidence of significant associations with lung cancer risk. While, among variants without significant associations with lung cancer, seven had strong evidence. Our findings provided useful data and important references for the future studies to evaluate the genetic role in the field of lung cancer. The identification of genetic variants with robust association to lung cancer may help us to get more precise estimate of population risk stratification and potential target population for primary prevention.

Methods

Selection criteria and search strategies

All methods were in accordance with the PRISMA statement, the HuGE Review Handbook (version1.0) guiding genetic reviews specifically, and Meta-analysis Of Observational Studies in Epidemiology (MOOSE) guidelines20,21,22, 45.

A study for inclusion had to meet the following four criteria: (1) it evaluated the association between a genetic polymorphism and lung cancer risk using a case-control, cohort, or a cross-sectional design in human; (2) lung cancer cases were diagnosed by pathological and/or histological examination; (3) it was published in a peer-reviewed scientific journal or online in English; (4) it provided sufficient information of genotype and/or allelic distributions for both cases and controls. We excluded studies with a family-based design and loci with genome-wide significant (P < 5 × 10−8) identified by GWAS since they have been replicated by many studies.

To identify all published association studies potentially eligible for inclusion in our meta-analysis, we performed a comprehensive literature search (Fig. 1). Two electronic databases (PubMed and EMBASE) were queried with the terms “lung cancer (as well as synonyms of lung cancer) AND associate*” on or before December 31, 2014. This search yielded 41,457 publications, and then screened respectively for eligibility using the title, abstract, or full-paper, as necessary. For publications between December 31, 2014 and November 1, 2015, we searched databases (PubMed and EMBASE) monthly using the previous search terms and the additional terms of “lung cancer AND [gene/loci names identified in enrolled publications]”. This second search identified 4,453 additional potential publications. Furthermore, we screened for bibliographies in reviews, published meta-analyses, and cited articles from the retrieved publications. Taken together, a total of 1,018 eligible papers were finally selected and their full-text versions were carefully reviewed for further analyses (Fig. 1).

Data management and abstraction

When multiple publications used the same or overlapping data sets, we kept the data with the largest population or most recent ones as recommended by Little et al.46. Forty three publications with redundant information were then excluded. Using standard data extraction forms, we extracted the detailed publication information, study design, characteristics of participants, gene and variant information. Subgroup information (ancestry, smoking status, or histological types) were also separately extracted from each study whenever possible. Ancestry was divided into four general groups (African, Asian, Caucasian, and other/mixed) based on ancestry of at least 80% of the subjects41. If no details of ethnicity were reported, the determination was made based on the general population of the country or region where the study was done41. When a publication reported data from multi-racial groups, data for each population were extracted and analyzed separately if possible.

To avoid the variant nomenclature confusion from different articles, we used the most current gene names and uniform identifiers (“rs” number) of variants in a public single nucleotide polymorphism (SNP) database (dbSNP, http://www.ncbi.nlm.nih.gov/projects/SNP/index.html), to designate the reported variants. For articles with “rs” number, we used as it was; for these without we used bioinformatics tools such as NCBI Blast (http://www.ncbi.nlm.nih.gov/BLAST/) and UCSC In-Silico PCR (http://genome.ucsc.edu/cgi-bin/hgPcr) to find “rs” number for the reported variant; for the remaining without any “rs” number, we used the common nomenclature (eg, MPG Arg59Cys according to amino acid substitution and GSTM1 present/null according to phenotype change) in the original articles.

Statistical analysis

All statistical analyses were performed using Stata software (version 12.0, StataCorp 2011, TX, USA), except where indicated otherwise. All tests were two-sided and considered statistically significant when p value was at 0.05 or lower, unless otherwise stated.

All variants from at least three data sources were selected for meta-analysis18. Association between a variant and lung cancer risk was assessed by study-specific crude odds ratios (ORs) and 95% confidence intervals (CIs) using a DerSimonian and Laird random-effects model47. The initial main meta-analyses assessed the variant effect using an allelic genetic model (minor allele vs. major allele) without stratification. For the variation not in the form of single nucleotide substitution, a conventional comparison from the publications was used to assess the effects (eg, CYP2A6 [*4 vs. non*4], MMP3 rs3025058 [5A vs. 6A], and GSTM1 [null vs. present]). When average minor allele frequency (MAF) were greater than 50%, a rare occasion where major and minor alleles are flipped in different ethnic populations, we designated the minor allele from Caucasian population in all analyses. For the variant with sufficient genotype distribution data, we performed additional analyses based on dominant and recessive genetic models.

Subgroup meta-analyses were also performed by ethnicity (Caucasian and Asian), histological types (SCLC, NSCLC, AD, and SCC), and smoking status (smoking and nonsmoking), if sufficient data were available.

Between-study heterogeneity was assessed by calculating the Cochran Q statistic, with a p value less than 0.10 being the significant threshold48. We also used I 2 heterogeneity metric to assess the heterogeneity49. Generally, I 2 < 25%, 25%-50% and > 50% showed mild, moderate, and strong heterogeneity, respectively.

The publication bias of studies was evaluated by funnel plot analysis (logOR against standard error) and Begg’s test50. Potential small study effect (a trend for smaller study to show larger effect) was checked by the modified Egger’s test, which can lower the type I and type II error rates compared to the original Egger’s test51. We also conducted an excess significance test to examine whether there was a relative excess of formally significant findings in studies due to potential sources of bias, such as selective analyses, selective outcome reporting, or fabricated data52.

For all variants that showed a significant association with lung cancer risk, we performed a sensitivity analysis to examine whether the significant summary ORs were robust after excluding the first published or first positive report, or excluding studies with controls violating Hardy-Weinberg equilibrium [HWE]. We used a Fisher’s exact/chi-square to assess the HWE among controls in each dataset.

Assessment of cumulative evidence

For each nominally significant results from the meta-analyses, Venice criteria was used to assess the credibility of cumulative epidemiological evidence21. Venice criteria is a semi-quantitative index which assigns three aspects for the amount of evidence, extent of replication, and protection from bias, and finally generates a composite assessment of “strong”, “moderate”, or “weak” epidemiological credibility for an association with lung cancer risk21. For the three aspects (the amount of evidence, extent of replication, and protection from bias) of Venice criteria, each aspect was assigned three levels (A, B, or C)21. Briefly, amount of evidence, depending on total sample size of the smallest genetic group among cases and controls in each meta-analysis, was graded as A (sample size >1000), B (sample size between 100 and 1000), or C (sample size <100). For very rare variant with frequency less than 0.5%, the amount of evidence was not assessed considering an A grade was unlikely to obtain18. The extent of replication, depending on between-study heterogeneity, was graded as A (I 2 < 25%), B (I 2 between 25% and 50%), or C (I 2 > 50%). The protection from bias, considering various potential sources of bias in meta-analysis, was graded as A when there was no demonstrable bias and the bias would unlikely invalidate the association, B when there was insufficient information for identifying evidence (eg, missing information for evaluating HWE among controls in an individual study) although there was no obvious bias, and C when the bias was evident and/or was likely to explain the presence of association. More specifically, C grade was assigned if the meta-analysis had any of the following potential sources of bias: (1) the magnitude of the association was low (eg, OR <1.15 for risk effect, OR >0.87 for protective effect) with the exception of a highly consistent OR across studies enrolled in meta-analysis; (2) the sensitivity analysis indicated that the significant summary OR can be substantially changed; (3) the potential small study effect was present according to the modified Egger’s test (p-value < 0.10); (4) an excess of significant findings was possible (excess significance test, p-value < 0.10); (5) there was a potential publication bias (Begg’s test, p-value < 0.10). With the grades from three aspects, the credibility of cumulative epidemiological evidence was categorized as strong (all three aspect grades were A), moderate (any grade was B, but not C), or weak (any grade was C).

Additionally, for the non-significant associations revealed by all meta-analyses, we also evaluated the credibility of cumulative epidemiological evidence based on three aspects: the degree of heterogeneity across studies, potential bias assessment, and statistical power. The statistical power was calculated by using SNP tools53. The credibility of cumulative epidemiological evidence of non-significant association was categorized as strong (if there was no or mild [I 2 < 25%] heterogeneity across studies, no demonstrable bias, and sufficient statistical power [power >90%]), weak (heterogeneity I 2 > 50%, or any potential bias detected, or low statistical power [power <80%]), or moderate (for other cases).

Data Availability

All data generated or analysed during this study are included in this article and its Supplementary Information file.