Introduction

Despite several measures taken against tobacco smoking and consumption, lung cancer remains one of the leading causes of cancer-related mortalities worldwide, with a low 5-year survival rate1. Epidemiological data suggest that the global lung cancer burden has risen to 2.1 million new cases of all cancer cases and 1.8 million deaths, close to 1 in 5 cancer deaths2. A recent investigation has shown the increase in lung cancer incidence in the Indian subcontinent and East Asia3. Lung cancer incidences vary widely across geographical regions due to the admixture of different populations4. In India, lung cancer constitutes 5.9% of all new cancer cases and 8.1% of all cancer-related mortalities in both sexes2. The northeastern state of Mizoram accounts for the highest reported cases of lung cancer in both sexes4. Earlier reports stated that approximately one million of the total five million lung cancer deaths worldwide are contributed by India5, and the death toll is projected to rise to 1.5 million by 20205,6. Smoking tobacco is considered the most significant factor in lung carcinogenesis1,7. Apart from tobacco smoking, betel quid chewing8,9, diet10,11,12, biofuel exposure10,11,12,13,14,15, asbestos exposure10,11,16 and other environmental pollutants10,11,17,18 contribute to lung carcinogenesis. Earlier studies have revealed a rise in lung cancer incidence among never smokers19, particularly in women of East Asian origin20,21.

Genome-wide association studies (GWAS) in the Chinese population have identified 16 susceptibility loci (p ≤ 5.00 × 10−8) associated with lung cancer risk22,23, and 4 loci out of them showed evidence of association with lung cancer risk in smokers22. Similarly, another GWAS on subjects of European ancestry with 29,266 lung cancer patients and 56,450 controls identified 18 susceptibility loci (p ≤ 5.00 × 10−8), including 10 novel loci21. Interestingly, the association of the 10 novel loci varied across different histological subtypes. Out of the 10 loci, four were associated with overall lung cancer risk, while the remaining 6 loci were associated with lung adenocarcinoma21. Most of the GWAS was done on European or Chinese descent subjects, and the majority of the identified risk alleles have not been evaluated in the population of the Indian subcontinent despite several candidate gene association studies24,25,26,27,28,29,30,31,32,33. Contradictory outcomes of case–control association studies of the same polymorphism by different authors failed to identify the genes' overall effect and the genetic variations on lung cancer susceptibility in the region. The differences in genetic association across the geographical regions of the Indian subcontinent, comprised of distinct population groups, might be attributed to gene–gene and gene-environment interactions, which could act as potential modulators of lung cancer risk34. Contradictory study results can be due to small sample sizes, heterogeneity between study samples and racial/ethnic differences of the source populations35 within the Indian subcontinent.

Further, the differences in socio-economic and cultural practices in different parts of the Indian subcontinent might contribute to diverse lifestyle habits like smoking, chewing of tobacco and betel quids, alcohol consumption, and exposure to air pollutants; exposure to asbestos and other occupational hazards that in turn could modify the risk of the disease. This brings forth the importance of meta-analysis, a robust statistical method36, to assess the variant(s) pooled effect on lung cancer susceptibility in the concerned population by pooling the individual study data.

The present investigation aimed to estimate the pooled association measure of reported candidate genetic variants for the Indian subcontinent through a workflow (Supplementary Information, Fig. S1) as in our previous study37. Some results of the meta-analysis were further investigated in an independent case–control sample. Significantly associated variants were compared to other populations and ethnicities worldwide (Supplementary Information, Fig. S2).

Methods

The scheme of analysis followed in this study is explained and summarised in (Supplementary Information, Fig. S2).

Identification and eligibility of studies

The current study followed the PRISMA guidelines38. Systematic mining of the databases, such as PubMed, Scopus, and Web of Science, was done to select appropriate studies using the following keywords: (SNP/SNPs/polymorphisms/single nucleotide polymorphisms/SNVs/SNV/Mutation/ Variants/Genotypes/Alleles); (Lung cancer/Lung Carcinoma/Lung malignancy/Lung neoplasm); (India/Pakistan/Nepal/Bangladesh /Bhutan/Sri-Lanka/Maldives/Afghanistan). The eligibility of all the identified case–control studies on lung cancer was curated and selected manually by two authors and rechecked by the other authors. Hardy–Weinberg Equilibrium (HWE) was assessed in the controls by goodness-of-fit chi-square test (p < 0.05) for all the variants using the R package ‘genetics’39. We could not assess HWE for the deletion polymorphisms of GSTT1 and GSTM1, which was presented in the selected studies without the heterozygous genotype counts.

Inclusion and exclusion criteria

The selection of the studies for meta-analysis was made following the specific inclusion criteria: (a) samples should be from populations belonging to the countries of the Indian subcontinent; (b) genotype counts of cases and controls need to be reported for each investigated genomic variant (c) only full research article of original studies were included (d) all association studies published till 31st December 2019 were considered (e) studies should be published in English.

The exclusion criteria were as follows: (a’) duplicated studies using the same population (b’) the studies inconsistent with Hardy–Weinberg Equilibrium (HWE). However, the variants reported in at least three independent studies on different sample sets were considered for this study.

Data extraction

Data extraction from the literature was done following specific inclusion and exclusion criteria. The data collected from the selected studies are (1) first author surname, (2) year of the publication, (3) mean age with standard deviation, (4) sex, (5) smoking status, (6) histological types, (7) genetic polymorphisms and (8) genotype-specific case–control data (9) geographical region of sampling done in the selected studies.

Genotype counts of the lung cancer cases and controls were collected for all 18 variants. The cases' genotype counts were stratified within the histological subtypes of lung cancer for del1/GSTT1, del2/GSTM1, rs4646903/CYP1A1, and rs1048943/CYP1A1 only. The remaining variants lacked the histological subtype-stratified genotype counts for the cases and were not included in the analysis. For the variants del1/GSTT1, del2/GSTM1, rs4646903/CYP1A1, and rs1048943/CYP1A1, we looked for smoking status-stratified genotype counts as described earlier37,40.

Study-level summary estimates and selection of genetic model

Logistic regression of lung cancer status on variant genotype was done using additive, recessive and dominant effect models (using R function ‘glm’) to obtain the study-level unadjusted odds ratio (OR), standard errors (SE) and 95% confidence intervals (95% CI). Adjustment for covariates, such as smoking and sex, could not be done as some of the selected studies did not present sufficient data.

Apart from del1/GSTT1, del2/GSTM1, rs1048943/CYP1A1 and rs4646903/CYP1A1, the remaining 14 polymorphic variants were analysed in 3 different genetic models: i.e. additive, dominant and recessive models as we did not know which model will give better outcomes. The selection of the genetic models for the four variants in our analysis was done based on the models used in the respective studies. Thus, the variants del1/GSTT1 and del2/GSTM1 were analysed in the recessive model, while rs1048943/CYP1A1 and rs4646903/CYP1A1 were only analysed in a dominant model.

Meta-analysis

Meta-analysis was conducted in R software41 package ‘metafor’42 on lung cancer genetic association reports from the Indian subcontinent. Both fixed-effect (FE) meta-analysis (inverse-variance weighting) and random-effects (RE) meta-analysis (DerSimonian Laird method) were used to combine the study-level estimates (using the ‘rma.uni’ function in R package ‘metafor’42). It estimates cumulative odds ratios and 95% confidence intervals (95% CI) to determine the overall evidence of statistical association (p < 0.05) of the reported variants with lung cancer risk. Benjamini–Hochberg method was used to correct multiple testing, assessing significance at a false discovery rate (FDR) less than 10% level (pFDR < 0.1). The selection of variants was made based on the p-values of the FE meta-analysis. However, for variants with significant heterogeneity, the summary estimates from the RE model are more reliable43. Inter-study heterogeneity was evaluated using Cochran’s Q test (pHet < 0.1)44 and heterogeneity index (I2)45,46,47,48.

Effect of histological subtypes and smoking

For histological subtypes and smoking, we performed subgroup stratified meta-analyses. First, we generated study-level summary data using logistic regression (generalised linear model; ‘glm’) of disease status on subgroup-specific genotype counts. We performed the meta-analysis within each subgroup using the methods described earlier. Finally, we performed a fixed-effect meta-regression to test for effect modification (interaction) by smoking status. For this, we used stratum (i.e., smoker/non-smoker group) as a moderator variable using the ‘rma.uni’ function in R package ‘metafor’.

Publication bias

A visual inspection of funnel plots49 for variants reported in more than five different studies along with Egger’s regression test for variants reported in more than ten different studies was done to evaluate the asymmetry (p < 0.05) of the funnel plots for the estimation of publication bias, if any, among the selected studies.

Case–control study on the East Indian population

Lung cancer patients (N = 101) with a recent history of tobacco smoking, including males and females, of all the histological subtypes, were recruited from the Saroj Gupta Cancer Centre and Research Institute (SGCCR&I) and the Department of CHEST, IPGME&R in Kolkata. Individuals who had quit smoking ≥ 15 years from the date of recruitment were excluded. Clinico-radiologically confirmed healthy smokers, aged ≥ 55 years2, following NCCN guidelines50, without any history of cancer but from the same geographical region were recruited as controls (N = 413). All study participants were asked to provide their informed consent for voluntary participation before sample collection, following the concerned institutes' ethical guidelines and the Declaration of Helsinki, 1964. A detailed questionnaire was filled up under medical supervision with the clinical data.

We conducted a case–control association of rs1048943/CYP1A1 with lung cancer among smokers in the East Indian population, including 101 cases and 413controls. The polymorphism was selected from the current meta-analysis on lung cancer as the significant polymorphic variant after FDR correction with significant heterogeneity between the studies.

Genotyping

The PCR–RFLP technique was used for genotyping of rs1048943 (CYP1A1). The primer sequences used for the PCR of the fragment of 204 base pairs harbouring the polymorphism rs1048943 of CYP1A1 were as follows: CYP1A1-F: 5’-CTGTCTCCCTCTGGTTAC AGGAAGC-3’, and CYP1A1-R: 5’-TTCCACCCGTTGCAGCAGG ATAGC-3’. The PCR conditions followed for adequate amplification were as follows: 94 °C/5 min–(94 °C/40 s–61 °C/40 s–2 °C/40 s) × 30 cycles–72 °C/7 min–4 °C hold. Following PCR, a quality check of the amplicons was done in 6% Polyacrylamide gel electrophoresis (PAGE). The BsrDI restriction enzyme digested the PCR amplicons at 65 °C for 2 h. We performed logistic regression of the lung cancer status on the genotype counts using R v3.4.2 software. Covariate-adjusted analysis was done for age, sex, ethnicity, smoking intensity in pack-years, alcohol consumption, tobacco and betel quid chewing, and asbestos exposure. We assessed Hardy–Weinberg Equilibrium (HWE) in the genotyped controls by goodness-of-fit chi-square test (p < 0.05) using the R package ‘genetics’39.

Meta-analysis of significant variants in the global population

We performed a meta-analysis including all the reported populations worldwide for the lung cancer-associated non-synonymous variant after Benjamini–Hochberg FDR correction in the Indian subcontinent that showed heterogeneity, following the same protocol.

Ethics approval

The Ethics Committee of Saroj Gupta Cancer Centre and Research Institute (IEC SGCCRI Ref No- 2017/MS/1; dated: 11.10.2017), IPGME&R (Memo No. Inst/IEC/2015/545; dated: 10.12.2015), Kolkata and the University of Calcutta (Ref No: 0024/16–117/1434; dated: 24.10.2016), Kolkata, India; approved the study with human subjects as per the regulation of the Indian Council of Medical Research (ICMR) following the Declaration of Helsinki, 1964.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent for publication

The human participants have consented to the submission of the case report to the journal.

Results

Study characteristics

Systematic mining of the databases with the search strings mentioned above revealed 1060 hits, screened down to 39 studies following the specific inclusion/exclusion criteria set for the proposed study (Supplementary Information, Fig.S3). These 39 studies included 18 polymorphisms from 11 genes with 7,630 cases and 8,169 controls (Table 1). Covariate specific case–control data, particularly tobacco smoking, mean age, histological status, and geographical region of the subjects, were recorded from all 39 studies selected for meta-analysis (Supplementary Information, Table S1).

Table 1 Details of the selected studies for meta-analysis.

Meta-analysis for the association of polymorphic variants with lung cancer and its subgroups in the Indian Subcontinent

In the FE meta-analysis of the 18 variants in the Indian subcontinent, we found six variants to be associated with overall lung cancer risk at nominal significance (p < 0.05), as shown in (Table 2). It included rs1048943/CYP1A1, rs1695/GSTP1 and rs4646903/CYP1A1 in the dominant model, del1/GSTT1 and del2/GSTM1 in the recessive model and rs17037102/DKK2 in both additive and dominant models. After Benjamini-Höchberg false discovery rate (FDR) correction, two variants (rs1048943 and rs4646903) were significant at 10% FDR in the dominant model only. Out of these six nominal associations, two variants (rs1695 and rs1048943) showed significant heterogeneity (pHet < 0.1) by Cochran’s Q test. Hence, summary estimates of RE meta-analysis are also tabulated for these variants (Table 2). Additional smoking and histology-stratified analyses were carried out for the nominal associations. The detailed results are summarised below:

Table 2 A comprehensive list of meta-analysis results showing the overall association of the variants with lung cancer, with crude odds ratio (OR), 95% Confidence Interval (CI), pFDR, Benjamini-Höchberg False Discovery Rate (FDR) corrected p-value, Heterogeneity indices H2, I2. Both the Genetic model and model used for meta-analysis are also mentioned.

CYP1A1/rs1048943 (dominant model)

Meta-analysis using FE model, we found a significant association of the variant with overall lung cancer risk (AG + GG vs. AA: OR = 2.07, 95% CI = 1.49–2.87, p = 0.00002, pFDR = 0.07, Q2 = 2.17, pHet = 0.06, I2 = 53.92%) (Table 2). After multiple testing adjustment by Benjamini–Hochberg FDR correction, the variant showed significant association with lung cancer (pFDR = 0.07). Significant heterogeneity was observed for the variant rs1048943. The RE estimates also showed nominal association of the variant with lung cancer (AG + GG vs. AA: OR = 2.03, 95% CI = 1.23–3.30, p = 0.004) (Table2 and Fig. 1). No publication bias was observed from the inspection of the funnel plots (Fig. 1).

Figure 1
figure 1

(A) Forest plot depicting the odds ratios (ORs), and 95% CI of the polymorphism, rs1048943/CYP1A1 for its association with overall lung cancer risk in the Indian subcontinent in the random-effects model, (A’) Funnel plot that shows no evidence of publication bias between the studies reporting the polymorphism, rs1048943/CYP1A1. (B) Forest plot depicting the odds ratios (ORs) and 95% CI of the polymorphism, rs4646903/ CYP1A1, is associated with overall lung cancer risk in the Indian subcontinent in a fixed-effect model. (B’) Funnel plot shows no evidence of publication bias between the studies reporting the polymorphism, rs4646903/CYP1A1. The results are obtained in a dominant model of analysis. The forest plots of the significant associations were given (p < 0.05*). The figures were generated in the ‘metafor’ package (http://www.metafor-project.org) of R software (https://cran.r-project.org/).

Interestingly for the variant rs1048943/CYP1A1, most of the signal was driven by a single study31, which could be the reason behind the observed heterogeneity between studies. Meta-analysis performed after excluding the study revealed a nominal association of rs1048943/CYP1A1 [AG + GG vs AA: OR = 1.67, 95% C.I. = 1.18─2.25; p = 0.003] with overall lung cancer without any significant heterogeneity.

Association with lung cancer histological subtypes

We found significant association of rs1048943 with lung Adenocarcinoma (AG + GG vs. AA: OR = 3.38, 95% CI = 1.63–6.25, p = 0.0001, Q2 = 1.99, pHet = 0.16, I2 = 49.65%) (Table 3 and Fig. 2) and Lung Squamous carcinoma (AG + GG vs. AA: OR = 3.83, 95% CI = 2.15–6.82, p = 0.000005, Q2 = 2.72, pHet = 0.07, I2 = 63.17%) in FE model (Table 3). Significant heterogeneity was observed for the variant. Therefore, meta-analysis fitted with the RE model showed a significant association of the variant with lung Squamous carcinoma (AG + GG vs. AA: OR = 3.91, 95% CI = 1.49–10.19, p = 0.005) only (Table 3 and Fig. 2).

Table 3 Results of the histological subtype-stratified meta-analysis of reported lung cancer variants in the Indian subcontinent.
Figure 2
figure 2

(A) Forest plot depicting the odds ratios (ORs), and 95% CI of the polymorphism, del1/GSTT1 for its association with lung adenocarcinoma in a recessive model in the Indian subcontinent, (B) Forest plot depicting the odds ratios (ORs), and 95% CI of the polymorphism, rs4646903/CYP1A1 for its association with squamous cell carcinoma in a dominant model in the Indian subcontinent, (C) Forest plot depicting the odds ratios (ORs), and 95% CI of the polymorphism, rs1048943/CYP1A1 for its association with adenocarcinoma in a dominant model in the Indian subcontinent, and (C’) Forest plot depicting the odds ratios (ORs), and 95% CI of the polymorphism, rs1048943/CYP1A1 for its association with squamous cell carcinoma in a dominant model in the Indian subcontinent. (AC) are the representation of the analysis in the FE-model, while (C’) are in RE-model. The forest plots of the significant associations were given (p < 0.05*). The figures were generated in the ‘metafor’ package (http://www.metafor-project.org) of R software (https://cran.r-project.org/).

Smoking status-stratified subgroup analysis:

Using smoking status-stratified meta-analysis, we found a significant association of rs1048943 with lung cancer in both “Smoker” (AG + GG vs. AA: OR = 2.26, 95% CI = 1.44–3.53, p = 0.0004, Q2 = 1.74, pHet = 0.17, I2 = 42.63%) and “Non-Smoker” (AG + GG vs. AA: OR = 1.75, 95% CI = 1.11–2.76, p = 0.02, Q2 = 1.35, pHet = 0.49, I2 = 26.06%) subgroups (Table 4 and Fig. 3), with a stronger effect in smokers. Further, we did not find any significant effect modification by smoking (Table 5) by meta-regression analysis. Thus, the variant rs1048943 could be responsible for the metabolism of xenobiotics, present in both smokers and non-smokers that might be the reason for such confounding effect.

Table 4 The results of the subgroup analysis stratified by smoking status.
Figure 3
figure 3

Forest plot depicting the odds ratios (ORs), and 95% CI of, rs4646903/CYP1A1 for its association with lung cancer in a dominant model in the Indian subcontinent among (A) Smokers, (A’) Non-Smokers and (A’’) Combine forest plot of rs4646903/CYP1A1 for its association with lung cancer stratified by smoking status in a dominant model in the Indian subcontinent, Forest plot depicting the odds ratios (ORs), and 95% CI of, rs1048943/CYP1A1 for its association with lung cancer in a dominant model in the Indian subcontinent among (B) Smokers, (B’) Non-Smokers, and (B’’) Combine forest plot of rs1048943/CYP1A1 for its association with lung cancer stratified by smoking status in a dominant model in the Indian subcontinent. The forest plots of the significant associations were given (p < 0.05*). The figures were generated in the ‘metafor’ package (http://www.metafor-project.org) of R software (https://cran.r-project.org/).

Table 5 The results of the effect modification of variants on lung cancer by smoking status.

CYP1A1/rs4646903 (dominant model)

We found a significant association of the variant with overall lung cancer risk without any significant heterogeneity (TC + CC vs TT: OR = 1.48, 95% CI = 1.93–1.95, p = 0.005, pFDR = 0.07, Q2 = 1.19, pHet = 0.84, I2 = 15.98% (Table 2 and Fig. 1). The variant also showed significant association with lung cancer after multiple testing adjustment by Benjamini–Hochberg FDR correction (pFDR = 0.07). We did not find any significant publication bias from the inspection of the funnel plots (Fig. 1) and Egger’s Test (p = 0.35).

Association with lung cancer histological subtypes

Our analysis found a significant association of the variant with lung Squamous carcinoma only without heterogeneity (TC + CC vs TT: OR = 1.82, 95% CI = 1.06–3.11, p = 0.03, Q2 = 1.13, pHet = 0.71, I2 = 11.38%) only (Table 3 and Fig. 2).

Smoking status-stratified subgroup analysis

The variant showed significant association with lung cancer in smoker subgroup (TC + CC vs TT: OR = 2.26, 95% CI = 1.44–3.53, p = 0.0004, Q2 = 1.74, pHet = 0.17, I2 = 42.63%) only (Table 4 and Fig. 3). The meta-regression analysis showed significant effect modification by smoking (p = 0.01) (Table 5).

GSTT1/del1 (recessive model)

Meta-analysis fitted with FE model showed a nominal association of the variant with overall lung cancer [Null (−/−) vs Present [(+ /−) + (+ / +)]: OR = 1.36, 95% CI = 1.03–1.79, p = 0.028, pFDR = 0.42, Q2 = 1.45, pHet = 0.49, I2 = 31.1%) without any evidence of heterogeneity (Table 2 and Supplementary Information, Fig. S4). We found no significant publication bias from the inspection of the funnel plots (Supplementary Information, Fig. S4) and Egger’s Test (p = 0.93).

Association with lung cancer histological subtypes

Meta-analysis stratified by histological subtypes showed an association of the variant with lung Adenocarcinoma (Null (−/−) vs Present [(+ /-) + (+ / +)]: OR = 2.14, 95% CI = 1.04–4.41, p = 0.04, Q2 = 1.01, pHet = 0.91, I2 = 0.79%) only (Table 3 and Fig. 2).

Smoking status-stratified subgroup analysis

We found no significant association of the variant with the smoking status-stratified subgroups, i.e. Smokers and Non-Smokers (Table 4 and Fig. 3). However, it showed significant effect modification by smoking in a meta-regression analysis (p = 0.015) (Table 5).

GSTM1/del2 (recessive model)

Using FE meta-analysis, we found a nominal association of the variant with overall lung cancer (Null (-/-) vs Present [(+ /-) + (+ / +)]: OR = 1.38, 95% CI = 1.09–1.75, p = 0.008, pFDR = 0.42, Q2 = 1.88, pHet = 0.29, I2 = 46.9%) without significant heterogeneity (Table 2 and Supplementary Information, Fig. S5). In our analysis, Bhardwaj et al. 2018 seems to be an outlier but did not cause significant heterogeneity as the study's sample size was small. We did not find any significant publication bias from the inspection of the funnel plots (Supplementary Information, Fig. S5) and Egger’s Test (p = 0.93).

Association with lung cancer histological subtypes

We found no association of the variant with any of the histological subtypes included in the study (Table 3).

Smoking status-stratified subgroup analysis

We found no significant association of the variant with any smoking status-stratified subgroups, i.e. Smokers and Non-Smokers (Table 4). The meta-regression analysis showed no significant effect modification by smoking (Table 5).

GSTP1/rs1695 (dominant model)

We found a marginal association of the variant with overall lung cancer risk (GG + AG vs AA: OR = 1.84, 95% CI = 1.07–3.16, p = 0.03, pFDR = 0.15, Q2 = 5.02, pHet = 0.007, I2 = 80.08%), using FE meta-analysis (Table 2). We observed significant heterogeneity and performed RE meta-analysis, which showed a lack of association of rs1695 with lung cancer (Table 2 and Supplementary Information, Fig. S6). Due to an insufficient number of studies, the assessment of publication bias was not reliable. Subgroup analysis was also not done due to the lack of sufficient data.

DKK2/rs17037102 (additive and dominant models)

Similarly, in FE meta-analysis, we found a marginal association of rs17037102 with overall lung cancer risk without any significant heterogeneity, in both the additive (AA vs AG vs GG : OR = 1.81, 95% CI = 1.03–3.15, p = 0.04, pFDR = 0.53, Q2 = 1.00, pHet = 0.99, I2 = 0.00002%) and the dominant (AA vs AG + GG: OR = 1.82, 95% CI = 1.03–3.22, p = 0.04, pFDR = 0.16, Q2 = 1.00, pHet = 0.99, I2 = 0.004%) models (Table 2 and Supplementary Information, Fig. S7). Subgroup analysis and assessment of publication bias was not done due to lack of sufficient data.

Summary of the significant findings

  • GSTT1 (del1)—associated with overall lung cancer risk, Adenocarcinoma, and showed significant effect modification by smoking status.

  • CYP1A1 (rs4646903)—associated with overall lung cancer risk, Squamous cell carcinoma, lung cancer in Smokers, and showed significant effect modification by smoking status.

  • CYP1A1 (rs1048943)—associated with overall lung cancer risk, both Adenocarcinoma and Squamous cell carcinoma, and in smoker and non-smoker subgroups.

  • GSTM1 (del2), GSTP1 (rs1695), and DKK2 (rs17037102)—associated with overall lung cancer risk.

Association of rs1048943/CYP1A1 with lung cancer in a case–control dataset of East Indian population

The study sample's detailed demographic and clinical attributes from East India are summarised (Supplementary Information, Table S2). Out of the 2 variants confirmed to be associated by the meta-analysis after FDR correction, namely rs4646903/CYP1A1 and rs1048943/CYP1A1, the latter showed significant heterogeneity (Q = 1.93, I2 = 48.32, p = 0.092). We hypothesised that this heterogeneity might be explained by looking at covariate-specific and subgroup-stratified analysis. One of the major reasons for heterogeneity in the crude analysis is the uneven distribution of confounder/subgroups across studies. Hence, to understand the source of heterogeneity, we genotyped this polymorphic variant in a case–control dataset among smokers comprising 101 lung cancer cases and 413 healthy controls from a representative East Indian sample population. Several relevant covariates such as age, sex, pack-years of smoking were measured for their effect on lung cancer risk.

Genotyping and quality control

We genotyped rs1048943 in our sample set. The representative image of the RFLP analysis is depicted in (Supplementary Information, Fig. S8). To determine the quality of our genotyping, we assessed HWE in controls and found it to be consistent with HWE (p > 0.05). We found no significant association of the variant with lung cancer among smokers (GG + GA vs AA: OR = 1.33, 95% CI = 0.825–2.16; padj = 0.24) adjusted for age, ethnicity, smoking intensity in pack-years, alcohol consumption, tobacco and betel quid chewing, and asbestos exposure in the dominant model (Supplementary Information, Table S3).

Covariate-stratified subgroup analysis

Using covariate-stratified subgroup analysis, we found no significant association of rs1048943 in any of the covariate subgroups (Supplementary Information, Table S4).

Association with lung cancer histological subtypes

We found a nominal association of rs1048943/CYP1A1 with lung Adenocarcinoma (ADC) among smokers in the dominant (GG + GA vs AA: OR = 1.99; 95% CI = 1.10–3.63; p = 0.024) effect model (Supplementary Information, Table S5). We have performed an age-adjusted analysis restricted in males only, which showed a significant association of rs1048943 (CYP1A1) with lung Adenocarcinoma (OR = 2.97, 95% CI = 1.35–6.69, p = 0.007) among smokers (Supplementary Information, Table S5). Further, adjusting for age, pack-years of smoking and ethnicity, we also found a significant association of rs1048943 with Adenocarcinoma (OR = 2.08, 95% CI = 1.08–4.02, p = 0.03) in smokers. Interestingly, we found a negative effect of age (β = -0.14) on Adenocarcinoma in smokers, which signifies rs1048943 (CYP1A1) to confer risk of developing lung adenocarcinoma in young male smokers. Thus, the results indicate a potential role of rs1048943/CYP1A1 in a specific histological subtype of lung cancer, indicating the disease's genetic heterogeneity and variability.

Meta-analysis of rs1048943 (CYP1A1) in world population

A literature search with the specific keywords revealed a total of 2617 hits for the variant rs1048943 (CYP1A1) published till 31st December 2019, worldwide. Our case–control association was included in the pool of hits, which increased the total number of hits for rs1048943 (CYP1A1) to 2618. Following the specific inclusion/exclusion criteria, 40 studies with 10,458 cases and 10,871 controls were selected for the meta-analysis (Supplementary Information Fig. S9). All the covariate and demographic data for rs1048943 are listed in a tabular form (Supplementary Information, Table S6).

In the FE meta-analysis, we found a nominal association of rs1048943 with overall lung cancer risk (AG + GG vs AA: OR = 1.21, 95% CI = 1.04–1.41, p = 0.01, Q2 = 1.74, pHet = 0.08, I2 = 42.67%) (Supplementary Information, Table S7 and Fig. S10). Since there was evidence for heterogeneity, we did a RE meta-analysis and found no association of rs1048943 (AG + GG vs AA: OR = 1.20, 95% CI = 0.98–1.47, p = 0.07) with overall lung cancer risk (Supplementary Information, Table S7 and Fig. S10). The RE model is known to have lower power, which could potentially explain the lack of significant association of rs1048943/CYP1A1 with overall lung cancer. No significant publication bias was observed from the inspection of the funnel plots (Supplementary Information, Fig. S10) and Egger’s test (p = 0.23). Further, we stratified the crude genotype counts of rs1048943/CYP1A1 by the selected studies' geographical region/country. We found a significant association (p < 0.05) of the variant with lung cancer in the Indian and Australian population (Supplementary Information, Table S8 and Fig. S11).

To study the association of the variant in more homogeneous strata, we performed histology and smoking status–stratified subgroup analysis as given below.

Association with lung cancer histological subtypes:

A significant association of rs1048943 with lung Adenocarcinoma (AG + GG vs AA: OR = 1.35, 95% CI = 1.03–1.77, p = 0.028, Q2 = 1.98, pHet = 0.032, I2 = 49.55%) and Lung Squamous carcinoma (AG + GG vs AA: OR = 1.50, 95% CI = 1.14–1.99, p = 0.004, Q2 = 2.05, pHet = 0.02, I2 = 51.33%) were found in FE model. The RE meta-analysis showed a significant association with only lung Squamous carcinoma; (AG + GG vs AA: OR = 1.53, 95% CI = 1.02–2.30, p = 0.04) only (Supplementary Information, Table S9 and Fig. S12).

Smoking status-stratified subgroup analysis:

A significant association of rs1048943 in “Smoker” (AG + GG vs AA OR = 1.57, 95% CI = 1.16–2.11, p = 0.003, Q2 = 2.33, pHet = 0.22, I2 = 57.14%) subgroup was observed and “Non-Smoker” (AG + GG vs AA: OR = 1.39, 95% CI = 0.99–1.93, p = 0.051, Q2 = 1.41, pHet = 0.42, I2 = 29.54%) subgroups were found (Supplementary Information, Table S10 and Fig. S13) with no effect modification by smoking (p = 0.59) (Supplementary Information, Table S11).

Discussion

Our study presents the first comprehensive meta-analysis of 18 variants of 11 genes across 39 studies from the Indian subcontinent that provides an insight into the combined effect of each variant on overall and covariate-stratified lung cancer risk in the region. The lack of significant publication bias confirms that the results were not overestimated under the influence of any bias in the published articles. Although GWAS data mining revealed no significant association of rs1048943/CYP1A1 with lung cancer, it showed a significant association of the CYP1A1 gene with hypertension and habitual coffee consumption. Therefore, the variant's association with lung cancer could be modified by coffee consumption or smoking tobacco. The variant rs1048943/CYP1A1 was associated with lung cancer risk in East Asians51, which shows the colinearity of this study's findings to the present study as discussed here. The CYP1A1 (Cytochrome P450, family 1, member A1; 15q22-24) gene encodes a bulky phase I endoplasmic xenobiotic metabolism enzyme present in lung tissue. The enzyme catalyses the activation of reactive electrophilic compounds, including benzo[a]pyrenes and PAHs present in tobacco smoke 52. It promotes DNA adducts formation, which imparts a genotoxic effect that could lead to DNA lesions and cause lung cancer. The variant rs1048943A > G of CYP1A1 locus causes a single amino acid substitution (Ile > Val) in the heme-binding region, which increases enzyme activity, enhancing the activation of procarcinogens in tobacco smoke. It influences the metabolism of environmental carcinogens, such as tobacco smoke, that modifies lung cancer susceptibility52.

The superfamily of glutathione-S-transferases (GSTs) comprises multifunctional enzymes that catalyse the conjugation of reduced tripeptide glutathione to various electrophilic and hydrophobic substrates resulting in their detoxification and effective elimination from the cell. Thus, they help to reduce the carcinogenic load accumulated due to smoking from the cells. The null genotype of the deletion polymorphisms of glutathione-S-transferase theta 1 (GSTT1) and glutathione-S-transferase mu 1 (GSTM1) is frequently associated with lung cancer with evidence of effect modification by tobacco smoking. The null genotype is responsible for the lack of the enzyme within the cell, conferring a higher risk of lung cancer. Inconsistent reports on the association of GSTT1 (del1) and GSTM1 (del2) null genotypes led to confusion regarding their correct effect on the disease pathogenesis53,54. Ethnicity/racial differences in the association of GSTT1 null genotype with lung cancer has been reported where the frequency of the null genotype was significantly higher in Asians than in Caucasians55.

The gene Dickkopf-related protein 2 (DKK2) encodes a secretory protein belonging to the Dickkopf family. The protein DKK2 bears two cysteine-rich regions and is involved in embryonic development through the Wnt/β-catenin signalling pathway. DKK2 exhibits a bimodal function as an agonist or antagonist of the Wnt/β-catenin signalling pathway56 depending on the cellular context and the presence of the co-factor kremen2. The activity of DKK2 is modulated by the Wnt co-receptor, LDL-receptor related protein 5 (LRP5) and -6 (LRP6)57. Aberrant expression of DKK2 has been observed in many tumours, including epigenetic silencing of the expression of DKK2 in ovarian carcinoma58, hepatocellular carcinoma59, and renal carcinoma60. RNAi-mediated silencing of DKK2 is frequently observed in tongue squamous cell carcinoma61 and oesophageal adenocarcinoma62. These reports are suggestive of the anti-tumour effect of DKK2. However, the upregulation of DKK2 promotes cell proliferation and invasion through the Wnt signalling pathway in prostate cancer63, Ewing sarcoma64, and colorectal cancer65. Thus, the cellular context-dependent function of DKK2 is very complex, which is evident from the above examples. DKK2 has been found to promote angiogenesis, distinct from VEGF-dependent angiogenesis66, forming closer interconnections of the vessels.

Interestingly, Dkk2-induced blood vessels consistently show higher coverage of endothelial cells (ECs) by pericytes and smooth muscle cells (SMCs), which are involved in vessel maturity and stability. Dkk2-mediated angiogenesis consists of a signalling cascade induced through LRP6-mediated APC/Asef2/Cdc42 activation. DKK2 promotes tumour progression by suppressing cytotoxic immune cell activation in colorectal carcinoma67 and NSCLC68 with APC mutations. In a recent study26, the heterozygous genotype of rs17037102/DKK2 and rs419558/DKK2 confer an increased risk of lung cancer. A combination of all the 3 genotypic variants of DKK2 confers a four-fold increase in lung cancer risk.

The protein encoded by XRCC1 (X-ray repair cross-complementing 1; 19q13.31) performs an efficient repair of single-strand DNA breaks formed by the exposure to ionising radiation and alkylating agents. XRCC1 interacts with DNA ligase III, polymerase beta and poly (ADP-ribose) polymerase to participate in the base excision repair. The protein plays a role in DNA processing during meiosis and DNA recombination in germinal epithelial cells. Moreover, XRCC1 harbours a rare microsatellite polymorphism, which is associated with varying radiosensitivity in cancer69. Polymorphisms of XRCC1, like Arg194Trp (exon 7), Arg280His (exon 10) and Arg399Gln (exon 11), were reported to confer increased risk to lung cancer70,71,72 with inconsistencies across different populations73,74,75,76,77.

The association of variants with different histological subtypes of lung cancer revealed del1/GSTT1 to be associated with lung adenocarcinoma, rs4646903/CYP1A1 with lung squamous cell carcinoma while rs1048943/CYP1A1 with both lung adenocarcinoma and lung squamous cell carcinoma. Thus, stratification of the genotypes based on the histological subtypes of lung cancer adjusted for age, pack-years of smoking and ethnicity has improved risk assessment potential. Identification of subtypes specific genetic risk markers helps to design targeted early detection and prevention strategies. Moreover, identifying histotype-associated genetic markers may define the mechanism underlying the currently unknown origins of morphological variations that could develop personalised treatment modalities for subtype-specific lung cancer cases.

Furthermore, subgroup analysis of 4 variants stratified by smoking status revealed rs1048943 of the CYP1A1 gene to be significantly associated with lung cancer in smokers and non-smokers. However, the meta-regression analysis revealed the absence of any effect modification of rs1048943 on lung cancer by smoking, implying that the polymorphism has no modifier effect on lung cancer. The variant rs4646903 of the CYP1A1 gene show an association with lung cancer in smokers only. Interestingly, significant effect modification of del1 of the GSTT1 gene and rs4646903 of the CYP1A1 gene on lung cancer by smoking was observed by meta-regression analysis, which suggested the importance of the variants in modifying the risk of lung cancer by smoking status.

Based on the meta-regression analysis, there is no significant effect modification for the remaining variants, although it can be surmised that there may be interaction in the biological mechanisms leading to lung cancer. The variant rs1048943 could be involved in the metabolism of xenobiotics present in both smokers and non-smokers, which might be the reason for such confounding effects. Hence, we believe that the variant has biological relevance in lung carcinogenesis, but more extensive analysis of different covariates, including smoking in larger samples, are required to dissect its actual effect on lung carcinogenesis. Subgroup analysis based on covariates, such as age, sex, ethnicity, exposure types and dose, was not done due to lack of sufficient reports on the population of the Indian subcontinent. The Indian subcontinent consists of a highly heterogeneous population with considerable admixture among different ethnicities, which could modify the population's linkage disequilibrium structure78. This could contribute to significant heterogeneity between the studies.

The variant rs1048943A > G of CYP1A1 locus is a non-synonymous polymorphic variant, which imparts an individual effect on lung cancer risk in various populations31,79,80. A case–control analysis followed the meta-analysis in the East Indian sample population among smokers, which revealed no association of rs1048943A > G of CYP1A1 with overall lung cancer risk among smokers. However, the variant rs1048943 showed a significant association with lung adenocarcinoma in smokers adjusted for various covariate factors. Thus, our case–control study reveals rs1048943/CYP1A1 as a histological subtype-specific variant for lung cancer in the East Indian population, potentially targeting personalised therapy and histology-specific drug designing for lung cancer patients. The finding shows colinearity with the outcomes of the current meta-analysis. The studies included in this meta-analysis, reported from the Eastern region of the subcontinent, also shows a lack of association of the risk genotype (GG) of the polymorphic variant rs1048943/CYP1A1 as summarised in (Table 1 and Supplementary Information, Table S1)81,82. Our replication meta-analysis across the world population justifies the role of the variant rs1048943 (CYP1A1) in conferring lung cancer risk among smokers with a higher power. Interestingly, rs1048943 (CYP1A1) shows no effect modification by smoking status on lung cancer risk that is indicative of the association in smokers as a random occurrence by chance, or it might be involved in the metabolism of other xenobiotics in both smokers and non-smokers, leading to this confounding effect of smoking.

In the larger sample size, the variant rs1048943 (CYP1A1) shows an association with squamous cell carcinoma, which is indicative of a population-specific effect of the variant on different histological subtypes of lung cancer. The association of rs1048943 (CYP1A1) across various populations identifies the relevance of the variant in lung cancer risk in a population-specific manner, which could be critical in designing personalised treatment and precision medicine for patients of diverse populations.

Interestingly, out of 11 selected genes for meta-analysis, 5 belong to the xenobiotic metabolism pathway, 3 belong to the DNA repair pathway, and 3 belong to the Wnt/β-catenin pathway regulating various physiological aspects of lung cancer. The xenobiotic metabolism and DNA repair pathways could be the significant ‘modifier’ and ‘driver’ pathways leading to altered gene-environment interaction and lung cancer development. Genes of xenobiotic metabolism pathways are involved in the metabolism and detoxification of tobacco smoke components to reduce intracellular carcinogenic load. Some genes of the xenobiotic metabolism pathway also induce bio-activation of procarcinogens into potent carcinogens that can quickly form DNA adducts and subsequent mutagenesis. The genes belonging to the DNA repair pathway repairs DNA damage induced by tobacco smoking and radiation. Detailed text mining of the available reports following the inclusion criteria revealed the association of xenobiotic metabolism genes (XMG) and DNA repair genes (DRG) to the risk of lung cancer development.

In the current study, we could not perform subgroup and meta-regression analysis for other covariate risk factors for all the variants due to insufficient data in the selected studies. Moreover, we could not adjust for any of the covariates in the meta-analysis due to insufficient data. The subtype-specific polymorphic variant identification obtained in the current meta-analysis would suffice personalised therapy and precision medicine development. Identifying genetic variants for which there is evidence of influence on lung cancer risk through meta-analysis may provide new insights into the fundamental biological pathways involved in developing lung cancer to help future research. Further, identifying lung cancer risk variants may also help assess risk scores for accurate population risk stratification and decision-making, which could be of potential value in targeting primary prevention and lung cancer screening modalities in a population-specific manner.