Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Rare deleterious germline variants and risk of lung cancer

Abstract

Recent studies suggest that rare variants exhibit stronger effect sizes and might play a crucial role in the etiology of lung cancers (LC). Whole exome plus targeted sequencing of germline DNA was performed on 1045 LC cases and 885 controls in the discovery set. To unveil the inherited causal variants, we focused on rare and predicted deleterious variants and small indels enriched in cases or controls. Promising candidates were further validated in a series of 26,803 LCs and 555,107 controls. During discovery, we identified 25 rare deleterious variants associated with LC susceptibility, including 13 reported in ClinVar. Of the five validated candidates, we discovered two pathogenic variants in known LC susceptibility loci, ATM p.V2716A (Odds Ratio [OR] 19.55, 95%CI 5.04–75.6) and MPZL2 p.I24M frameshift deletion (OR 3.88, 95%CI 1.71–8.8); and three in novel LC susceptibility genes, POMC c.*28delT at 3′ UTR (OR 4.33, 95%CI 2.03–9.24), STAU2 p.N364M frameshift deletion (OR 4.48, 95%CI 1.73–11.55), and MLNR p.Q334V frameshift deletion (OR 2.69, 95%CI 1.33–5.43). The potential cancer-promoting role of selected candidate genes and variants was further supported by endogenous DNA damage assays. Our analyses led to the identification of new rare deleterious variants with LC susceptibility. However, in-depth mechanistic studies are still needed to evaluate the pathogenic effects of these specific alleles.

Introduction

Lung cancer (LC), the leading cause of cancer mortality in the US, has recently shown substantial drops in mortality, largely attributed to reduced smoking rates and improvement in new treatments such as immunotherapy1. Prior genome-wide association studies (GWAS) identified novel genetic factors influencing LC risk, which are sometimes modulated by smoking behavior2. Notably, in the 15q25.1 region that shows the most significant and consistent genetic signal, a missense p.D398N and a 22-bp deletion (del) in the core promoter region of CHRNA5 have been identified that affect the function and expression3,4. Carriers of these variants find quitting smoking more difficult than noncarriers5 and may benefit from a targeted smoking cessation intervention6.

Previous studies have estimated heritability of LC to be 18%7. Recent genetic studies suggest that rare variants (minor allele frequency [MAF] < 1%) that are functionally deleterious, exhibit far larger effect sizes than common variants8,9,10 as they display signs of stronger selective pressure11,12, and could account for missing heritability unexplained by common variants11. Fewer than 3% of protein-coding single nucleotide variants (SNVs) corresponding to approximately 300 genes per genome are predicted to result in loss of protein function (LoF) through the introduction of stop-gain, frameshift, or the disruption of an essential splice site13. Insertions (ins) or deletions (indels) have been understudied, though they are the second most abundant source of human genetic variation. Selected indels have been identified as playing a key role in causing LC, such as p.E746_A750del in EGFR14,15,16.

Supporting the hypothesis that deleterious mutations will show lower MAF are recent identifications of several rare missense variants that have a moderate-to-large effect on LC risk, for example, PARK2 p.R275W (OR 5.24)17, BRCA2 p.K3326X (OR 2.47), CHEK2 p.I157T (OR 0.38)18, LTB p.L87F (OR 7.52), P3H2 p.Q185H (OR 5.39)19, DBH p.V26M20, and ATM p.L2307F (OR 8.82)21. Because of the stronger evolutionary pressure and weak linkage disequilibrium (LD) with common SNPs used in GWAS, finding these rare variants through population-based studies can be challenging22. To maximize the potential for the detection of large-effect, rare deleterious variants (SNVs and small indels ≤21 bp), we employed whole exome sequencing (WES) plus targeted sequencing on healthy controls and selected high-risk LC cases enriched with the highest genetic risk of LC, for example, early-onset or family history of LC (FHLC)7,23,24.

Results

Demographics of study subjects

As shown in Table 1, the vast majority of subjects in the discovery study ─ Transdisciplinary Research in Cancer of the Lung (TRICL; 1,045 LCs vs. 885 controls) ─ and the validation sets (26,803 LCs and 555,107 controls) were primarily of European-descent (Supplementary Fig. 1). LC cases were significantly more likely to be smokers and with higher pack-years than controls (P-value < 0.0001). The TRICL and Genetic Epidemiology of LC Consortium (GELCC) cases were enriched for having FHLC.

Table 1 Basic characteristics of LC cases and controls in the discovery and validations sets.

Identification of rare and deleterious variants in the TRICL discovery set

In the discovery set, a total of 2,182,753 variants were detected. Applying a three-step filtering method based on allele frequency (MAF < 1% in non-Finnish European [NFE] population from the Genome Aggregation Database [gnomAD]), variant class (missense, protein-truncating and regulatory), and functional effects (predicted deleterious and or with clinical significance from ClinVar), we identified 67,470 rare and putatively deleterious variants: 63% missense, 16% frameshift (fs), 12% in-frame indels, 6% regulatory (untranslated region [UTR] and splice acceptor/donor), and 3% stop-gain. Single variant association analysis identified 75 potential candidates.

Given the known challenge of excessive false-positive indel detection rates caused by the high frequency of homopolymer-associated sequencing errors25,26,27,28, we subjected these 75 potential candidates to additional filtering and manual inspection using Genome Browser (Supplementary Table 1). Twenty-five of the 75 were high-confidence putative candidates (two SNVs, four ins, and 19 del). Supplementary Fig. 2 shows the variant visualization map for the candidates and variant carriers (read alignment and depth). Thirteen out of the 25 candidates (in 24 genes) reported clinical significance in ClinVar, and eight were classified as pathogenic. Also, 5/24 genes were mapped to known LC-GWAS loci, such as 3q28 TP6329, 5q31.1 TXNDC1530, 11q22.3 ATM21, 11q23.3 MPZL231, and 22q12.1 CHEK218. Three mapped in known GWAS loci for COPD/ PF (pulmonary function): 1p34.3 BMP8A32,33, 1p36.31 PHF1332, and 14q23.1 TALPID3/KIAA058634.

We next assessed the dose-effect of the 25 candidates: 16 were enriched in LCs (risk-conferring alleles) and 9 were enriched in controls (protective alleles). Compared with subjects with zero risk- and protective-alleles, the groups carrying one, and two risk-alleles (5 LCs) showed a progressively increased risk, whereas groups carry one, and two protective-alleles (6 controls) demonstrated a gradually reduced risk (Supplementary Table 2). All 6 controls harbored MOB3A p.F69_I75del, whereas 4/5 LCs harbored STAU2 p.N364M fs*67del.

Studying the demographics of the mutation carriers, there was no significant difference in smoking (status and pack-years) or FHLC between carriers and non-carriers. Notably, 5/6 two-protective-alleles carriers were male, whereas 4/5 two-risk-alleles carriers were female and had adenocarcinoma (AD). Overall, age did not differ significantly between carriers and non-carriers (Supplementary Fig. 3). However, in LC cases, onset-age in risk-allele carriers (54 yrs for two-risk-alleles carriers, 62 yrs for one-risk-allele carriers) were significantly younger than the protective-allele carriers (69 yrs; Supplementary Table 2).

Further gene-environment (G×E) interaction analysis showed that two variants interacted with smoking behavior (Supplementary Table 1). Specifically, the risk MLNR p.Q334V fs*3del interacted with pack-years (P-value 0.0035); the protective-effect associated with the MOB3A p.F69_I75del is substantial and significant among males (10/11 control carriers were male, whereas 0/2 LCs carriers were male; P-value 0.042), smokers (6/11 control carriers were smokers, whereas 0/2 LCs carriers were smokers; P-value 0.016), and pack-years (P-value 0.0036). We also identified that the protective variant TXNDC15 p.E9G fs*68del interacted with FHLC, as 5/7 of LC carriers with FHLC, compared to 0/21 controls (P-value 0.035).

We subsequently conducted gene-based rare variant burden tests for the 24 genes harboring potential candidates, five genes, namely, MLNR, CCDC105, BMP8A, MME, and NPHP3, showed suggestive associations (Table 2). We also performed exome wide gene-based tests, however, none showed strong association after multiple testing corrections (Supplementary Fig. 4).

Table 2 Gene-based association tests in the TRICL study, ranked by P-value from the combined multivariate and collapsing test.

Meta-analyses of the discovery and validation sets

In the seven validation datasets, of the 25 candidates, 100% were covered by the gnomAD, 22 (88%) in TCGA, 16 (64%) in COPDGene, nine (36%) in GELCC, and nine (36%) were covered in one of the three case–control studies (OncoArray, Affymetrix, and UKB) with genotyping data. Table 3 summarizes the top five candidates with consistent associations from the meta-analysis.

Table 3 Top five hits from discovery and validation association analysis.

The topmost risk-conferring variant is a missense SNV, p.V2716A, in the phosphatidylinositol 3-kinase (PI3K) catalytic domain of ATM (Ataxia telangiectasia mutated; OMIM 607585, UniProt Q13315). This pathogenic variant (rs587782652) is exceedingly rare in the gnomAD, with MAF 0.0021% and 0.0054% in non-cancer controls and NFE population, respectively. In our combined datasets, this variant presented in 0.05% of LCs and 0.003% controls, with remarkably high effect sizes (OR 19.55, 95%CI 5.04–75.6; P-value 1.7e-05). LC carriers of this variant were predominately enriched in smokers (8/9 carriers), AD (7/9 carriers), and early-onset (6/9 carriers; mean 55 yrs). Further, four additional rare deleterious variants were observed in ATM (Fig. 1 and Supplementary Table 3). No LD is present among these variants and the candidate p.V2716A (Supplementary Table 4).

Fig. 1: Gene exons, protein domains, and rare deleterious variants of the candidate genes.
figure 1

The top five candidate variants (red arrows): 1) POMC c.*28 deletion (del) located at target sites of several miRNAs in 3′ UTR; 2) STAU2 p.N364M fs*67del located in the double-stranded RNA-binding motif (dsrm), and next to a phosphorylation site p.S363; 3) ATM V2716A located in the PI3-kinase (PI3K) catalytic domain; 4) MPZL2 p.I24M fs*22del was close to the antibody variable domain of immunoglobulins (Ig-V); 5) MLNR p.Q334V fs*3del located in the transmembrane receptor domain (TM), and close to a phosphorylation site p.S327. The color vertical bars represent different types of variants: ClinVar pathogenic variants (bold blue: POMC W84* stop-gain, ATM Q414* stop-gain, and MPZL2 M1T* start-loss), previous reported LC-associated variants (blue: ATM P1054R and L2307F, and MPZL2 deletion rs13915729), and ClinVar variants of uncertain significance (black). Gene exons (green blocks), introns (horizontal green lines), untranslated regions (UTRs, orange blocks), and protein domain/motif (framed rectangles) are shown. The length of the gene (kb) and protein (number of amino acids, AA) are shown to the right.

The second risk variant is c.*28delT in the 3′ UTR of POMC (Pro-opiomelanocortin; OMIM 176830, UniProt # P01189). The MAF of this 2 bp del (rs756770132) were 0.086%/0.17% in gnomAD non-cancer/NFE controls; while in our dataset presented in 0.66% of LCs and 0.15% of controls, conferring a 4-fold risk for carriers (OR 4.33, 95%CI 2.03–9.24; P-value 0.00015). Although reported as VUS in ClinVar, this 3′ UTR del is located in a critical site computationally predicted to be targets of several miRNAs by the TargetScan35, including hsa-miR-149-3p and hsa-mir-625-5p. We also observed four additional rare deleterious variants in the TRICL set (Fig. 1 and Supplementary Table 3).

The third novel risk variant is p.N364M fs*67del in STAU2 (Staufen homolog 2; OMIM 605920, UniProt Q9NUL3). This del (rs746501298) is very rare in gnomAD (MAF 0.011%/0.0027% in non-cancer/NFE population controls), but presented in 1.02% of LCs and 0.02% of non-cancer controls (OR 4.48, 95%CI 1.73–11.55; P-value 0.0019). It was predicted to disrupt the double-stranded RNA-binding motif (DSRM; Fig. 1) which plays a critical role in RNA editing. This del is also reported in the Catalogue of Somatic Mutations In Cancer (COSMIC, # COSM253104).

The fourth and fifth variants are two pathogenic, truncating deletions ─ p.I24M fs*22del (rs752672077) in MPZL2 (Myelin protein zero-like protein 2, or Epithelial v-like antigen 1 [EVA1]; OMIM 604873, UniProt O60487), and p.Q334V fs*3del (rs563947699) in MLNR (Motilin receptor; OMIM 602885, UniProt O43193) ─ with effects sizes of 3.88 (95% CI 1.71–8.8) and 2.69 (95% CI 1.33–5.43), respectively. The MPZL2 deletion was close to the Immunoglobulin-like antibody Variable domain (Ig-V; Fig. 1) which is involved in thymocyte development36. In gnomAD, MAF was the highest in the Ashkenazi Jewish (AJ, 0.38%) than other populations, including NFE (0.123%), Latino (0.028%), and African (0.012%). Additionally, a start-loss p.M1T of MPZL2 was present in two LCs (Fig. 1 and Supplementary Table 3).

Other interesting candidates from the discovery (Supplementary Table 1), include 1) two VUS ins, TP63 c.*2550insT (rs772929136) and CHEK2 c.*2insC (rs749257861), both were located in the 3′ UTR; however, no genotype data/coverage were available in validation sets; 2) a protective effect pathogenic variant, CHEK2 p.S428F (rs137853011), that was non-significant in the meta-analysis (OR 0.41, 95% CI 0.13–1.31, P-value 0.13).

Candidate gene prioritization

As shown in Table 2, of the 24 candidate genes, the most evolutionarily constrained (intolerance) genes with the lowest LoF observed/expected (o/e) values were PHF13, TP63, and STAU2; whereas the genes with the highest LC-correlated PhoRank scores were CHEK2, ATM, TP63, and MME. The most interesting protein interaction network consists of eight genes and is centered on three known DNA damage response genes, CHEK2-ATM-TP63, linking five other genes (Supplementary Fig. 5). GO enrichment analysis highlighted genes involved in replicative senescence (which triggers a DNA damage response); whereas KEGG pathway analysis revealed that genes were involved in small cell LC (Supplementary Table 5).

Endogenous DNA damage assay

Large conserved networks of E. coli and human proteins were recently discovered to promote endogenous DNA damage when overproduced37. These networks are known as DNA damageome proteins (DDPs)37. The DNA damageome also includes LoF variants that show DNA damage-up phenotypes38, most of which are not directly related to DNA repair but rather participate in the DNA damage production. We selected six prioritized genes for the assay: CHEK2, ATM, MPZL2, MLNR, POMC, and MME. We discovered the knockdown of five genes, overproduction of the mutant MLNR p.Q334V fs*3del and wildtype POMC promote DNA damage. Specifically, we first used pooled small interfering RNAs (siRNAs) that minimize off-target effects, and observed significantly increased DNA damage levels (γH2AX) for 5/6 genes (Fig. 2a–c), including two well-known DNA repair genes (CHEK2 and ATM) and three newly discovered DDPs (POMC, MLNR, and MME). By contrast, the knockdown of MPZL2 did not affect DNA damage. For the three newly discovered DDPs, we further validated their DNA damage phenotypes using different individual siRNAs (Fig. 2d–f). Moreover, overproducing the mutant MLNR p.Q334V fs*3del and the wildtype POMC open reading frame (ORF) from the plasmid promote DNA damage in the lung fibroblast-derived cell line (Fig. 2g–i).

Fig. 2: Discovery of DNA damageome genes/proteins and variants.
figure 2

a siRNA knockdown endogenous DNA damage assay scheme. b Increased DNA damage (γH2AX) levels in five out of the six genes knockdowns (mean ± SEM, n = 2~4), MLNR, CHEK2, POMC, ATM, and MME, compared with non-targeting (NT) siRNA control. There is no increasing DNA damage in MPZL2 knockdown cells. c Representative flow histograms showing higher γH2AX levels in gene knockdowns. df MLNR, MME, and POMC knockdown by two individual siRNAs confirmed the DNA damage-up phenotypes by pooled siRNAs in b. DNA damage quantified by d median fluorescence intensity or e DNA-damage positive subpopulation. f Examples of flow cytometry dot plots showing DNA-damage positive subpopulation. g Overproduction endogenous DNA damage assay scheme. h Wildtype POMC and mutant MLNR p.Q334V fs*3del overproduction promote DNA damage. GFP-Tubulin as a control. i Representative histograms of (g). *P-value < 0.05, **P-value < 0.01, n.s not significant (P-value > 0.05).

Discussion

Our analyses led to the identification of 25 rare deleterious candidates (in 24 genes) that may be associated with LC susceptibility. Of the five validated variants, we rediscovered two pathogenic variants mapped to known LC susceptibility loci, ATM p.V2716A and MPZL2 p.I24M fs*22del; and identified three deletions in novel LC susceptibility genes, POMC 3′ UTR c.*28delT, STAU2 p.N364M fs*67del, and MLNR p.Q334V fs*3del. Our GxE analysis also suggests some of these associations may be further modified by smoking (MLNR p.Q334V fs*3del and MOB3A p.F69_I75del) and FHLC (TXNDC15 p.E9G fs*68del). Additionally, our assays of cellular DNA damage identified POMC and MLNR as part of the DNA damageome, and confirmed a double-strand break repair role of ATM.

This study confirms a robust association between LC susceptibility and ATM and discovered a new pathogenic p.V2716A, that reside in the PI3K catalytic domain. We also found this association is more evident in AD, which is consistent with several previous studies21,39,40. ATM is a critical first responder to DNA damage in the cell and essential for genome stability. Several association studies have indicated that common variants of ATM are linked to cancer susceptibility, including LC41,42,43. Expression of the PI3K domain in ataxia-telangiectasia cells resulted in complemented radiosensitivity and reduced chromosomal breakage after irradiation44,45,46, suggesting the PI3K domain contains many of the significant activity of ATM47. Our DNA damage assay also shows elevated DNA damage in lung fibroblasts confirming the previous finding that ATM defective cells accumulate more double-strand breaks48. Further, the presence of additional rare deleterious variants, together with previously identified p.P1054R31 and p.L2307F21, strongly suggests that the ATM gene plays a role in LC susceptibility.

Another known LC locus we rediscovered is MPZL2 (also called Epithelial v-like antigen 1, EVA), and the pathogenic frameshift p.I24M fs*22del. MPZL2 is located at 11q23.3, a known GWAS locus for LC31,49 and hearing loss50,51. MPZL2 is one of the top candidate target genes at this locus based on the expression quantitative trait loci (eQTLs) mapping31. MPZL2 is a member of the immunoglobulin superfamily, preferentially expressed in lung and thymus epithelium with a potential role as a favorable prognostic marker in thyroid cancer52. Interestingly, the MAF of p.I24M fs*22del in the AJ population was 5-fold higher than the general population in gnomAD. There are several examples where rare causal variants (e.g., variants in the P53, CFTR, and BRCA1/2) have higher frequencies within the AJ population53,54,55,56. In our DNA damage assay, MPZL2 expression levels do not affect endogenous DNA damage in lung fibroblasts, implying the need to investigate alternative mechanisms in future functional studies.

The most consistent and interesting findings are two new deletions: POMC 3′ UTR c.*28delT and MLNR p.Q334V fs*3del. POMC encodes a polypeptide hormone precursor that regulating energy metabolism, nicotinic-induced weight loss, and immune reactions57,58,59. In particular, POMC plays a role in UV-induced DNA damage through interactions with TP53 and is associated with skin cancer susceptibility60,61,62,63,64. Abnormal expression of POMC was a poor prognostic marker for LC65,66,67,68. Using in vitro models, Derghal et al. evaluated putative miRNA (i.e., miR-383, miR-384-3p, and miR-488) and found them physically bind to the 3′ UTR mRNA and regulate POMC expression in several neuronal subtypes69. Our DNA damage assay showed both downregulation and overproduction of wildtype POMC promotes endogenous DNA damage. Whether and how the c.*28delT affects POMC expression and their putative role to LC risk merit further mechanistic investigation. MLNR is a member of the G-protein coupled receptor 1 family, and known for regulating gastrointestinal activity70. MLNR variants and dysregulation have been implicated in lung occult small cell carcinoma, bile duct cancer71, and head and neck cancer72. Our overproduction results of the MLNR p.Q334V fs*3del suggest a dominant-negative role in terms of DNA damage promotion. Collectively, these findings suggesting POMC and MLNR, while both functions in multiple cellular processes, might also share their various effects on DNA damage.

Although the pathogenic variant, CHEK2 p.S428F with lower LC risk is not statistically significant in the meta-analysis, its protective effect is consistent with another known pathogenic low-frequency variant, CHEK2 p.I157T, associated with reduced risk of smoking-related cancers (lung, laryngeal, urinary, and upper aerodigestive tract)18,73,74,75. In contrast, both p.I157T and p.S428F showed an increased risk of breast cancer75,76,77,78,79. The mechanism underlying this effect is an ongoing question with unknown impact, perhaps related to smoking exposure and cell cycle checkpoint signaling/apoptosis75. STAU2 is a double-stranded RNA-binding protein and a major regulator of mRNA transport, decay, and translation80. It was reported that STAU2 downregulation enhances levels of DNA damage (γH2AX) and promotes apoptosis (PARP1 cleavage) in camptothecin-treated cells81,82. The role of STAU2 in LC requires future investigations.

A main strength of the study is the focus on LC patients with extreme phenotypes of known risk factors (i.e., early-onset, FHLC, or familial cases in high-risk families), which provide >5 times statistical power10. Another strength was the relatively large sample size, which is by far the largest collection of LC rare variant analysis to our knowledge. It should be noted however that our study still has limited power to detect association for ultra-rare variants and those candidates (16/25) that could not be assessed in the validation. Third, our exome plus customized captures (50 Mb + 250 kb) in the discovery offers an efficient method for analyzing known susceptibility regions at greater depth and better coverage, particularly for indels that are often poorly captured in GWAS. Last, we have focused on the investigation of predicted LoF variants which provide directionality of effect. Notably, 14/25 candidates we identified were frameshift deletions that result in either truncated proteins or nonsense-mediated mRNA decay. In the discovery, we observed non-coding variants reside in regulatory regions that may influence target gene expression; however, the lack of population frequency information and insufficient coverage in the validation, limits our ability to explore this aspect for some non-coding variants.

There exist various challenges using the gnomAD as controls, including lack of individual-level data, inability to perform GxE interaction, gene-burden tests, and differences in platforms/coverage. Additionally, there were some racial differences in non-white between TCGA cases (27%) and gnomAD controls (30%), that could cause biased effect sizes in the meta-analysis. Genetic ancestry analysis shows 90% TCGA-LCs were inferred as genetic European ancestry83. However, it is possible that a small portion of European ancestry TCGA-patients has AJ origin, given that 7% of ovarian cancer84 and 24% of endometrial cancer85 are of AJ heritage. It is of note that in our dataset, none of the variant allele carriers of the 25 candidates were found to have African-ancestry. Therefore, we expect this potential population stratification effect to be relatively small on rare variant associations, particularly in non-Africans that have not experienced severe population bottlenecks86,87,88.

Although we demonstrated strong joint-effect of the 25 potential candidates (Supplementary Table 2), it is challenging to detect tissue-specific eQTL effects, identify mutational signatures, or construct polygenic risk score (PRS) based on these rare or ultra-rare candidates, due to their low frequencies and weak LD among rare or with common variants. We found some lung-tissue specific eQTL variants from The Genotype-Tissue Expression project (GTEx): three SNPs for ATM, 61 SNPs for POMC, 75 SNPs for MPZL2, and 141 SNPs for STAU2; but none of them overlap or are in LD with the 25 candidates we are reporting. Future studies could integrate single-cell transcriptomic sequencing and epigenomic maps in cells and tissues relevant to LC, to establish mutation signatures (i.e., DNA mismatch repair) and explore the application of PRS to clinical care.

In conclusion, our results provide evidence that rare deleterious variants with moderate to large effect sizes, in particular ATM p.V2716A, MPZL2 p.I24M fs*22del, STAU2 p.N364M fs*67del, POMC 3′ UTR c.*28delT, and MLNR p.Q334V fs*3del, contribute to LC susceptibility. Additional targeted studies using CRISPR/Cas9 mutagenesis could be performed for each variant, to evaluate more comprehensively what its effects are on gene functions and the underlying molecular mechanisms. Future extremely large-scale multi-ancestry studies may also provide additional opportunities to assess ancestry-specific predisposing variants, and discover new genetic alterations with relatively large attributable risk for LC.

Methods

Study population in the discovery set

The discovery set included 1094 LC cases and 933 controls from the TRICL study89. All study subjects and biospecimens were collected with informed consent under institutional review board (IRB) approved protocols. Subjects were selected from four sites: Harvard School of Public Health (HSPH), International Agency for Research on Cancer (IARC), University of Liverpool, and Mount Sinai Hospital and Princess Margaret Hospital (MSH-PMH) in Toronto89. Cases were selected because they reported FHLC (first-degree) or were early-onset (<60 yrs) or had specimens available (Table 1). Never smokers were defined as persons who had smoked fewer than 100 cigarettes in their lifetimes. The ethnicities were inferred using FastPop90.

WES and variant calling in the discovery set

WES was performed using captures with Agilent SureSelect v5 (50 Mb, Agilent Technologies) and custom capture targeted known LC-GWAS region91,92 (250 kb). Germline DNA was sequenced at the Center for Inherited Disease Research. The mean on-target coverage was 52x for each sequencing experiment and greater than 97% of on-target bases had a depth greater than 10x. Sequence reads were mapped to the human reference GRCh37/hg19 using the Burrows-Wheeler Aligner. SNVs and indels were called based on the union of raw GATK v3.3-0 and Atlas2. QC process involved the following user-definable criteria: i) low-complexity repeats and segmental duplications were filtered out; ii) quality score ≥20, depth ≥10, and AB ≥ 0.2 for heterozygous calls; iii) call rate ≥0.85; and iv) samples with abnormal heterozygosity rate, sex discordance, <95% completion rates, and unexpected relatedness (identity-by-state >10%) were filtered out.

Rare variant filtering and functional annotation in the discovery set

Following variant calling, rare variants were further enriched by the application of three-steps: i) Variant with MAF < 1% in the gnomAD (NFE ancestry, v2.1); ii) Variants class, including missense, protein-truncating, and regulatory; and iii) Mutation effects, i.e., variant results in protein truncation and predicted to be deleterious from 4/6 prediction tools (SIFT, Polyphen-2, MutationTaster, MutationAssessor, FATHMM, and FATHMM-MKL). The miRNAs putatively bound to the sequence containing UTR variants were identified by the TargetScan35. We additionally incorporated rare variants classified as pathogenic, likely pathogenic, or VUS from the ClinVar database, which compiles clinically observed human variants.

Single variant association test in the discovery set

For variants derived from the above automated filtering schema, we conducted the association test using Fisher’s exact test. We used the Genome Browser (Golden Helix) visualization tool to verify the presence of the potential candidates in each carrier. By manual review of the variants’ coverage plot (read depth) and pile-up plot (read alignment), we rule out low-confidence variants resulting from mapping error, strand bias, and weak exon conservation.

Gene–environment interaction and gene-based burden analysis in the discovery set

For the candidates identified from the association test, we performed G×E interaction (i.e., age-onset, sex, smoking status, pack-years, and FHLC), using the mixed linear regression model. To measure the cumulative effect of the rare deleterious variants within the gene, we performed collapsing tests using the CMC and the KBAC tests93,94.

Study populations in the validation sets and meta-analysis

The candidate variants were further examined in seven validation datasets, aggregated from different centers and across several platforms (four WES data and three genome-wide genotyping datasets as shown in Table 1). We tabulated the variant carrier counts per candidate and performed meta-analyses using the inverse-variance-weighted fixed-effects (assume the true effect size is the same in all studies).

  1. 1.

    GELCC study (Genetic Epidemiology of LC Consortium, 380 LCs): This included 122 familial and 258 sporadic LC cases. i) Familial LC Study Subjects (dbGaP phs000629.v1.p1). The familial cases were selected from high-risk LC families with at least two first-degree relatives affected with LC95. The GELCC study population and recruitment scheme have been described in detail previously96. Samples and data were collected by the familial LC recruitment sites of the GELCC, that included the University of Cincinnati, University of Colorado Health Science Center, Karmanos Cancer Institute at Wayne State University, Louisiana State University Health Sciences Center-New Orleans, Mayo Clinic, University of Toledo, Johns Hopkins University, and Saccomanno Research Institute. ii) Sporadic LC Study Subjects. The sporadic LC patients were selected from our previous WES study19,20, including samples from the HSPH, Baylor College of Medicine (BCM), and MD Anderson Cancer Center (MDACC). Germline DNA was sequenced utilizing NimbleGen VCRome 2.1 (Roche)19,20, and HumanOmniExpressExome (Illumina)95.

  2. 2.

    TCGA (The Cancer Genome Atlas cohort, 1015 LCs): this public germline WES dataset includes non-tumor DNA from 577 AD and 438 SCC (dbGaP Phs000178.v9.p8), using Agilent SureSelect (Agilent Technologies) and NimbleGen SeqCap (Roche).

  3. 3.

    COPDGene (Genetic Epidemiology of COPD Study97, 318 controls): controls were selected to be white, smokers with normal lung function data (defined as post-bronchodilator Forced Expiratory Volume in 1 s [FEV1] ≥ 0 80% predicted, FEV1/FVC ≥ 0.7), and with smoking histories ≥10 pack-years; WES utilized NimbleGen VCRome 2.1 (Roche)19,20.

  4. 4.

    GnomAD (the Genome Aggregation Database, 134,187 controls): we restricted our analyses to non-cancer individuals (excluded individuals from cancer cohort studies, such as the TCGA cohort), resulting in a data subset of 118,479 exomes and 15,708 whole genomes; multiple exome captures were utilized including Nimblegen SeqCap (Roche), Agilent SureSelect (Agilent Technologies), and Illumina Exome BeadChip (Illumina).

  5. 5.

    Oncoarray case–control study (17,878 LCs vs. 13,425 controls; dbGaP phs001273): The OncoArray consortium is a network created to increase understanding of the genetic architecture of common cancers. We restricted our analyses to European descent subjects (Supplementary Fig. 1)98,99,100; participants were obtained from 29 LC studies across North America and Europe, and genotyped on OncoArray-500K BeadChip (Illumina). There were 1162 participants in the OncoArray consortium who were also exome-sequenced in the TRICL discovery, and therefore these samples were excluded from the analysis in the validation phase.

  6. 6.

    Affymetrix case–control studies (5364 LCs vs. 5724 controls; dbGaP phs001681.v1.p1). This is a large pooled sample was assembled consisting of 10 independent case–control studies which previously described elsewhere99,101. Study participants were genotyped on an Axiom Exome Plus Array (Affymetrix)99,101, which contains a custom panel of key LC GWAS markers, and rare coding SNVs and indels102. There were 992 participants in the Affymetrix that were also exome-sequenced in the TRICL discovery, and therefore these samples were excluded from the analysis in the validation phase.

  7. 7.

    UKB (UK Biobank cohort103; 2166 LCs vs. 401,453 controls): we restricted our analyses to non-cancer controls and LC cases; individuals were genotyped on UK BiLEVE Axiom Array and UK Biobank Axiom Array (Affymetrix)103,104.

Gene prioritization based on functional annotations and protein interactions network

To better reprioritize genes and candidates, we used three prioritization tools: 1) Gene evolutionary constraint to LoF variation, which using the o/e ratio from the gnomAD. 2) Phevor PhoRank algorithm105, which ranks the genes based on their phenotypic relevance as defined by diverse biomedical ontologies. 3) Protein–Protein interactions (PPI) network using the STRING database106, with an interaction score cut-off ≥0.15 (low confidence).

Functional evaluation of candidate genes using endogenous DNA damage assay

Endogenous DNA damage is proposed to drive cancers by genome instability — a hallmark of cancer37,38. To test whether knockdown or overexpression of the candidate genes or variants induces endogenous DNA damage, we performed flow cytometric assays to measure γH2AX levels, a DNA double-strand-break marker107, following siRNA knockdown and overproduction of GFP fusions of proteins of interest.

  1. 1.

    Human cell lines and reagents. MRC5-SV40, a human lung fibroblasts derived cell line was maintained in standard Dulbecco’s modified Eagle’s medium with 10% fetal bovine serum, 2 mM L-glutamine, 100 μg/mL penicillin, and 100 μg/mL streptomycin37,38. The cell line was authenticated by ATCC STR analysis and routinely check to be mycoplasma-free. MLNR p.Q334V fs*3del, MME p.P156L fs, MPZL2 p.I24M fs*22del, and full-length wildtype POMC entry clones for gateway cloning was synthesized, sequence-verified, and cloned into pDONR223 (Invitrogen) by Genscipt. All the above clones were further subcloned into an N-terminal GFP tagged vector (pcDNA6.2/N-EmGFP-DEST, Invitrogen), using Gateway LR Clonase II Enzyme Mix (Invitrogen). Overexpression plasmids transfections were performed using GenJet In Vitro DNA Transfection Reagent Ver. II (# SL100489, SignaGen). Non-targeting pool siRNA (D-001810-10), SMARTpool siRNAs each containing four targeting sequences of MME, MLNR, POMC, ATM, CHEK2, and MPZL2, sets of 4 siRNAs targeting MME, MLNR, and POMC were purchased from Dharmacon. The target sequences for MME, MLNR, and POMC are as follows: #1 MME (GGAGGCUGGUUGAAACGUA), #2 MME (GAACCUAUAGGCCAGAGUA), #3 MME (AAAGAUGAGUGGAUAAGUG), #4 MME (GACAGCACCUUAAUGGAAU); #1 MLNR (GCGCUAACGUGAAGACGAU), #2 MLNR (GCGCAUCUAUCAACCCAAU), #3 MLNR (CAUCGUCGCUCUGCAACUU), #4 MLNR (GAAGAUUCGCGGAUGAUGU); #1 POMC (GACAAGCGCUACGGCGGUU), #2 POMC (CAGUGAAGGUGUACCCUAA), #3 POMC (GGCCGAGACUCCCAUGUUC), #4 POMC (CUACAAGAAGGGCGAGUGA). siRNA transfections were carried out with lipofectamine RNAiMax Transfection Reagent (#13778075, Invitrogen), following the manufacturer’s recommendations. SMARTpool ON-TARGETplus siRNA was designed and modified for greater specificity and reduce off-targets up to 90% utilizing a dual-strand modification.

  2. 2.

    Real-time quantitative reverse transcription PCR (RT-qPCR). Knockdown efficiency was quantified by RT-qPCR and shown in Supplementary Fig. 6. RNeasy mini kit (Qiagen #74106) was used to extract total RNA from cells 72 h post siRNA transfection or protein overproduction. 300 ng of total RNA from each sample was used to synthesize cDNA by the Superscript III first-strand synthesis system (Invitrogen, #18080051). The qPCR reactions were performed using iTaq Universal SYBR Green Supermix (BioRad #172-5121) on a QuantStudio 3 Real-Time PCR System (Applied Biosystems). For each gene, three replicates were analyzed and the average threshold cycle (Ct) was calculated. The relative expression levels were calculated with the 2–ΔΔCt method108. Primers used included GAPDH (housekeeping gene) forward: CAA TGA CCC CTT CAT TGA CC; GAPDH reverse: GAT CTC GCT CCT GGA AGA TG; POMC forward: GCC AGT GTC AGG ACC TCA C; POMC reverse: GGG AAC ATG GGA GTC TCG G; CHEK2 forward: TCT CGG GAG TCG GAT GTT GAG; CHEK2 reverse: CCT GAG TGG ACA CTG TCT CTA A; ATM forward: GGC TAT TCA GTG TGC GAG ACA; ATM reverse: TGG CTC CTT TCG GAT GAT GGA; MPZL2 forward: TTA ATG GGA CAG ATG CTC GGT; MPZL2 reverse: AAG ACA CCC GGT CCT TAA ACC; MME forward: AGA AGA AAC AGC GAT GGA CTC C; MME reverse: CAT AGA GTG CGA TCA TTG TCA CA; MLNR forward (siRNA): CTG AGC GCA TCT ATC AAC CCA; MLNR reverse (siRNA): TCC CAT CGT CTT CAC GTT AGC; MLNR forward (overexpression): GTG GTG ACC GTG ATG CTG AT; MLNR reverse (overexpression): AGC AGG ATG AGT AGG TCG GA.

  3. 3.

    Flow-cytometric DNA damage assays. Sensitive DNA damage assays by flow cytometry were performed as previously described37,38. γH2AX primary antibody (Sigma, Catalog #05-636) and goat anti-mouse secondary antibody, Alexa Fluor 647 (Thermo Fisher, Catalog #A21236) were used to stain cells. Stained cells were then analyzed by a BD LSRFortessa flow cytometer. FCS files were analyzed by FlowJo 10.5 software. For siRNA experiments, cells were collected 72 h post transfection and median fluorescence intensity was quantified. Also, to quantify the DNA-damage positive subpopulations, 0.5% of the mock cells were gated as the γH2AX threshold as previously demonstrated. The percentage of γH2AX positive cells in each sample was calculated and compared to its corresponding non-targeting siRNA control. For overproduction experiments, mock-transfected cells were used to set the gates to determine the GFP and γH2AX positive cells. 0.5% of the mock cells were gated as the γH2AX threshold. The DNA-damage ratios by protein overproduction for 72 h are calculated as described. Briefly, the damage ratio is defined as (Q2/Q3)/(Q1/Q4), where Q2 is the portion of transfected γH2AX-positive cells; Q3 is the portion of transfected, γH2AX -negative cells; Q1 is the portion of untransfected, γH2AX-positive cells; and Q4 is the portion of untransfected, γH2AX-negative cells. The DNA damage ratios by candidate protein overproduction were compared with GFP-Tubulin as previously described.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data generated and/or analyzed during the related study are described in the figshare metadata record: https://doi.org/10.6084/m9.figshare.13280387109. The data that support the findings of this study are available via the dbGaP (database of genotypes and phenotypes) repository. The data are controlled-access, so interested parties will need to request access — information on how to do so can be found on pages linked to below. The access numbers are https://identifiers.org/dbgap:phs000878.v2.p1110 for Transdisciplinary Research in Cancer of the Lung (TRICL) study, https://identifiers.org/dbgap:phs001273.v1.p1111 for the OncoArray study, https://identifiers.org/dbgap:phs001681.v1.p1112 for the Affymetrix study, https://identifiers.org/dbgap:phs000629.v1.p1113 for part of the Genetic Epidemiology of Lung Cancer Consortium (GELCC) study, and https://identifiers.org/dbgap:phs000178.v9.p8114 for The Cancer Genome Atlas (TCGA) study. Two files are not publicly available in order to protect patient privacy. These are: ‘TRICL WES.xlsx’ (underlying Supplementary Table 2 and Supplementary Fig. 3) and ‘TRICL WES.bam’ (underlying Supplementary Fig. 2). These data are only available to authorized researchers who have submitted an IRB application. Please email the corresponding author for access. Data underlying Supplementary Table 5 and Supplementary Fig. 5 are a publicly available resource available from the STRING (Search Tool for the Retrieval of Interacting Genes) website: http://string-db.org/. The file used in this study was ‘Protein-Protein Interaction Networks Functional Enrichment Analysis-STRING.txt’.

Sources of other datasets used in this study are: the UKB dataset is accessible to approved researchers and applications through ukbgene at www.ukbiobank.ac.uk. The GnomAD dataset can be downloaded from the Genome Aggregation Database at https://gnomad.broadinstitute.org/.

References

  1. Rizvi, N. A. et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124–128 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Bosse, Y. & Amos, C. I. A decade of GWAS results in lung cancer. Cancer Epidemiol. Biomark. Prev. 27, 363–379 (2018).

    Article  Google Scholar 

  3. Wei, C. et al. A case-control study of a sex-specific association between a 15q25 variant and lung cancer risk. Cancer Epidemiol. Biomark. Prev. 20, 2603–2609 (2011).

    CAS  Article  Google Scholar 

  4. Bierut, L. J. et al. Variants in nicotinic receptors and risk for nicotine dependence. Am. J. Psychiatry 165, 1163–1171 (2008).

    PubMed  PubMed Central  Article  Google Scholar 

  5. Chen, L. S., et al. CHRNA5 risk variant predicts delayed smoking cessation and earlier lung cancer diagnosis–a meta-analysis. J. Natl Cancer Inst. 107, djv100 (2015).

  6. Chen, L. S. et al. Interplay of genetic risk factors (CHRNA5-CHRNA3-CHRNB4) and cessation treatments in smoking cessation success. Am. J. Psychiatry 169, 735–742 (2012).

    PubMed  PubMed Central  Article  Google Scholar 

  7. Mucci, L. A. et al. Familial risk and heritability of cancer among twins in Nordic countries. JAMA 315, 68–76 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. Kang, G., Lin, D., Hakonarson, H. & Chen, J. Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power. Hum. Hered. 73, 139–147 (2012).

    PubMed  Article  Google Scholar 

  9. Lamina, C. Digging into the extremes: a useful approach for the analysis of rare variants with continuous traits? BMC Proc. 5(Suppl. 9), S105 (2011).

    PubMed  PubMed Central  Article  Google Scholar 

  10. Li, D., Lewinger, J. P., Gauderman, W. J., Murcray, C. E. & Conti, D. Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies. Genet. Epidemiol. 35, 790–799 (2011).

    PubMed  PubMed Central  Article  Google Scholar 

  11. Gorlov, I. P., Gorlova, O. Y., Sunyaev, S. R., Spitz, M. R. & Amos, C. I. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Gorlov, I. P., Gorlova, O. Y., Frazier, M. L., Spitz, M. R. & Amos, C. I. Evolutionary evidence of the effect of rare variants on disease etiology. Clin. Genet. 79, 199–206 (2011).

    CAS  PubMed  Article  Google Scholar 

  13. Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. Choi, Y. W. et al. EGFR exon 19 deletion is associated with favorable overall survival after first-line gefitinib therapy in advanced non-small cell lung cancer patients. Am. J. Clin. Oncol. 41, 385–390 (2018).

    CAS  PubMed  Article  Google Scholar 

  15. Sequist, L. V. et al. First-line gefitinib in patients with advanced non-small-cell lung cancer harboring somatic EGFR mutations. J. Clin. Oncol. 26, 2442–2449 (2008).

    CAS  PubMed  Article  Google Scholar 

  16. Tian, Y. et al. Different subtypes of EGFR exon19 mutation can affect prognosis of patients with non-small cell lung adenocarcinoma. PLoS ONE 13, e0201682 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  17. Xiong, D. et al. A recurrent mutation in PARK2 is associated with familial lung cancer. Am. J. Hum. Genet. 96, 301–308 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. Wang, Y. et al. Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer. Nat. Genet. 46, 736–741 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. Liu, Y. et al. Rare variants in known susceptibility loci and their contribution to risk of lung cancer. J. Thorac. Oncol. 13, 1483–1495 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  20. Liu, Y. et al. Focused analysis of exome sequencing data for rare germline mutations in familial and sporadic lung cancer. J. Thorac. Oncol. 11, 52–61 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  21. Ji, X. et al. Protein-altering germline mutations implicate novel genes related to lung cancer development. Nat. Commun. 11, 2220 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. Peng, B., Li, B., Han, Y. & Amos, C. I. Power analysis for case-control association studies of samples with known family histories. Hum. Genet. 127, 699–704 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  23. Osann, K. E. Lung cancer in women: the importance of smoking, family history of cancer, and medical history of respiratory disease. Cancer Res. 51, 4893–4897 (1991).

    CAS  PubMed  Google Scholar 

  24. Cote, M. L. et al. Increased risk of lung cancer in individuals with a family history of the disease: a pooled analysis from the International Lung Cancer Consortium. Eur. J. Cancer 48, 1957–1968 (2012).

    PubMed  PubMed Central  Article  Google Scholar 

  25. Loman, N. J. et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30, 434–439 (2012).

    CAS  PubMed  Article  Google Scholar 

  26. Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. Balzer, S., Malde, K. & Jonassen, I. Systematic exploration of error sources in pyrosequencing flowgram data. Bioinformatics 27, i304–i309 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Wang, Y. et al. Deciphering associations for lung cancer risk through imputation and analysis of 12,316 cases and 16,831 controls. Eur. J. Hum. Genet. 23, 1723–1728 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. Dong, J. et al. Association analyses identify multiple new lung cancer susceptibility loci and their interactions with smoking in the Chinese population. Nat. Genet. 44, 895–899 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. McKay, J. D. et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet. 49, 1126–1132 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. Shrine, N. et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 51, 481–493 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Zhu, Z. et al. Genetic overlap of chronic obstructive pulmonary disease and cardiovascular disease-related traits: a large-scale genome-wide cross-trait analysis. Respir. Res. 20, 64 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).

    CAS  PubMed  Article  Google Scholar 

  35. Lewis, B. P., Burge, C. B. & Bartel, D. P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20 (2005).

    CAS  PubMed  Article  Google Scholar 

  36. Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell Proteom. 13, 397–406 (2014).

    CAS  Article  Google Scholar 

  37. Xia, J. et al. Bacteria-to-human protein networks reveal origins of endogenous DNA damage. Cell 176, 127–143 e124 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. Bosse, Y. et al. Transcriptome-wide association study reveals candidate causal genes for lung cancer. Int. J. Cancer 146, 1862–1878 (2020).

    CAS  PubMed  Article  Google Scholar 

  39. Selvan, M. E. et al. Inherited rare, deleterious variants in ATM increase lung adenocarcinoma risk. J. Thorac. Oncol. 15, 1871–1879 (2020).

    Article  CAS  Google Scholar 

  40. Parry, E. M. et al. Germline mutations in DNA repair genes in lung adenocarcinoma. J. Thorac. Oncol. 12, 1673–1678 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  41. Yang, H. et al. ATM sequence variants associate with susceptibility to non-small cell lung cancer. Int. J. Cancer 121, 2254–2259 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. Lo, Y. L. et al. ATM polymorphisms and risk of lung cancer among never smokers. Lung Cancer 69, 148–154 (2010).

    PubMed  Article  Google Scholar 

  43. Hsia, T. C. et al. Effects of ataxia telangiectasia mutated (ATM) genotypes and smoking habits on lung cancer risk in Taiwan. Anticancer Res. 33, 4067–4071 (2013).

    CAS  PubMed  Google Scholar 

  44. Chenevix-Trench, G. et al. Dominant negative ATM mutations in breast cancer families. J. Natl Cancer Inst. 94, 205–215 (2002).

    PubMed  Article  Google Scholar 

  45. Morgan, S. E., Lovly, C., Pandita, T. K., Shiloh, Y. & Kastan, M. B. Fragments of ATM which have dominant-negative or complementing activity. Mol. Cell Biol. 17, 2020–2029 (1997).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. Bakkenist, C. J. & Kastan, M. B. DNA damage activates ATM through intermolecular autophosphorylation and dimer dissociation. Nature 421, 499–506 (2003).

    CAS  PubMed  Article  Google Scholar 

  47. Scott, S. P. et al. Missense mutations but not allelic variants alter the function of ATM by dominant interference in patients with breast cancer. Proc. Natl Acad. Sci. USA 99, 925–930 (2002).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  48. Kuhne, M. et al. A double-strand break repair defect in ATM-deficient cells contributes to radiosensitivity. Cancer Res. 64, 500–508 (2004).

    PubMed  Article  Google Scholar 

  49. Dai, J. et al. Genome-wide association study of INDELs identified four novel susceptibility loci associated with lung cancer risk. Int. J. Cancer 146, 2855–2864 (2020).

    CAS  PubMed  Article  Google Scholar 

  50. Bademci, G. et al. MPZL2 is a novel gene associated with autosomal recessive nonsyndromic moderate hearing loss. Hum. Genet. 137, 479–486 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. Wesdorp, M. et al. MPZL2, encoding the epithelial junctional protein myelin protein zero-like 2, is essential for hearing in man and mouse. Am. J. Hum. Genet. 103, 74–88 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Guttinger, M. et al. Epithelial V-like antigen (EVA), a novel member of the immunoglobulin superfamily, expressed in embryonic epithelia with a potential role as homotypic adhesion molecule in thymus histogenesis. J. Cell Biol. 141, 1061–1071 (1998).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. Einhorn, Y. et al. Differential analysis of mutations in the Jewish population and their implications for diseases. Genet. Res. 99, e3 (2017).

    Article  Google Scholar 

  54. Shi, L. et al. Comprehensive population screening in the Ashkenazi Jewish population for recurrent disease-causing variants. Clin. Genet. 91, 599–604 (2017).

    CAS  PubMed  Article  Google Scholar 

  55. Kerem, B., Chiba-Falek, O. & Kerem, E. Cystic fibrosis in Jews: frequency and mutation distribution. Genet. Test. 1, 35–39 (1997).

    CAS  PubMed  Article  Google Scholar 

  56. Powers, J. et al. A rare TP53 mutation predominant in Ashkenazi Jews confers risk of multiple cancers. Cancer Res. 80, 3732–3744 (2020).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  57. Picciotto, M. R. & Mineur, Y. S. Molecules and circuits involved in nicotine addiction: the many faces of smoking. Neuropharmacology 76 Pt B, 545–553 (2014).

    PubMed  Article  CAS  Google Scholar 

  58. Huang, H., Xu, Y. & van den Pol, A. N. Nicotine excites hypothalamic arcuate anorexigenic proopiomelanocortin neurons and orexigenic neuropeptide Y neurons: similarities and differences. J. Neurophysiol. 106, 1191–1202 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. Mineur, Y. S. et al. Nicotine decreases food intake through activation of POMC neurons. Science 332, 1330–1332 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. Wenczl, E. et al. (Pheo)melanin photosensitizes UVA-induced DNA damage in cultured human melanocytes. J. Invest. Dermatol. 111, 678–682 (1998).

    CAS  PubMed  Article  Google Scholar 

  61. Cui, R. et al. Central role of p53 in the suntan response and pathologic hyperpigmentation. Cell 128, 853–864 (2007).

    CAS  PubMed  Article  Google Scholar 

  62. Suzuki, I. et al. Increase of pro-opiomelanocortin mRNA prior to tyrosinase, tyrosinase-related protein 1, dopachrome tautomerase, Pmel-17/gp100, and P-protein mRNA in human skin after ultraviolet B irradiation. J. Invest. Dermatol. 118, 73–78 (2002).

    CAS  PubMed  Article  Google Scholar 

  63. Slominski, A., Tobin, D. J. & Paus, R. Does p53 regulate skin pigmentation by controlling proopiomelanocortin gene transcription? Pigment Cell Res. 20, 307–308 (2007). author reply 309-310.

    PubMed  Article  Google Scholar 

  64. Krude, H. et al. Severe early-onset obesity, adrenal insufficiency and red hair pigmentation caused by POMC mutations in humans. Nat. Genet. 19, 155–157 (1998).

    CAS  PubMed  Article  Google Scholar 

  65. Tsai, H. E. et al. Downregulation of hepatoma-derived growth factor contributes to retarded lung metastasis via inhibition of epithelial-mesenchymal transition by systemic POMC gene delivery in melanoma. Mol. Cancer Ther. 12, 1016–1025 (2013).

    CAS  PubMed  Article  Google Scholar 

  66. Stovold, R. et al. Neuroendocrine and epithelial phenotypes in small-cell lung cancer: implications for metastasis and survival in patients. Br. J. Cancer 108, 1704–1711 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  67. Meredith, S. L. et al. Irradiation decreases the neuroendocrine biomarker pro-opiomelanocortin in small cell lung cancer cells in vitro and in vivo. PLoS ONE 11, e0148404 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  68. Hao, L., Zhao, X., Zhang, B., Li, C. & Wang, C. Positive expression of pro-opiomelanocortin (POMC) is a novel independent poor prognostic marker in surgically resected non-small cell lung cancer. Tumour Biol. 36, 1811–1817 (2015).

    CAS  PubMed  Article  Google Scholar 

  69. Derghal, A. et al. Leptin modulates the expression of miRNAs-targeting POMC mRNA by the JAK2-STAT3 and PI3K-Akt pathways. J. Clin. Med. 8, 2213–2224 (2019).

    CAS  PubMed Central  Article  Google Scholar 

  70. Feighner, S. D. et al. Receptor for motilin identified in the human gastrointestinal system. Science 284, 2184–2188 (1999).

    CAS  PubMed  Article  Google Scholar 

  71. Xu, H. L. et al. Variants in motilin, somatostatin and their receptor genes and risk of biliary tract cancers and stones in Shanghai, China. Meta Gene 2, 418–426 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  72. Misawa, K. et al. Neuropeptide receptor genes GHSR and NMUR1 are candidate epigenetic biomarkers and predictors for surgically treated patients with oropharyngeal cancer. Sci. Rep. 10, 1007 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  73. Delahaye-Sourdeix, M. et al. A rare truncating BRCA2 variant and genetic susceptibility to upper aerodigestive tract cancer. J. Natl Cancer Inst. 107, djv037 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  74. Cybulski, C. et al. Constitutional CHEK2 mutations are associated with a decreased risk of lung and laryngeal cancers. Carcinogenesis 29, 762–765 (2008).

    CAS  PubMed  Article  Google Scholar 

  75. Brennan, P. et al. Uncommon CHEK2 mis-sense variant and reduced risk of tobacco-related cancers: case control study. Hum. Mol. Genet. 16, 1794–1801 (2007).

    CAS  PubMed  Article  Google Scholar 

  76. Shaag, A. et al. Functional and genomic approaches reveal an ancient CHEK2 allele associated with breast cancer in the Ashkenazi Jewish population. Hum. Mol. Genet. 14, 555–563 (2005).

    CAS  PubMed  Article  Google Scholar 

  77. Roeb, W., Higgins, J. & King, M. C. Response to DNA damage of CHEK2 missense mutations in familial breast cancer. Hum. Mol. Genet. 21, 2738–2744 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  78. Kilpivaara, O. et al. CHEK2 variant I157T may be associated with increased breast cancer risk. Int. J. Cancer 111, 543–547 (2004).

    CAS  PubMed  Article  Google Scholar 

  79. Apostolou, P. & Papasotiriou, I. Current perspectives on CHEK2 mutations in breast cancer. Breast Cancer 9, 331–335 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Furic, L., Maher-Laporte, M. & DesGroseillers, L. A genome-wide approach identifies distinct but overlapping subsets of cellular mRNAs associated with Staufen1- and Staufen2-containing ribonucleoprotein complexes. RNA 14, 324–335 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  81. Zhang, X. et al. The downregulation of the RNA-binding protein Staufen2 in response to DNA damage promotes apoptosis. Nucleic Acids Res. 44, 3695–3712 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  82. Conde, L., Beaujois, R. & DesGroseillers, L. STAU2 protein level is controlled by caspases and the CHK1 pathway and regulates cell cycle progression in the non-transformed hTERT-RPE1 cells. Preprint from Research Square, https://doi.org/10.21203/rs.21203.rs-60003/v21201 PPR: PPR206819 (2020).

  83. Yuan, J. et al. Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell 34, 549–560.e549 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  84. Yang, D. et al. Association of BRCA1 and BRCA2 mutations with survival, chemotherapy sensitivity, and gene mutator phenotype in patients with ovarian cancer. JAMA 306, 1557–1565 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  85. Cadoo, K. A. Understanding inherited risk in unselected newly diagnosed patients with endometrial cancer. JCO Precis. Oncol. 3, 473–474 (2019).

    Google Scholar 

  86. O’Connor, T. D. et al. Fine-scale patterns of population stratification confound rare variant association tests. PLoS ONE 8, e65834 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  87. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  88. Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  89. Wang, Z., et al. Multi-omics analysis reveals a HIF network and Hub gene EPAS1 associated with lung adenocarcinoma. EBioMedicine, 93–101 (2018).

  90. Li, Y. et al. FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data. BMC Bioinform. 17, 122 (2016).

    Article  CAS  Google Scholar 

  91. Bainbridge, M. N. et al. Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol. 12, R68 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  92. Lupski, J. R. et al. Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy. Genome Med. 5, 57 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  93. Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  94. Liu, D. J. & Leal, S. M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001156 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  95. Musolf, A. M. et al. Whole exome sequencing of highly aggregated lung cancer families reveals linked loci for increased cancer risk on chromosomes 12q, 7p, and 4q. Cancer Epidemiol. Biomark. Prev. 29, 434–442 (2020).

    CAS  Article  Google Scholar 

  96. Liu, P. et al. Familial aggregation of common sequence variants on 15q24-25.1 in lung cancer. J. Natl Cancer Inst. 100, 1326–1330 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  97. Regan, E. A. et al. Genetic epidemiology of COPD (COPDGene) study design. COPD 7, 32–43 (2010).

    PubMed  Article  Google Scholar 

  98. Ji, X. et al. Identification of susceptibility pathways for the role of chromosome 15q25.1 in modifying lung cancer risk. Nat. Commun. 9, 3221 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  99. Li, Y. et al. Genetic interaction analysis among oncogenesis-related genes revealed novel genes and networks in lung cancer development. Oncotarget 10, 1760–1774 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  100. Byun, J. et al. Genome-wide association study of familial lung cancer. Carcinogenesis 39, 1135–1140 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  101. Kachuri, L. et al. Fine mapping of chromosome 5p15.33 based on a targeted deep sequencing and high density genotyping identifies novel lung cancer susceptibility loci. Carcinogenesis 37, 96–105 (2016).

    CAS  PubMed  Article  Google Scholar 

  102. Zuzarte, P. C. et al. A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms. PLoS ONE 9, e93455 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  103. Matthews, P. M. & Sudlow, C. The UK Biobank. Brain 138, 3463–3465 (2015).

    PubMed  Article  Google Scholar 

  104. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  105. Singleton, M. V. et al. Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am. J. Hum. Genet. 94, 599–610 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  106. Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).

    CAS  Article  PubMed  Google Scholar 

  107. Kinner, A., Wu, W., Staudt, C. & Iliakis, G. Gamma-H2AX in recognition and signaling of DNA double-strand breaks in the context of chromatin. Nucleic Acids Res. 36, 5678–5694 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  108. Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25, 402–408 (2001).

    CAS  PubMed  Article  Google Scholar 

  109. Liu, Y. Metadata record for the manuscript: rare deleterious germline variants and risk of lung cancer. figshare https://doi.org/10.6084/m9.figshare.13280387 (2020).

  110. Transdisciplinary Research Into Cancer of the Lung (TRICL) - Exome Plus Targeted Sequencing. dbGaP https://identifiers.org/dbgap:phs000878.v2.p1.

  111. Oncoarray Consortium - Lung Cancer Studies. dbGaP https://identifiers.org/dbgap:phs001273.v1.p1.

  112. Transdisciplinary Research Into Cancer of the Lung (TRICL) – Affymetrix. dbGaP https://identifiers.org/dbgap:phs001681.v1.p1.

  113. Genetic Epidemiology of Lung Cancer Consortium GWAS of Familial Lung Cancer. dbGaP https://identifiers.org/dbgap:phs000629.v1.p1.

  114. National Institutes of Health The Cancer Genome Atlas (TCGA). dbGaP https://identifiers.org/dbgap:phs000178.v9.p8.

Download references

Acknowledgements

We would like to thank all individuals who participated in this study. This work was supported by grants from the National Institutes of Health (R01CA127219, R01CA141769, R01CA060691, R01CA87895, R01CA80127, R01CA84354, R01CA134682, R01CA134433, R01CA074386, R01CA092824, R01CA250905, R01HL113264, R01HL082487, R01HL110883, R03CA77118, P20GM103534, P30CA125123, P30CA023108, P30CA022453, P30ES006096, P50CA090578, U01CA243483, U01HL089856, U01HL089897, U01CA76293, U19CA148127, U01CA209414, K07CA181480, N01-HG-65404, HHSN268200782096C, HHSN261201300011I, HHSN268201100011, HHSN268201 200007 C, DP1-CA174424, DP1-AG072751, CA125123, RR024574, Intramural Research Program of the National Human Genome Research Institute (JEB-W), and Herrick Foundation. Dr. Amos is an Established Research Scholar of the Cancer Prevention Research Institute of Texas (RR170048). We also want to acknowledge the Cytometry and Cell Sorting Core support by the Cancer Prevention and Research Institute of Texas Core Facility (RP180672). At Toronto, the study is supported by The Canadian Cancer Society Research Institute (# 020214) to R. H., Ontario Institute for Cancer Research to R. H, and the Alan Brown Chair to G. L. and Lusi Wong Programs at the Princess Margaret Hospital Foundation. The Liverpool Lung Project is supported by Roy Castle Lung Cancer Foundation.

Author information

Authors and Affiliations

Authors

Contributions

Drafted the Paper: Y.L., J.X., and C.I.A. Project Coordination: C.I.A., R.J.H., D.C.C., and P.B. Statistical Analysis: Y.L., S.T., X.X., D.Z., C.W.P., C.M.L., and C.I.A. Genetic validation analysis: Y.L., S.T., X.X., C.C., Y.Li., J.B., D.Z., W.H., C.W.P., C.M.L., and C.I.A. Functional DNA damage assay analysis: J.X., Z.S., and S.M.R. Sample collection, exome sequencing, and development of the epidemiological studies: J.M., M.R.S., M.E.S., F.K., C.M.L., A.G.S., I.I.W., M.H.C., E.K.S., J.B.W., S.M.P., M.A., E.K., C.G., D.M., M.Y., M.dA., P.Y., T, M.P.A.D., J.L., B.S., D.Z., A.M., V.J., I.H., D.M., J.S., G.S., P.B., G.L., J.K.F., R.J.H., D.C.C., and C.I.A.

Corresponding author

Correspondence to Christopher I. Amos.

Ethics declarations

Competing interests

E.K.S. reports institutional grant funding from Bayer and GlaxoSmithKline. Other authors declare no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Xia, J., McKay, J. et al. Rare deleterious germline variants and risk of lung cancer. npj Precis. Onc. 5, 12 (2021). https://doi.org/10.1038/s41698-021-00146-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41698-021-00146-7

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing