Rare deleterious germline variants and risk of lung cancer

Liu, Yanhong; Xia, Jun; McKay, James; Tsavachidis, Spiridon; Xiao, Xiangjun; Spitz, Margaret R.; Cheng, Chao; Byun, Jinyoung; Hong, Wei; Li, Yafang; Zhu, Dakai; Song, Zhuoyi; Rosenberg, Susan M.; Scheurer, Michael E.; Kheradmand, Farrah; Pikielny, Claudio W.; Lusk, Christine M.; Schwartz, Ann G.; Wistuba, Ignacio I.; Cho, Michael H.; Silverman, Edwin K.; Bailey-Wilson, Joan; Pinney, Susan M.; Anderson, Marshall; Kupert, Elena; Gaba, Colette; Mandal, Diptasri; You, Ming; de Andrade, Mariza; Yang, Ping; Liloglou, Triantafillos; Davies, Michael P. A.; Lissowska, Jolanta; Swiatkowska, Beata; Zaridze, David; Mukeria, Anush; Janout, Vladimir; Holcatova, Ivana; Mates, Dana; Stojsic, Jelena; Scelo, Ghislaine; Brennan, Paul; Liu, Geoffrey; Field, John K.; Hung, Rayjean J.; Christiani, David C.; Amos, Christopher I.

doi:10.1038/s41698-021-00146-7

Download PDF

Article
Open access
Published: 16 February 2021

Rare deleterious germline variants and risk of lung cancer

Yanhong Liu¹^na1,
Jun Xia ORCID: orcid.org/0000-0001-7331-0673²^na1,
James McKay³,
Spiridon Tsavachidis¹,
Xiangjun Xiao²,
Margaret R. Spitz¹,
Chao Cheng ORCID: orcid.org/0000-0002-5002-3417^1,2,
Jinyoung Byun^1,2,
Wei Hong²,
Yafang Li^1,2,
Dakai Zhu²,
Zhuoyi Song²,
Susan M. Rosenberg⁴,
Michael E. Scheurer^1,5,
Farrah Kheradmand ORCID: orcid.org/0000-0001-5343-103X^1,6,
Claudio W. Pikielny⁷,
Christine M. Lusk⁸,
Ann G. Schwartz⁸,
Ignacio I. Wistuba⁹,
Michael H. Cho ORCID: orcid.org/0000-0002-4907-1657¹⁰,
Edwin K. Silverman¹⁰,
Joan Bailey-Wilson ORCID: orcid.org/0000-0002-9153-2920¹¹,
Susan M. Pinney¹²,
Marshall Anderson¹²,
Elena Kupert¹²,
Colette Gaba¹³,
Diptasri Mandal¹⁴,
Ming You¹⁵,
Mariza de Andrade¹⁶,
Ping Yang¹⁷,
Triantafillos Liloglou¹⁸,
Michael P. A. Davies ORCID: orcid.org/0000-0002-7609-4977¹⁸,
Jolanta Lissowska¹⁹,
Beata Swiatkowska²⁰,
David Zaridze²¹,
Anush Mukeria²¹,
Vladimir Janout²²,
Ivana Holcatova²³,
Dana Mates²⁴,
Jelena Stojsic ORCID: orcid.org/0000-0002-9290-0967²⁵,
Ghislaine Scelo³,
Paul Brennan ORCID: orcid.org/0000-0002-0518-8714³,
Geoffrey Liu²⁶,
John K. Field ORCID: orcid.org/0000-0003-3951-6365¹⁸,
Rayjean J. Hung ORCID: orcid.org/0000-0002-4486-7496²⁷,
David C. Christiani ORCID: orcid.org/0000-0002-0301-0242²⁸ &
…
Christopher I. Amos ORCID: orcid.org/0000-0002-8540-7023^1,2

npj Precision Oncology volume 5, Article number: 12 (2021) Cite this article

4533 Accesses
15 Citations
91 Altmetric
Metrics details

Subjects

Abstract

Recent studies suggest that rare variants exhibit stronger effect sizes and might play a crucial role in the etiology of lung cancers (LC). Whole exome plus targeted sequencing of germline DNA was performed on 1045 LC cases and 885 controls in the discovery set. To unveil the inherited causal variants, we focused on rare and predicted deleterious variants and small indels enriched in cases or controls. Promising candidates were further validated in a series of 26,803 LCs and 555,107 controls. During discovery, we identified 25 rare deleterious variants associated with LC susceptibility, including 13 reported in ClinVar. Of the five validated candidates, we discovered two pathogenic variants in known LC susceptibility loci, ATM p.V2716A (Odds Ratio [OR] 19.55, 95%CI 5.04–75.6) and MPZL2 p.I24M frameshift deletion (OR 3.88, 95%CI 1.71–8.8); and three in novel LC susceptibility genes, POMC c.*28delT at 3′ UTR (OR 4.33, 95%CI 2.03–9.24), STAU2 p.N364M frameshift deletion (OR 4.48, 95%CI 1.73–11.55), and MLNR p.Q334V frameshift deletion (OR 2.69, 95%CI 1.33–5.43). The potential cancer-promoting role of selected candidate genes and variants was further supported by endogenous DNA damage assays. Our analyses led to the identification of new rare deleterious variants with LC susceptibility. However, in-depth mechanistic studies are still needed to evaluate the pathogenic effects of these specific alleles.

Protein-altering germline mutations implicate novel genes related to lung cancer development

Article Open access 11 May 2020

Germline rare deleterious variant load alters cancer risk, age of onset and tumor characteristics

Article Open access 27 January 2023

Association between germline variants and somatic mutations in colorectal cancer

Article Open access 17 June 2022

Introduction

Lung cancer (LC), the leading cause of cancer mortality in the US, has recently shown substantial drops in mortality, largely attributed to reduced smoking rates and improvement in new treatments such as immunotherapy¹. Prior genome-wide association studies (GWAS) identified novel genetic factors influencing LC risk, which are sometimes modulated by smoking behavior². Notably, in the 15q25.1 region that shows the most significant and consistent genetic signal, a missense p.D398N and a 22-bp deletion (del) in the core promoter region of CHRNA5 have been identified that affect the function and expression^3,4. Carriers of these variants find quitting smoking more difficult than noncarriers⁵ and may benefit from a targeted smoking cessation intervention⁶.

Previous studies have estimated heritability of LC to be 18%⁷. Recent genetic studies suggest that rare variants (minor allele frequency [MAF] < 1%) that are functionally deleterious, exhibit far larger effect sizes than common variants^8,9,10 as they display signs of stronger selective pressure^11,12, and could account for missing heritability unexplained by common variants¹¹. Fewer than 3% of protein-coding single nucleotide variants (SNVs) corresponding to approximately 300 genes per genome are predicted to result in loss of protein function (LoF) through the introduction of stop-gain, frameshift, or the disruption of an essential splice site¹³. Insertions (ins) or deletions (indels) have been understudied, though they are the second most abundant source of human genetic variation. Selected indels have been identified as playing a key role in causing LC, such as p.E746_A750del in EGFR^14,15,16.

Supporting the hypothesis that deleterious mutations will show lower MAF are recent identifications of several rare missense variants that have a moderate-to-large effect on LC risk, for example, PARK2 p.R275W (OR 5.24)¹⁷, BRCA2 p.K3326X (OR 2.47), CHEK2 p.I157T (OR 0.38)¹⁸, LTB p.L87F (OR 7.52), P3H2 p.Q185H (OR 5.39)¹⁹, DBH p.V26M²⁰, and ATM p.L2307F (OR 8.82)²¹. Because of the stronger evolutionary pressure and weak linkage disequilibrium (LD) with common SNPs used in GWAS, finding these rare variants through population-based studies can be challenging²². To maximize the potential for the detection of large-effect, rare deleterious variants (SNVs and small indels ≤21 bp), we employed whole exome sequencing (WES) plus targeted sequencing on healthy controls and selected high-risk LC cases enriched with the highest genetic risk of LC, for example, early-onset or family history of LC (FHLC)^7,23,24.

Results

Demographics of study subjects

As shown in Table 1, the vast majority of subjects in the discovery study ─ Transdisciplinary Research in Cancer of the Lung (TRICL; 1,045 LCs vs. 885 controls) ─ and the validation sets (26,803 LCs and 555,107 controls) were primarily of European-descent (Supplementary Fig. 1). LC cases were significantly more likely to be smokers and with higher pack-years than controls (P-value < 0.0001). The TRICL and Genetic Epidemiology of LC Consortium (GELCC) cases were enriched for having FHLC.

Table 1 Basic characteristics of LC cases and controls in the discovery and validations sets.

Full size table

Identification of rare and deleterious variants in the TRICL discovery set

In the discovery set, a total of 2,182,753 variants were detected. Applying a three-step filtering method based on allele frequency (MAF < 1% in non-Finnish European [NFE] population from the Genome Aggregation Database [gnomAD]), variant class (missense, protein-truncating and regulatory), and functional effects (predicted deleterious and or with clinical significance from ClinVar), we identified 67,470 rare and putatively deleterious variants: 63% missense, 16% frameshift (fs), 12% in-frame indels, 6% regulatory (untranslated region [UTR] and splice acceptor/donor), and 3% stop-gain. Single variant association analysis identified 75 potential candidates.

Given the known challenge of excessive false-positive indel detection rates caused by the high frequency of homopolymer-associated sequencing errors^25,26,27,28, we subjected these 75 potential candidates to additional filtering and manual inspection using Genome Browser (Supplementary Table 1). Twenty-five of the 75 were high-confidence putative candidates (two SNVs, four ins, and 19 del). Supplementary Fig. 2 shows the variant visualization map for the candidates and variant carriers (read alignment and depth). Thirteen out of the 25 candidates (in 24 genes) reported clinical significance in ClinVar, and eight were classified as pathogenic. Also, 5/24 genes were mapped to known LC-GWAS loci, such as 3q28 TP63²⁹, 5q31.1 TXNDC15³⁰, 11q22.3 ATM²¹, 11q23.3 MPZL2³¹, and 22q12.1 CHEK2¹⁸. Three mapped in known GWAS loci for COPD/ PF (pulmonary function): 1p34.3 BMP8A^32,33, 1p36.31 PHF13³², and 14q23.1 TALPID3/KIAA0586³⁴.

We next assessed the dose-effect of the 25 candidates: 16 were enriched in LCs (risk-conferring alleles) and 9 were enriched in controls (protective alleles). Compared with subjects with zero risk- and protective-alleles, the groups carrying one, and two risk-alleles (5 LCs) showed a progressively increased risk, whereas groups carry one, and two protective-alleles (6 controls) demonstrated a gradually reduced risk (Supplementary Table 2). All 6 controls harbored MOB3A p.F69_I75del, whereas 4/5 LCs harbored STAU2 p.N364M fs*67del.

Studying the demographics of the mutation carriers, there was no significant difference in smoking (status and pack-years) or FHLC between carriers and non-carriers. Notably, 5/6 two-protective-alleles carriers were male, whereas 4/5 two-risk-alleles carriers were female and had adenocarcinoma (AD). Overall, age did not differ significantly between carriers and non-carriers (Supplementary Fig. 3). However, in LC cases, onset-age in risk-allele carriers (54 yrs for two-risk-alleles carriers, 62 yrs for one-risk-allele carriers) were significantly younger than the protective-allele carriers (69 yrs; Supplementary Table 2).

Further gene-environment (G×E) interaction analysis showed that two variants interacted with smoking behavior (Supplementary Table 1). Specifically, the risk MLNR p.Q334V fs*3del interacted with pack-years (P-value 0.0035); the protective-effect associated with the MOB3A p.F69_I75del is substantial and significant among males (10/11 control carriers were male, whereas 0/2 LCs carriers were male; P-value 0.042), smokers (6/11 control carriers were smokers, whereas 0/2 LCs carriers were smokers; P-value 0.016), and pack-years (P-value 0.0036). We also identified that the protective variant TXNDC15 p.E9G fs*68del interacted with FHLC, as 5/7 of LC carriers with FHLC, compared to 0/21 controls (P-value 0.035).

We subsequently conducted gene-based rare variant burden tests for the 24 genes harboring potential candidates, five genes, namely, MLNR, CCDC105, BMP8A, MME, and NPHP3, showed suggestive associations (Table 2). We also performed exome wide gene-based tests, however, none showed strong association after multiple testing corrections (Supplementary Fig. 4).

Table 2 Gene-based association tests in the TRICL study, ranked by P-value from the combined multivariate and collapsing test.

Full size table

Meta-analyses of the discovery and validation sets

In the seven validation datasets, of the 25 candidates, 100% were covered by the gnomAD, 22 (88%) in TCGA, 16 (64%) in COPDGene, nine (36%) in GELCC, and nine (36%) were covered in one of the three case–control studies (OncoArray, Affymetrix, and UKB) with genotyping data. Table 3 summarizes the top five candidates with consistent associations from the meta-analysis.

Table 3 Top five hits from discovery and validation association analysis.

Full size table

The topmost risk-conferring variant is a missense SNV, p.V2716A, in the phosphatidylinositol 3-kinase (PI3K) catalytic domain of ATM (Ataxia telangiectasia mutated; OMIM 607585, UniProt Q13315). This pathogenic variant (rs587782652) is exceedingly rare in the gnomAD, with MAF 0.0021% and 0.0054% in non-cancer controls and NFE population, respectively. In our combined datasets, this variant presented in 0.05% of LCs and 0.003% controls, with remarkably high effect sizes (OR 19.55, 95%CI 5.04–75.6; P-value 1.7e-05). LC carriers of this variant were predominately enriched in smokers (8/9 carriers), AD (7/9 carriers), and early-onset (6/9 carriers; mean 55 yrs). Further, four additional rare deleterious variants were observed in ATM (Fig. 1 and Supplementary Table 3). No LD is present among these variants and the candidate p.V2716A (Supplementary Table 4).

**Fig. 1: Gene exons, protein domains, and rare deleterious variants of the candidate genes.**

The second risk variant is c.*28delT in the 3′ UTR of POMC (Pro-opiomelanocortin; OMIM 176830, UniProt # P01189). The MAF of this 2 bp del (rs756770132) were 0.086%/0.17% in gnomAD non-cancer/NFE controls; while in our dataset presented in 0.66% of LCs and 0.15% of controls, conferring a 4-fold risk for carriers (OR 4.33, 95%CI 2.03–9.24; P-value 0.00015). Although reported as VUS in ClinVar, this 3′ UTR del is located in a critical site computationally predicted to be targets of several miRNAs by the TargetScan³⁵, including hsa-miR-149-3p and hsa-mir-625-5p. We also observed four additional rare deleterious variants in the TRICL set (Fig. 1 and Supplementary Table 3).

The third novel risk variant is p.N364M fs*67del in STAU2 (Staufen homolog 2; OMIM 605920, UniProt Q9NUL3). This del (rs746501298) is very rare in gnomAD (MAF 0.011%/0.0027% in non-cancer/NFE population controls), but presented in 1.02% of LCs and 0.02% of non-cancer controls (OR 4.48, 95%CI 1.73–11.55; P-value 0.0019). It was predicted to disrupt the double-stranded RNA-binding motif (DSRM; Fig. 1) which plays a critical role in RNA editing. This del is also reported in the Catalogue of Somatic Mutations In Cancer (COSMIC, # COSM253104).

The fourth and fifth variants are two pathogenic, truncating deletions ─ p.I24M fs*22del (rs752672077) in MPZL2 (Myelin protein zero-like protein 2, or Epithelial v-like antigen 1 [EVA1]; OMIM 604873, UniProt O60487), and p.Q334V fs*3del (rs563947699) in MLNR (Motilin receptor; OMIM 602885, UniProt O43193) ─ with effects sizes of 3.88 (95% CI 1.71–8.8) and 2.69 (95% CI 1.33–5.43), respectively. The MPZL2 deletion was close to the Immunoglobulin-like antibody Variable domain (Ig-V; Fig. 1) which is involved in thymocyte development³⁶. In gnomAD, MAF was the highest in the Ashkenazi Jewish (AJ, 0.38%) than other populations, including NFE (0.123%), Latino (0.028%), and African (0.012%). Additionally, a start-loss p.M1T of MPZL2 was present in two LCs (Fig. 1 and Supplementary Table 3).

Other interesting candidates from the discovery (Supplementary Table 1), include 1) two VUS ins, TP63 c.*2550insT (rs772929136) and CHEK2 c.*2insC (rs749257861), both were located in the 3′ UTR; however, no genotype data/coverage were available in validation sets; 2) a protective effect pathogenic variant, CHEK2 p.S428F (rs137853011), that was non-significant in the meta-analysis (OR 0.41, 95% CI 0.13–1.31, P-value 0.13).

Candidate gene prioritization

As shown in Table 2, of the 24 candidate genes, the most evolutionarily constrained (intolerance) genes with the lowest LoF observed/expected (o/e) values were PHF13, TP63, and STAU2; whereas the genes with the highest LC-correlated PhoRank scores were CHEK2, ATM, TP63, and MME. The most interesting protein interaction network consists of eight genes and is centered on three known DNA damage response genes, CHEK2-ATM-TP63, linking five other genes (Supplementary Fig. 5). GO enrichment analysis highlighted genes involved in replicative senescence (which triggers a DNA damage response); whereas KEGG pathway analysis revealed that genes were involved in small cell LC (Supplementary Table 5).

Endogenous DNA damage assay

Large conserved networks of E. coli and human proteins were recently discovered to promote endogenous DNA damage when overproduced³⁷. These networks are known as DNA damageome proteins (DDPs)³⁷. The DNA damageome also includes LoF variants that show DNA damage-up phenotypes³⁸, most of which are not directly related to DNA repair but rather participate in the DNA damage production. We selected six prioritized genes for the assay: CHEK2, ATM, MPZL2, MLNR, POMC, and MME. We discovered the knockdown of five genes, overproduction of the mutant MLNR p.Q334V fs*3del and wildtype POMC promote DNA damage. Specifically, we first used pooled small interfering RNAs (siRNAs) that minimize off-target effects, and observed significantly increased DNA damage levels (γH2AX) for 5/6 genes (Fig. 2a–c), including two well-known DNA repair genes (CHEK2 and ATM) and three newly discovered DDPs (POMC, MLNR, and MME). By contrast, the knockdown of MPZL2 did not affect DNA damage. For the three newly discovered DDPs, we further validated their DNA damage phenotypes using different individual siRNAs (Fig. 2d–f). Moreover, overproducing the mutant MLNR p.Q334V fs*3del and the wildtype POMC open reading frame (ORF) from the plasmid promote DNA damage in the lung fibroblast-derived cell line (Fig. 2g–i).

**Fig. 2: Discovery of DNA damageome genes/proteins and variants.**

Discussion

Our analyses led to the identification of 25 rare deleterious candidates (in 24 genes) that may be associated with LC susceptibility. Of the five validated variants, we rediscovered two pathogenic variants mapped to known LC susceptibility loci, ATM p.V2716A and MPZL2 p.I24M fs*22del; and identified three deletions in novel LC susceptibility genes, POMC 3′ UTR c.*28delT, STAU2 p.N364M fs*67del, and MLNR p.Q334V fs*3del. Our GxE analysis also suggests some of these associations may be further modified by smoking (MLNR p.Q334V fs*3del and MOB3A p.F69_I75del) and FHLC (TXNDC15 p.E9G fs*68del). Additionally, our assays of cellular DNA damage identified POMC and MLNR as part of the DNA damageome, and confirmed a double-strand break repair role of ATM.

This study confirms a robust association between LC susceptibility and ATM and discovered a new pathogenic p.V2716A, that reside in the PI3K catalytic domain. We also found this association is more evident in AD, which is consistent with several previous studies^21,39,40. ATM is a critical first responder to DNA damage in the cell and essential for genome stability. Several association studies have indicated that common variants of ATM are linked to cancer susceptibility, including LC^41,42,43. Expression of the PI3K domain in ataxia-telangiectasia cells resulted in complemented radiosensitivity and reduced chromosomal breakage after irradiation^44,45,46, suggesting the PI3K domain contains many of the significant activity of ATM⁴⁷. Our DNA damage assay also shows elevated DNA damage in lung fibroblasts confirming the previous finding that ATM defective cells accumulate more double-strand breaks⁴⁸. Further, the presence of additional rare deleterious variants, together with previously identified p.P1054R³¹ and p.L2307F²¹, strongly suggests that the ATM gene plays a role in LC susceptibility.

Another known LC locus we rediscovered is MPZL2 (also called Epithelial v-like antigen 1, EVA), and the pathogenic frameshift p.I24M fs*22del. MPZL2 is located at 11q23.3, a known GWAS locus for LC^31,49 and hearing loss^50,51. MPZL2 is one of the top candidate target genes at this locus based on the expression quantitative trait loci (eQTLs) mapping³¹. MPZL2 is a member of the immunoglobulin superfamily, preferentially expressed in lung and thymus epithelium with a potential role as a favorable prognostic marker in thyroid cancer⁵². Interestingly, the MAF of p.I24M fs*22del in the AJ population was 5-fold higher than the general population in gnomAD. There are several examples where rare causal variants (e.g., variants in the P53, CFTR, and BRCA1/2) have higher frequencies within the AJ population^53,54,55,56. In our DNA damage assay, MPZL2 expression levels do not affect endogenous DNA damage in lung fibroblasts, implying the need to investigate alternative mechanisms in future functional studies.

The most consistent and interesting findings are two new deletions: POMC 3′ UTR c.*28delT and MLNR p.Q334V fs*3del. POMC encodes a polypeptide hormone precursor that regulating energy metabolism, nicotinic-induced weight loss, and immune reactions^57,58,59. In particular, POMC plays a role in UV-induced DNA damage through interactions with TP53 and is associated with skin cancer susceptibility^{60,61,62,63,64}. Abnormal expression of POMC was a poor prognostic marker for LC^65,66,67,68. Using in vitro models, Derghal et al. evaluated putative miRNA (i.e., miR-383, miR-384-3p, and miR-488) and found them physically bind to the 3′ UTR mRNA and regulate POMC expression in several neuronal subtypes⁶⁹. Our DNA damage assay showed both downregulation and overproduction of wildtype POMC promotes endogenous DNA damage. Whether and how the c.*28delT affects POMC expression and their putative role to LC risk merit further mechanistic investigation. MLNR is a member of the G-protein coupled receptor 1 family, and known for regulating gastrointestinal activity⁷⁰. MLNR variants and dysregulation have been implicated in lung occult small cell carcinoma, bile duct cancer⁷¹, and head and neck cancer⁷². Our overproduction results of the MLNR p.Q334V fs*3del suggest a dominant-negative role in terms of DNA damage promotion. Collectively, these findings suggesting POMC and MLNR, while both functions in multiple cellular processes, might also share their various effects on DNA damage.

Although the pathogenic variant, CHEK2 p.S428F with lower LC risk is not statistically significant in the meta-analysis, its protective effect is consistent with another known pathogenic low-frequency variant, CHEK2 p.I157T, associated with reduced risk of smoking-related cancers (lung, laryngeal, urinary, and upper aerodigestive tract)^18,73,74,75. In contrast, both p.I157T and p.S428F showed an increased risk of breast cancer^{75,76,77,78,79}. The mechanism underlying this effect is an ongoing question with unknown impact, perhaps related to smoking exposure and cell cycle checkpoint signaling/apoptosis⁷⁵. STAU2 is a double-stranded RNA-binding protein and a major regulator of mRNA transport, decay, and translation⁸⁰. It was reported that STAU2 downregulation enhances levels of DNA damage (γH2AX) and promotes apoptosis (PARP1 cleavage) in camptothecin-treated cells^81,82. The role of STAU2 in LC requires future investigations.

A main strength of the study is the focus on LC patients with extreme phenotypes of known risk factors (i.e., early-onset, FHLC, or familial cases in high-risk families), which provide >5 times statistical power¹⁰. Another strength was the relatively large sample size, which is by far the largest collection of LC rare variant analysis to our knowledge. It should be noted however that our study still has limited power to detect association for ultra-rare variants and those candidates (16/25) that could not be assessed in the validation. Third, our exome plus customized captures (50 Mb + 250 kb) in the discovery offers an efficient method for analyzing known susceptibility regions at greater depth and better coverage, particularly for indels that are often poorly captured in GWAS. Last, we have focused on the investigation of predicted LoF variants which provide directionality of effect. Notably, 14/25 candidates we identified were frameshift deletions that result in either truncated proteins or nonsense-mediated mRNA decay. In the discovery, we observed non-coding variants reside in regulatory regions that may influence target gene expression; however, the lack of population frequency information and insufficient coverage in the validation, limits our ability to explore this aspect for some non-coding variants.

There exist various challenges using the gnomAD as controls, including lack of individual-level data, inability to perform GxE interaction, gene-burden tests, and differences in platforms/coverage. Additionally, there were some racial differences in non-white between TCGA cases (27%) and gnomAD controls (30%), that could cause biased effect sizes in the meta-analysis. Genetic ancestry analysis shows 90% TCGA-LCs were inferred as genetic European ancestry⁸³. However, it is possible that a small portion of European ancestry TCGA-patients has AJ origin, given that 7% of ovarian cancer⁸⁴ and 24% of endometrial cancer⁸⁵ are of AJ heritage. It is of note that in our dataset, none of the variant allele carriers of the 25 candidates were found to have African-ancestry. Therefore, we expect this potential population stratification effect to be relatively small on rare variant associations, particularly in non-Africans that have not experienced severe population bottlenecks^86,87,88.

Although we demonstrated strong joint-effect of the 25 potential candidates (Supplementary Table 2), it is challenging to detect tissue-specific eQTL effects, identify mutational signatures, or construct polygenic risk score (PRS) based on these rare or ultra-rare candidates, due to their low frequencies and weak LD among rare or with common variants. We found some lung-tissue specific eQTL variants from The Genotype-Tissue Expression project (GTEx): three SNPs for ATM, 61 SNPs for POMC, 75 SNPs for MPZL2, and 141 SNPs for STAU2; but none of them overlap or are in LD with the 25 candidates we are reporting. Future studies could integrate single-cell transcriptomic sequencing and epigenomic maps in cells and tissues relevant to LC, to establish mutation signatures (i.e., DNA mismatch repair) and explore the application of PRS to clinical care.

In conclusion, our results provide evidence that rare deleterious variants with moderate to large effect sizes, in particular ATM p.V2716A, MPZL2 p.I24M fs*22del, STAU2 p.N364M fs*67del, POMC 3′ UTR c.*28delT, and MLNR p.Q334V fs*3del, contribute to LC susceptibility. Additional targeted studies using CRISPR/Cas9 mutagenesis could be performed for each variant, to evaluate more comprehensively what its effects are on gene functions and the underlying molecular mechanisms. Future extremely large-scale multi-ancestry studies may also provide additional opportunities to assess ancestry-specific predisposing variants, and discover new genetic alterations with relatively large attributable risk for LC.

Methods

Study population in the discovery set

The discovery set included 1094 LC cases and 933 controls from the TRICL study⁸⁹. All study subjects and biospecimens were collected with informed consent under institutional review board (IRB) approved protocols. Subjects were selected from four sites: Harvard School of Public Health (HSPH), International Agency for Research on Cancer (IARC), University of Liverpool, and Mount Sinai Hospital and Princess Margaret Hospital (MSH-PMH) in Toronto⁸⁹. Cases were selected because they reported FHLC (first-degree) or were early-onset (<60 yrs) or had specimens available (Table 1). Never smokers were defined as persons who had smoked fewer than 100 cigarettes in their lifetimes. The ethnicities were inferred using FastPop⁹⁰.

WES and variant calling in the discovery set

WES was performed using captures with Agilent SureSelect v5 (50 Mb, Agilent Technologies) and custom capture targeted known LC-GWAS region^91,92 (250 kb). Germline DNA was sequenced at the Center for Inherited Disease Research. The mean on-target coverage was 52x for each sequencing experiment and greater than 97% of on-target bases had a depth greater than 10x. Sequence reads were mapped to the human reference GRCh37/hg19 using the Burrows-Wheeler Aligner. SNVs and indels were called based on the union of raw GATK v3.3-0 and Atlas2. QC process involved the following user-definable criteria: i) low-complexity repeats and segmental duplications were filtered out; ii) quality score ≥20, depth ≥10, and AB ≥ 0.2 for heterozygous calls; iii) call rate ≥0.85; and iv) samples with abnormal heterozygosity rate, sex discordance, <95% completion rates, and unexpected relatedness (identity-by-state >10%) were filtered out.

Rare variant filtering and functional annotation in the discovery set

Following variant calling, rare variants were further enriched by the application of three-steps: i) Variant with MAF < 1% in the gnomAD (NFE ancestry, v2.1); ii) Variants class, including missense, protein-truncating, and regulatory; and iii) Mutation effects, i.e., variant results in protein truncation and predicted to be deleterious from 4/6 prediction tools (SIFT, Polyphen-2, MutationTaster, MutationAssessor, FATHMM, and FATHMM-MKL). The miRNAs putatively bound to the sequence containing UTR variants were identified by the TargetScan³⁵. We additionally incorporated rare variants classified as pathogenic, likely pathogenic, or VUS from the ClinVar database, which compiles clinically observed human variants.

Single variant association test in the discovery set

For variants derived from the above automated filtering schema, we conducted the association test using Fisher’s exact test. We used the Genome Browser (Golden Helix) visualization tool to verify the presence of the potential candidates in each carrier. By manual review of the variants’ coverage plot (read depth) and pile-up plot (read alignment), we rule out low-confidence variants resulting from mapping error, strand bias, and weak exon conservation.

Gene–environment interaction and gene-based burden analysis in the discovery set

For the candidates identified from the association test, we performed G×E interaction (i.e., age-onset, sex, smoking status, pack-years, and FHLC), using the mixed linear regression model. To measure the cumulative effect of the rare deleterious variants within the gene, we performed collapsing tests using the CMC and the KBAC tests^93,94.

Study populations in the validation sets and meta-analysis

The candidate variants were further examined in seven validation datasets, aggregated from different centers and across several platforms (four WES data and three genome-wide genotyping datasets as shown in Table 1). We tabulated the variant carrier counts per candidate and performed meta-analyses using the inverse-variance-weighted fixed-effects (assume the true effect size is the same in all studies).

1.
GELCC study (Genetic Epidemiology of LC Consortium, 380 LCs): This included 122 familial and 258 sporadic LC cases. i) Familial LC Study Subjects (dbGaP phs000629.v1.p1). The familial cases were selected from high-risk LC families with at least two first-degree relatives affected with LC⁹⁵. The GELCC study population and recruitment scheme have been described in detail previously⁹⁶. Samples and data were collected by the familial LC recruitment sites of the GELCC, that included the University of Cincinnati, University of Colorado Health Science Center, Karmanos Cancer Institute at Wayne State University, Louisiana State University Health Sciences Center-New Orleans, Mayo Clinic, University of Toledo, Johns Hopkins University, and Saccomanno Research Institute. ii) Sporadic LC Study Subjects. The sporadic LC patients were selected from our previous WES study^19,20, including samples from the HSPH, Baylor College of Medicine (BCM), and MD Anderson Cancer Center (MDACC). Germline DNA was sequenced utilizing NimbleGen VCRome 2.1 (Roche)^19,20, and HumanOmniExpressExome (Illumina)⁹⁵.
2.
TCGA (The Cancer Genome Atlas cohort, 1015 LCs): this public germline WES dataset includes non-tumor DNA from 577 AD and 438 SCC (dbGaP Phs000178.v9.p8), using Agilent SureSelect (Agilent Technologies) and NimbleGen SeqCap (Roche).
3.
COPDGene (Genetic Epidemiology of COPD Study⁹⁷, 318 controls): controls were selected to be white, smokers with normal lung function data (defined as post-bronchodilator Forced Expiratory Volume in 1 s [FEV₁] ≥ 0 80% predicted, FEV1/FVC ≥ 0.7), and with smoking histories ≥10 pack-years; WES utilized NimbleGen VCRome 2.1 (Roche)^19,20.
4.
GnomAD (the Genome Aggregation Database, 134,187 controls): we restricted our analyses to non-cancer individuals (excluded individuals from cancer cohort studies, such as the TCGA cohort), resulting in a data subset of 118,479 exomes and 15,708 whole genomes; multiple exome captures were utilized including Nimblegen SeqCap (Roche), Agilent SureSelect (Agilent Technologies), and Illumina Exome BeadChip (Illumina).
5.
Oncoarray case–control study (17,878 LCs vs. 13,425 controls; dbGaP phs001273): The OncoArray consortium is a network created to increase understanding of the genetic architecture of common cancers. We restricted our analyses to European descent subjects (Supplementary Fig. 1)^98,99,100; participants were obtained from 29 LC studies across North America and Europe, and genotyped on OncoArray-500K BeadChip (Illumina). There were 1162 participants in the OncoArray consortium who were also exome-sequenced in the TRICL discovery, and therefore these samples were excluded from the analysis in the validation phase.
6.
Affymetrix case–control studies (5364 LCs vs. 5724 controls; dbGaP phs001681.v1.p1). This is a large pooled sample was assembled consisting of 10 independent case–control studies which previously described elsewhere^99,101. Study participants were genotyped on an Axiom Exome Plus Array (Affymetrix)^99,101, which contains a custom panel of key LC GWAS markers, and rare coding SNVs and indels¹⁰². There were 992 participants in the Affymetrix that were also exome-sequenced in the TRICL discovery, and therefore these samples were excluded from the analysis in the validation phase.
7.
UKB (UK Biobank cohort¹⁰³; 2166 LCs vs. 401,453 controls): we restricted our analyses to non-cancer controls and LC cases; individuals were genotyped on UK BiLEVE Axiom Array and UK Biobank Axiom Array (Affymetrix)^103,104.

Gene prioritization based on functional annotations and protein interactions network

To better reprioritize genes and candidates, we used three prioritization tools: 1) Gene evolutionary constraint to LoF variation, which using the o/e ratio from the gnomAD. 2) Phevor PhoRank algorithm¹⁰⁵, which ranks the genes based on their phenotypic relevance as defined by diverse biomedical ontologies. 3) Protein–Protein interactions (PPI) network using the STRING database¹⁰⁶, with an interaction score cut-off ≥0.15 (low confidence).

Functional evaluation of candidate genes using endogenous DNA damage assay

Endogenous DNA damage is proposed to drive cancers by genome instability — a hallmark of cancer^37,38. To test whether knockdown or overexpression of the candidate genes or variants induces endogenous DNA damage, we performed flow cytometric assays to measure γH2AX levels, a DNA double-strand-break marker¹⁰⁷, following siRNA knockdown and overproduction of GFP fusions of proteins of interest.

1.
Human cell lines and reagents. MRC5-SV40, a human lung fibroblasts derived cell line was maintained in standard Dulbecco’s modified Eagle’s medium with 10% fetal bovine serum, 2 mM L-glutamine, 100 μg/mL penicillin, and 100 μg/mL streptomycin^37,38. The cell line was authenticated by ATCC STR analysis and routinely check to be mycoplasma-free. MLNR p.Q334V fs*3del, MME p.P156L fs, MPZL2 p.I24M fs*22del, and full-length wildtype POMC entry clones for gateway cloning was synthesized, sequence-verified, and cloned into pDONR223 (Invitrogen) by Genscipt. All the above clones were further subcloned into an N-terminal GFP tagged vector (pcDNA6.2/N-EmGFP-DEST, Invitrogen), using Gateway LR Clonase II Enzyme Mix (Invitrogen). Overexpression plasmids transfections were performed using GenJet In Vitro DNA Transfection Reagent Ver. II (# SL100489, SignaGen). Non-targeting pool siRNA (D-001810-10), SMARTpool siRNAs each containing four targeting sequences of MME, MLNR, POMC, ATM, CHEK2, and MPZL2, sets of 4 siRNAs targeting MME, MLNR, and POMC were purchased from Dharmacon. The target sequences for MME, MLNR, and POMC are as follows: #1 MME (GGAGGCUGGUUGAAACGUA), #2 MME (GAACCUAUAGGCCAGAGUA), #3 MME (AAAGAUGAGUGGAUAAGUG), #4 MME (GACAGCACCUUAAUGGAAU); #1 MLNR (GCGCUAACGUGAAGACGAU), #2 MLNR (GCGCAUCUAUCAACCCAAU), #3 MLNR (CAUCGUCGCUCUGCAACUU), #4 MLNR (GAAGAUUCGCGGAUGAUGU); #1 POMC (GACAAGCGCUACGGCGGUU), #2 POMC (CAGUGAAGGUGUACCCUAA), #3 POMC (GGCCGAGACUCCCAUGUUC), #4 POMC (CUACAAGAAGGGCGAGUGA). siRNA transfections were carried out with lipofectamine RNAiMax Transfection Reagent (#13778075, Invitrogen), following the manufacturer’s recommendations. SMARTpool ON-TARGETplus siRNA was designed and modified for greater specificity and reduce off-targets up to 90% utilizing a dual-strand modification.
2.
Real-time quantitative reverse transcription PCR (RT-qPCR). Knockdown efficiency was quantified by RT-qPCR and shown in Supplementary Fig. 6. RNeasy mini kit (Qiagen #74106) was used to extract total RNA from cells 72 h post siRNA transfection or protein overproduction. 300 ng of total RNA from each sample was used to synthesize cDNA by the Superscript III first-strand synthesis system (Invitrogen, #18080051). The qPCR reactions were performed using iTaq Universal SYBR Green Supermix (BioRad #172-5121) on a QuantStudio 3 Real-Time PCR System (Applied Biosystems). For each gene, three replicates were analyzed and the average threshold cycle (Ct) was calculated. The relative expression levels were calculated with the 2–ΔΔCt method¹⁰⁸. Primers used included GAPDH (housekeeping gene) forward: CAA TGA CCC CTT CAT TGA CC; GAPDH reverse: GAT CTC GCT CCT GGA AGA TG; POMC forward: GCC AGT GTC AGG ACC TCA C; POMC reverse: GGG AAC ATG GGA GTC TCG G; CHEK2 forward: TCT CGG GAG TCG GAT GTT GAG; CHEK2 reverse: CCT GAG TGG ACA CTG TCT CTA A; ATM forward: GGC TAT TCA GTG TGC GAG ACA; ATM reverse: TGG CTC CTT TCG GAT GAT GGA; MPZL2 forward: TTA ATG GGA CAG ATG CTC GGT; MPZL2 reverse: AAG ACA CCC GGT CCT TAA ACC; MME forward: AGA AGA AAC AGC GAT GGA CTC C; MME reverse: CAT AGA GTG CGA TCA TTG TCA CA; MLNR forward (siRNA): CTG AGC GCA TCT ATC AAC CCA; MLNR reverse (siRNA): TCC CAT CGT CTT CAC GTT AGC; MLNR forward (overexpression): GTG GTG ACC GTG ATG CTG AT; MLNR reverse (overexpression): AGC AGG ATG AGT AGG TCG GA.
3.
Flow-cytometric DNA damage assays. Sensitive DNA damage assays by flow cytometry were performed as previously described^37,38. γH2AX primary antibody (Sigma, Catalog #05-636) and goat anti-mouse secondary antibody, Alexa Fluor 647 (Thermo Fisher, Catalog #A21236) were used to stain cells. Stained cells were then analyzed by a BD LSRFortessa flow cytometer. FCS files were analyzed by FlowJo 10.5 software. For siRNA experiments, cells were collected 72 h post transfection and median fluorescence intensity was quantified. Also, to quantify the DNA-damage positive subpopulations, 0.5% of the mock cells were gated as the γH2AX threshold as previously demonstrated. The percentage of γH2AX positive cells in each sample was calculated and compared to its corresponding non-targeting siRNA control. For overproduction experiments, mock-transfected cells were used to set the gates to determine the GFP and γH2AX positive cells. 0.5% of the mock cells were gated as the γH2AX threshold. The DNA-damage ratios by protein overproduction for 72 h are calculated as described. Briefly, the damage ratio is defined as (Q2/Q3)/(Q1/Q4), where Q2 is the portion of transfected γH2AX-positive cells; Q3 is the portion of transfected, γH2AX -negative cells; Q1 is the portion of untransfected, γH2AX-positive cells; and Q4 is the portion of untransfected, γH2AX-negative cells. The DNA damage ratios by candidate protein overproduction were compared with GFP-Tubulin as previously described.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data generated and/or analyzed during the related study are described in the figshare metadata record: https://doi.org/10.6084/m9.figshare.13280387¹⁰⁹. The data that support the findings of this study are available via the dbGaP (database of genotypes and phenotypes) repository. The data are controlled-access, so interested parties will need to request access — information on how to do so can be found on pages linked to below. The access numbers are https://identifiers.org/dbgap:phs000878.v2.p1¹¹⁰ for Transdisciplinary Research in Cancer of the Lung (TRICL) study, https://identifiers.org/dbgap:phs001273.v1.p1¹¹¹ for the OncoArray study, https://identifiers.org/dbgap:phs001681.v1.p1¹¹² for the Affymetrix study, https://identifiers.org/dbgap:phs000629.v1.p1¹¹³ for part of the Genetic Epidemiology of Lung Cancer Consortium (GELCC) study, and https://identifiers.org/dbgap:phs000178.v9.p8¹¹⁴ for The Cancer Genome Atlas (TCGA) study. Two files are not publicly available in order to protect patient privacy. These are: ‘TRICL WES.xlsx’ (underlying Supplementary Table 2 and Supplementary Fig. 3) and ‘TRICL WES.bam’ (underlying Supplementary Fig. 2). These data are only available to authorized researchers who have submitted an IRB application. Please email the corresponding author for access. Data underlying Supplementary Table 5 and Supplementary Fig. 5 are a publicly available resource available from the STRING (Search Tool for the Retrieval of Interacting Genes) website: http://string-db.org/. The file used in this study was ‘Protein-Protein Interaction Networks Functional Enrichment Analysis-STRING.txt’.

Sources of other datasets used in this study are: the UKB dataset is accessible to approved researchers and applications through ukbgene at www.ukbiobank.ac.uk. The GnomAD dataset can be downloaded from the Genome Aggregation Database at https://gnomad.broadinstitute.org/.

References

Rizvi, N. A. et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124–128 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bosse, Y. & Amos, C. I. A decade of GWAS results in lung cancer. Cancer Epidemiol. Biomark. Prev. 27, 363–379 (2018).
Article Google Scholar
Wei, C. et al. A case-control study of a sex-specific association between a 15q25 variant and lung cancer risk. Cancer Epidemiol. Biomark. Prev. 20, 2603–2609 (2011).
Article CAS Google Scholar
Bierut, L. J. et al. Variants in nicotinic receptors and risk for nicotine dependence. Am. J. Psychiatry 165, 1163–1171 (2008).
Article PubMed PubMed Central Google Scholar
Chen, L. S., et al. CHRNA5 risk variant predicts delayed smoking cessation and earlier lung cancer diagnosis–a meta-analysis. J. Natl Cancer Inst. 107, djv100 (2015).
Chen, L. S. et al. Interplay of genetic risk factors (CHRNA5-CHRNA3-CHRNB4) and cessation treatments in smoking cessation success. Am. J. Psychiatry 169, 735–742 (2012).
Article PubMed PubMed Central Google Scholar
Mucci, L. A. et al. Familial risk and heritability of cancer among twins in Nordic countries. JAMA 315, 68–76 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kang, G., Lin, D., Hakonarson, H. & Chen, J. Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power. Hum. Hered. 73, 139–147 (2012).
Article PubMed Google Scholar
Lamina, C. Digging into the extremes: a useful approach for the analysis of rare variants with continuous traits? BMC Proc. 5(Suppl. 9), S105 (2011).
Article PubMed PubMed Central Google Scholar
Li, D., Lewinger, J. P., Gauderman, W. J., Murcray, C. E. & Conti, D. Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies. Genet. Epidemiol. 35, 790–799 (2011).
Article PubMed PubMed Central Google Scholar
Gorlov, I. P., Gorlova, O. Y., Sunyaev, S. R., Spitz, M. R. & Amos, C. I. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112 (2008).
Article CAS PubMed PubMed Central Google Scholar
Gorlov, I. P., Gorlova, O. Y., Frazier, M. L., Spitz, M. R. & Amos, C. I. Evolutionary evidence of the effect of rare variants on disease etiology. Clin. Genet. 79, 199–206 (2011).
Article CAS PubMed Google Scholar
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Article CAS PubMed PubMed Central Google Scholar
Choi, Y. W. et al. EGFR exon 19 deletion is associated with favorable overall survival after first-line gefitinib therapy in advanced non-small cell lung cancer patients. Am. J. Clin. Oncol. 41, 385–390 (2018).
Article CAS PubMed Google Scholar
Sequist, L. V. et al. First-line gefitinib in patients with advanced non-small-cell lung cancer harboring somatic EGFR mutations. J. Clin. Oncol. 26, 2442–2449 (2008).
Article CAS PubMed Google Scholar
Tian, Y. et al. Different subtypes of EGFR exon19 mutation can affect prognosis of patients with non-small cell lung adenocarcinoma. PLoS ONE 13, e0201682 (2018).
Article PubMed PubMed Central CAS Google Scholar
Xiong, D. et al. A recurrent mutation in PARK2 is associated with familial lung cancer. Am. J. Hum. Genet. 96, 301–308 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer. Nat. Genet. 46, 736–741 (2014).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y. et al. Rare variants in known susceptibility loci and their contribution to risk of lung cancer. J. Thorac. Oncol. 13, 1483–1495 (2018).
Article PubMed PubMed Central Google Scholar
Liu, Y. et al. Focused analysis of exome sequencing data for rare germline mutations in familial and sporadic lung cancer. J. Thorac. Oncol. 11, 52–61 (2016).
Article PubMed PubMed Central Google Scholar
Ji, X. et al. Protein-altering germline mutations implicate novel genes related to lung cancer development. Nat. Commun. 11, 2220 (2020).
Article CAS PubMed PubMed Central Google Scholar
Peng, B., Li, B., Han, Y. & Amos, C. I. Power analysis for case-control association studies of samples with known family histories. Hum. Genet. 127, 699–704 (2010).
Article PubMed PubMed Central Google Scholar
Osann, K. E. Lung cancer in women: the importance of smoking, family history of cancer, and medical history of respiratory disease. Cancer Res. 51, 4893–4897 (1991).
CAS PubMed Google Scholar
Cote, M. L. et al. Increased risk of lung cancer in individuals with a family history of the disease: a pooled analysis from the International Lung Cancer Consortium. Eur. J. Cancer 48, 1957–1968 (2012).
Article PubMed PubMed Central Google Scholar
Loman, N. J. et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30, 434–439 (2012).
Article CAS PubMed Google Scholar
Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).
Article CAS PubMed PubMed Central Google Scholar
Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011).
Article CAS PubMed PubMed Central Google Scholar
Balzer, S., Malde, K. & Jonassen, I. Systematic exploration of error sources in pyrosequencing flowgram data. Bioinformatics 27, i304–i309 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Deciphering associations for lung cancer risk through imputation and analysis of 12,316 cases and 16,831 controls. Eur. J. Hum. Genet. 23, 1723–1728 (2015).
Article CAS PubMed PubMed Central Google Scholar
Dong, J. et al. Association analyses identify multiple new lung cancer susceptibility loci and their interactions with smoking in the Chinese population. Nat. Genet. 44, 895–899 (2012).
Article CAS PubMed PubMed Central Google Scholar
McKay, J. D. et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet. 49, 1126–1132 (2017).
Article CAS PubMed PubMed Central Google Scholar
Shrine, N. et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 51, 481–493 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Z. et al. Genetic overlap of chronic obstructive pulmonary disease and cardiovascular disease-related traits: a large-scale genome-wide cross-trait analysis. Respir. Res. 20, 64 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).
Article CAS PubMed Google Scholar
Lewis, B. P., Burge, C. B. & Bartel, D. P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20 (2005).
Article CAS PubMed Google Scholar
Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell Proteom. 13, 397–406 (2014).
Article CAS Google Scholar
Xia, J. et al. Bacteria-to-human protein networks reveal origins of endogenous DNA damage. Cell 176, 127–143 e124 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bosse, Y. et al. Transcriptome-wide association study reveals candidate causal genes for lung cancer. Int. J. Cancer 146, 1862–1878 (2020).
Article CAS PubMed Google Scholar
Selvan, M. E. et al. Inherited rare, deleterious variants in ATM increase lung adenocarcinoma risk. J. Thorac. Oncol. 15, 1871–1879 (2020).
Article CAS Google Scholar
Parry, E. M. et al. Germline mutations in DNA repair genes in lung adenocarcinoma. J. Thorac. Oncol. 12, 1673–1678 (2017).
Article PubMed PubMed Central Google Scholar
Yang, H. et al. ATM sequence variants associate with susceptibility to non-small cell lung cancer. Int. J. Cancer 121, 2254–2259 (2007).
Article CAS PubMed PubMed Central Google Scholar
Lo, Y. L. et al. ATM polymorphisms and risk of lung cancer among never smokers. Lung Cancer 69, 148–154 (2010).
Article PubMed Google Scholar
Hsia, T. C. et al. Effects of ataxia telangiectasia mutated (ATM) genotypes and smoking habits on lung cancer risk in Taiwan. Anticancer Res. 33, 4067–4071 (2013).
CAS PubMed Google Scholar
Chenevix-Trench, G. et al. Dominant negative ATM mutations in breast cancer families. J. Natl Cancer Inst. 94, 205–215 (2002).
Article PubMed Google Scholar
Morgan, S. E., Lovly, C., Pandita, T. K., Shiloh, Y. & Kastan, M. B. Fragments of ATM which have dominant-negative or complementing activity. Mol. Cell Biol. 17, 2020–2029 (1997).
Article CAS PubMed PubMed Central Google Scholar
Bakkenist, C. J. & Kastan, M. B. DNA damage activates ATM through intermolecular autophosphorylation and dimer dissociation. Nature 421, 499–506 (2003).
Article CAS PubMed Google Scholar
Scott, S. P. et al. Missense mutations but not allelic variants alter the function of ATM by dominant interference in patients with breast cancer. Proc. Natl Acad. Sci. USA 99, 925–930 (2002).
Article CAS PubMed PubMed Central Google Scholar
Kuhne, M. et al. A double-strand break repair defect in ATM-deficient cells contributes to radiosensitivity. Cancer Res. 64, 500–508 (2004).
Article PubMed Google Scholar
Dai, J. et al. Genome-wide association study of INDELs identified four novel susceptibility loci associated with lung cancer risk. Int. J. Cancer 146, 2855–2864 (2020).
Article CAS PubMed Google Scholar
Bademci, G. et al. MPZL2 is a novel gene associated with autosomal recessive nonsyndromic moderate hearing loss. Hum. Genet. 137, 479–486 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wesdorp, M. et al. MPZL2, encoding the epithelial junctional protein myelin protein zero-like 2, is essential for hearing in man and mouse. Am. J. Hum. Genet. 103, 74–88 (2018).
Article CAS PubMed PubMed Central Google Scholar
Guttinger, M. et al. Epithelial V-like antigen (EVA), a novel member of the immunoglobulin superfamily, expressed in embryonic epithelia with a potential role as homotypic adhesion molecule in thymus histogenesis. J. Cell Biol. 141, 1061–1071 (1998).
Article CAS PubMed PubMed Central Google Scholar
Einhorn, Y. et al. Differential analysis of mutations in the Jewish population and their implications for diseases. Genet. Res. 99, e3 (2017).
Article Google Scholar
Shi, L. et al. Comprehensive population screening in the Ashkenazi Jewish population for recurrent disease-causing variants. Clin. Genet. 91, 599–604 (2017).
Article CAS PubMed Google Scholar
Kerem, B., Chiba-Falek, O. & Kerem, E. Cystic fibrosis in Jews: frequency and mutation distribution. Genet. Test. 1, 35–39 (1997).
Article CAS PubMed Google Scholar
Powers, J. et al. A rare TP53 mutation predominant in Ashkenazi Jews confers risk of multiple cancers. Cancer Res. 80, 3732–3744 (2020).
Article CAS PubMed PubMed Central Google Scholar
Picciotto, M. R. & Mineur, Y. S. Molecules and circuits involved in nicotine addiction: the many faces of smoking. Neuropharmacology 76 Pt B, 545–553 (2014).
Article PubMed CAS Google Scholar
Huang, H., Xu, Y. & van den Pol, A. N. Nicotine excites hypothalamic arcuate anorexigenic proopiomelanocortin neurons and orexigenic neuropeptide Y neurons: similarities and differences. J. Neurophysiol. 106, 1191–1202 (2011).
Article CAS PubMed PubMed Central Google Scholar
Mineur, Y. S. et al. Nicotine decreases food intake through activation of POMC neurons. Science 332, 1330–1332 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wenczl, E. et al. (Pheo)melanin photosensitizes UVA-induced DNA damage in cultured human melanocytes. J. Invest. Dermatol. 111, 678–682 (1998).
Article CAS PubMed Google Scholar
Cui, R. et al. Central role of p53 in the suntan response and pathologic hyperpigmentation. Cell 128, 853–864 (2007).
Article CAS PubMed Google Scholar
Suzuki, I. et al. Increase of pro-opiomelanocortin mRNA prior to tyrosinase, tyrosinase-related protein 1, dopachrome tautomerase, Pmel-17/gp100, and P-protein mRNA in human skin after ultraviolet B irradiation. J. Invest. Dermatol. 118, 73–78 (2002).
Article CAS PubMed Google Scholar
Slominski, A., Tobin, D. J. & Paus, R. Does p53 regulate skin pigmentation by controlling proopiomelanocortin gene transcription? Pigment Cell Res. 20, 307–308 (2007). author reply 309-310.
Article PubMed Google Scholar
Krude, H. et al. Severe early-onset obesity, adrenal insufficiency and red hair pigmentation caused by POMC mutations in humans. Nat. Genet. 19, 155–157 (1998).
Article CAS PubMed Google Scholar
Tsai, H. E. et al. Downregulation of hepatoma-derived growth factor contributes to retarded lung metastasis via inhibition of epithelial-mesenchymal transition by systemic POMC gene delivery in melanoma. Mol. Cancer Ther. 12, 1016–1025 (2013).
Article CAS PubMed Google Scholar
Stovold, R. et al. Neuroendocrine and epithelial phenotypes in small-cell lung cancer: implications for metastasis and survival in patients. Br. J. Cancer 108, 1704–1711 (2013).
Article CAS PubMed PubMed Central Google Scholar
Meredith, S. L. et al. Irradiation decreases the neuroendocrine biomarker pro-opiomelanocortin in small cell lung cancer cells in vitro and in vivo. PLoS ONE 11, e0148404 (2016).
Article PubMed PubMed Central CAS Google Scholar
Hao, L., Zhao, X., Zhang, B., Li, C. & Wang, C. Positive expression of pro-opiomelanocortin (POMC) is a novel independent poor prognostic marker in surgically resected non-small cell lung cancer. Tumour Biol. 36, 1811–1817 (2015).
Article CAS PubMed Google Scholar
Derghal, A. et al. Leptin modulates the expression of miRNAs-targeting POMC mRNA by the JAK2-STAT3 and PI3K-Akt pathways. J. Clin. Med. 8, 2213–2224 (2019).
Article CAS PubMed Central Google Scholar
Feighner, S. D. et al. Receptor for motilin identified in the human gastrointestinal system. Science 284, 2184–2188 (1999).
Article CAS PubMed Google Scholar
Xu, H. L. et al. Variants in motilin, somatostatin and their receptor genes and risk of biliary tract cancers and stones in Shanghai, China. Meta Gene 2, 418–426 (2014).
Article PubMed PubMed Central Google Scholar
Misawa, K. et al. Neuropeptide receptor genes GHSR and NMUR1 are candidate epigenetic biomarkers and predictors for surgically treated patients with oropharyngeal cancer. Sci. Rep. 10, 1007 (2020).
Article CAS PubMed PubMed Central Google Scholar
Delahaye-Sourdeix, M. et al. A rare truncating BRCA2 variant and genetic susceptibility to upper aerodigestive tract cancer. J. Natl Cancer Inst. 107, djv037 (2015).
Article PubMed PubMed Central CAS Google Scholar
Cybulski, C. et al. Constitutional CHEK2 mutations are associated with a decreased risk of lung and laryngeal cancers. Carcinogenesis 29, 762–765 (2008).
Article CAS PubMed Google Scholar
Brennan, P. et al. Uncommon CHEK2 mis-sense variant and reduced risk of tobacco-related cancers: case control study. Hum. Mol. Genet. 16, 1794–1801 (2007).
Article CAS PubMed Google Scholar
Shaag, A. et al. Functional and genomic approaches reveal an ancient CHEK2 allele associated with breast cancer in the Ashkenazi Jewish population. Hum. Mol. Genet. 14, 555–563 (2005).
Article CAS PubMed Google Scholar
Roeb, W., Higgins, J. & King, M. C. Response to DNA damage of CHEK2 missense mutations in familial breast cancer. Hum. Mol. Genet. 21, 2738–2744 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kilpivaara, O. et al. CHEK2 variant I157T may be associated with increased breast cancer risk. Int. J. Cancer 111, 543–547 (2004).
Article CAS PubMed Google Scholar
Apostolou, P. & Papasotiriou, I. Current perspectives on CHEK2 mutations in breast cancer. Breast Cancer 9, 331–335 (2017).
CAS PubMed PubMed Central Google Scholar
Furic, L., Maher-Laporte, M. & DesGroseillers, L. A genome-wide approach identifies distinct but overlapping subsets of cellular mRNAs associated with Staufen1- and Staufen2-containing ribonucleoprotein complexes. RNA 14, 324–335 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. The downregulation of the RNA-binding protein Staufen2 in response to DNA damage promotes apoptosis. Nucleic Acids Res. 44, 3695–3712 (2016).
Article CAS PubMed PubMed Central Google Scholar
Conde, L., Beaujois, R. & DesGroseillers, L. STAU2 protein level is controlled by caspases and the CHK1 pathway and regulates cell cycle progression in the non-transformed hTERT-RPE1 cells. Preprint from Research Square, https://doi.org/10.21203/rs.21203.rs-60003/v21201 PPR: PPR206819 (2020).
Yuan, J. et al. Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell 34, 549–560.e549 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yang, D. et al. Association of BRCA1 and BRCA2 mutations with survival, chemotherapy sensitivity, and gene mutator phenotype in patients with ovarian cancer. JAMA 306, 1557–1565 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cadoo, K. A. Understanding inherited risk in unselected newly diagnosed patients with endometrial cancer. JCO Precis. Oncol. 3, 473–474 (2019).
Google Scholar
O’Connor, T. D. et al. Fine-scale patterns of population stratification confound rare variant association tests. PLoS ONE 8, e65834 (2013).
Article PubMed PubMed Central CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Wang, Z., et al. Multi-omics analysis reveals a HIF network and Hub gene EPAS1 associated with lung adenocarcinoma. EBioMedicine, 93–101 (2018).
Li, Y. et al. FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data. BMC Bioinform. 17, 122 (2016).
Article CAS Google Scholar
Bainbridge, M. N. et al. Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol. 12, R68 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lupski, J. R. et al. Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy. Genome Med. 5, 57 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Article CAS PubMed PubMed Central Google Scholar
Liu, D. J. & Leal, S. M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001156 (2010).
Article PubMed PubMed Central CAS Google Scholar
Musolf, A. M. et al. Whole exome sequencing of highly aggregated lung cancer families reveals linked loci for increased cancer risk on chromosomes 12q, 7p, and 4q. Cancer Epidemiol. Biomark. Prev. 29, 434–442 (2020).
Article CAS Google Scholar
Liu, P. et al. Familial aggregation of common sequence variants on 15q24-25.1 in lung cancer. J. Natl Cancer Inst. 100, 1326–1330 (2008).
Article CAS PubMed PubMed Central Google Scholar
Regan, E. A. et al. Genetic epidemiology of COPD (COPDGene) study design. COPD 7, 32–43 (2010).
Article PubMed Google Scholar
Ji, X. et al. Identification of susceptibility pathways for the role of chromosome 15q25.1 in modifying lung cancer risk. Nat. Commun. 9, 3221 (2018).
Article PubMed PubMed Central CAS Google Scholar
Li, Y. et al. Genetic interaction analysis among oncogenesis-related genes revealed novel genes and networks in lung cancer development. Oncotarget 10, 1760–1774 (2019).
Article PubMed PubMed Central Google Scholar
Byun, J. et al. Genome-wide association study of familial lung cancer. Carcinogenesis 39, 1135–1140 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kachuri, L. et al. Fine mapping of chromosome 5p15.33 based on a targeted deep sequencing and high density genotyping identifies novel lung cancer susceptibility loci. Carcinogenesis 37, 96–105 (2016).
Article CAS PubMed Google Scholar
Zuzarte, P. C. et al. A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms. PLoS ONE 9, e93455 (2014).
Article PubMed PubMed Central Google Scholar
Matthews, P. M. & Sudlow, C. The UK Biobank. Brain 138, 3463–3465 (2015).
Article PubMed Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Singleton, M. V. et al. Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am. J. Hum. Genet. 94, 599–610 (2014).
Article CAS PubMed PubMed Central Google Scholar
Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Article CAS PubMed Google Scholar
Kinner, A., Wu, W., Staudt, C. & Iliakis, G. Gamma-H2AX in recognition and signaling of DNA double-strand breaks in the context of chromatin. Nucleic Acids Res. 36, 5678–5694 (2008).
Article CAS PubMed PubMed Central Google Scholar
Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25, 402–408 (2001).
Article CAS PubMed Google Scholar
Liu, Y. Metadata record for the manuscript: rare deleterious germline variants and risk of lung cancer. figshare https://doi.org/10.6084/m9.figshare.13280387 (2020).
Transdisciplinary Research Into Cancer of the Lung (TRICL) - Exome Plus Targeted Sequencing. dbGaP https://identifiers.org/dbgap:phs000878.v2.p1.
Oncoarray Consortium - Lung Cancer Studies. dbGaP https://identifiers.org/dbgap:phs001273.v1.p1.
Transdisciplinary Research Into Cancer of the Lung (TRICL) – Affymetrix. dbGaP https://identifiers.org/dbgap:phs001681.v1.p1.
Genetic Epidemiology of Lung Cancer Consortium GWAS of Familial Lung Cancer. dbGaP https://identifiers.org/dbgap:phs000629.v1.p1.
National Institutes of Health The Cancer Genome Atlas (TCGA). dbGaP https://identifiers.org/dbgap:phs000178.v9.p8.

Download references

Acknowledgements

We would like to thank all individuals who participated in this study. This work was supported by grants from the National Institutes of Health (R01CA127219, R01CA141769, R01CA060691, R01CA87895, R01CA80127, R01CA84354, R01CA134682, R01CA134433, R01CA074386, R01CA092824, R01CA250905, R01HL113264, R01HL082487, R01HL110883, R03CA77118, P20GM103534, P30CA125123, P30CA023108, P30CA022453, P30ES006096, P50CA090578, U01CA243483, U01HL089856, U01HL089897, U01CA76293, U19CA148127, U01CA209414, K07CA181480, N01-HG-65404, HHSN268200782096C, HHSN261201300011I, HHSN268201100011, HHSN268201 200007 C, DP1-CA174424, DP1-AG072751, CA125123, RR024574, Intramural Research Program of the National Human Genome Research Institute (JEB-W), and Herrick Foundation. Dr. Amos is an Established Research Scholar of the Cancer Prevention Research Institute of Texas (RR170048). We also want to acknowledge the Cytometry and Cell Sorting Core support by the Cancer Prevention and Research Institute of Texas Core Facility (RP180672). At Toronto, the study is supported by The Canadian Cancer Society Research Institute (# 020214) to R. H., Ontario Institute for Cancer Research to R. H, and the Alan Brown Chair to G. L. and Lusi Wong Programs at the Princess Margaret Hospital Foundation. The Liverpool Lung Project is supported by Roy Castle Lung Cancer Foundation.

Author information

These authors contributed equally: Yanhong Liu, Jun Xia.

Authors and Affiliations

Dan L. Duncan Comprehensive Cancer Center, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
Yanhong Liu, Spiridon Tsavachidis, Margaret R. Spitz, Chao Cheng, Jinyoung Byun, Yafang Li, Michael E. Scheurer, Farrah Kheradmand & Christopher I. Amos
Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
Jun Xia, Xiangjun Xiao, Chao Cheng, Jinyoung Byun, Wei Hong, Yafang Li, Dakai Zhu, Zhuoyi Song & Christopher I. Amos
International Agency for Research on Cancer, Lyon, France
James McKay, Ghislaine Scelo & Paul Brennan
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Susan M. Rosenberg
Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
Michael E. Scheurer
Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX, USA
Farrah Kheradmand
Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
Claudio W. Pikielny
Karmanos Cancer Institute, Wayne State University, Detroit, MI, USA
Christine M. Lusk & Ann G. Schwartz
Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Ignacio I. Wistuba
Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Michael H. Cho & Edwin K. Silverman
National Human Genome Research Institute, Bethesda, MD, USA
Joan Bailey-Wilson
University of Cincinnati College of Medicine, Cincinnati, OH, USA
Susan M. Pinney, Marshall Anderson & Elena Kupert
The University of Toledo College of Medicine, Toledo, OH, USA
Colette Gaba
Louisiana State University Health Sciences Center, New Orleans, LA, USA
Diptasri Mandal
Medical College of Wisconsin, Milwaukee, WI, USA
Ming You
Mayo Clinic College of Medicine, Rochester, MN, USA
Mariza de Andrade
Mayo Clinic College of Medicine, Scottsdale, AZ, USA
Ping Yang
Roy Castle Lung Cancer Research Programme, The University of Liverpool, Department of Molecular and Clinical Cancer Medicine, Liverpool, UK
Triantafillos Liloglou, Michael P. A. Davies & John K. Field
M. Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
Jolanta Lissowska
Nofer Institute of Occupational Medicine, Department of Environmental Epidemiology, Lodz, Poland
Beata Swiatkowska
Russian N.N. Blokhin Cancer Research Centre, Moscow, Russian Federation
David Zaridze & Anush Mukeria
Faculty of Health Sciences, Palacky University, Olomouc, Czech Republic
Vladimir Janout
Institute of Public Health and Preventive Medicine, Charles University, 2nd Faculty of Medicine, Prague, Czech Republic
Ivana Holcatova
National Institute of Public Health, Bucharest, Romania
Dana Mates
Department of Thoracopulmonary Pathology, Service of Pathology, Clinical Center of Serbia, Belgrade, Serbia
Jelena Stojsic
Princess Margaret Cancer Center, Toronto, ON, Canada
Geoffrey Liu
Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
Rayjean J. Hung
Harvard University T. H. Chan School of Public Health, Boston, MA, USA
David C. Christiani

Authors

Yanhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Xia
View author publications
You can also search for this author in PubMed Google Scholar
James McKay
View author publications
You can also search for this author in PubMed Google Scholar
Spiridon Tsavachidis
View author publications
You can also search for this author in PubMed Google Scholar
Xiangjun Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Margaret R. Spitz
View author publications
You can also search for this author in PubMed Google Scholar
Chao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jinyoung Byun
View author publications
You can also search for this author in PubMed Google Scholar
Wei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Yafang Li
View author publications
You can also search for this author in PubMed Google Scholar
Dakai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoyi Song
View author publications
You can also search for this author in PubMed Google Scholar
Susan M. Rosenberg
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Scheurer
View author publications
You can also search for this author in PubMed Google Scholar
Farrah Kheradmand
View author publications
You can also search for this author in PubMed Google Scholar
Claudio W. Pikielny
View author publications
You can also search for this author in PubMed Google Scholar
Christine M. Lusk
View author publications
You can also search for this author in PubMed Google Scholar
Ann G. Schwartz
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio I. Wistuba
View author publications
You can also search for this author in PubMed Google Scholar
Michael H. Cho
View author publications
You can also search for this author in PubMed Google Scholar
Edwin K. Silverman
View author publications
You can also search for this author in PubMed Google Scholar
Joan Bailey-Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Susan M. Pinney
View author publications
You can also search for this author in PubMed Google Scholar
Marshall Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Elena Kupert
View author publications
You can also search for this author in PubMed Google Scholar
Colette Gaba
View author publications
You can also search for this author in PubMed Google Scholar
Diptasri Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Ming You
View author publications
You can also search for this author in PubMed Google Scholar
Mariza de Andrade
View author publications
You can also search for this author in PubMed Google Scholar
Ping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Triantafillos Liloglou
View author publications
You can also search for this author in PubMed Google Scholar
Michael P. A. Davies
View author publications
You can also search for this author in PubMed Google Scholar
Jolanta Lissowska
View author publications
You can also search for this author in PubMed Google Scholar
Beata Swiatkowska
View author publications
You can also search for this author in PubMed Google Scholar
David Zaridze
View author publications
You can also search for this author in PubMed Google Scholar
Anush Mukeria
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Janout
View author publications
You can also search for this author in PubMed Google Scholar
Ivana Holcatova
View author publications
You can also search for this author in PubMed Google Scholar
Dana Mates
View author publications
You can also search for this author in PubMed Google Scholar
Jelena Stojsic
View author publications
You can also search for this author in PubMed Google Scholar
Ghislaine Scelo
View author publications
You can also search for this author in PubMed Google Scholar
Paul Brennan
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey Liu
View author publications
You can also search for this author in PubMed Google Scholar
John K. Field
View author publications
You can also search for this author in PubMed Google Scholar
Rayjean J. Hung
View author publications
You can also search for this author in PubMed Google Scholar
David C. Christiani
View author publications
You can also search for this author in PubMed Google Scholar
Christopher I. Amos
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Drafted the Paper: Y.L., J.X., and C.I.A. Project Coordination: C.I.A., R.J.H., D.C.C., and P.B. Statistical Analysis: Y.L., S.T., X.X., D.Z., C.W.P., C.M.L., and C.I.A. Genetic validation analysis: Y.L., S.T., X.X., C.C., Y.Li., J.B., D.Z., W.H., C.W.P., C.M.L., and C.I.A. Functional DNA damage assay analysis: J.X., Z.S., and S.M.R. Sample collection, exome sequencing, and development of the epidemiological studies: J.M., M.R.S., M.E.S., F.K., C.M.L., A.G.S., I.I.W., M.H.C., E.K.S., J.B.W., S.M.P., M.A., E.K., C.G., D.M., M.Y., M.dA., P.Y., T, M.P.A.D., J.L., B.S., D.Z., A.M., V.J., I.H., D.M., J.S., G.S., P.B., G.L., J.K.F., R.J.H., D.C.C., and C.I.A.

Corresponding author

Correspondence to Christopher I. Amos.

Ethics declarations

Competing interests

E.K.S. reports institutional grant funding from Bayer and GlaxoSmithKline. Other authors declare no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Supplementary Figures and Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, Y., Xia, J., McKay, J. et al. Rare deleterious germline variants and risk of lung cancer. npj Precis. Onc. 5, 12 (2021). https://doi.org/10.1038/s41698-021-00146-7

Download citation

Received: 27 May 2020
Accepted: 11 December 2020
Published: 16 February 2021
DOI: https://doi.org/10.1038/s41698-021-00146-7

This article is cited by

Transcriptome signatures of host tissue infected with African swine fever virus reveal differential expression of associated oncogenes
- Rajib Deb
- Gyanendra Singh Sengar
- Vivek Kumar Gupta
Archives of Virology (2024)
Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer
- Jinyoung Byun
- Younghun Han
- Christopher I. Amos
Nature Genetics (2022)

Subjects

Abstract

Similar content being viewed by others

Protein-altering germline mutations implicate novel genes related to lung cancer development

Germline rare deleterious variant load alters cancer risk, age of onset and tumor characteristics

Association between germline variants and somatic mutations in colorectal cancer

Introduction

Results

Demographics of study subjects

Identification of rare and deleterious variants in the TRICL discovery set

Meta-analyses of the discovery and validation sets

Candidate gene prioritization

Endogenous DNA damage assay

Discussion

Methods

Study population in the discovery set

WES and variant calling in the discovery set

Rare variant filtering and functional annotation in the discovery set

Single variant association test in the discovery set

Gene–environment interaction and gene-based burden analysis in the discovery set

Study populations in the validation sets and meta-analysis

Gene prioritization based on functional annotations and protein interactions network

Functional evaluation of candidate genes using endogenous DNA damage assay

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Reporting Summary

Supplementary Figures and Tables

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Transcriptome signatures of host tissue infected with African swine fever virus reveal differential expression of associated oncogenes

Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer

Search

Quick links