Clinical and genomic features of Chinese lung cancer patients with germline mutations

The germline mutation landscape in Chinese lung cancer patients has not been well defined. In this study, sequencing data of 1,021 cancer genes of 1,794 Chinese lung cancer patients was analyzed. A total of 111 pathogenic or likely pathogenic germline mutations were identified, significantly higher than non-cancer individuals (111/1794 vs. 84/10,588, p < 2.2e-16). BRCA1/2 germline mutations are associated with earlier onset age (median 52.5 vs 60 years-old, p = 0.008). Among 29 cancer disposition genes with germline mutations detected in Chinese cohort and/or TCGA lung cancer cohort, Only 11 from 29 genes are identified in both cohorts and BRCA2 mutations are significantly more common in Chinese cohort (p = 0.015). Chinese patients with germline mutations have different prevalence of somatic KRAS, MET exon 14 skipping and TP53 mutations compared to those without. Our findings suggest potential ethnic and etiologic differences between Western and Asian lung cancer patients.

M any human cancers could be inheritable. Over 100 genes, mostly tumor suppressor genes, have been identified to be accountable for inheritable cancers, a phenomenon termed, genetic predisposition as exampled by germline mutations in BRCA1 and BRCA2 for predisposition of breast cancers, ovary cancers and mismatch repair (MMR) genes for cancers associated with Lynch syndrome 1,2 . In addition, patients with these mutations may have distinct biological and clinical features that are managed differently. For examples, ovary cancer patients with BRCA1 or BRCA2 mutations benefit particularly from Poly ADP-ribose polymerase inhibitor Olaparib while solid tumors with MMR mutations have demonstrated high response rate to immune checkpoint blockade 3,4 . However, these well-known genes only account for a small fraction of the genetic burden in cancers and the genetic alterations that may be responsible for predisposition to many potentially inheritable cancers are largely unknown.
Lung cancer is the leading cause of cancer-related death worldwide. It has been long known that a family history of lung cancer is associated with increased risks for lung cancer in both smokers and never smokers [5][6][7] , suggesting the potential genetic predisposition for lung cancer development. Well-defined, high penetrance, hereditary lung-cancer syndromes are uncommon. Recent pan-cancer studies have demonstrated that 3.5-8.5% of lung cancers harbor likely pathogenic germline mutations 8,9 . Several well-known predisposition genetic variants including BRCA2 and CHEK2 have been found to have strong association with lung cancer risk 10 and rare pathogenic germline mutations in genes of Fanconi anemia pathway also contribute to the risk of squamous lung cancers 11 . However, all these pioneer studies are based on western populations and the germline mutation landscape in Asian lung cancer patients remains largely unknown. Given the distinct genomic landscape of Asian lung cancer patients 12 , it is reasonable to speculate that genetic predisposition variants may be different between Asian lung cancer patients and western counterparts. Furthermore, the current standard of care for treatment of metastatic lung cancers is based upon the determination of actionable somatic driver gene mutations 13 . Recent studies have demonstrated germline mutations can cooccur or be mutually exclusive with somatic cancer gene alterations 8 , but little attention has been paid to the somatic mutational landscape in the setting of co-occurring pathogenic germline mutations.
In this study, we analyzed the next generation sequencing (NGS) data of 1021 cancer genes from 1794 Chinese lung cancer patients with the intent to delineate the germline mutational landscape in Chinese lung cancer patients as well as clinical and genomic features of these patients with germline mutations. Pathogenic or likely pathogenic (P/LP) germline mutations of 35 cancer genes were identified in 106 of the 1794 Chinese patients (5.91%). BRCA1/2 germline mutations are associated with younger age. Prevalence of somatic mutations in KRAS, MET exon 14 skipping and TP53 is different in patients with P/LP germline mutations compared to those without.

Results
Germline mutation landscape of Chinese lung cancer patients. Germline DNA and paired tumor DNA were subjected to NGS of 1021 cancer genes with an average sequencing depth of 285× (36×−441×) in germline DNA and 1248× (56×−4626×) in tumor DNA respectively (Fig. 1). Comparison of the single-nucleotide polymorphism (SNP) data from the current cohort to that of individuals submitted to 1000 genomes project phase 3 (n = 2054) 14,15 revealed that the mean pairwise F-statistics (Fixation indices, Fst) difference was significant between the lung cancer patients in the current cohort and African (Fst = 0.07), European (Fst = 0.06), South Asian (Fst = 0.04) and Admixed American (Fst = 0.04) populations; however, the SNP architecture of the lung cancer patients in the current cohort was almost identical to the East Asian (ASN) individuals (Fst = 0.00) (Fig. 2a). Furthermore, the principal component analysis (PCA) using SNP from 1000 genome project also demonstrated that the lung cancer patients from this study were significantly clustered with East Asians but clearly separated from other ethnic populations (Fig. 2b). Taken together, these data suggested that these 1794 Chinese lung cancer patients are likely genetically ordinal Chinese.
A total of 111 pathogenic/likely pathogenic (P/LP) germline mutations from 35 known cancer susceptibility genes were identified in 106 (5.91%) patients according to American College of Medical Genetics and Genomics (ACMG) 2015 guideline 16 . One hundred and one of the 106 patients carried one P/LP germline mutation and five patients harbored two P/LP germline gene mutations (Supplementary data 1). The demographic, clinical and pathological features and the prevalence of germline mutations are shown in Table 1. The most commonly mutated gene in this Chinese lung cancer cohort was BRCA2 in 14 patients. In addition, BRCA1 germline mutations were identified in four patients (Fig. 3a). Seventeen of the 18 BRCA1/2 mutations have been reported in public database (Clinvar or BRCA Share 17 ) or previous studies on breast cancers 18 . A novel frameshift mutation, BRCA2: c.5163_5164delCA (p.N1721Kfs*5), was identified and defined as a P/LP mutation based on ACMG guideline 16 . Other frequently mutated genes included FANCA in nine patients, RAD51D in seven patients, ATM in seven patients, MUTYH in six patients, and TP53 in five patients, etc.
To illustrate the potential association between these P/LP germline mutations and lung carcinogenesis in this cohort, we annotated the P/LP mutations from a recently published whole genome sequencing data of non-cancer individuals enrolled in the China Metabolic Analytics Project (ChinaMAP) (n = 10,588), a study on the impact of genetic architecture on metabolic diseases 19 . Based on the same 2015 ACMG guideline under the same filtering criteria (see Methods), 84 P/LP germline mutations were identified in the same 35 cancer predisposition genes,

NGS of WBC for germline variants n=1794
Chinese lung cancer patients n=1794 significantly lower than that in the lung cancer cohort (84/10,588 (0.80%) vs. 111/1794 (6.1%), p < 2.2e−16, Chi-square test). We then compared the allele frequency (AF) of the germline mutations of each gene in lung cancer patients to that in the non-cancer individuals. There were 17 genes with P/LP germline mutations detected in ≥3 patients in this lung cancer cohort. As shown in the Table 2, 16 of the 17 genes had AF of P/LP germline mutations higher (significantly higher in 11 genes) in the lung cancer patients than in non-cancer individuals in ChinaMAP study indicating an enrichment of these germline mutations in lung cancer patients. Furthermore, of the 106 patients with germline mutations, we were able to collect and analyze tumor samples from 59 patients and loss of heterozygosity (LOH) of the second allele was found in 8 (12.9%) tumors (Supplementary Table 1) and lost-of-function mutations in the other allele were found in three additional tumors (Supplementary Table 2) for a total of 11 (18.6%) patients showing evidence of second-hit events, comparable to that in the western patient population 20 . Taken together, these data suggest that these P/LP germline mutations identified in the current study may be associated with increased risk of lung cancer development in Chinese population.
Germline landscape between Asian and western lung cancers.
To understand whether germline landscape differs between Asian and western lung cancer population, we compared our results with the germline mutation data derived from the Cancer Genome Atlas (TCGA) lung adenocarcinoma (LUAD), and lung squamous carcinoma (LUSC) cohorts 9 . Overall, there were 75 cancer predisposition genes (Supplementary Table 3 Table 5) likely due to the very low event rates in both cohorts.
P/LP germline mutations in BRCA1/2 and TP53 may be associated with early onset of lung cancer. Next, we sought to investigate whether the lung cancers with P/LP germline mutations have distinct clinical features compared to lung cancers without P/LP germline mutations. The germline mutation rate was not associated with gender, age, histology or stage (IV vs. I-III) either in univariate analysis or multivariate analysis (Supplementary Table 6). However, there was a trend that germline mutations were more common in younger patients, consistent with report in a Western lung cancer population 20 . The prevalence of P/LP germline mutation in patients under 40 was 8.57% vs. 5.29% in patients over 40 (p = 0.218, Chi-Square test)  . Bar plot and lines shows the frequency of germline variants in patients under certain age (bar) and frequency in female and male patients (lines). c The panels show the age of onset for patients without germline mutations (n = 1611 patients) (light brown dots) and patients with different germline genes (n = 104 patients) (dark brown dots). Horizontal lines indicate median age. P value is calculated by the Mann-Whitney test (*p = 0.021, **p = 0.008). and the prevalence plateaued after 55 years old (Fig. 3b). This trend appeared to be primarily driven by patients with germline mutations in BRCA1/2, who were significantly younger than patients without germline mutations (median of 52.5 vs. 60 yrs, p = 0.008, Mann-Whitney test) or patients with other germline mutations (median, 52.5 vs. 62.5, p = 0.016, Mann-Whitney test) (Fig. 3c). In addition, there were five patients in our cohort who were identified to carry P/LP germline mutations in TP53. The median age of these five patients was 43 years old, younger than patients without germline mutations (60 yrs, p = 0.07, Mann-Whitney test) (Fig. 3c) or patients with other germline mutations (62.5 years old, p = 0.16, Mann-Whitney test) although the differences did not reach statistical significance likely due to small sample size. On the other hand, patients with P/LP germline PMS2 mutations were older than patients without germline mutations (median, 74 vs. 60 yrs, p = 0.021, Mann-Whitney test), or patients with other P/LP germline mutations (median, 74 vs. 59.5 yrs, p = 0.006, Mann-Whitney test) (Fig. 3c).
Somatic mutation landscape in non-small cell lung cancers with germline mutation. We next investigated whether lung cancers with P/LP germline mutations have distinct somatic mutational landscape. To avoid false negative results ascribed to low tumor content in the specimens, 224 patients (including 187 with liquid biopsy samples and 37 with FFPE samples) without any somatic mutation detected were excluded for this analysis (Fig. 1). Furthermore, since 634 tumor specimens were formalinfixed paraffin-embedded (FFPE) specimens, which are known to be associated with higher incidence of sequencing artifacts than fresh tissues, strict filtering criteria (see method for details) were applied for somatic mutation calls. DNA degradation increases with age of FFPE blocks, the quality of DNA heavily depends on storage time of FFPE specimens and the artifact rate is rather low if the FFPE blocks are <1 year old 21 . Fortunately, 551/634 (87%) FFPE specimens utilized in this study were collected within one year before DNA extraction. Nevertheless, we sought to determine the sequencing data quality before further analyses. Since FFPE sequencing artifacts usually present as low log odds (LOD) score (usually < 10), low VAF (usually < 10%), predominantly "C > T/G > A" transitions 22 , we assessed the LOD scores and the proportion of "C > T/G > A" transitions for all mutations included in this study. When particularly looking into these "highrisk" features, only 8 of 4784 (0.2%) mutations from FFPE specimens were C > T/G > A transitions with LOD score < 10 and VAF < 10%. Furthermore, the overall proportion of C > T transitions was similar for mutations identified from FFPE specimens compared to those from fresh tissue specimens (29.3% vs. 29.17%, p = 0.951, Chi-square test). Taken together, these data suggest that the impact of FFPE artifacts was minimal in this study.
Overall, the frequently mutated genes and tumor mutation burden (TMB, 5/Mb vs. 5/Mb, p = 0.841, Mann-Whitney test) were similar between patients with P/LP germline mutations and those without germline mutations (Fig. 4a). Interestingly, the group of patients with P/LP germline mutations were significantly enriched for somatic mutations in MET (7/92, 7.6% vs. 43 (Table 3). Otherwise, the prevalence of somatic mutations in other commonly mutated cancer genes in Asian lung cancer patients including EGFR (44.6% (41/92) in patients with germline mutations vs. 46.7% (675/1434) in patients without, p = 0.720) was comparable in patients with P/LP germline mutations and those without. These data suggested that although the common cancer gene mutations are similar between lung cancers with and without P/LP germline mutations, there might be genetic constraints in certain patients with cancer predisposition germline mutations.
Germline mutations may have impact on mutagenesis of lung cancers. We next sought to explore whether germline mutations could have impacted the mutagenesis in this cohort of lung  Fig. 1). These data imply that germline mutations may contribute to the tumorgenesis by inducing related mutation type. However comprehensive studies with data at whole exome sequencing level are warranted to validate these findings.

Discussion
Understanding genetic predisposition is critical for screening, prevention and treatment of patients with germline pathogenic alterations. Genome-wide analyses have offered new evidence on cancer pathogenic germline variants 10 27 . Our study provided the first set of data on the clinical and genomic features of Chinese lung cancer patients with P/LP germline predisposition mutations using a relatively large gene panel and revealed that a substantial proportion (5.91%) of Chinese lung cancer patients could carry P/LP germline mutations. Considering the large cardinal number of lung cancer patients in China, these carriers represent a very large patient population. As expected, certain genetic predisposition genes were shared between Chinese lung cancer patients and western lung cancer patients from TCGA. However, the prevalence of P/LP germline mutations was different in these two lung cancer patient cohorts. Some genetic predisposition genes were unique to the Chinese or the TCGA lung cancer cohort implying a potential difference in genetic influences and/or exposures between these two patient populations. One caveat is that compared to TCGA cohorts, patients in the current Chinese cohort were younger, with more female patients and stage IV diseases (Supplementary Table 9). These important differences could have potentially confounded the observed higher incidence of P/LP germline mutations in the Chinese cohort. However, in the current cohort, incidence of P/ LP mutations did not seem to correlate with gender, age, histology or stage (Supplementary Table 6). Similarly, in a study on western lung cancer cohort, incidence of P/LP genetic mutations was not different between histologies 20 . Taken together, these data implied that distinct ethnic background and possibly exposure history may be the main reasons for the observed differences in germline mutations between Chinese and Western lung cancer patients.
With the modest sample size fully acknowledged, we attempted to address the question whether lung cancer patients carrying genetic disposition germline mutations have unique clinical and molecular features. Of particular interest, the age of onset was significantly younger in patients with germline mutations in BRCA1/2. These observations were consistent with a previous pan-cancer analysis 8 and studies on hereditary breast cancer and colorectal cancer 28,29 . Similarly, patients with P/LP germline mutations in TP53 also appeared to be younger than patients without P/LP germline mutations or patients with other germline mutations although the difference did not reach statistical difference likely due to small sample size (Fig. 3c). This is in line with previous findings that germline mutations in TP53 are associated with early onset of various cancers in patients with Li-Fraumeni syndrome 30,31 . These data, if validated, advocate for screening of lung cancer at younger age in individuals with certain cancer predisposition germline mutations.
Another interesting finding was the lack of association between P/LP germline mutations and EGFR mutations, which are well documented to be far more prevalent in Asian lung cancer patients than western populations. The exact mechanisms remain unknown, but different genetic background in different ethnic populations is thought to be one of the potential reasons. Germline EGFR mutations such as T790M have been reported in hereditary lung cancers 32 . In above mentioned study on 12,833 Chinese lung cancer patients, germline EGFR mutations were identified in 14 patients (0.11%) 26 . Interestingly, germline EGFR T790M mutation was identified in only 1 of 5675 (0.02%) Chinese lung cancer patients carrying somatic EGFR mutations, much lower than 1-4% in EGFR-mutant Caucasian lung cancer patients 33,34 further highlighting the potential ethnic and etiologic differences between Chinese and western patient populations. In the current study, we did not detect any EGFR germline mutations in the 1794 lung cancer patients. Furthermore, the prevalence of somatic EGFR mutations did not appear to associate with P/LP germline mutations. Taken together, these data suggest the genetic basis for germline EGFR mutations and somatic EGFR mutations in Chinese lung cancer patients is different and P/LP germline mutations unlikely account for predisposition of somatic mutations in EGFR in Chinese lung cancer patients.
One major caveat is that in many studies including the current study, P/LP germline mutations were annotated according to the ACMG guidelines, which were mainly based on the data and experience from Caucasian patients. Because of the distinct genetic background, these guidelines may not always apply to other ethnic populations. In the current study, for example, MUTYH, a gene encoding a DNA glycosylase involved in oxidative DNA damage repair and associated with heritable predisposition to various cancers, particularly colorectal cancer 35,36 , demonstrated similar AF between lung cancer patients and noncancer individuals suggesting that the germline mutations in MUTYH were not associated with lung cancer risk. It is worth noting that an MUTYH variant c.934-2 A > G (rs77542170) is defined as a "P/LP" mutation based on ACMG guidelines. Indeed, the AF of MUTYH c.934-2 A > G in non-cancer individuals from the Genome Aggregation Database (GAD) was only 0.11% (allele count (AC): 312 of 28,2820) in line with this annotation. However, 307 of the 312 AC were from the East Asians (EAS) with an AF of 1.5% (307/19952) in EAS, 13.6 times higher than that of the whole GAD population (P = 2.2E−16). In addition, there were five EAS individuals harboring homozygous alleles of this variant (gnomAD, https://gnomad.broadinstitute.org). Moreover, two studies on Japanese patients reported that AF of this MUTYH c.934-2 A > G mutation in gastric cancer patients 37 and colorectal cancer patients 38 was no different compared to non-cancer individuals. These results suggested that MUTYH c.934-2 A > G (rs77542170) most likely is not a pathologic germline mutation for EAS individuals. These findings highlighted the profound impact of ethnicity on defining P/LP germline mutations and emphasized the importance of taking ethnicity into consideration when annotating P/LP germline mutations. Furthermore, because we only included P/LP germline mutations based on ACMG guidelines, there may be other cancer predisposition genes or mutations unique to Asian patients that we were not able to identify in the current study. There have been efforts to fill this void. For example, the China food and drug institute established a standard database based on the interpretation of genetic variation in Chinese population with the goal to establish a reference system for performance evaluation of BRCA genetic testing 39 . Future studies on large cohort of Asian cancer patients using more comprehensive panel, ideally at exome level are warranted to establish clinical germline database for Asian cancer patients.
The majority of studies on genetic predisposition are primarily based on the association between the presence of certain germline mutations and cancer incidences. In our study, we found that the frequencies of P/LP germline mutations were significantly higher in lung cancer patients than the 10,588 non-cancer Chinese individuals (Table 2). In addition, evidence of second-hit of genes with P/LP germline mutations was found in 18.6% of tumors. These results indicated the connection between these germline mutations and lung carcinogenesis. Other bioinformatics approaches, such as mutational signature analysis 40 , homologous recombination deficiency score 41 analysis etc. could potentially provide further support to the contribution of these germline mutations to lung cancer development. Unfortunately, our data was limited by small numbers of mutations from panel sequencing for such analyses. Nevertheless, these association-based studies have served as the bases for establishing guidelines such as ACMG, which are of value to determine strategies for screening and prevention of certain cancers. However, from cancer biology standpoint, presence of a mutation does not necessarily mean it is causative. Functional studies including genetic animal models are eventually needed to determine the impact of certain germline mutations on carcinogenesis.
One inherent limitation of our study, as a retrospective realworld data mining study, is that clinical information including smoking history, treatment response and survival data etc. were not available from many patients, which precluded us being able to explore some very important questions such as the impact of P/LP germline mutations on mutational signatures, treatment response and prognosis. Nevertheless, as the first study, our data suggested that substantial proportion of Chinese lung cancer patients may carry germline mutations. These patients with germline mutations may have distinct clinical and molecular features and the genes accounting for lung cancer predisposition in Asian patients may be different from those in western populations. These results highlighted again the need for future prospective studies on larger cohorts of Asian patients to identify cancer disposition genes unique to Asian populations as well as to define the clinical and genomic features of cancer patients with germline mutations for precise cancer prevention, screening and treatment.

Methods
Patient cohort and samples. Cohort in this study encompassed 1794 lung cancer patients (Supplementary data 2), who were subjected to target capture NGS of 1021 cancer genes in tumor DNA and paired germline DNA as part of the clinical care. The study was approved by the Ethics Committee of Hunan Cancer Hospital and all participants signed a written informed consent.
Sample processing, DNA extraction and Quantification. Liquid biopsy samples (peripheral blood, ascitic effusion, pleural effusion, pericardial effusion and cerebrospinal liquid) were collected in Streck vacutainer tubes (Omaha, NE) and processed within 48 h to separate the supernatant by centrifugation at 1600 g for 10 min. Buffy coat from peripheral blood was kept for DNA extraction as germline control. The supernatant was transferred to microcentrifuge tubes, and further centrifuged at 16,000 g for 10 min to remove remaining cell debris. Separated liquid biopsy samples and buffy coat were stored at −80°C until DNA extraction. Separated liquid biopsy samples were isolated for cell free DNA (cfDNA) using a QIAamp Circulating Nucleic Acid Kit (Qiagen, Hilden, Germany). Buffy coat DNA and FFPE tumor tissue DNA were extracted using the DNeasy Blood & Tissue Kit (Qiagen). DNA concentration was measured using Qubit fluorometer 3.0 (Life Technologies) and the Qubit dsDNA HS (High Sensitivity) Assay Kit (Invitrogen, Carlsbad, CA, USA). The cfDNA size distribution was evaluated using an Agilent 2100 BioAnalyzer and a DNA HS kit (Agilent Technologies, Santa Clara, CA, USA). The sample quality was assessed based on the following criteria: total amount ≥30 ng for cfDNA samples or ≥100 ng for tumor FFPE samples; fragment length for cfDNA samples was distributed with a dominant peak at 170 bp proximately 42,43 .
Library construction, target enrichment and sequencing. Before library construction, DNA from buffy coat peripheral blood lymphocytes (PBL) or from FFPE samples was sheared to 200-300 bp fragments using a Covaris S2 ultrasonicator (Covaris, Woburn, MA, USA). Indexed Illumina next-generation sequencing (NGS) libraries were prepared from PBL DNA, tumor DNA, and liquid biopsy DNA using the KAPA Library Preparation Kit (Kapa Biosystems, Wilmington, MA, USA). The region of frequently mutated 1021 genes (Supplementary data 3) in solid tumors were enriched using a custom SeqCap EZ Library (Integrated DNA Technology, Coralville, IA, USA). Captured hybridization was performed using the manufacturer's protocol. Following hybrid selection, the captured DNA fragments were amplified and then pooled to generate several multiplex libraries. Of note, for liquid biopsy samples, the duplex sequencing based on a unique identifier tag (UID) were applied to filter repeatedly errors in the consensus bidirectionally and rectify sequencing errors mostly introduced by PCR/sequencing and modify the base quality. Finally, the libraries were performed on NovaSeq6000 or Hiseq3000 Sequencing System (Illumina, San Diego, CA) with 2 × 101 bp paired-end reads. The TruSeq PE Cluster Generation Kit V3 and TruSeq SBS Kit V3 (Illumina, San Diego, CA, USA) were used according to the manufacturer's recommendations 44 . each barcoded dataset was separated. Burrows-Wheeler Aligner was used to map reads to the reference genome GRCh37/hg19. GATK (Version 3.6) 45 (haplotype caller in single-sample mode with duplicate and unmapped reads removed using defaulted parameters) was used to detect single-nucleotide variants (SNVs) and small insertions and deletions (indels) from germline DNA samples extracted from blood. Variants in 94 genes (selected from Genetic Testing Registry 46 (GTR, www.ncbi.nlm.nih.gov/gtr/) and NCCN Genetic/Familial High-Risk Assessment guidellines 46,47 ) (Supplementary Table 10) were included for further annotation. The following filtering criteria were applied. (1) A minimal mapping quality of 25 was used to ensure high quality reads. (2) Only germline mutations that meet the following criteria were included: A. Sequencing depth at the targets >50× (78×-510×, mean 265× for the data used in this study); and B. Variant allele frequency (VAF) > 25% (25%-55%, mean 47.3% for the data used in this study). (3) All germline mutations were manually verified using IGV browser. Common SNPs in ≥1% of population in the 1000 genomes, ExAC and ExAC Asian databases were filtered out. Variants were matched with those in the ClinVar, HGMD or an inhouse database, and then were manually confirmed and annotated as pathogenic, likely pathogenic, uncertain significance, likely benign or benign according to 2015 ACMG Guideline 16 . Various types of evidence classified as PVS1 (pathogenic very strong 1), PS2 (pathogenic strong 2), PS3 (pathogenic strong 3), PM6 (pathogenic moderate 6) and BS3 (benign strong 3) in ACMG guideline were confirmed according to the recommendation of application ACMG guideline [48][49][50] by Clinical Genome Resource (Clingen) (https://www.clinicalgenome.org/).
Genetic architecture of Chinese lung cancer patients. SNP data from 1000 genomes project phase 3 (n = 2054) was utilized. Following criteria were employed to select the SNPs covered in this 1021 cancer gene panel: minor allele frequency ≥ 1% (common and low-frequency variants), genotyping rate ≥ 90%, Hardy-Weinberg-Equilibrium P > 0.000001, and removing one SNP from each pair with r2 ≥ 0.5 (in windows of 50 SNPs with steps of 5 SNPs). The mean pairwise Fst differences between the Chinese lung cancer patients and different ethnic populations in the 1000 genome population (1KGP) were calculated using EIGENSOFT (Version 7.2.1). Principal component analysis (PCA) was performed using autosomal bi-allelic SNPs. The PCA was performed with the final SNPs using PLINK 51 (Version 1.9) and EIGENSOFT 52,53 (Version 7.2.1).
Somatic sequencing data analysis. A minimal mapping quality of 25 was required to ensure high-quality somatic reads. Somatic SNVs in tumor DNA were called using MuTect (Version 1.4) and NChot 54 . GATK (Version 3.6) 45 was used to identify indels. (Supplementary data 4). Somatic mutations that meet the following criteria were included for further analyses: sequencing average depth >100× in germline DNA, >500× in tumor DNA (1000 in ctDNA), minimal VAF > 1% in tumor DNA (0.5% in ctDNA), the ratio of AF in case/control (tumor /germline) >3 and at least 4 supportive reads (both tissue and ctDNA). For hotspot mutations (EGFR 19del, EGFR L858R, EGFR T790M, KRAS G12, MET 14 exon skipping, BRAF V600E etc.), the requirements were: the sequencing depth >20, 3 (for SNV) or 2 (for indel) supportive reads, and the ratio of AF in case/control (tumor /germline) >3. In addition, all mutations were manually verified with IGV browser. Somatic copy-number variation (CNV) were identified with CONTRA (Version 2.0.8) and calculated as the ratio of adjusted depth between tumor DNA and germline DNA (Supplementary data 5). Loss of heterozygosity (LOH) was analyzed with Facets (Version 1.0.1) 55 . For structural variations (SV) (Supplementary data 6), probes were designed to capture selected exons and introns of RET, ALK, ROS1, and NTRK1 oncogenes based on previously reported SVs. An in-house algorithm was used to identify split-read and discordant read-pair. In addition, mutations associated with clonal hematopoiesis were filtered out 56 . R package "YAPSA" was applied to deconstruct signatures from combined SNV from each groups. Signatures (AC) contributed by over 3% of SNVs were displayed.
Calculation of tumor mutation burden. The tumor mutation burden (TMB) was calculated as the number of non-silent somatic mutations (non-synonymous SNV, indel and splice ± 2) per mega-base (1 Mb) of coding genomic regions sequenced (1 Mb for this 1021 panel). To avoid the false negative results that may confound the TMB calculation, only samples carrying at least one mutation with VAF > 0.03 (in tissue sample) or >0.005 (in ctDNA sample) were included. Other specimens were classified as TMB-unevaluable. CNV or SV was not included for TMB calculation.
Statistical analysis. Mann-Whitney test was employed to compare age and TMB between groups. The Chi-square test or Fisher's exact test was performed to test frequency between groups. All statistical analysis was performed with SPSS (v.23.0; STATA, College Station, TX, USA) or GraphPad Prism (v. 6.0; GraphPad Software, La Jolla, CA, USA) software. Statistical significance was defined as a two-sided p value of <0.05.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Patient deidentified clinical and mutation data (both germline and somatic mutations) were provided in the Supplementary data 1-6. The Fastq data of all samples were deposited in the GSA-Human (Genome Sequence Archive for Human in BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, http://gsa.big.ac.cn/ gsa-human, https://ngdc.cncb.ac.cn/gsa-human/browse/HRA001610). The data are available under controlled access and may be requested by completing the application form via GSA-Human System. Data acquisition is granted by the corresponding Data Access Committee. The approximate response time for accession requests is about 2 weeks. Additional guidance are shown on the GSA-Human System website [https:// ngdc.cncb.ac.cn/gsa-human/document/GSA-Human_Request_Guide_for_Users_us.pdf]. The reference genome used in this study was GRCh37/hg19. SNP data from 1000 genomes project phase 3 were used for SNP architecture analysis (https:// www.internationalgenome.org/data-portal/data-collection/phase-3).
Germline mutation data form the Cancer Genome Atlas (https://www.cell.com/cell/ fulltext/S0092-8674(18)30363-5) were used to compare the landscape of germline mutations. The summary information from The China Metabolic Analytics Project (ChinaMAP) was used as non-cancer population to illustrate the potential association between germline mutations and lung carcinogenesis (https://www.nature.com/articles/ s41422-020-0322-9), all variants could be accessed through the ChinaMAP browser (www.mBiobank.com). A complete list of germline mutation can be found in Supplementary Data 1. Epidemiological information can be found in Supplementary Data 2. Gene list and region in sequencing panel can be found in Supplementary Data 3. somatic variation, copy-number variation and structural variation can be found in Supplementary Data 4-6.