Comprehensive genomic profile of Chinese lung cancer patients and mutation characteristics of individuals resistant to icotinib/gefitinib

Lung cancer is the leading causes of cancer-related death worldwide. Precise treatment based on next-generation sequencing technology has shown advantages in the diagnosis and treatment of lung cancer. This cohort study included 371 lung cancer patients. The lung cancer subtype was related to the smoking status and sex of the patients. The most common mutated genes were TP53 (62%), EGFR (55%), and KRAS (11%). The mutation frequencies of EGFR, TP53, PIK3CA, NFE2L2, KMT2D, FGFR1, CCND1, and CDKN2A were significantly different between lung adenocarcinoma and lung squamous cell carcinoma. We identified the age-associated mutations in ALK, ERBB2, KMT2D, RBM10, NRAS, NF1, PIK3CA, MET, PBRM1, LRP2, and CDKN2B; smoking-associated mutations in CDKN2A, FAT1, FGFR1, NFE2L2, CCNE1, CCND1, SMARCA4, KEAP1, KMT2C, and STK11; tumor stage-associated mutations in ARFRP1, AURKA, and CBFB; and sex-associated mutations in EGFR. Tumor mutational burden (TMB) is associated with tumor subtype, age, sex, and smoking status. TMB-associated mutations included CDKN2A, LRP1B, LRP2, TP53, and EGFR. EGFR amplification was commonly detected in patients with acquired lcotinib/gefitinib resistance. DNMT3A and NOTCH4 mutations may be associated with the benefit of icotinib/gefitinib treatment.


Scientific Reports
| (2020) 10:20243 | https://doi.org/10.1038/s41598-020-76791-y www.nature.com/scientificreports/ history, age, and tumor stage. Considering the small number of SCLC and LASC cases, we excluded them from the correlation analysis. The proportion of patients with a smoking history was higher than those who never smoked in LUSC, while the proportion of patients who never smoked was higher than those with a smoking history in LUAD. Statistical analysis showed that smoking history was correlated with tumor subtype (Fig. 1A). In addition, we found that the proportion of male patients was higher than that of female patients with LUSC, while the proportion of female patients was higher than that of male patients with LUAD. Statistical analysis showed that the sex of patients was correlated with tumor subtype (Fig. 1B). Meanwhile, we found that the proportion of stage IV tumors was high in LUAD, while the proportion of stage I tumors was high in LUSC. Statistical analysis showed a significant association between tumor stage and tumor subtype (Fig. 1C). In addition, we found that the majority of patients were near 60 years old cross different tumor subtypes. Our results showed that there was no correlation between tumor subtype and patient age (Fig. 1D).
The co-occurrence and mutual exclusion of gene mutations can influence prognosis. For this reason, we performed a co-mutation analysis of this cohort. Our results showed that mutations in KMT2C, APC, CDKN2A, RB1, and EGFR co-occurred with TP53 mutations, while mutations in MDM2 and KRAS were mutually exclusive   Mutational landscape of 371 Chinese lung cancer patients. The X-axis represents each patient tissue sample and the Y-axis represents each mutated gene. The bar graph above shows the tumor mutational burden (TMB) value of each sample, and the bar graph on the right shows the mutation frequency of each mutated gene. Statistical distribution of variation types is shown in the right column. Green represents substitution/ indel, red represents gene amplification, blue represents gene homozygous deletion, yellow represents fusion/ rearrangement, and purple represents truncation mutations. Differences between lung adenocarcinoma and lung squamous cell carcinoma. In this study, NSCLC represented nearly 90% of cases and mainly consisted of LUAD and LUSC. As shown in Fig. 4, there were many differences in the molecular characteristics of LUAD and LUSC. The most common mutations in LUAD and LUSC were EGFR, TP53, and KRAS, and TP53, PIK3CA, CDKN2A, EGFR, CCND1, NFE2L2, FAM1358, and FGFR1, respectively (Fig. 4A,B). In LUAD, the main mutation type of PIK3CA was SNV; in LUSC, it was mainly gene amplification. ALK fusion; RBM10 truncation; MET, TERT, NKX2-1, SDHA, CDK4, and MDM2 amplifications, and CDKN2A deletion were mainly identified in LUAD (Fig. 4A). CCND1, FGFR1, and FGF3/4/13, and SOX2 amplifications were mainly identified in LUSC (Fig. 4B). Statistical analysis showed that the mutation frequency of EGFR was higher in LUAD than in LUSC (P = 0.0015), while the mutation frequencies of TP53 (P = 0.064), PIK3CA (P = 0.00014), NFE2L2 (P = 0.0038), KMT2D (P = 0.0066), FGFR1 (P = 0.023), CCND1 (P = 0.033), and CDKN2A (P = 0.035) were higher in LUSC than in LUAD (Fig. 5).
Age-related gene mutations in Chinese lung cancer patient. We examined the correlation between patient age and gene mutations. The results showed that patients with mutations in ALK, ERBB2, or KMT2D were younger than those without these mutations, while patients with mutations in RBM10, NRAS, NF1, PIK3CA, MET, PBRM1, LRP2, NFE2L2, or CDKN2B were older than those without these mutations. Statistical analysis showed that the mutation of these genes was significantly associated with the patient age (Fig. 6).
Correlations between mutated genes and smoking status, tumor stage, and sex in Chinese lung cancer patients. Based on the smoking status data, we analyzed the correlation between mutated genes and smoking status of patients. The most common mutated genes in smokers included TP53, EGFR, KRAS, CDKN2A, LRP1B, ALK, BCL2L11, KEAP1, KMT2C, PIK3CA, and STK11. The most common mutated genes in non-smokers were EGFR, TP53, RB1, SDHA, RBM10, and TERT (Table S2). TP53 and EGFR mutations frequently occurred in both smokers and nonsmokers. Based on statistical analysis, the frequency of EGFR mutations was significantly higher in nonsmoking than smoking patients (Fig. 7A). The mutation frequencies of CDKN2A, FAT1, FGFR1, NFE2L2, CCNE1, CCND1, SMARCA4, KEAP1, KMT2C, and STK11 were significantly higher in smokers than in nonsmokers (Fig. 7A).
According to the information on tumor stage, we combined 80 cases of stage I and II into a group, and 230 cases of stage III and IV into another group. Statistical analysis showed that TP53 and RB1 mutation frequencies were significantly higher in cases with tumor stages III and IV than in those with tumor stages I and II, while the mutation frequencies of TAF1, LRP1B, SDHA, CBFB, BRIP1, and SMAD4 were significantly higher in cases with tumor stages I and II than in those with tumor stages III and IV (Fig. 7B).

Figure 3.
Co-occurring analysis of Chinese lung cancer patients. Green represents the co-occurrence mutations; pink represents the mutual exclusive mutations. ▪ P < 0.05, *P < 0.01.

Correlations between tumor mutation burden and clinical characteristics and mutated genes.
We measured the available TMB in 216 cases to explore the relationship between TMB and clinical characteristics, and TMB and clinically relevant mutations. The median TMB was 5.0 muts/Mb (range, 0-55.7 muts/Mb) ( Table 1). We identified TMB-H in 49 cases (22.7%, 49/216) and TMB-L was identified in 167 cases (77.3%, 167/216). The median TMB in LUSC was higher than that in LUAD. Statistical analysis showed a significant association between TMB and tumor subtype (Fig. 8A). According to the age distribution of patients, we found that the TMB value increased gradually with the increase in age. Statistical analysis showed a positive correlation between age and TMB value in lung cancer (Fig. 8B). In this cohort, we also found that the median . The X-axis represents each patient tissue sample and the Y-axis represents each mutated gene. The bar graph above shows the tumor mutational burden (TMB) value of each sample, and the bar graph on the right shows the mutation frequency of each mutated gene. Green represents substitution/Indel, red represents gene amplification, blue represents gene homozygous deletion, yellow represents fusion/rearrangement, and purple represents truncation mutations.
Scientific Reports | (2020) 10:20243 | https://doi.org/10.1038/s41598-020-76791-y www.nature.com/scientificreports/ TMB was higher in males than in females (7 mutations/Mb vs 4.3 mutations/Mb). The median TMB of smoking and non-smoking patients was 8.5 mutations/Mb and 3.8 mutations/Mb, respectively. Statistical analysis showed a significant association between TMB and sex and smoking status of patients (Fig. 8C,D). Based on clinical relevance, we also analyzed TMB-related mutations. For each tested gene, patients were divided into mutant and wild type groups. Statistical analysis showed that mutations in CDKN2A, LRP1B, LRP2, TP53, and EGFR were significantly associated with TMB. Among these five genes, mutations in CDKN2A, LRP1B, LRP2, and TP53 were associated with high TMB, while EGFR mutations were associated with low TMB (Fig. 9).
Characterization of EGFR mutations in patients resistant to icotinib/gefitinib. In this cohort, 203 patients were harbored EGFR mutations, with 77 patients receiving EGFR-TKIs treatment. Of the patients who received this treatment, we followed up 29 patients who treated with icotinib (375 mg/day) or gefitinib (250 mg/day). Among them, 22 patients developed disease progression within 6 months and were considered drug resistant, while 7 benefited from the therapy for more than 6 months and were considered drug sensitive. A total of 55 EGFR alterations were detected in these 29 patients, including 8 L858R, 16 T790M, 19 19del, and 12 EGFR amplifications. Among the drug-resistant patients, 12 had T790M mutation. The patients who did not have this mutation included 2 patients with L858R mutation (one of them harbored ERBB2 amplification), 2 patients with EGFR amplification, 4 patients with 19del mutation (one of them harbored ERBB2 amplification), and 2 patients with both 19del mutation and EGFR amplification. Among the drug-sensitive patients, 4 had T790M mutation and 3 did not have this mutation (2 carried 19del and 1 carried L858R mutation) ( Table 2).
In addition to the T790M mutation, we found that the proportion of EGFR amplification in patients with drug resistance was higher than that in patients with drug sensitivity (40% vs 0%). We also analyzed mutations other than EGFR in the followed up patients. We found that DNMT3A and NOTCH4 mutations were lower in the lcotinib/gefitinib-resistant patients than those in the drug-sensitive patients (0% vs 28.6%, P = 0.052, for both) (Fig. 10).

Discussion
Lung cancer, which is multi-factorial and has various histological subtypes, is one of the most dangerous malignant tumors to human health and life. In recent years, the incidence and mortality rates of lung cancer have increased significantly in many countries 1,30 , with the incidence in women increasing annually 31 . In addition, men are more likely to develop LUSC, while women are more likely to develop LUAD 32 . Regarding risk factors, smoking is one of the most important for lung cancer. Smoking has been reported to be significantly associated with LUSC 33 . Our results also supported that smoking was significantly associated with LUSC and LASC.
A total of 371 lung cancer patients (187 males and 184 females) were included in this study. Most of them were LUAD patients, and the proportion of different sexes in this group was similar. However, in LUSC patients, the proportion of males was higher than females. This might be due to the high proportion of smokers among male patients. Moreover, the incidence rate of lung cancer increases with age 34 . However, the median age of patients in this study was approximately 60 years and there was no significant difference in age distribution among different cancer subtypes, indicating that there was no correlation between tumor subtypes and patient age.
The continuous development of NGS technology facilitates the analysis of the landscape of cancer mutations. LUAD and LUSC are the two major subtypes of lung cancer and previous studies have shown that they have different molecular characteristics. The most common mutations in LUAD were TP53, KRAS, KEAP1, STK11, EGFR, NF1, and BRAF 4 ; in LUSC, they were TP53, MLL2 (KMT2D), CDKN2A, PIK3CA, KEAP1 and NFE2L2 5 .   36 . Similarly, we found co-mutations of EGFR, TP53, and   37 . Mutual exclusive mutations of EGFR and KRAS may imply the potential opportunity to benefit from TKI-inhibitor therapy. However, there was no co-mutation of KRAS with STK11 and KEAP1. The inactivation of TP53 and RB1 is the molecular characterization of SCLC 38 . In this study, 8 out of 11 SCLC patients harbored a co-mutation of TP53 and RB1. All these results support the previous reported molecular features of lung cancer. Furthermore, we found significantly different mutational frequencies of NFE2L2 and KMT2D. NFE2L2 is an important gene involved in the regulation of cell response to oxidative damage and chemotherapy 5 . A previous reports suggested that the NFE2L2 mutation may be a biomarker for the special treatment of LUSC. Another study reported that the KMT2D mutation correlates with poor prognosis in NSCLC 39 . In this way, the high frequency of KMT2D mutations indicated a poor prognosis of LUSC. However, the small number of LUSC samples is a limitation of this study and more expanded samples are needed to elucidate this association.
Age is an important factor for lung cancer and is often considered when selecting treatment 40 . With the increasing proportion of young lung cancer patients 41 , more attention has been devoted to their diagnosis. Sacher et al. focused on the GAs of young lung cancer patients and identified that mutations in EGFR, ALK, and ERBB2 trend to occur in younger NSCLC patients 42 . According to different age groups, Jiang et al. reported that mutations in EGFR and TP53 were associated with age in Chinese NSCLC patients 43 . In contrast to this study, we did not detect an association between age and TP53 and EGFR mutations. However, we detected the correlation between age and ALK and ERBB2 mutations, similar to the results of Sacher et al. 42 , which showed the reliability of our analysis. Furthermore, we identified associations between younger patients and NRAS and KMT2D mutations, and elderly patients and RBM10, NF1, PIK3CA, MET, PBRM1, LRP2, and CDKN2B mutations. These results contribute to the age-associated gene alteration data in lung cancer.
The mutational profile is different in smoking and nonsmoking patients. Mutations in EGFR, ROS1, and ALK mainly occur in nonsmoking patients, while mutations in KRAS, TP53, BRAF, JAK2, and JAK3 mainly occur in (C) Differences in mutational frequency of EGFR between male and female lung cancer patients. *P < 0.05, **P < 0.01, and ***P < 0.001. This discrepancy may be caused by regional differences among populations. Previous studies have shown that female patients have a lower risk of cancer progression than male patients 45,46 . Sex-related biomarkers could indicate specific treatment options. Similar to previous studies, we found that the mutational frequency of EGFR was significantly higher in female patients than that in male patients 4 . Tumor, lymph node and metastasis (TNM) staging are often used in treatment decisions and prognosis prediction of lung cancer patients 47 . For NSCLC, stages I-II are considered early stages and are normally treated with surgery, while stages III-IV are advanced stages and are normally treated with concurrent chemoradiotherapy 48 . Our results showed a correlation between TP53 and RB1 mutations and tumor stages III-IV. TP53 and RB1 are important regulators of cell cycle progression. TP53 is the most common mutated gene in human cancers, and both TP53 and RB1 mutations are reported to be associated with poor prognosis of lung cancer patients [49][50][51][52] . Our results indicate that these mutations may predict the prognosis of Chinese lung cancer patients.

Scientific Reports
Moreover, we showed a significant association between early tumor stage and mutations in TAF1, LRP1B, SDHA, CBFB, BRIP1, and SMAD4. Chen et al. reported that LRP1B mutation was associated with better survival in NSCLC patients treated with immunotherapy 53 . Additionally, SMAD4 expression is associated with www.nature.com/scientificreports/ survival of patients with lung and pancreatic cancers 54 , while SDHA is considered a tumor suppressor gene of paraganglioma 55 . In this study, we first reported the correlation between SDHA mutation and tumor stage, indicating its potential predictive value. Although the correlation between TAF1, BRIP1, and CBFB mutations and prognosis has been reported, only have been reported in lung cancer [56][57][58] . Our results suggest that these genes may be related to prognosis in Chinese lung cancer patients. However, studies with a longer follow-up period are needed to elucidate this relationship. TMB is a new biomarker that may further guide the selection of checkpoint inhibitors (CPI) for patients 59 . A certain correlation between TMB and clinical characteristics has been reported. Wang et al. reported that the predictive power of TMB in lung cancer immunotherapy response was significantly better for women than for men 60 . In addition, it has been reported that increased TMB is associated with increased age in many types of   www.nature.com/scientificreports/ results showed a significant association between TMB and tumor subtype. However, we also detected a correlation between smoking status and tumor subtype, indicating that the association between tumor subtype and TMB may be caused by the smoking status. Moreover, TMB is associated with known DNA mismatch repair pathway genes (MSH2, MSH6, MLH1, and PMS2) and DNA polymerases (POLE) 61 . In this study, we failed to detect a correlation between TMB and these genes. However, statistical analysis showed the significant association between TMB and mutations in EGFR, TP53, LRP1B, LRP2, and CDKN2A, suggesting potential biomarkers for the prognosis of Chinese lung cancer patients. Particularly, TP53 mutation status may be a useful biomarker for predicting the response to immunotherapy in different cancer types 65 . Owada-Ozaki et al. reported that high TMB is associated with poor prognosis in NSCLC 66 . These studies supported our results.
EGFR-mutated lung cancer is a special molecular subgroup of lung cancer in which most patients benefit from treatment with EGFR-TKIs 67 . The clinical course of EGFR mutant lung cancer is significantly heterogeneous, and acquisition of EGFR T790M mutation is the most frequent reason for first-and second-generation EGFR-TKIs 68 . The receptor tyrosine kinase or alternative downstream compounds activate survival tracks such as MET amplification, ERBB2 amplification, and IGF1R activation, which are the main EGFR-independent reasons for EGFR-TKIs resistance 69,70 .
Besides 19del and L858R, EGFR amplification is also frequently occurs in lung cancer. Recently, Chen et al. reported that an EGFR-amplified cervical squamous cell carcinoma patient benefited from afatinib therapy 71 . EGFR amplification was also reported to be associated with better OS, PFS, CR, and PR in LUAD patients treated with erlotinib 72 . In this study, 29 patients received the treatment of lcotinib/gefitinib. In addition to the T790M mutation or ERBB2 amplification, we found a higher proportion of EGFR amplification in EGFR-TKI-resistant patients than that in EGFR-TKI-sensitive patients. It is important to consider whether EGFR amplification is associated with the rapid development of lcotinib/gefitinib resistance.
However, 6 patients with 19del or L858R mutations also rapidly developed EGFR-TKIs resistance. We found a high mutational proportion of DNMT3a and NOTCH4 in these patients. DNMT3a plays an important role in methylation status. The Notch signaling pathway has an important regulatory role in a variety of tumor stem cells. Mutations in DNMT3a and NOTCH4 have been reported to be associated with better prognosis in patients with LUAD and NSCLC, respectively 73,74 . Similarly, the EGFR-TKI sensitivity of patients with these indicates a good prognosis. It is suggested that DNMT3a and NOTCH4 mutations may be potential biomarkers to predict sensitivity to EGFR-TKIs.
In conclusion, we identified the comprehensive genomic features of 371 Chinese lung cancer patients and found that sex and smoking status were significantly associated with lung cancer subtype. Furthermore, we detected that certain gene mutations were associated with age, smoking status, tumor stage, and TMB value. We also suggested a series of biomarkers for potential therapy and prognosis, and indicated that EGFR amplification, DNMT3a mutation, and NOTCH4 mutation may be used to predict EGFR-TKI resistance. Together, our research contributes to the comprehensive understanding of lung cancer molecular features and provides evidence for the developing and application of precise therapeutic strategies for Chinese lung cancer patients.

Data availability
The datasets used and analyzed in this study are available from the corresponding author upon reasonable request. License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.