Actionable Mutation Profiles of Non-Small Cell Lung Cancer patients from Vietnamese population

Comprehensive profiling of actionable mutations in non-small cell lung cancer (NSCLC) is vital to guide targeted therapy, thereby improving the survival rate of patients. Despite the high incidence and mortality rate of NSCLC in Vietnam, the actionable mutation profiles of Vietnamese patients have not been thoroughly examined. Here, we employed massively parallel sequencing to identify alterations in major driver genes (EGFR, KRAS, NRAS, BRAF, ALK and ROS1) in 350 Vietnamese NSCLC patients. We showed that the Vietnamese NSCLC patients exhibited mutations most frequently in EGFR (35.4%) and KRAS (22.6%), followed by ALK (6.6%), ROS1 (3.1%), BRAF (2.3%) and NRAS (0.6%). Interestingly, the cohort of Vietnamese patients with advanced adenocarcinoma had higher prevalence of EGFR mutations than the Caucasian MSK-IMPACT cohort. Compared to the East Asian cohort, it had lower EGFR but higher KRAS mutation prevalence. We found that KRAS mutations were more commonly detected in male patients while EGFR mutations was more frequently found in female. Moreover, younger patients (<61 years) had higher genetic rearrangements in ALK or ROS1. In conclusions, our study revealed mutation profiles of 6 driver genes in the largest cohort of NSCLC patients in Vietnam to date, highlighting significant differences in mutation prevalence to other cohorts.

Lung cancer is the most common malignancy and the leading cause of cancer related deaths worldwide (18.4% of total cancer deaths), with non-small cell lung cancer (NSCLC) being the most common subtype, accounting for approximately 85% of all diagnosed cases 1,2 . The majority of NSCLC patients display advanced disease when diagnosed and thus have poor prognosis 2,3 . It is well established that acquired genetic alterations in certain driver genes result in tumour growth and invasiveness, and that patients harboring certain mutations may benefit from targeted therapies 4,5 . Indeed, a randomized clinical trial reported that advanced NSCLC patients harboring activating mutations in EGFR, one of the major driver genes of NSCLC, exhibited longer progression-free period when treated with a tyrosine kinase inhibitor (TKI), gefitinib, compared to those treated with standard platinum based chemotherapy 6 . However, those who were treated with TKI drugs can acquire secondary resistant mutations, in which case a new treatment regimen is needed to maintain therapeutic effect 7,8 . In addition to EGFR, NSCLC patients carrying ALK or ROS1 rearrangement were shown to respond well to a different TKI drug,

Results
Clinical characteristics of Vietnamese NSCLC patients. The cohort in this study comprised of 350 patients diagnosed with NSCLC by clinical histology from 4 hospitals in Vietnam, with higher percentage of male compared to female (60.6% versus 37.4%, p < 0.05) and the median age of 61 years, ranging from 24 to 89 years ( Table 1). The majority of patients (289 cases, 82.6%) were classified into advanced stages (III-IV), while 12 patients (3.4%) were in early stages (I-II) and 49 cases (14%) missing information on clinical stages (Table 1). Adenocarcinoma (AC) was the most common NSCLC subtype (241 cases, 68.9%) while squamous carcinoma (SCC) was confirmed in 25 patients (7.1%). Additionally, 84 cases (24%) were of either unknown or uncharacterized subtypes (Table 1)

Mutation profiles of driver genes in Vietnamese NSCLC patients.
In the present study, we developed a targeted capture sequencing assay to analyse genetic alterations in formalin-fixed paraffin-embedded (FFPE) tissue biopsy specimens of NSCLC patients. We first validated our assay by comparing its performance with a commercial droplet digital PCR (ddPCR) assay (Bio-rad) for detecting three major EGFR mutations (L858R, del19 and T790M) in 40 tissue samples randomly selected from our cohort. When ddPCR results were used as reference standard, our targeted capture sequencing assay exhibited sensitivity of 81.8% (11/13), specificity of 100% (27/27) and concordance rate of 95% (38/40) ( Table 2, Table S2). The two cases (LBL021 and L10021) that were positive for del19 mutation by ddPCR but missed by our assay had relatively low variant allele frequency (VAF) of 0.5% and 3.9%, respectively, below the limit of detection of our assay (4%) (Table S2). Hence, these results confirmed that our targeted capture sequencing assay achieved precise identification of mutations with VAF >4% in FFPE tissue samples. Therefore, this assay was subsequently used to identify genetic alterations in six major driver genes of NSCLC including KRAS, EGFR, NRAS, BRAF, ALK and ROS1 for the cohort of 350 NSCLC patients. Among 350 patients successfully sequenced, 232 (66.3%) cases were found to carry at least one clinically relevant genetic alteration (according to ClinVar) in the tested driver genes while the remaining 118 cases (33.7%) were negative for these mutations (Fig. 1A). EGFR (32.3%) and KRAS (20%) were the most frequently mutated driver genes, followed by ALK (5.4%), ROS1 (2.9%), BRAF (1.1%) and NRAS (0.6%) (Fig. 1A). Although mutations in driver genes such as EGFR, KRAS and ALK were reported to be mutually exclusive in majority of NSCLC patients 20 , we detected 14 cases (4%) carrying mutations in more than one driver genes (Fig. 1B). Of those, the co-occurrence of mutations in EGFR and KRAS was the most common (6 cases), including one case carrying concurrent mutations in 3 driver genes (EGFR, KRAS and BRAF). EGFR mutation was also found in 3 cases with ddPCR NGS
Collectively, our data indicated that mutations in EGFR, KRAS are the two most common events occurring in more than half of all tested NSCLC patients in Vietnam, followed by less frequent mutations in other tested genes.

Comparison of mutation profiles among three NSCLC cohorts.
To put the mutation profiles found in the Vietnamese cohort into global context, we compared the prevalence of mutations in Vietnamese NSCLC patients with those found in two other cohorts including the Caucasian MSK-IMPACT cohort established by the Memorial Sloan Kettering (MSK) Cancer Center 21,22 and East Asian (China) cohort retrieved from a study by Wang et al. 23 (Table 3). Since the majority of patients in the Vietnamese cohort were diagnosed with AC subtype in advanced stages (III-IV), we selectively retrieved data from patients with comparable histology and tumour stage from the other two cohorts. Given the comparable histology and stage, the Vietnamese cohort had fewer female patients than the other two cohorts. Additionally, patients in this cohort were slightly younger than the MSK-IMPACT cohort (median age: 61 versus 64, p < 0.001, Table 3) but older than the East Asian cohort (median age: 61 versus 58, p < 0.001, Table 3).
The prevalence of mutations in EGFR was significantly higher in the Vietnamese cohort compared to the MSK-IMPACT cohort (37.7% versus 29.1%, p < 0.05, Table 3 and Fig. 3) but markedly lower than the East Asian cohort (37.7% versus 73.4%, p < 0.00001, Table 3 and Fig. 3). Interestingly, the prevalence of KRAS mutations of the Vietnamese cohort was comparable to the MSK-IMPACT cohort while it was significantly higher than that of East Asians (21.4% versus 9.1%, p < 0.0001, Table 3, Fig. 3). Apart from KRAS and EGFR mutations, mutation frequencies of the remaining tested genes showed no significant differences between the Vietnamese cohort and the other two cohorts (Table 3, Fig. 3) In summary, the cohort of Vietnamese NSCLC patients showed specific characteristics that set it apart from the two other cohorts. It had higher prevalence of EGFR mutations than the Caucasian MSK-IMPACT cohort but lower than the East Asian cohort while its KRAS mutation prevalence is higher than the East Asian cohort.   www.nature.com/scientificreports www.nature.com/scientificreports/ Correlation between mutation prevalence and clinicopathological features of Vietnamese NSCLC patients. Previous studies have reported significant association between prevalence of driver mutations and patients' clinicopathological features [24][25][26][27][28][29] . However, the results are often inconsistent across different studies. Here, we examined such associations in the Vietnamese NSCLC cohort.
Gender. Gender status was available in 343 patients including 131 female and 212 male patients (1:1.6 ratio); the 7 patients with unknown sex were excluded from gender association analysis. We performed Chi-squared (χ²) test to investigate the association between patients' gender and mutation prevalence. Consistent with previous studies, we found that EGFR mutations were more commonly detected in Vietnamese female patients than in male patients (48.1% versus 26.9%, p < 0.00001, Table 4). Conversely, KRAS mutation frequency was significantly higher in male than that in female patients (30.7% versus 9.2%, p < 0.0001, Table 4). Other driver genes including NRAS, BRAF, ALK and ROS1 did not show any significant correlation Age. Patient age was available in 344 patients, ranging from to 24 to 89 years old. When the median age (61 years) was used as a cutoff value, KRAS mutations were more frequently detected in elderly patients aged over 61 years than younger individuals (25.3% versus 20.1%, p < 0.05, Table 4). In contrast, the younger group showed higher prevalence of ALK (9.8% versus 3.5%, p < 0.05) and ROS1 (5.2% versus 1.2%, p < 0.05) mutations (Table 4). There was no significant correlation between patient age and mutation prevalence of other driver genes including EGFR, NRAS and BRAF (Table 4).
Smoking status. Of the 185 cases with smoking status, we could detect a statistically significant correlation between EGFR and KRAS mutation prevalence and smoking status (Table 4). Our data indicated that non-smokers showed significantly higher frequency of EGFR mutation (51.9% versus 29.6%, p < 0.01) but lower frequency of KRAS mutation (6.9% versus 29.6%, p < 0.001) than smokers.
Histology. Adenocarcinoma (AC) were diagnosed in 241 patients, accounting for the most common histological type (81%), while squamous cell carcinoma (SCC) were identified in 25 cases (8.4%). We detected a significant association between histology type and EGFR mutation with higher prevalence in AC group than SCC group (35.7% versus 16%, p < 0.05) ( Table 4).

Discussion
NSCLC is the most common type of lung cancer with high rates of acquired somatic mutations 2 . Comprehensive profiling of clinically relevant mutations is of great importance in clinical practice for designing optimal targeted therapy as well as understanding drug resistance mechanisms 30,31 . Given the diversity of mutation constitution in different populations, the primary objective of our study is to examine the mutation profiles of major druggable driver genes in Vietnamese NSCLC patients. To this end, we performed targeted capture sequencing on 350 tumour tissue samples from Vietnamese patients with NSCLC and analyzed their genomic alterations in the six most common driver genes recommended by the ASCO and National Comprehensive Cancer Network for mutation testing in NSCLC patients 17 .
Our data showed that 66.3% of patients in the Vietnamese cohort harbour at least one alteration in the six tested driver genes. Our findings were consistent with our previous study using the same panel of genes and reporting a similar mutation rate of 63.6% in a smaller cohort of 59 Vietnamese NSCLC patients 32 . The mutation profiles of Vietnamese NSCLC patients also exhibited certain common features of NSCLC patients previously reported 33,34 . Firstly, mutations in EGFR and KRAS are the most common, accounted for more than 50% of total cases, while mutations in ALK, BRAF, NRAS and ROS1 were rarer. This trend was also reported in a Chinese cohort by Zhuang et al. 33 or in Caucasian populations by Campbell et al. 34 . Secondly, the mutation sites within the driver genes identified in this cohort were similar to those reported in other populations, including these most Figure 3. Comparison of driver gene mutation frequencies between Vietnamese NSCLC cohort with Caucasians and East Asians. Mutation frequency of each driver gene in the Vietnamese cohort was calculated among 220 patients with adenocarcinoma (AC) in late stages (III-IV) taking into account cases with comutation. For the Caucasian cohort, data were obtained from the MSK-IMPACT cohort (764 lung cancer cases with AC subtypes in metastatic stages (III-IV), Asian patients were excluded). For East Asia cohort, data we retrieved from a recent report profiling a similar panel of driver mutations in a Chinese cohort of 361 patients with AC in late stages (III-IV). NT: mutations were not tested. common mutation sites: EGFR exon 19 deletion (del19) and exon 21 (L858R) 35,36 , KRAS exon 2 (G12C) 37 , BRAF V600E 38 and ALK-EML4 fusion 39 . The frequencies of ALK (5.4%) and ROS1 (2.9%) mutations determined in our study ( Fig. 1) are comparable to previously published studies reporting the frequencies of 5.05 and 1%-2% for ALK and ROS1, respectively 40,41 . EGFR del19, EGFR L858R, BRAF V600E, ALK-EML4 and ROS1 fusion mutations in combined (139 cases) accounted for 39.7% of cases in the Vietnamese cohort. They were known as activating mutations and clinically proven to be sensitive to treatments with available TKI drugs 5,6,11,39 . Thus, our findings suggested that approximately 40% of Vietnamese patients would carry such mutations and therefore would benefit from available targeted drugs. Among 6 patients currently known to be on TKI therapy (Table 1), 5 cases carrying activating EGFR mutations (EGFR del19, EGFR L858R) and one case positive for both EGFR L858R and ALK-EML4 fusion.
In contrast, some patients with activating EGFR mutations were found to develop an acquired resistance mutation, T790M 42,43 . Consistent with previous studies, we found that 7 out of 8 T790M cases were also positive for either L858R or del19 mutations and we suspected that these patients were under treatment with TKI drugs although we could not obtain treatment data for these cases. In addition to T790M, ins20 mutations were also known as resistant mutations 44,45 and were detected in 8 cases in our cohort and most of them (6/8 cases) did not co-exist with any activating EGFR mutations L858R and del19, suggesting that ins20 mutations are likely primary inactivating mutations rather than acquired resistant mutations. Although a significant proportion of Vietnamese NSCLC patients were identified to carry KRAS mutations, drugs directly targeting KRAS mutated NSCLC are still under clinical evaluation 46 .
Although concomitant driver gene mutations in EGFR, KRAS, BRAF and ALK were initially reported to be mutually exclusive events in NSCLC patients 20,47 , we detected 14 cases (4%) harbouring concurrent alterations among the tested driver genes. These mutations might either coexist in the same tumour cell or belong to different tumour cell lines. The proportion of cases with co-mutations varied among studies. A recent study by Zhuang et al. 33 involving a cohort of 3774 Chinese NSCLC patients reported a lower co-mutation rate of 1.67% in 5 tested driver genes (EGFR, KRAS, ALK, ROS1 and BRAF) while 5% of patients in another cohort of 1,000 NSCLC patients at The NCI's Lung Cancer Mutation Consortium were reported to harbour concomitant driver gene mutations 48 . However, these studies together with our study consistently reported the EGRF/KRAS (6/14 cases in our study) as the most common co-mutation event in NSCLC patients 33,48 . The identification of patients  www.nature.com/scientificreports www.nature.com/scientificreports/ with such co-mutations is of clinical importance since these concurrent mutations represent a distinct subset of patients and may have significant impact on treatment outcomes. In this regard, previous studies showed that patients carrying EGFR/ALK co-mutations varied in their sensitivity to and that the choice between these two classes of TKI drugs as first-line treatment for these patients is still being debated 49 . Hence, further studies are required to investigate clinical activity and drug sensitivity of different co-mutation subsets in order to develop suitable treatment approaches.
To identify Vietnamese-specific mutation profiles in NSCLC patients, we selected patients with comparable histology (AC) and tumour stage (stage III-IV) from the East Asian cohort (China) 23 and the MSK-IMPACT cohort mainly consisting of Caucasia patients 21,22 . The prevalence of EGFR mutations among Vietnamese NSCLC patients was markedly lower than East Asia cohort (37.7% versus 73.4%, p < 0.00001, Table 3) but significantly higher than MSK-IMPACT cohorts (37.7% versus 29.1%, p < 0.05, Table 3), confirming previous reports that EGFR mutations are more prevalent in Asian patients than in Caucasian patients 50 . Of note, the percentage of Vietnamese patients with KRAS mutation including those with concurrent mutations was comparable to the MSK-IMPACT cohort but significantly higher than the published data in East Asian (21.4% versus 9.1%, p < 0.0001). Hence, our results demonstrated that NSCLC patients from Vietnamese population exhibit a unique mutation constitution, suggesting that ethnic composition might contribute to the observed variation in mutation profiles. Furthermore, Nguyen et al. 32 reported a remarkably higher frequency of KRAS mutations in Vietnamese patients living in Vietnam than in those living in the USA (24.4% versus 4.5%), suggesting that geographic and socioeconomic disparities might also contribute to the variation in mutation frequencies in different cohorts. Although the clinical significance and mechanisms driving these variations are unclear, the unique mutation profiles should be taken into consideration for prioritizing research programs aiming to develop new treatment strategies for Vietnamese NSCLC patients.
KRAS mutations were detected in 30% of NSCLC patients who were non-responsive to TKI treatment 51 . Hence, the high prevalence of KRAS mutations in the Vietnamese NSCLC cohort might have negative impacts on clinical outcomes. However, the use of KRAS mutation status as a negatively predictive marker of TKI therapy remains controversial due to inconsistent results obtained from subsequent meta-analysis studies [52][53][54][55] . Interestingly, KRAS mutations, particularly the most prevalent subtype G12C, when co-existing with PD-L1 expression in patients' tumour were shown to have poor prognosis 56 , supporting the potential benefit of KRAS mutation testing in selecting patients for immunotherapy using check-point inhibitor.
We further investigated correlations between mutation prevalence and major patients' clinical characteristics. Consistent with previous studies 25, 50 , we observed that EGFR mutations were more prevalent in female patients, non-smoker and those with histological subtype of AC. Unlike EGFR mutations, KRAS mutations were more commonly detected in male patients and showed significant correlations with patients' age, with higher prevalence in elder patients. It is possible that the high prevalence of KRAS mutations in Vietnamese male patients may be responsible for their higher mortality rate. Previous studies reported that KRAS mutations more frequently arise in smokers than in non-smokers 25,57 . Consistently, we observed such correlation in our study, indicating that the high frequency of KRAS mutation in our cohort could be attributable to high prevalence of smoking in Vietnam population. In addition, we found that ALK and ROS1 rearrangement mutations were more common in younger patients as compared to elderly patients, which is consistent with previous studies 28,29 . Take together, our data revealed several significant correlations between driver gene mutation prevalence and patients' clinical characteristics.
There are a few limitations in our study. Firstly, although the panel of driver genes used in this study was chosen based on ASCO guidelines 5 , we did not take into account mutations in other driver genes such as PIK3CA 58 , AKT1 59 and ERBB2 60 , previously reported to co-exist with those detected mutations and possibly having significant clinical impacts. Secondly, when comparing the mutation profiles of the Vietnamese cohorts with MSK-IMPACT and East Asian cohort, we selected patients with AC in late stages (stage III-IV) to exclude the confounding effect of histological subtypes and tumour stage that are varied among three cohorts. However, there were still differences in patients' age at diagnosis and gender ratio between the Vietnamese and the other two cohorts, which were identified be significant factors associated with EGFR and KRAS mutation prevalence, thus might have impact on the analysis of the mutation profiles. Future large-scale studies are required to assess whether these confounders contribute to the variations in mutation profile between the of Vietnamese population and other races.
In conclusions, our study revealed the mutation profiles of multiple driver genes in the largest cohort of NSCLC patients in Vietnam to date. Our data highlighted several subsets of Vietnamese NSCLC patients carrying specific mutations that would benefit from future studies to provide more suitable treatment options.

Material and Methods
Tumour tissues. We studied 350 formalin fixed, paraffin-embedded tumour specimens from NSCLC patients treated at Pham Ngoc Thach hospital, Cho Ray hospital, Ha Noi Oncology hospital and Vietnam National cancer hospital. The tumour-rich areas of the tissues that contain at least 20% of tumour cells identified by a hematoxylin and eosin staining were micro-dissected. Written informed consents were obtained from all patients. Clinical characteristics of all patients were summarised in Table S1. This study was approved by The Ethic Committee of University of Medicine and Pharmacy at Ho Chi Minh City, Vietnam (Ethic number: 027/ DHYD-HD) and The Medical Genetics Institute. All methods were performed in accordance with the relevant guidelines and regulations. DNA isolation. DNA  www.nature.com/scientificreports www.nature.com/scientificreports/ Massively parallel sequencing. DNA fragmentation and library preparation were performed using the NEBNext Ultra II FS DNA library prep kit (New England Biolabs, USA) following the manufacturer's instructions. DNA library concentrations were quantified with a QuantiFlour dsDNA system (Promega, USA). Equal amounts of libraries (150 ng per sample) were pooled together and hybridized with xGen Lockdown probes for six targeted genes EGFR, KRAS, NRAS, BRAF, ALK and ROS1 (IDT DNA, USA). For ALK and ROS1, customized probes (Table S3) for intron regions were designed and mixed with probes for exon regions at equal concentration. Sequencing was run using NextSeq. 500/550 High output kits v2 (150 cycles) on Illumina NextSeq. 550 system (Illumina, USA) with minimum target coverage of 100×. In cases where the mean coverage in the targeted regions is lower than 100×, extra sequencing was performed to increase the mean coverage to the expected range. The mean coverage in the target regions for all samples is approximately 129×.
Variant calling using Mutect2 and Factera. Each FFPE sample was barcoded with dual indexes in the P7 and P5 primer. The PE reads were generated by bcl2fastq package (Illumina) and aligned to human genome (hg38) using BWA package 61 . Duplicate reads were marked using MarkDuplicates from Picard tools (http://broadinstitute.github.io/picard/). Somatic variants were called using Mutect2 package 62 . A custom pipeline with call to BWA, Picard, and Samtools packages were built to perform the above-mentioned analysis steps 63 . For detection of ALK and ROS1 rearrangement, fusion variant calling was analyzed using Factera v1.4.4 with default parameters 64 . ddPCR method. A four-step ddPCR procedure was performed using reagents and equipment from Bio-Rad (unless otherwise stated) following the manufacturer's instruction 65 . Briefly, the PCR mix was first prepared by mixing 1 × ddPCR Supermix for Probes, primers and probes (IDTDNA) and DNA template (0.8 or 1.6 ng). Next, 20 µl of the PCR mix was transferred into the Droplet Generator DG8TM Cartridge followed by 70 μl of the Droplet Generation Oil before placing in a QX100TM Droplet Generator to generate droplets. Subsequently, the droplets were transferred to a 96-well plate before placing in a thermal cycler (C1000 Touch, Bio-Rad) for PCR amplification. The PCR thermal program was performed as follows: 95 °C for 10 min, then 40 successive cycles of amplification (94 °C for 30 sec; 55 °C for 60 sec) and 98 °C for 10 min. Lastly, the droplet reading was acquired by the QX 200 Droplet reader and analyzed using the QuantaSoft Software. Positive and negative droplets are assigned based on the fluorescence threshold that was set as previously described by Deprez et al. 66 .

Statistical analysis.
Pearson's chi-squared (χ²) test (sample size >5) or Fisher's exact test (sample size< = 5) was performed on the web page 'Social Science Statistics' (http://www.socscistatistics.com) to assess the association between two categorical variables (Tables 3 and 4). Bonferroni correction was applied when multiple comparisons were performed (Table 3).