Mutational profile of Brazilian lung adenocarcinoma unveils association of EGFR mutations with high Asian ancestry and independent prognostic role of KRAS mutations

Lung cancer is the deadliest cancer worldwide. The mutational frequency of EGFR and KRAS genes in lung adenocarcinoma varies worldwide per ethnicity and smoking. The impact of EGFR and KRAS mutations in Brazilian lung cancer remains poorly explored. Thus, we investigated the frequency of EGFR and KRAS mutations in a large Brazilian series of lung adenocarcinoma together with patients’ genetic ancestry, clinicopathological and sociodemographic characteristics. The mutational frequency of EGFR was 22.7% and KRAS was 20.4%. The average ancestry proportions were 73.1% for EUR, 13.1% for AFR, 6.5% for AME and 7.3% for ASN. EGFR mutations were independently associated with never-smokers, high-Asian ancestry, and better performance status. KRAS mutations were independently associated with tobacco exposure and non-Asian ancestry. EGFR-exon 20 mutations were associated with worse outcome. The Cox regression model indicated a worse outcome for patients whose were older at diagnosis (>61 y), solid histological subtype, loss of weight (>10%), worse performance status (≥2), and presence of KRAS mutations and EGFR mutational status in TKi non-treated patients. In conclusion, we assessed the clinicopathological and ethnic impact of EGFR and KRAS mutations in the largest series reported of Brazilian lung adenocarcinomas. These findings can support future clinical strategies for Brazilian lung cancer patients.

All EGFR and KRAS mutations were mutually exclusive.

Genetic ancestry component and its association with EGFR and KRAS mutations.
We further assessed the ancestry background by an AIM-INDEL panel that allowed to estimates the AFR, EUR, ASN and AME ancestral proportions in 427 out of 444 patients (Fig. 2). The average ancestry proportions for all individuals were 73.1% for EUR, 13.1% for AFR, 6.5% for AME and 7.3% for ASN (Fig. 2). Ancestry proportions were further categorically defined as low, intermediate and high based on terciles (Supplementary Table 3). Most of our patients were self-declared as white (Table 1) and likewise most of our cases presented high EUR background (Fig. 2). We then correlated the genetic ancestry with the molecular features. EGFR mutations were associated with high ASN (p = 0.03; Supplementary Table 4). In the multivariate analysis, the high ASN background was independently associated with the presence of EGFR mutations [OD = 2.01 (1.09-3.71); p = 0.03; Table 2]. On the other hand, in the multivariate analysis, the low ASN background was independently associated with the presence of KRAS mutations [OD = 1.93 (1.06-3.52); p = 0.03; Table 2].
The additional ethnic groups, AFR, EUR, and AME, were not associated with the presence of EGFR and KRAS mutations (Supplementary Table 4).
Concerning KRAS status, the presence of mutations was associated with the presence of smoking habit (p < 0.0001). Age, self-reported race, disease staging, metastasis at diagnosis, weight loss at diagnosis, and differentiation grade were not associated with the presence of KRAS mutations (Supplementary Table 4). A multivariate analysis indicated the following independent variables as associated with the presence of KRAS mutations: tobacco exposure (current: OR = 3.42; p = 0.001/former: OR = 3.74; p < 0.0001; Table 2) and low Asian ancestry (OR = 1.93; p = 0.03; Table 2). www.nature.com/scientificreports www.nature.com/scientificreports/ Since most patients were diagnosed at disease stage IV, and they exhibit a very distinct outcome from stage I, II and III ( Supplementary Fig. 2), we only analyzed this group of patients in the multivariate analysis of disease outcome (Table 3). Unfavorable outcome was independently associated with age at diagnosis higher than 61 years old (OR = 1.45; p = 0.01), solid histological subtype (OR = 1.91; p < 0.0001), increased weight loss (OR = 1.72; p = 0.006), ECOG PS 2 and 3 or 4 (OR = 2.43 and OR = 6.28, respectively; p = 0.03 and p < 0.0001, respectively) ( Table 3). Metastasis at diagnosis in the central nervous system was considered a risk factor for an unfavorable outcome (Table 3). Clinical outcome was not independently associated with self-reported race and alcohol consuming ( Table 3).
The Cox regression analysis indicated the presence of EGFR mutations in TKi non-treated patients were independently associated with unfavorable outcome (OR = 3.79; p = 0.001; Median OS TKi non-treated = 6.7 months; Median OS TKi-treated = 19.9 months) ( Table 3 and Supplementary Fig. 2). Of note, among the EGFR-mutated patients that were not treated with TKi (n = 30), 15 received Best Support of Care (BSC), 9 received only chemotherapy, and 6 of them had localized disease and they were submitted to local treatment.
In addition, the Cox regression analysis also showed that the presence of KRAS mutations was independently associated with unfavorable outcome (OR = 2.93; p < 0.0001; Table 3).

Discussion
Genetic testing is crucial for molecular-targeted therapies in NSCLC. In Brazil, less than half of the Brazilian cancer patients are tested for therapeutic targets and the public health system does not cover the greatest majority of molecular testing for NSCLC 29 . Conversely, at the Barretos Cancer Hospital, a non-profit cancer center where 100% of patients are from the public health system 30   www.nature.com/scientificreports www.nature.com/scientificreports/ small number of cases. Nevertheless, solid histological subtype had been previously associated with a worse prognosis 25,31 .
Although the role of EGFR has been well established in the last few years, data on Brazilian populations remains limited. In the present study, the frequency of EGFR mutations was 22.7%. Previous findings from smaller Brazilian cohorts described frequencies of EGFR mutations between 22% (27/125) to 30% (63/207) 25,26 . The most recurrently EGFR mutation in the present adenocarcinoma series was a deletion in exon 19 followed by a substitution in exon 21 (p.L858R), similarly to previously reported in Brazilian patients 25 . These mutations are known to be sensitive to TKi. Interestingly, in the present study, the EGFR-mutated patients that were not treated with TKi, due to poor PS at diagnosis or death before receive the result of molecular test, presented a worse outcome compared with those EGFR-mutated patients TKi-treated, supporting the clinical benefit of TKi in EGFR mutated patients. As expected, the most recurrently EGFR resistance mutation was the p. Tyr790Met and further EGFR resistance mutations were mostly located at exon 20 32 . In accordance, EGFR-mutated patients harboring mutations located at exon 20 presented lower overall survival and compromised response to TKi.
Several commercial assays are available for EGFR testing, such as COBAS (Roche), and Therascreen (Qiagen) among others, which are realtime-PCR based and are built to harbor the major mutations reported in the literature. In this context, we can hypothesize that approximately 13% of the mutations identified in our series, mainly located in exon 20 and 21, would not be detected by these commercial assays that are widely used. Thus, these results emphasize the importance of the knowledge of the mutational profile of each population to better guide the methodology used for routine practice.
We next interrogate the impact of EGFR status in patients' clinicopathological features. In a multivariate analysis, we observed that EGFR mutated cases was associated with never smokers, better PS, and higher Asian ancestry, and a tangentially with the female gender. These results are in accordance with the literature 16,26,[33][34][35] . Interestingly, the association of EGFR mutation with higher Asian ancestry observed in our Brazilian cases is related with the admixture of Asian background in our Brazilian cases, probably due to Japanese/Korean/Chinese immigration wave in the 1940's.
We observed KRAS mutations in 20.4% of lung adenocarcinomas. This frequency is in accordance with reported in international literature, that vary from 15-33% of cases 21,33,34,36 . Likewise, our results are in line with the two previous Brazilian reports, which showed 15% (30/207) and 26% (33/115) of KRAS mutations in lung cancer 25,26 . As well reported 26 , we found that KRAS mutations were more frequently found in patients who reported tobacco exposure. We also observed an independent association of KRAS mutation with a lower Asian background. Importantly, in our series, KRAS mutation was an independent factor for unfavorable outcome supporting the prognostic value of KRAS mutations. The prognostic role of KRAS in lung cancer is not consensual,  www.nature.com/scientificreports www.nature.com/scientificreports/ with diverge reports 37,38 . Recently, it was reported that KRAS mutation induced upregulation of PD-L1, through p-ERK, mediated immune escape in lung adenocarcinoma, and induces the apoptosis of CD3-positive T cells, which were reversed by anti-PD-L1 or ERK inhibition 39 . In addition, it was reported that specific KRAS mutations could affect the immune microenvironment of lung adenocarcinoma patients, which affect the efficacy of immune checkpoint inhibitors, implying stratification of patients for immunotherapy should be tailored based on the specific mutant KRAS variants and tumor microenvironment 24 . Thus, following the advent of immunotherapy, KRAS mutations have rewarded new purposes with the promising clinical utility.
The present study harbors some limitations, mainly due to the retrospective nature of the study, therefore patients were not treated uniformly, which hamper proper outcome analysis and only patients diagnosed at stage IV were included in the survival analysis, since patients diagnosed at stages I, II, and III presented distinctive outcomes compared with stage IV.
Concluding, this is the largest study assessing EGFR and KRAS mutation status in the Brazilian lung adenocarcinoma population. EGFR mutation was associated with Asian ancestry background, confirming the known geographic disparities. In our series, KRAS mutation was an independent prognostic factor. Overall, these data provide important information about the role of some of the most important driver genes and tailored-guided treatment for lung adenocarcinoma in the Brazilian population.

Materials e Methods. Study population and design.
This is a retrospective study conducted at the Center for Molecular Diagnosis, from patients diagnosed at Barretos Cancer Hospital from 2011 to 2014. Overall, 496 NSCLC cases, who underwent surgical resection or core biopsy were histopathologically re-evaluated. Of these, 52 cases with non-adenocarcinoma histology were excluded for further analysis. A subset of this series was previously published and tested for ALK translocations 40 . The major clinicopathological features of the 444 lung adenocarcinomas are summarized in Table 1. Overall, 232 were male (52%) and 212 female (48%) with an average age at the diagnosis of 61 years old (22-87 years). Most of the patients were self-reported as white (79%), were current or former smokers (77%) and were no alcohol consumers (61%). Most patients were diagnosed at stage IV (74%) and among these patients, most of them presented metastasis at multiple sites (61%). ECOG PS 1 was the most prevalent at diagnosis, and weight loss was observed in half of the patients (50.6%). The most predominant histological subtype was acinar, followed by solid, papillary, lepidic and mucinous but about a quarter of the tumors (27%) was not possible to determine the histological subtype (Table 1).
Considering the present study enrolls a retrospective series, patients were treated ununiformly. Detailed information about treatment regimens are described in supplementary material (Supplementary Tables 6-9).
The present study was approved by the Barretos Cancer Hospital IRB (Project n°. 630/2012), which bestowed the exemption of informed consent due to the retrospective nature of the study since most of the patients are dead. All methods were performed in accordance with the relevant guidelines and regulations.
DNA isolation. Serial 10 μm unstained sections of FFPE blocks were cut for DNA isolation and one hematoxylin and eosin-stained (H&E) section was taken for pathological evaluation and selection of the tumor area as previously reported 41 . Briefly, sections were heated at 80 °C and serial washes with xylene and ethanol (100, 70 and 50%) were performed for paraffin removal. Then, sections were macrodissected using a sterile needle and carefully collected into a microtube. Next, DNA was isolated from FFPE tissues using the QIAmp DNA micro kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. DNA concentration and quality were evaluated by Nanodrop 2000 (Thermo Scientific, Wilmington, USA). DNA samples were diluted to a final concentration of 50 ng/μL and stored at −20 °C for further molecular analysis.
Mutational analysis for EGFR and KRAS hotspot regions. The mutational analysis for hotspots regions of EGFR (exons 18, 19, 20 and 21) and KRAS (exon 2, codons 12 and 13) genes was analyzed by PCR, followed by direct sequencing, as previously described 9,42 . Briefly, the PCR reaction was performed with 50 ng of DNA in a final volume of 15 µL, using 10 µM of both forward and reverse primers and 7,5 µl of the HotStart master mix (Qiagen, Hilden, Germany), following the manufacturer's instructions. PCR conditions are following described: 96 °C for 15 minutes, 40 cycles of 96 °C for 45 seconds, 56.5 °C for 45 seconds, 72 °C for 45 seconds and a final extension of 72 °C for 10 minutes.
The amplification of PCR products was checked by electrophoresis in agarose gel and purified by enzymatic reaction (ExoSAP-it, ThermoFisher Scientific). Next, direct sequencing was carried out using BigDye Terminator v3.1 Cycle Sequencing kit (ThermoFisher Scientific) with the following conditions: 97 °C for 3 minutes, 28 cycles of 96 °C for 10 seconds, 50 °C for 5 seconds, and 60 °C for 4 minutes. Sequencing products were purified using BigDye Xterminator (ThermoFisher Scientific) and analyzed on a 3500 Genetic Analyzer, ABI capillary electrophoresis system (Applied Biosystems). Sequences were captured by the SeqScape software (Applied Biosystems) and manually compared to reference sequences collected from GenBank (EGFR: NG_007726.3; KRAS: NG_007524.1). All mutations were confirmed twice.
Cobas ® EGFR Mutation. A subset of cases was processed using the Cobas ® DNA Sample Preparation Kit for manual sample preparation and the Cobas z 480 analyzer for automated amplification and detection following Cobas ® EGFR Mutation Test kit manual instructions.
Ancestry analysis. The ancestry analysis was performed using a set of 46 ancestry-informative markers (AIMs) among the most informative INDELs for Native American (AME), European (EUR), African (AFR), and East Asian (ASN) population groups as previously published 43 . Primer sequences and PCR conditions were previously described 43,44 . Multiplex PCR was performed and the amplified products were submitted to fragment analysis www.nature.com/scientificreports www.nature.com/scientificreports/ on a 3500xL Genetic Analyzer, ABI capillary electrophoresis system, according to the manufacturer's instructions. The electropherograms were analyzed and genotypes were automatically assigned using GeneMapper v4.1 (Applied Biosystems).
Ancestry proportions were then assessed using the Structure v2.3.3 software 45,46 considering the four major population groups as possible contributors to the current genetic background of the Brazilian population. Data from the Human Genome Diversity Panel (HGDP-CEPH) previously demonstrating no significant departures from Hardy-Weinberg equilibrium and linkage equilibrium 43 were employed as a reference for the ancestral populations and a supervised analysis was performed to estimate ancestry membership proportions of the individuals involved in the present study. Structure v2.3.3 software runs considering K = 4 consisted of 100.000 burnin steps followed by 100.000 Markov Chain Monte Carlo iterations. The option 'Use population Information to test for migrants' was used with the Admixture model, considering allele frequencies correlated, and updating allele frequencies using only individuals with POPFLAG = 1.

Statistical analysis.
Clinicopathological factors were used in univariate and multivariate to determine whether the mutations have a significant effect on the parameter. Ancestry proportions were defined as categorical variables according to Lima-Costa et al. 47 . The significance of multivariate analysis for association with the presence of the mutations was assessed by the Wald test. Survival analysis was carried out using the Kaplan-Meier method and the Log-rank test. The multivariate analysis was performed by the Cox proportional hazard model to determine whether they have a significant effect on overall survival. All statistical analyses were conducted using SPSS 21.0 (IBM Corp, Armonk, NY, USA) and the level of significance was 5%.

Data Availability
The data that support the findings of this study are available from Dr. Rui Manuel Reis but restrictions apply to the availability of these data, which were used under ethics committee approval for the current study, and so are not publicly available because of patients' personal data. Data are however available from the authors upon reasonable request and with permission of the Dr. Rui Manuel Reis (Scientific and Executive Director of the Molecular Oncology Research Center, Barretos Cancer Hospital).