Profile of esophageal squamous cell carcinoma mutations in Brazilian patients

Esophageal cancer is an aggressive tumor that has a high rate of incidence and mortality worldwide. It is the 10th most frequent type in Brazil, being squamous cell carcinoma (ESCC) the predominant subtype. There is currently an incessant search to identify the frequently altered genes associated with esophageal squamous cell carcinoma biology that could be druggable. This study aimed to analyze the somatic mutation profile of a large panel of cancer-related genes in Brazilian ESCC. In a series of 46 ESCC diagnoses at Barretos Cancer Hospital, DNA isolated from paired fresh-frozen and blood tissue, a panel of 150 cancer-related genes was analyzed by next-generation sequencing. The genes with the highest frequency of mutations were TP53 (39/46, 84.8%), followed by NOTCH1 (7/46, 15.2%), NFE2L2 (5/46, 10.8%), RB1 (3/46, 6.5%), PTEN (3/46, 6.5%), CDKN2A (3/46, 6.5%), PTCH1 (2/46, 4.3%) and PIK3CA (2/46, 4.3%). There was no significant association between molecular and patients’ clinicopathological features. Applying an evolutionary action score of p53 (EAp53), we observed that 14 (35.9%) TP53 mutations were classified as high-risk, yet no association with overall survival was observed. Concluding, this the largest mutation profile of Brazilian ESCC patients, which helps in the elucidation of the major cancer-related genes in this population.

Description of the mutation profile. We sequenced the whole coding region of 150 cancer-related genes in 46 cases of esophageal squamous cell carcinoma. The mean read depth of sequencing was 911 × per gene and 310.9 × per variant. We found a mean of 1.9 driver mutations per patient (range 0-7), and we identified at least one driver somatic variant in 42 tumor samples. Driver mutations in single genes were found in 41.3% of tumors (19/46), whereas 26.1% (12/46) showed driver mutations in two genes, 17.4% (8/46) in three genes, 6.5% (3/46) in four or more genes and 8.7% (4/46) showed driver mutations in none of the genes analyzed. In total, 25 genes were found to harbor driver somatic mutations (Fig. 1). A complete list of variants (missense, frameshift, nonsense, in-frame, and splice mutations) identified is presented in Supplementary Table S1.
Five cases (11%) showed somatic mutations in the NFE2L2 gene (Figs. 1, 2 and Supplementary Table S1). The mutations were all missense, being p.Arg34Gln and p.Val32Gly in 2 cases, and p.Glu79Lys and p.Leu30Phe, which was present in one case each (Fig. 2B). Three tumors (7%) exhibited somatic changes in the RB1 gene ( Fig. 1; Supplementary Table S1). The mutations were two frameshifts and one splice site (Fig. 2D). The CDKN2A gene was mutated in 3 cases (7%) ( Fig. 1; Supplementary Table S1). The types of changes identified in the CDKN2A gene were two nonsense and one frameshift (Fig. 2E). We also observed the PTCH1 gene's somatic mutation in two tumors (2%) ( Fig. 1; Supplementary Table S1). The types of changes identified were one missense and one nonsense. Moreover, we observed mutations in TERT, STAT3, DDX3X, CDK12, and BAP1 genes in one case each (Supplementary Table S1).
NOTCH signaling pathway alterations. The Notch signaling pathway involving the genes NOTCH1 and FBXW7 was altered in 6 cases (13%) ( Fig. 1; Supplementary Table S1). The most affected gene was NOTCH1 (11%), and the types of changes identified were 3 missense, one frameshift, and one splice site (Fig. 2C). Only one missense mutation was identified in FBXW7.   www.nature.com/scientificreports/ PIK3-AKT signaling pathway alterations. The PIK3-AKT signaling pathway involving the genes PTEN, PIK3CA, and TSC2 was altered in 6 (13%) cases ( Fig. 1; Supplementary Table S1). PTEN was the gene with the most mutations identified with alterations in 3 cases (7%). The types of changes identified were one missense, one nonsense, and one translocation (Fig. 2F). Mutations in the PIK3CA gene were identified in 2 cases (4%), at hotspot codons p.Glu542Lys and p.Glu545Lys in one case each. The TSC2 gene was also mutated in one case (2%), and the alteration was a p.Thr831Met.
MAPK signaling pathway alterations. The MAPK signaling pathway, with genes KRAS, NF1, and BRAF, was altered in 3 ESCC (7%) ( Fig. 1; Supplementary Table S1). All three genes were changed in one case each (2%). All changes were missense type. The type of alterations in the KRAS gene was the p.Gly12Ser; in the NF1 gene, it was p.Pro2115Leu, and in the BRAF gene, it was p.Arg260Cys, outside the known hotspot V600 region.
Genes involved in the DNA repair. Of the 25 mutated genes, only 2 (MSH6 and PALB2) were involved in the DNA repair ( Fig. 1; Supplementary Table S1).
RTKs signaling pathway alterations. All four genes of the RTKs signaling pathway had a mutation each (2%). The RET gene is the missense p.Arg1050Gln mutation; the EPHA7 gene harbor the splice site p.Val777Leu mutation and the ERBB2 gene have a missense mutation. In the KIT gene, the only mutation was for p.Val825Gly.
Association of mutation profile and clinicopathological features. Next, we performed an analysis of the association between clinicopathological features and mutation status. Due to the small number of cases analyzed, this analysis was performed only for genes harboring more than 10% of mutations, namely TP53, NFE2L2, and NOTCH1 (Table 3). No significant association was found (Table 3). We also analyzed the association of TP53 categories of (EAp53 risk score and disruptive and non-disruptive) with patients' clinicopathological features, yet, no significant association was found (Table 4).
In addition, a Kaplan-Meier survival analysis was performed, and no association was observed with TP53, NFE2L2, and NOTCH1, neither with TP53 EAp53 score and disruptive and non-disruptive categories (Table 5 and Fig. 3).

Discussion
In the present study, we performed, for the first time, the somatic mutational profile of a panel of 150 cancerrelated genes in a series of 46 Brazilian esophageal squamous cell carcinomas. We identified 25 genes with alterations, the genes with the highest frequencies being TP53, NOTCH1, and NFE2L2.
The frequency of mutation observed was compared with the literature, and similar frequencies were observed with other populations ( Table 6). As expected, the TP53 gene was the highest mutated (84.8%, 39 cases). In our study population, its frequency was similar to that reported in other studies such as TCGA [14][15][16][17][22][23][24][25] (Table 6) and higher than previously reported in the Brazilian ESCC population (34-40%) 18,24,26 . This discrepancy with previous Brazilian studies is probably due to the distinct methodologies used since Sanger sequencing and hotspot TP53 regions were used on those studies, at variance with all TP53 coding regions and NGS used in the present study. We also applied for the first time a risk-system score based on Evolutionary Trace (ET) method proposed and validated in head and neck tumors (EAp53) 19 to evaluate its impact in ESCC. We showed that 36% of TP53 mutation in squamous cell carcinoma would be considered high-risk; however, there was no significant association between the clinical and pathological condition of the patient and the prognosis. This lack of significance could be related to the limited number of cases analyzed, and further studies are needed to validate our findings. We also classified TP53 mutations in disruptive and non-disruptive, as reported by Poeta et al. 20 and Molina et al. 21 , for head and neck and lung cancer, respectively. We observed that 54% were disruptive, yet no significant statistical association was observed with the clinical-pathological characteristics and patient prognosis.
Another important gene of the cell cycle is the CDKN2A gene, also known as p16, which the loss of its function can result from homozygous deletions and mutations. According to the TCGA and genomic studies, the significant alteration is homozygous deletion, leading to 76% loss of function of the CDKN2A gene in ESCC 17 . At variance with high copy number alterations, CDKN2A mutation frequency is much lower, ranging from 3-8% (Table 6).
The second most mutated gene was NOTCH1 (11%), which is part of the NOTCH signaling pathway, important to regulate the cell cycle senescence 27 . In esophageal squamous cell carcinoma, the frequency of mutations in this gene varies from 8 to 19% (Table 6), so our population is within the reported range. These findings are in concordance with the ESCC2 TCGA molecular subtype, observed in the 15 patients from South America (Brazil) analyzed 17 .
The NFE2L2 gene encodes the NRF2 protein, a transcription factor regulating the expression of antioxidant proteins that protect against damage caused by injuries and inflammation 28  www.nature.com/scientificreports/ by chemoradiation leading to resistance to treatment [29][30][31] . The frequency of mutations in the NRF2 gene has been reported in 3-17% of ESCC (Table 6). Driver mutations in NRF2 are believed to be late events in the process of developing esophageal squamous cell carcinoma 22,32 . In our population, we observed a frequency of 10.8% of alterations in this gene, and in accordance with TCGA, it is associated with the ESCC1 subtype, typical of squamous carcinomas associated with tobacco exposure 17 . The PI3K-AKT signaling is a complex pathway that regulates cell growth, proliferation, motility, apoptosis, and cell growth and is often related to the development of esophageal squamous cell carcinoma through driver mutations in the PIK3CA gene 15,16,33 . The frequency of mutations reported in the literature varied from 4 in the present to 13% (Table 6). Recently our group analyzed another series of 38 ESCC and observed a frequency of 10.5% of PIK3CA mutations 34 . The majority of the mutations are present in the hotspot exons 9 (E542K and E545K) and 20 (H1047R) [34][35][36] . (Supplementary Table S1). Another gene of this PI3K-AKT pathway is the PTEN, which is frequently mutated in some tumor types, such as melanomas and glioblastomas, yet its mutation rate in ESCC is lower, varying from 1 to 6.5%, as observed in our Brazilian series (Table 6). Of note, the TCGA ESCC3 subgroup is characterized by upregulation of this pathway, and in our study, it was alternated in 13% (PTEN, PIK3CA, and TSC2). www.nature.com/scientificreports/ The KRAS gene is an essential biomarker in cancer, mainly because it predicts the efficacy in therapies targeting the growth factor EGFR in tumors such as colorectal cancer 37,38 . According to the TCGA, the frequency of mutations in the KRAS gene is low, 7%, which is in line with the results obtained in our study population, which was 2% 17 .
ESCC is usually diagnosed late, and the minority of patients can benefit from treatments such as chemotherapy and radiation therapy 39 . Target therapies are important approaches for several tumors, including ESCC 39 . Significantly, in the present work, the identification of patients harboring PIK3CA mutations could potentially benefit from PI3K and mTOR inhibitors, such as buparlisib, alpelisib, and everolimus 40,41 . Moreover, the recent development of anti-KRAS agents, such as sotorasib and adagrasib 42,43 , can bring some hope for patients with KRAS mutations ( Table 6).
The present study has several limitations, being the relatively small number of cases analyzed the major issue, and it could explain the lack of significant association of mutation status and patients' clinical-pathological features. Additionally, a limited panel of 150 cancer-related genes, not whole-genome nor whole-exome sequencing, was performed, so a complete picture of the mutated landscape is lacking. Therefore, further studies with a larger population and broader mutation analysis are needed. Despite these issues, we performed paired germline/ tumor analysis, and it is the first to the somatic landscape of an admixture population such as the Brazilian ESCC population. Our findings align with the frequencies reported in other populations, namely Occidental and Asian, and will contribute to understanding the mutational profile of esophageal squamous cell carcinoma in Brazil.

Material and methods
Tissue samples. Forty-six patients diagnosed with esophageal squamous cell carcinoma treated at the Barretos Cancer Hospital's upper-digestive department, Barretos, SP, Brazil, were evaluated. The main clinicopathological features were collected from patients' medical records.
The tumor and blood samples were obtained from biopsy or surgery and immediately processed and stored at − 80 °C in the Barretos Cancer Hospital Biobank. The present study was approved by the Barretos Cancer Hospital Institutional Review Board (Project No. 1.454.967/2016), and all patients included signed an Informed Consent Form. All methods were performed following the relevant guidelines and regulations. DNA isolation. Tumor DNA was isolated from fresh-frozen tissue using QIAsymphony DNA Mini Kit following the Tissue_200 protocol for automated isolation in the QIAsymphony (QIAGEN, Hilden, Germany). DNA from leukocytes of peripheral blood was isolated using the QIAmp DNA Blood Mini Kit (QIAGEN,

MUT (n = 39) WT (n = 7)
p-value www.nature.com/scientificreports/  www.nature.com/scientificreports/ , and data pre-processing was performed following the recommended best practices, i.e., alignment against the human genome reference build GRCh37 using Burrows-Wheeler Aligner (BWA, version 0.7.13), duplicates were marked, and a further base quality score recalibration step was applied 45 . The VarScan2 algorithm called the somatic variants 46 . The variants with artifacts due to indel reads at their position, or less than 10% or more than 90% of variant supporting reads on one strand, were removed. The  www.nature.com/scientificreports/ variants were further filtered to remove those with fewer than ten reads covering the variant and less than 5% of the variant allele frequency. A second algorithm was applied using the somatic SNV and indell caller MuTect2 from GATK. MuTect2 combines the somatic genotyping engine of the original MuTect 47 with the assembly-based machinery of Haplo-typeCaller provided by GATK 48 , detecting somatic mutations using a Bayesian classifier approach. The variants were detected by comparing the likelihood of the site to be to sequencing noise and filtered by the alterations of the normal paired control (blood) from the same patient, by a pool of normals of 291 local samples using the same NGS technology and according to the allelic frequencies provided by the Genome Aggregation Database (gnomAD) datasets 49 to reduce miscalled germline calls. In this study, only variants found by both variant callers (VarScan2 and MuTect2) were considered. Finally, the annotation of variants was done using Variant Effect Prediction 50 .

MUT (n = 5) WT (n = 41)
To identify driver mutations in tumors, we applied the Cancer Genome Interpreter-CGI 51 . Briefly, CGI annotates potential driver mutations detected in tumors by identifying known tumorigenic variants and classifying variants of unknown significance via OncodriveMUT 51 . After the CGI classification, we maintained variants classified as tumorigenic and variants predicted as Tier1 or Tier2 for tumorigenesis. Variants that were not classified as cancer driver mutation or not predicted as Tier1 or Tier2 as driver by the OncodriveMut algorithm were excluded. Therefore, mutations identified as polymorphism (high allele frequency) or predicted as neutral or passenger for oncogenesis and found in DNA sequence outside coding regions were excluded.
Classification of TP53 mutations. To assess the impact of TP53 missense mutations, we used a risksystem score based on the Evolutionary Trace (ET) method proposed and validated in head and neck tumors by Neskey et al., called evolutionary action score of p53 (EAp53) (http:// mammo th. bcm. tmc. edu/ cgi-bin/ panos/ EAp53. cgi) 19 . In this system, mutations are scored from 0 to 100, with higher scores representing more deleterious mutations. A threshold score of 75 is used to classify variants as low-risk (EAp score < 75) or as high-risk (EAp score > 75). Mutations classified as high risk was associated with a poor prognosis, decreased survival, and increased development of distant metastases in head and neck tumors 19,52 .
Additionally, we evaluated a second classification proposed for head and neck tumors by Poeta et al. 20 and validated in lung cancer by Molina et al. 21 . According to this system, TP53 mutations are divided into "disruptive" and "non-disruptive". Disruptive mutations are stop-codon all over the coding region and missense mutations within the L2 and L3 sites, codons 163-195 and 236-251 with an amino acid polarity shift. Non-disruptive mutations are missense mutations within the L2 and L3 sites and do not change in polarity between the amino acids.
Statistical analysis. Characterization of the study population was analyzed through frequency tables for qualitative variables and measures of central tendency and dispersion (mean, standard deviation, minimum, and maximum) for the quantitative variables. Regarding the clinicopathological association analyzes with profile mutation (TP53, NFE2L2, and NOTCH1) and classification of TP53 mutations (EAp53 score and disruptive and non-disruptive), we used the Mann-Whitney test for age, and for other categorical variables, the Chi-square test or Fisher's exact test.

Data availability
Data that support the findings are available upon reasonable request and with the permission of Dr. Rui Manuel Reis.