Somatic Mutations and Genetic Variants of NOTCH1 in Head and Neck Squamous Cell Carcinoma Occurrence and Development

A number of genetic variants have been associated with cancer occurrence, however it may be the acquired somatic mutations (SMs) that drive cancer development. This study investigates the potential SMs and related genetic variants associated with the occurrence and development of head and neck squamous cell carcinoma (HNSCC). We identified several SMs in NOTCH1 from whole-exome sequencing and validated them in a 13-year cohort of 128 HNSCC patients using a high-resolution melting analysis and resequencing. Patients who have NOTCH1 SMs show higher 5-year relapse-free recurrence (P = 0.0013) and lower survival proportion (P = 0.0447) when the risk-associated SMs were analysed by Cox proportional hazard models. Interestingly, the NOTCH1 gene rs139994842 that shares linkage with SMs is associated with HNSCC risk (OR = 3.46), increasing when SMs in NOTCH1 are involved (OR = 7.74), and furthermore when there are SMs in conjunction to betel quid chewing (OR = 32.11), which is a related independent environmental risk factor after adjusting for substances use (alcohol, betel quid, cigarettes) and age. The findings indicate that betel quid chewing is highly associated with NOTCH1 SMs (especially with changes in EGF-like domains), and that rs139994842 may potentially serve as an early predictive and prognostic biomarker for the occurrence and development of HNSCC.

The NOTCH1 protein is a single-pass transmembrane receptor known to exist in a wide range of tissues and organisms 7 . A major part of NOTCH1 is its extracellular region, comprising of 36 EGF-like domains that contain Ca 2+ -binding consensus sequence. The inactivation of NOTCH1 has been linked to squamous cell differentiation and HNSCC 8 , and NOTCH1 regulation of squamous epithelium differentiation is also suggested by studies using cultured cervical and oesophageal keratinocytes 9,10 . In spite of NOTCH1 is proposed to be an oncogene or tumour suppressor gene in human cancer development 11 , in the instance of HNSCC [4][5][6] , it may be a tumour suppressor gene similar to cutaneous squamous cell carcinomas.
Because Taiwan, India, and Melanesia are betel chewing prevalent areas, whereas Central and Eastern European countries, such as France, Germany and Hungary are heavy alcohol drinking areas [12][13][14] , as well as epidemiological studies have linked HNSCC with the use of alcohol, betel quid, cigarettes and various genes 3,13,15 and tobacco has been studied as a contributing factor to somatic mutations (SMs) of NOTCH1 in HNSCC 5,6 ; this study investigates the potential SMs and related genetic variants and environmental (or substances use) risk factors that are associated with the occurrence and development of HNSCC. The effects of SMs are analysed in terms of recurrence and survival from HNSCC.

Results
Patient characteristics in SM Validation. Table 1 shows the clinical characteristics of the 128 HNSCC patients used for validation of SMs. Among these, 23 (18%) have NOTCH1 SMs. The mean ages of patients with and without NOTCH1 SMs were 52.8 and 51.3 years old. No significance difference was observed for the cancer sites, stages, and adjuvant therapies (radiotherapy and chemotherapy). However, significant differences were observed for the malignant tumour recurrence (P = 0.02) and fatality rate (borderline P = 0.06).
Structural characteristics of NOTCH1 SMs. Twenty-four SMs distributed across 34 exons of the NOTCH1 gene were found in the cancerous tissues of 23 HNSCC patients (Table S1). Twenty-one SMs (87.5%), including 15 (71.4%) at Ca 2+ binding sites and 6 (28.6%) at non-Ca 2+ binding sites, were located in EGF-like domains of the NOTCH1 extracellular region (Fig. 1a). The mutation category view showed 22 alternations comprising 19 point mutations, 1 single-base deletion and 2 mononucleotide insertions. Given the novelty of SMs, 4 SMs were found in the database of COSMIC v73, and 18 SMs were identified for the first time in this study. To  elucidate the relationship between the NOTCH1 SMs and the functional diversity, the structural consequences of the respective SMs in proteins were assessed. Figure 2a presents the detailed positions of 19 SMs in EGF-like domains. Three NOTCH1 SMs were outside EGF-like domains, including 1 in the LNR region, 1 in the TM region and 1 in the RAM (Fig. 2b).
In silico prediction of functional impact of NOTCH1 SMs. Functionally, 22 of the 24 SMs (91.7%) that was detected in 23 HNSCC patients were non-synonymous mutations, comprising 7 novel nonsense and frameshift SMs (31.8%) and 15 missense mutations (68.2%) (Fig. 1b). NOTCH1 is regarded as a tumour suppressor in HNSCC because these missense SMs within the domain frequently harboured potential protein inactivation or were located in domains that affected the conserved residues in the NOTCH1 gene (Fig. 2b). Furthermore, these SMs have the potential to induce persistent NOTCH1 functional defects and to change the capacity of NOTCH1 in a manner that is indispensable for its interaction with ligands. The effects might be similar to those of NOTCH1 downregulation.
To quantify the extent to which the HNSCC phenotype can be explained by a destructive effect on protein structures or functions, these SMs are mapped onto the known 3D structure of the NOTCH1 protein (Fig. S4).   Table 2). The Kaplan-Meier survival curves for patients with and without NOTCH1 SMs revealed significantly different 5-year relapse-free recurrence and survival curves ( Fig. 3; P = 0.0013 and P = 0.0447, respectively). Multivariate regression analysis demonstrated that NOTCH1 SMs [hazard ratio (HR) = 3.2, P < 0.01) is an independent prognostic factor associated with 5-year disease-free recurrence for HNSCC patients; the HR increased to 5.2-fold (P < 0.01) after controlling for age at surgery, disease status (cancer site and stage), and adjuvant therapies (radiotherapy and chemotherapy) (also refer Table 2). Similar results were obtained for the 10-year disease-free survival analysis (Fig. S5). Moreover, after controlling for age at surgery, disease status and adjuvant therapies, NOTCH1 SMs (HR = 5.2, P < 0.01) were a prognostic factor for the 5-year disease-free survival of HNSCC patients.
SMs-associated genetic polymorphism and environmental risk factors in a case-control study. The mean age of the patients with HNSCC and the controls were 53.8 years and 50.8 years old (Table S3).

Figure 2. Somatic mutations distributed across the region of NOTCH1 receptor in 23 HNSCC patients.
(a) An alignment of 36 tandem EGF-like domains of human NOTCH1 extracted from the UniProt protein database and generated by Align tools using the Clustal Omega programme according to the EGF-like repeats consensus. Each line represents a conserved EGF-like domain, consensus site for Ca 2+ dependent binding (shaded yellow) and non-Ca 2+ binding (shaded green) among 36 EGF-like repeats in the extracellular domains of a fold "triple-stranded" structure model. Red highlighting indicates six conserved Cysteine residues of the EGF-like domain to form consensus disulfide bonds. Blue and green boxes show the somatic mutation identified from this study of 124 HNSCC patients. Grey, red and purple shading in boxes show synonymous, missense and nonsense somatic mutations at the EGF-like domain, respectively. The symbol of "I" indicates the frameshift mutation. (b) Schematic diagram of the domain organization of the human NOTCH1 gene generated by the SMART database including 36 tandem EGF-like repeats (colour yellow and green indicate the Ca 2+ -dependent and non-Ca 2+ binding domain, respectively; rectangle) and 3 Lin-12/Notch repeats (LNR; colour green; rectangle), 2 hetero-dimerization domain (HD; Colour grey; rectangle) determined as negative regulatory regions. A short transmembrane segment (TM; colour blue; arc). The Notch intracellular domain (NICD) contains the recombination signal-binding protein 1 for J (RBP-J) association molecule (RAM; colour red; rectangle), Ankyrin repeats (ANK; colour orange; rectangle), transcriptional activation domain (TAD; colour deep blue; rectangle) and proline, glutamic acid, serine/threonine-rich motif (PEST; colour brown; rectangle). Each colour bar represents a NOTCH1 somatic mutation in an HNSCC individual, of the class of mutation type indicated the same colour as (a).

Discussion
We observed a high fraction (68.2%) of HNSCC-related NOTCH1 SMs are missense mutations that locate in the functionally conserved residues within or close the extracellular region of ligand interaction. A lesser extent (31.8%) are nonsense and frameshift SMs that relate to truncated NOTCH1 proteins that lack C-terminal Notch intracellular domain (NICD) that may affect transactivation of target genes. A reduced NOTCH1 expression influences the terminal differentiation of squamous epithelium cells and forms immature epithelia, suggesting its essential role in maintaining the epithelial integrity 8 . An increased association with skin cancer risk 17 from gamma-secretase inhibitors that target the signalling pathway downstream of NOTCH1 for Alzheimer's disease has also been shown. The data are consistent with the NOTCH1 function as a tumour suppressor gene in HNSCC occurrence 4 . According to our findings, the NOTCH1 SMs in patients were not only associated with higher risks of cancer recurrence and lower survival in 5-year (Fig. 3) and 10-year (Fig. S5) Kaplan Meier survival estimates but also had a significant predictive power in multivariate Cox regression for both cancer recurrence and death after controlling for patient-and hospital-confounders (Table 2). We further found that carriers of NOTCH1 genetic variant rs139994842 were associated with five SMs of NOTCH1 and could be used to predict risk of HNSCC. Since the SM generation is random, a biological reason remains to be investigated in future studies.
The HNSCC risk-associated rs139994842 is elevated further by betel quid (BQ) chewing, which is an independent risk factor 13,18,19 that accounts for 79% of oral cancer and 18% of laryngeal cancer occurrence 20 . The typical BQ is a mixture of areca nut, betel leaf and slaked lime, and in some parts of the world, includes tobacco as an ingredient. BQ is evaluated to be a group 1 carcinogen to humans 21 with an estimated 600 million users in the world 22 . A commercial formulation in Taiwan comprise of an areca nut, betel leaf (or inflorescences) and slaked lime. The preparation involving only areca nut or BQ containing tobacco is rarely consumed in Taiwan. The substance use associated with HNSCC in the Central and Eastern European countries may be heavy alcohol drinking 14 . We found that BQ chewing is significantly associated with HNSCC and NOTCH1 exome SMs, while alcohol drinking is associated with HNSCC and patients without NOTCH1 SMs (Table 3). Possibly, several NOTCH1 SMs increase the mutagenic effects of BQ, but not of alcohol. The effect of cigarette smoking was masked by that of betel chewing which had a stronger effect in this study.
In conclusion, our findings are consistent with previous reports of NOTCH1 SMs to associate with HNSCC 7,8,20,23,24 . Furthermore, we show that BQ chewing is strongly linked to the development of HNSCC through NOTCH1 SMs. These SMs are largely located to EGF-like domains that may functionally compromise and increase HNSCC recurrence and fatality, suggesting that NOTCH1 performs a tumour suppressive role in HNSCC. While rs139994842 relates to the germline, we show that it is possible to statistically serve as an early predictive and prognostic biomarker for the occurrence and development of HNSCC. This information can be used in prevention, surveillance of patients at risk, and early detection for reducing morbidity and mortality from HNSCC.  Table 3. NOTCH1 genetic variant (rs139994842) linked to somatic mutations in NOTCH1 is associated with betel quid and HNSCC occurrence using logistic regression adjusted age and substances use covariates. r: Correlation coefficient between polymorphisms and somatic mutation (P value = 0.0004). D': The coefficient of linkage disequilibrium between SMs and rs139994842. * Adjusted for substances use. ** Adjusted for substances use and age.
Scientific RepoRts | 6:24014 | DOI: 10.1038/srep24014 Methods Patients and tissue specimens. Paired tissues (cancerous and normal marginal sections) were obtained from 3 male HNSCC patients at Kaohsiung Medical University Hospital (KMUH) for whole-exome SMs discovery. To validate these SMs, we recruited 128 male HNSCC patients (< 6% of patients receiving adjuvant radiotherapy and/or chemotherapy before surgery) who have high quality paired tissue DNA between November 2000 and March 2012 (13 years follow-up) from China Medical University Hospital (CMUH). To investigate the association with substance use, 282 male patients diagnosed with HNSCC and 282 matched controls were recruited from KMUH for a case-control study. The three HNSCC cohorts are mutually independent; an overview is provided in Fig. S1.
Whole blood was obtained from volunteers with written informed consent. Information about social-demographic factors, anthropometric parameters, medical history, medications, and substance use (alcohol, betel quid (BQ), and cigarettes) were carefully recorded. Details regarding alcohol, BQ, cigarette use have included: types consumed, age at initial use, daily consumption, frequency of use, years of use, and achievement of abstinence 18 . The use of alcohol, BQ, and cigarettes were recorded in the newly diagnosed HNSCC patients at a first-time interview. An individual who has used alcohol, BQ, and cigarettes was defined as a drinker, chewer, and smoker. Genomic DNA was extracted from peripheral blood samples in case-control study using Puregene DNA isolation kit (Gentra Systems, Minneapolis, MN). This study was approved by the institutional review boards of KMUH and CMUH, committee on human subjects and biospecimen unitization committee (KMUH-DC-101-0402 and CMUH-HBB102-007). All methods were carried out in accordance with the approved guidelines.

SMs screening and validation.
A whole-exome sequencing (WES) discovery platform screened for candidate SMs from the paired tissues DNA of 3 HNSCC patients. DNA quantitation was determined from Qubit Fluorometer (Thermo Fisher Scientific). The whole-exome regions were captured using SureSelect Target Enrichment System (Agilent). A total of 6.5 gigabases sequence data was generated from next generation sequencing (NGS) using Solexa Hiseq 2000 sequencing system (Illumina). The NGS procedures 25 of data cleaning, alignment, variant calling, and annotations are described in Fig. S1.
The raw WES data that were generated by massively parallel sequencing platform required 80-fold enrichment for all prepared cancer-normal pair libraries. Reads that contained sequencing adaptors and low-quality reads with more than five unknown bases were removed. The high-quality reads were aligned to UCSC human reference genome (hg19) using two software tools, BWA 26 and Bowtie2 27 . To identify potential variants, local realignments of BWA-aligned reads were conducted using a genome analysis toolkit (TCGA) 28 . The raw lists of potential variants were then annotated, individually analysed, validated, and converted into prevalent types of variant call formats using VCFtools 29 . Potential SMs were detected in the matched non-tumorous HNSCC samples and the loci in exon regions. Another strategy was to directly compare sequences from the tumour and matched normal tissue during discovery or validation. Two applications were used to reveal specific mutations of the tumour: MuTect 30 and SomaticSniper 31 . A Bayesian comparison was then performed to detect SMs with various allele fractions. The ANNOVAR 32 tool was used to annotate the functions of these variants, to elucidate their effects on genes, and to obtain other information about known variants that were reported in the 1000 Genome Project 33 and dbSNP databases 34 . Suitable specific primers were designed to verify potential SMs using Sanger sequencing, and the candidate SMs were surveyed by the Mutation Surveyor software ( Fig. S2; version 4.0.6, Softgenetics, State College, PA) 35 . The novelty of SMs was assessed using the Catalogue of Somatic Mutations in Cancers (COSMIC v.73) 36 .

Detection and validation of NOTCH1 SMs with high-resolution melting. All hotspots of NOTCH1
exome SMs and genetic variants were identified from 128 male HNSCC patients using a high-resolution melting (HRM) analysis 37 and verified by Sanger resequencing (also refer Fig. S3). PCR reactions were performed in duplicate in the NOTCH1 gene in a 15 μl final volume using a Type-it HRM PCR Kit (Qiagen, Hilden, Germany). A 1 × HRM PCR master mix contained HotStar Taq Plus DNA polymerase, Type-it HRM PCR buffer, Q-solution, dNTP and EVA green dye, 15 ng DNA, and 0.66 μM of each primer was prepared. HRM assays were conducted with LightCycler ® 480 Instrument (Roche Diagnostics) and LightCycler ® 480 Gene Scanning Software Ver. 1.5 (Roche Diagnostics) for analysis. With SYBR Green I filter (533 nm), the PCR programme consisted of an initial denaturation-activation step at 95 °C for 10 min and a 40-cycle programme for detecting the NOTCH1 gene (denaturation at 95 °C for 10s, annealing at 63 °C 35s, and elongation at 72 °C for 10s) to read the fluorescence in single acquisition mode. The melting programme included denaturing at 95 °C for 1 min, annealing at 40 °C for 1 min, and subsequent melting that involved a continuous fluorescent reading of fluorescence from 55 to 90 °C at the rate of 25 acquisitions per °C. The curve plotted for each DNA duplicate sample was reproducible in terms of both shape and peak height. To verify the results of HRM analysis, Sanger DNA sequencing analysis was performed for all the amplicons containing an abnormal melting curve and some of the amplicons with a normal melting curve (Table S2).
Genotyping of NOTCH1 SMs-related SNPs. Based on NOTCH1 SMs discovery, the genetic SNPs closest to SMs linkage disequilibrium (LD) > 0.9 and allele frequencies > 1% were included in a case-control study. Only one potential SNP (rs139994842) was genotyped using Sequenom MassARRAY System (San Diego, CA) at the Academia Sinica National Genotyping Center (Taipei, Taiwan).
In silico prediction of NOTCH1 SMs in EGF-like domains. Fig. S4 shows a three-dimensional (3D) protein structure to provide insight into protein function. The crystal structure of EGF11-13 repeats (PDB ID: Scientific RepoRts | 6:24014 | DOI: 10.1038/srep24014 2VJ3) include the ligand binding site and an almost linear domain arrangement 38 . The O-glycan is observed in an interaction between the disaccharide in the NOTCH1 and protein side chains in its ligand using the 3D structures of PDB ID 4XL1 39 in the Ca 2+ stabilized EGF-like domains and the NMR structure PDB ID: 1TOZ 40 .
Statistical analysis. Clinical characteristics were analysed using a Chi-square test. The odd ratios of cancer recurrence and death, unadjusted or adjusted for surgery age, disease status (site and stage of cancer), or adjuvant therapies (radiotherapy and chemotherapy), were calculated using logistic regression models. The Kaplan-Meier estimated a 5-year and 10-year relapse-free survival and recurrence rate. Differences in recurrence and survival proportions between patients detected with and without NOTCH1 SMs were tested by a log-rank test. A multivariate Cox proportional-hazards regression analysis evaluated the prognostic factor of NOTCH1 SMs associated with recurrence and survival of HNSCC patients. Clinical factors (age at surgery, cancer site, cancer stage, radiotherapy and chemotherapy) were analysed as potential covariates in models. To identify which germline genetic variant has contributed to a detectable SM, a logistic regression analysis was performed to estimate the association between germline variant and SMs. All tests are two-tailed and a P value < 0.05 is considered to be statistically significant.