Analysis of head and neck carcinoma progression reveals novel and relevant stage-specific changes associated with immortalisation and malignancy

We report changes in the genomic landscape in the development of head and neck squamous cell carcinomas HNSCC from potentially premalignant lesions (PPOLS) to malignancy and lymph node metastases. Likely pathological mutations predominantly involved a relatively small set of genes reported previously (TP53, KMT2D, CDKN2A, PIK3CA, NOTCH1 and FAT1) but also other predicted cancer drivers (MGA, PABPC3, NR4A2, NCOR1 and MACF1). Notably, all these mutations arise early and are present in PPOLs. The most frequent genetic changes, which follow acquisition of immortality and loss of senescence, are of consistent somatic copy number alterations (SCNAs) involving chromosomal regions enriched for genes in known and previously unreported cancer-related pathways. We mapped the evolution of SCNAs in HNSCC progression. One of the earliest SCNAs involved deletions of CSMD1 (8p23.2). CSMD1 deletions or promoter hypermethylation were present in all of the immortal PPOLs and occurred at high frequency in the immortal HNSCC cell lines. Modulation of CSMD1 in cell lines revealed significant suppression of proliferation and invasion by forced expression, and significant stimulation of invasion by knockdown of expression. Known cancer drivers NOTCH1, PPP6C, RAC1, EIF4G1, PIK3CA showed significant increase in frequency of SCNA in transition from PPOLs to HNSCC that correlated with their expression. In the later stages of progression, HNSCC with and without nodal metastases showed some clear differences including high copy number gains of CCND1, hsa-miR-548k and TP63 in the metastases group.


Top Panel: Expression profile of NCKAP5 in primary HNSCC and normal oral mucosa.
The y-axis represents the change in the relative fold expression of the tumour samples over the mean value of the normal tissue. A relative decrease in expression of NCKAP5 was seen in 48% (11/23) of primary HNSCCs when compared with their adjacent normal mucosa samples. However, significant reduction was observed in only 26% (6/23) of primary HNSCCs Bottom Panel: Expression profile of NCKAP5 in primary HNSCCs compared with the normal tissue. The x-axis represents the primary HNSCC samples in the study. The y-axis represents the change in the relative fold expression compared to the mean expression in the normal samples. The mean expression values for the normal samples is averaged and normalized to 1 (Line on horizontal x-axis at value 1) against which, expression values for the         Genes showing significant correlation with gene expression in integrative analyses after correction for multiple testing (adj. p<0.05), are indicated by double asterisks (**); genes showing nominal significance (p<0.05) only, are indicated by a single asterisk (*).

Further Information: Correlating gene expression to somatic copy number alterations
To identify potential drivers of somatic copy number alterations we considered the correlation between gene expression and copy number for each gene across all samples. This analysis is based on the hypothesis that a consistent phenotypic effect should be observed on the gene expression of CNA driver genes, while passenger genes might remain unaffected.
To do so the gene expression data provided by (Hunter et al., 2006) was used and integrated to our copy number data resulting in a dataset of 29 samples with both gene expression and copy number values (18 PPOL and 11 HNSCC cell lines). We used PANP (Warren P., Bioconductor) to reduce the number of genes considered for further analysis to 8255 unique genes expressed in at least one sample.
Before correlating the SCNA and gene expression values corrections were applied for polyploidy and heterogeneity.
When considered a by-product of instability rather than a response of biological significance, the copy number (CN) values were altered to remove the ubiquitous chromosomal amplification observed in polyploid samples. To do so, all the CN values inferred for SNPs in chromosome arms with mean CN larger than 2.5 were reduced by one unit. The reduction was however rejected in the cases of heterozygous copy neutral calls (CN2 LOH0), copy losses (CN1) and homozygous deletions (CN0). These states were not altered.
Due to heterogeneity, the gene expression values obtained did not represent the expression of the CN-altered cells solely. For each genomic region, a non-negligible proportion of cells do not harbour any alteration. The proportion of cells with normal heterozygous copy number in each region was estimated using OncoSNP (Yau C., et al., 2010). When investigating the correlation between SCNAs and mRNA expression the weighted mean CN and LOH values of this mixture were used.
The resulting correlation coefficients across genes displayed a tendency for positive correlation of expression with copy number and negative correlation with LOH (i.e. higher gene expression in amplified regions and lower expression in LOH regions). Furthermore an analysis of autocorrelation of the correlation coefficients along the genome revealed significant level of autocorrelation across the positively correlated genes in the CN analysis, but no auto-correlation in the negatively correlated genes. The reverse was observed in the LOH analysis. Thus only genes displaying higher expression when amplified or lower expression when undergoing loss of heterozygosity were further considered. The significance of the correlation of the filtered list were corrected for multiple testing using a Sidak procedure resulting in 484 CN vs GE and 130 LOH vs GE genes (51 genes overlapping) with p-value below 0.05.

Identifying regions of differential CN and LOH between PPOLS an HNSCC
In order to find regions of copy number alterations typical of PPOLS or HNSCC samples, we examine the common genetic pattern of each group and searched for significant differences.
In the case of differential CN events we considered the contingency table displaying the number of samples by group (i.e. PPOL/HNSCC) and amplification status (i.e. amplification/neutral/deletion) and performing a Chi-squared test for each unique genomic region. The same was performed to identify differential LOHs by considering the LOH status instead of the amplification status. This yielded 208 significant CN regions and 192 significant LOH regions (p < 0.05).
Integrating differentially altered regions with significant CN vs GE genes to identify candidate driver genes The genes with expression identified as responding significantly to underlying alterations, both CNA and LOH, are suggested as potential driver genes of the regions with distinct patterns of alteration between PPOL and HNSCC.
The overlap of those two lists identified 23 CN regions with 61 potential driver genes and 6 LOH regions with 6 potential driver genes. Those are proposed as candidate differentiators of carcinoma versus dysplasia. Foundation Trust (CMFT) pathology database. All cases reviewed by a consultant pathologist to confirm the diagnosis. PPOLs were categorised into high grade (HG) including moderate and high dysplasia and low grade (LG) including mild dysplasia, using the information from the patient report and an additional review of the histology by a consultant pathologist. All samples were coded and anonymised.

SUPPLEMENTARY
Sample panel consisted of 42 HNSCC, 19 HG PPOLs biopsies and 24 LG PPOLs. Normal epithelium adjacent to HNSCC within 39 of the 42 tumour blocks was also analysed. Clinical parameters from all of the HNSCC cases were retrieved from each case report. These included the patient sex and age along with the tumour differentiation, stage, lymph node invasion, lympho-vascular invasion, and peri-neural invasion status.
Standard protocols for immunohistochemistry were followed with an optimised, rabbit polyclonal IgG antibody to Claudin 1, (Abcam, Cambridge, UK) at a dilution of 1:500. IHC staining was performed on a Ventana Benchmark XT Automated Immunostaining Module (Ventana Medical Systems, Tucson, AZ). All other reagents used were also supplied from Ventana Medical Systems. Optimisation of Claudin 1 also involved the used of Protease 1 (Ventana, 760-2018). Positive (normal human colon) and negative (minus the primary antibody) controls were processed with each batch.
Intensity and area of immunostaining of Claudin 1 was measured using a semi-automated image analysis method. This incorporated a standard light microscope attached to a CRI Nuance spectral analyzer (CRI, Woburn, MA), which was supported by Nuance Alpha, 1.6.2.368 software. This system used spectral unmixing of the IHC stained sections to identify and measure the areas of DAB positive staining. This method of IHC quantification has been as validated in recent publications (Das, et al. 2004, Taylor andLevenson 2006). Five random, conventional bright-field image cubes of the epithelium in each IHC stained section were collected at 10nm wavelength intervals from 480 nm (blue) to 700 nm (red) at a 40x magnification. In addition 38 of the 42 HNSCC tissue slides also had five random image cubes taken of its adjacent normal epithelium.
The DAB spectral profile of each image cube was unmixed from the haematoxylin spectral profile in each cube using Nuance 2.6.0 software (CRI, Woburn, MA), creating a composite false colour image cube. This enabled the intensity of the DAB staining per pixel of each positively stained site within the image cube to be measured as an average optical density (OD). The average of the ODs recorded in all five images cubes taken per slide was calculated to produce an overall OD score for that particular tissue sample.
Further description of this methodology can be found in Byers et al 2008.
Statistical analysis was performed using the statistical software packages Minitab 16 2010 (Minitab Inc, USA) and Medcalc version 12.3.0 (Medcalc Software, Belgium). For all of the statistical tests a p value of < 0.05 was considered to be statistically significant. Basic statistical analysis and Anderson-Darling tests were performed to test the data's normality in Minitab. The data sets were not normally distributed and so non-parametric statistical analysis was performed. The significance of differences in Claudin 1 immunostaining optical density between the different sample groups was tested for significance using the Kruskal Wallis test for non-parametric data and post-hoc analysis with Mann-Whitney U test. Spearman's rank correlation coefficient for non-parametric data was used to test for significance of the correlation between Claudin 1 immunostaining and progression towards HNSCC and clinical parameters.

Analysis of BCL2L1 Expression in Primary HNSCC and PPO Materials and methods
HNSCC and PPOLs diagnosed between 2012 and 2014 were retrieved from the Central Manchester Foundation Trust (CMFT) pathology database. All samples were coded and anonymised. Sample panel consisted of 36 HNSCC, 29 HG PPOLs biopsies, 27 LG PPOLs, 10 ILP and 10 FEP. Clinical parameters from all of the HNSCC cases were retrieved from each case report. These included the patient sex and age along with the tumour differentiation, stage, lymph node invasion, lympho-vascular invasion, and perineural invasion status.
All IHC staining was performed on the Ventana Benchmark XT automated IHC staining module with reagents also supplied by Ventana Medical Systems. Heat mediated antigen retrieval was performed for 36 minutes. Bcl-XL antibody was prepared at a 1/400 dilution and incubated at room temperature for 36 minutes.
Bcl-XL IHC Staining intensity was scored using the CRI Nuance Multi-spectral Imaging System. The DAB and haematoxylin spectral profiles were identified and unmixed in order to accurately measure the intensity of the DAB substrate reaction of bound primary antibody to Bcl-XL only. Five random, brightfield image cubes in each stained section were collected at 10nm wavelength intervals from 500 nm to 720 nm at a 40x magnification. For each cube area, the intensity of the DAB positive staining per pixel was measured as an average optical density (OD) value. The average OD recorded in all five image cubes taken per slide were calculated to produce an overall OD score for that particular tissue block.
Statistical analysis was performed using the statistical software package Minitab version 17.2.1 (Minitab Inc, USA). For all of the statistical tests a p value of < 0.05 was considered to be statistically significant, allowing the null hypothesis to be rejected and the alternative hypothesis to be accepted.