HNSCCs affect 600,000 patients per year worldwide1. They are characterized by phenotypic, aetiological, biological and clinical heterogeneity. Smoking is implicated in the rise of HNSCC in developing countries, and the role of human papillomavirus (HPV) is emerging as an important factor in the rise of oropharyngeal tumours affecting non-smokers in developed countries2. Despite surgery, radiation and chemotherapy, approximately half of all patients will die of the disease. Risk stratification for HNSCC is by anatomic site, stage and histological characteristics of the tumour. Except for HPV status, numerous molecular and clinical risk factors that have been investigated have limited clinical utility.

Published genome-wide profiling studies of HNSCC3,4 are limited to single platforms. To generate an integrated genomic annotation of molecular alterations in HNSCC, The Cancer Genome Atlas (TCGA) has undertaken a comprehensive multi-platform characterization of 500 tumours with the a priori hypothesis of detecting somatic variants present in at least 5% of samples. Here, we report the results for analyses from the first 279 patients with complete data.

Samples and clinical data

The cohort consists primarily of tumours from the oral cavity (n = 172 out of 279, 62%), oropharynx (n = 33 out of 279, 12%), and laryngeal sites (n = 72 out of 279, 26%) (Supplementary Information section 1, Supplementary Table 1.1 and Supplementary Data 1.1). Most patients were male (n = 203 out of 279, 73%) and heavy smokers (mean pack years = 51). Samples were classified as HPV-positive using an empiric definition of >1,000 mapped RNA sequencing (RNA-Seq) reads, primarily aligning to viral genes E6 and E7 (Supplementary Information section 1.2 and Supplementary Fig. 1.1). The HPV status by mapping of RNA-Seq reads was concordant with the genomic, sequencing and molecular data, and indicated that 36 tumours were HPV(+) and 243 were HPV(−) (Supplementary Information section 1.2, Supplementary Fig. 1.1 and Supplementary Data 1.2). Of 33 oropharyngeal tumours, 64% were positive for HPV, compared to 6% of 246 non-oropharyngeal tumours. Molecular HPV signatures were identified using microRNA (miRNA), DNA methylation, gene expression and somatic nucleotide substitutions (Supplementary Information section 1.2 and Supplementary Figs 1.1–1.3). HPV(+) tumours exhibited infrequent mutations in TP53 or genetic alterations in CDKN2A. We evaluated outcome by site, stage, HPV status, molecular subtypes and putative biomarkers (Supplementary Information section 1.3 and Supplementary Figs 1.4 and 1.5). HPV(+) and interestingly patients with HPV(−), TP53 wild-type tumours demonstrated favourable outcomes compared to TP53 mutants and 11q13/CCND1-amplified tumours.

DNA and RNA structural alterations

Most tumours demonstrated copy number alterations (CNAs) including losses of 3p and 8p, and gains of 3q, 5p and 8q chromosomal regions (Fig. 1a, Supplementary Fig. 2.1 and Supplementary Information section 2) resembling lung squamous cell carcinomas (LUSCs)5 (Fig. 1a and Supplementary Figs 2.1 and 2.2). HNSCC genomes showed high instability with a mean of 141 CNAs (amplifications or deletions) from microarray data and 62 structural aberrations (chromosomal fusions) per tumour by ‘high coverage’ whole-genome sequencing (n = 29) (Supplementary Information section 2.2). We observed 39 regions of recurrent copy number loss and 23 regions of recurrent copy number gain (q < 0.1, Supplementary Data 2.1 and 2.2). Both HPV(+) and (−) tumours contained recurrent focal amplifications for 3q26/28, a region involving squamous lineage transcription factors TP63 and SOX2 and the oncogene PIK3CA (Fig. 1b and Supplementary Fig. 2.3).

Figure 1: DNA copy number alterations.
figure 1

a, Copy number alterations by anatomic site and HPV status for squamous cancers. Lung squamous cell carcinoma (LUSC, n = 358) and cervical squamous cell carcinoma (CESC, n = 114). b, Unsupervised analysis of copy number alteration of HNSCC (n = 279) with associated characteristics. The rectangle indicates chromosome 7 amplifications in the purple cluster. NA, not available.

PowerPoint slide

HPV(+) tumours were distinguished by novel recurrent deletions (n = 5 out of 36, 14%) and truncating mutations (n = 3 out of 36, 8%) of TNF receptor-associated factor 3 (TRAF3) (Supplementary Figs 2.3 and 2.4, and Supplementary Data 2.1). TRAF3 is implicated in innate and acquired anti-viral responses6 including Epstein–Barr, HPV and human immunodeficiency virus (HIV)7,8,9, while loss promotes aberrant NF-κB signalling10. Although TRAF3 inactivation has been reported in haematological malignancies and nasopharyngeal carcinoma11,12, to our knowledge this is the first evidence linking TRAF3 to HPV-associated carcinomas. HPV(+) tumours were also notable for focal amplification of E2F1 and an intact 9p21.3 region containing the CDKN2A gene commonly deleted in HPV(−) tumours.

HPV(−) tumours featured novel co-amplifications of 11q13 (CCND1, FADD and CTTN) and 11q22 (BIRC2 and YAP1), which also contain genes implicated in cell death/NF-κB and Hippo pathways. HPV(−) tumours featured novel focal deletions in the nuclear set domain gene (NSD1) and tumour suppressor genes (for example, FAT1, NOTCH1, SMAD4 and CDKN2A; Supplementary Fig. 2.3). Recurrent focal amplifications in receptor tyrosine kinases (for example, EGFR, ERBB2 and FGFR1) also predominated in HPV(−) tumours. Notably, unsupervised clustering analysis of CNAs identified a mutually exclusive subset of predominantly oral cavity tumours with reduced CNAs, a pattern recently described in cancer as ‘M’ class (tumours driven by mutation rather than CNA)13 (Fig. 1b). This subset in particular contained a new three-gene pattern of activating mutations in HRAS, frequently with inactivating CASP8 mutations, and wild-type TP53. We confirmed a previously reported favourable clinical outcome in tumours with few CNAs14. The three-gene constellation of wild-type TP53 with mutant HRAS and CASP8 suggested an alternative tumorigenesis pathway involving RAS and/or alterations in cell death/NF-κB15. Unsupervised analysis also suggested that clustering was a function of chromosome 7 amplification (including the EGFR locus) in a manner that largely excluded HPV(+) tumours.

To detect additional structural alterations, we interrogated whole-genome and RNA-Seq data (Supplementary Information section 3, Supplementary Data 3.1 and Supplementary Fig. 3.1). Known fusion oncogenes reported in solid tumours including those involving the ALK, ROS or RET genes were not observed in HNSCC. Previously reported FGFR3-TACC3 fusions were present in two HPV(+) tumours (Supplementary Fig. 3.2). Only 1 out of 279 patients showed evidence of the type III isoform of EGFR (vIII), previously described in HNSCC16 (Supplementary Fig. 3.3). Although our investigation did not identify additional novel oncogenic fusions, several tumours demonstrated exon 1 of EGFR or FGFR3 fused to non-recurrent partners, suggesting potential promoter swaps for the partner genes (Supplementary Data 3.1). A low prevalence of an alternative MET transcript with skipped exon 14 was identified in two HPV(−) tumours (Supplementary Fig. 3.4); this finding was reported to be an activating event in non-small cell lung cancer17. Structural alterations (homozygous deletions, intra- and inter-chromosomal fusions) were more commonly associated with loss of function in tumour suppressor genes, most prominently CDKN2A (Supplementary Figs 3.5 and 3.6), followed by TP53, RB1, NOTCH1 and FAT1 (Supplementary Figs 3.7–3.9), than with protein-coding fusion events. RNA-Seq data (Supplementary Data 3.3) demonstrated evidence of alternative splicing in genes not previously described in HNSCC including kallikrein 12 (KLK12) (Supplementary Fig. 3.11), as well as genes such as TP63 with known importance in HNSCC (Supplementary Fig. 3.12).

By DNA analysis, most HPV(+) tumours demonstrated clear evidence of host genome integration, usually in a single genomic location per sample and almost always in association with amplifications of the host genome (Supplementary Fig. 3.10 and Supplementary Data 3.2). Interrogation of RNA transcripts confirmed transcription across the viral–human integration locus. However, none of the genes involved were recurrent, suggesting no single driver mechanism related to HPV integration. Similarly, none of the integration sites involved the MYC gene as reported in HPV(+) cell lines18.

Somatic mutations

Whole-exome sequencing identified somatically mutated genes, many located in regions of CNAs and annotated in the COSMIC database19 (Fig. 2). The mean sequencing coverage across targeted bases was 95×, with 82% of target bases above 30× coverage. In 279 samples, 12,159 synonymous somatic variants, 37,061 non-synonymous somatic variants, and 2,579 germline single base substitutions from the single nucleotide polymorphism database (dbSNP)20 were detected (Supplementary Information section 4). Targeted re-sequencing of 394 unique regions (Supplementary Fig. 4.1) validated 99% of mutations. Interrogation of RNA for expression of the mutated alleles confirmed the variant in 86% of cases (Supplementary Information section 3.2 and Supplementary Fig. 3.1). In contrast to previous reports, the mutation rates did not differ by HPV status, although transversions at CpG sites were more frequent in HPV(−) tumours and a predominance of TpC mutations were noted in HPV(+) cases3 (Supplementary Fig. 1.1). Mutations were statistically enriched in 11 genes (Fig. 2). Among inactivating mutations (premature termination of the protein by nonsense, frameshift or splice-site mutations), four genes segregated exclusively or predominantly in HPV(−) tumours. Two were associated with cell cycle and survival (CDKN2A (P < 0.01) and TP53 (P < 0.01)) and two were linked to Wnt/β-catenin signalling (FAT1 (P < 0.01) and AJUBA (P = 0.14))21,22. We observed TP53 mutation among HPV(−) samples at higher rates (86%) than have been previously reported19, while only 1 out of 36 HPV(+) cases had a non-synonymous TP53 mutation. Previously unreported somatic mutations and deletions of AJUBA were primarily 5′ inactivating events and clustered missense mutations in the functional LIM domain (Supplementary Fig. 4.2). AJUBA is a centrosomal protein that regulates cell division, vertebrate ciliogenesis and left–right axis determination23. Additionally, AJUBA is subject to EGFR-RAS-MAPK-dependent phosphorylation and implicated in Hippo growth and regeneration pathways conserved from Drosophila to mammals24,25, in ataxia-telangiectasia mutated (ATM) and ATM and Rad-3-related (ATR)-mediated DNA damage response26, and tumour invasion and migration27.

Figure 2: Significantly mutated genes in HNSCC.
figure 2

Genes (rows) with significantly mutated genes (identified using the MutSigCV algorithim; q < 0.1) ordered by q value; additional genes with trends towards significance are also shown. Samples (columns, n = 279) are arranged to emphasize mutual exclusivity among mutations. Left, mutation percentage in TCGA. Right, mutation percentage in COSMIC (‘upper aerodigestive tract’ tissue). Top, overall number of mutations per megabase. Colour coding indicates mutation type.

PowerPoint slide

A frequently mutated novel gene, the nuclear receptor binding SET domain protein 1 (NSD1), was identified in 33 HNSCCs. Alterations included inactivating mutations (n = 29) and focal homozygous deletions (n = 4). NSD1 is a histone 3 Lys 36 (H3K36) methyltransferase, similar to SETD2, which is frequently mutated in the clear cell variant of renal cell carcinoma, and associated with DNA hypomethylation28. Germline carriers of inactivating mutations in NSD1 are associated with craniofacial abnormalities (Sotos syndrome), and malignancies including squamous carcinoma, implicating NSD1 as a tumour suppressor gene29. Interestingly, NSD1 functions as an oncogene when fused to nucleoporin-98 (NUP98) t(5;11)(q35;p15.5) in haematological cancers with increased H3K36 trimethylation levels at HOXA genes and accompanying transcriptional activation30. Translocations involving other dedicated H3K36 methyltransferase genes including WHSC1 (also known as MMSET and NSD2) are reported in 20% of multiple myelomas. By contrast, NSD1 loss has been associated with sporadic non-melanoma skin cancers31. Significant inactivating mutations were found in genes linked to squamous differentiation including in NOTCH1 (19%), and other non-significant family members (NOTCH2 9%, and NOTCH3 5%, q > 0.1, non-significant), and the TP63 target gene ZNF750 (4%, q > 0.1, non-significant), which falls in a significantly deleted peak at 17q25.3. The analysis identified additional mutations including TRAF3, RB1 and NFE2L2, among others with q values < 1 (non-significant). The frequently mutated apoptosis gene CASP8 displayed clustered missense and other inactivating mutations in the first death effector, intron and caspase peptidase domains. Statistically significant mutations in KMT2D (also known as MLL2) and HLA-A could contribute to defective immunosurveillance. Of known oncogenes, only PIK3CA achieved statistical significance (q < 0.01). Approximately one-quarter of the mutated PIK3CA cases displayed concurrent amplification, with an additional 20% of tumours containing focal amplification without evidence of mutation. Seventy-three per cent of PIK3CA mutations localized to Glu542Lys, Glu545Lys and His1047Arg/Leu hotspots that promote activation, with the remaining mutations of uncertain function. Recurrent activating mutations of HRAS in the GTPase domain in residues 11–13 approached statistical significance (q = 0.2).

We extended our unsupervised genome-wide analysis of significantly mutated genes as well as genes reported in COSMIC to a subgroup analysis by anatomic sites, tumour versus normal status, HPV status and four previously validated gene expression subtypes32,33 (Supplementary Information section 5, Supplementary Figs 5.1–5.4 and Supplementary Data 4.1 and 5.1–5.4). Additional mutations included TRAF3, RB1 and NFE2L2, among others with q values < 1, and we observed statistical evidence for mutations of HRAS (q = 0 in COSMIC subset) and other genes. Sporadic inactivating mutations and deletions of TGFBR2 were identified primarily in oral cavity tumours, consistent with its role in promoting squamous tumorigenesis in mouse models34. Investigating COSMIC database mutations focused attention on the significant deletion peak at 4q31.3 containing the gene FBXW7, a ubiquitin ligase targeting cyclin E and NOTCH genes, in which we identified mutations that included recurrent Arg505Gly/Leu substitutions (n = 14). Genes with at least one identical mutation previously reported in COSMIC include SCN9A, CHEK2, PTCH1 and PIK3R1. We further focused on somatic alterations and protein expression that represent plausible therapeutic targets (Fig. 3, Supplementary Information sections 6 and 7, Supplementary Figs 6.1 and 6.2 and Supplementary Data 6.1 and 6.2).

Figure 3: Candidate therapeutic targets and driver oncogenic events.
figure 3

Alteration events for key genes are displayed by sample (n = 279). TSG, tumour suppressor gene.

PowerPoint slide

Integrated genome analysis and pathways

Correlative genetic alteration analysis identified numerous pairwise significant findings (Supplementary Information section 7 and Supplementary Fig. 7.1). In particular, co-amplification of 11q13 containing CCND1, FADD and CTTN and a narrow segment of 11q22 containing the genes with equal evidence for YAP1 and BIRC2 was further characterized (Supplementary Fig. 7.2). Chromosome 11q22 was focally but rarely amplified in the absence of co-amplification of 11q13. This novel finding suggests that the selection pressure for this co-amplification stems from the interaction of BIRC2 with FADD and the caspase cascade that inhibits cell death. Notably, the vast majority of tumours with the 11q13 amplification had large deletions in the telomeric region of 11q22, including other genes known to be important in cell death in cancer such as ATM and CASP1, 4, 5 and 12. Amplification of 11q13 was anti-correlated with CASP8 mutations, suggesting an alternative function of CASP8 and FADD in cell death/NF-κB activation35.

We investigated whether clinical factors, single gene alterations and statistically significant pairwise gene correlations (Supplementary Fig. 7.1 and Supplementary Data 7.1) might segregate previously defined molecular subtypes and/or anatomic sub-sites (Supplementary Data 5.1–5.4). We confirmed reported gene expression subtypes (atypical (24%), mesenchymal (27%), basal (31%) and classical (18%)), and assessed the subtypes for enrichment of somatic alterations32,33 (Supplementary Data 7.1). Notably, TP53 mutation, CDKN2A loss of function, chromosome 3q amplification, alteration of oxidative stress genes (KEAP1, NFE2L2 or CUL3), heavy smoking history (Supplementary Table 1.1) and larynx sub-site co-occurred in most classical subtype tumours (Fig. 4a and Supplementary Information section 7.2), similar to LUSC5 (Supplementary Figs 5.1 and 5.2). Collectively, these findings suggest that the NFE2L2 oxidative stress pathway is a tobacco-related signature across anatomic tumour sites. By contrast, the basal subtype demonstrated inactivation of NOTCH1 with intact oxidative stress signalling and fewer alterations of chromosome 3q. Analysis of the 3q locus highlighted a marked relative decrease of SOX2 expression in basal tumours relative to all other HNSCC and tumour adjacent normal samples (Supplementary Fig. 5.4), supporting the interaction of transcription factors SOX2, TP63, NFE2L2 and NOTCH1 as driving differences between expression subtypes. Additionally, the basal subtype included most tumours with the HRAS–CASP8 co-mutation and most co-amplified 11q13/q22 tumours. These findings along with HRAS mutations implicate disrupted cell death as a major alteration in this subtype36 (Supplementary Fig. 7.2). The atypical subtype was characterized by a lack of chromosome 7 amplifications (Supplementary Fig. 5.3), enrichment of HPV(+) tumours with activating mutations in exon 9 that contains the PIK3CA helical domain. By contrast, the mesenchymal subtype showed high levels of alteration in innate immunity genes, in particular high expression of natural killer cell marker CD56 and a low frequency of HLA class I mutations (Supplementary Fig. 7.3). Among the significantly mutated genes, TP53 (P < 0.001), CASP8 (P = 0.01), NSD1 (P = 0.01) and CDKN2A (0.06) were the most differentially mutated across anatomic sites (Supplementary Data 4.1). Most CASP8 mutations (22 out of 24, 92%) were in oral cavity tumours, whereas TP53, NSD1 and CDKN2A demonstrated decreased mutation rates in oropharyngeal tumours relative to other sites.

Figure 4: Integrated analysis of genomic alterations.
figure 4

a, b, Samples (n = 279) are displayed in columns and grouped by gene expression (a) or methylation (b) subtype (sub.). Unadjusted two-sided Fisher’s exact test P values assess the association of each genomic alteration. Methylation probe location of CpG islands, shores and shelves are shown on the left of b. Annotation shows HPV status and subtype (16, 33 and 35). CN, copy number.

PowerPoint slide

Unsupervised analysis of gene expression by HPV status and of reverse-phase protein arrays (Supplementary Information section 6), DNA methylation (Supplementary Information section 8), and miRNA platforms (Supplementary Information section 9, Supplementary Table 9.1, Supplementary Figs 7.4–7.9 and Supplementary Data 7.1) showed high correlation across platforms (P < 0.01; Fig. 4, Supplementary Information section 7.9 and Supplementary Data 7.2) and coordinated alterations of genes including the epithelial–mesenchymal transition signature37 (Supplementary Figs 7.4, 7.7 and 7.8). However, within the broader cross-platform agreement, individual unsupervised clustering of miRNA, reverse-phase protein arrays and DNA methylation data provided insight into the association of molecular subtypes with single gene alterations. The most notable example was the detection of hypomethylation and loss-of-function mutations of NSD1, and wild-type NOTCH1 in atypical and classical gene expression subtypes (Fig. 4b and Supplementary Table 7.1).

Supervised analyses detected genomic features (miRNA, gene expression and DNA methylation) associated with anatomic site (Supplementary Figs 5.5–5.8 and Supplementary Data 5.1–5.4). A supervised integrated analysis identified target genes that are inversely regulated by miRNAs in HNSCC (Supplementary Information section 7.4). Among these miRNA–messenger RNA networks, let-7c-5p and miR-100-5p exhibited a correlation between low copy number and expression. Let-7c-5p and miR-100-5p were decreased in tumours compared to normal (Supplementary Fig. 7.10). For these miRNAs, deletion was highly associated with increased expression of target genes, including the cell cycle regulator CDK6, transcription factor E2F1 (ref. 38), mitosis regulator PLK1 (ref. 39), and transcription factor HMGA2 (ref. 40; Supplementary Figs 7.10, Supplementary Tables 7.2 and 7.3).

Integrative bioinformatics analysis identified a limited number of pathways targeted by frequent genome alterations (Fig. 5, Supplementary Information section 7, Supplementary Figs 7.11–7.15 and Supplementary Data 7.3). Among receptor tyrosine kinases, EGFR/ERBB2 or FGFR1/3 alterations are the most frequent. Among downstream targets of the receptor tyrosine kinase (RTK)/RAS/phosphatidylinositol-3-OH kinase (PI(3)K) pathway, PIK3CA dominates with occasional HRAS and PTEN alterations. Further downstream, nearly every tumour has alteration of genes governing the cell cycle. The tumour suppressors TP53 and CDKN2A, oncogenes CCND1 and MYC, and the newly identified miRNA let-7c, are most often altered in HPV(−) tumours, whereas viral genes E6, E7 and E2F1 predominate in HPV(+) cases. In addition, we report frequent alterations in genes involved in cell death, NF-κB-mediated survival, or immunity pathways15,35. Co-amplification of FADD ± BIRC2, or CASP8 ± HRAS mutations define exclusive HPV(−) subsets, whereas TRAF3 loss characterizes an HPV(+) subset. These alterations along with PIK3CA and TP63 converge on NF-κB transcription factors that promote cell survival, migration, inflammation and angiogenesis41,42. Furthermore, TRAF3 and/or HLA loss are implicated in deregulation of innate antiviral and adaptive anti-tumour immunity43,44. Further alterations of NOTCH, TP63 and other genes in HPV(−) tumours (FAT1 and AJUBA) recently linked functionally to β-catenin (CTNNB1) are also detected21,22,45. Finally, we highlight a previously underappreciated role for a key transcription factor regulator of oxidative stress, NFE2L2, and its protein complex partners CUL3 and KEAP1 in HPV(−) HNSCCs.

Figure 5: Deregulation of signalling pathways and transcription factors.
figure 5

Key affected pathways, components and inferred functions, are summarized in the main text and Supplementary Information section 7 for n = 279 samples. The frequency (%) of genetic alterations for HPV(−) and HPV(+) tumours are shown separately within sub-panels and highlighted. Also see Supplementary Fig. 7.15. Pathway alterations include homozygous deletions, focal amplifications and somatic mutations. Activated and inactivated pathways/genes, and activating or inhibitory symbols are based on predicted effects of genome alterations and/or pathway functions.

PowerPoint slide


The TCGA study represents the most comprehensive integrative genomic analysis of HNSCC. Loss of TRAF3, activating mutations of PIK3CA, and amplification of E2F1 in HPV(+) oropharyngeal cancers point to aberrant activation of NF-κB, other oncogenic pathways, and cell cycle, as critical in the pathogenesis and development of new targeted therapies for these tumours. In HPV(−) HNSCCs, mutually exclusive subsets containing amplicons on 11q with CCND1, FADD, BIRC2 and YAP1, or concurrent mutations of CASP8 with HRAS, also target cell cycle, death, NF-κB and other oncogenic pathways. Recent studies predict that the inactivation of AJUBA, as well as FAT1 and NOTCH1, may converge to uncheck Wnt/β-catenin signalling, implicated in deregulation of cell polarity and differentiation. The 3q amplicon found in both HPV(+) and (−) HNSCCs includes transcription factors TP63, SOX2 and signal molecule PIK3CA, which are also implicated in homeostasis of epithelial stem cells and differentiation. Among these, the biological function and agents targeting BIRCs, PI(3)K, Wnt/β-catenin and NOTCH are under investigation. Collectively, these findings provide new insights into HNSCC and suggest that shared and unique alterations might be leveraged to accelerate progress in prevention and therapy across tumour types.