Head and neck squamous cell carcinoma (HNSCC) includes a subset of cancers driven by human papillomavirus (HPV). Here we use single-cell RNA-seq to profile both HPV-positive and HPV-negative oropharyngeal tumors, uncovering a high level of cellular diversity within and between tumors. First, we detect diverse chromosomal aberrations within individual tumors, suggesting genomic instability and enabling the identification of malignant cells even at pathologically negative margins. Second, we uncover diversity with respect to HNSCC subtypes and other cellular states such as the cell cycle, senescence and epithelial-mesenchymal transitions. Third, we find heterogeneity in viral gene expression within HPV-positive tumors. HPV expression is lost or repressed in a subset of cells, which are associated with a decrease in HPV-associated cell cycle phenotypes, decreased response to treatment, increased invasion and poor prognosis. These findings suggest that HPV expression diversity must be considered during diagnosis and treatment of HPV-positive tumors, with important prognostic ramifications.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Oncogene Open Access 04 September 2023
Predicting HPV association using deep learning and regular H&E stains allows granular stratification of oropharyngeal cancer patients
npj Digital Medicine Open Access 19 August 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
All scRNA-seq data produced by this study are available through the Gene Expression Omnibus with GEO accession GSE182227. TCGA bulk RNA-seq and clinical data for head and neck and cervical cancer are available through the Broad Genome Data Analysis Center Firehose (https://gdac.broadinstitute.org/). Single-cell datasets reanalyzed to compare proliferation rates are available through the Gene Expression Omnibus with accession numbers GSE150430 (nasopharyngeal carcinoma), GSE131907 (lung carcinoma), GSE132465, GSE132257, GSE144735 (CRC), GSE125449 (HCC), through the Chinese National Centre for Bioinformation Genome Sequence Archive (CNCB-GSA) with accession: CRA001160 (PDAC) and through EMBL-EBI ArrayExpress with accession numbers E-MTAB-8107 (breast, ovarian, colorectal cancer), E-MTAB-6149 (lung) and E-MTAB-6653 (lung). Cell line data used for validation analysis are available through GEO with accession number GSE157220. The NSCLC dataset used to validate finding malignant cells in normal samples is deposited as an NCBI BioProject with accession number PRJNA591860. Source data are provided with this paper.
R code and functions for the analysis are available at https://github.com/micmin3/HPV_OPSCC_Analysis.
Gillison, M. L. et al. Distinct risk factor profiles for human papillomavirus type 16-positive and human papillomavirus type 16-negative head and neck cancers. J. Natl Cancer Inst. 100, 407–420 (2008).
Ang, K. K. et al. Human papillomavirus and survival of patients with oropharyngeal cancer. N. Engl. J. Med. 363, 24–35 (2010).
Brianti, P., De Flammineis, E. & Mercuri, S. R. Review of HPV-related diseases and cancers. Nat. Microbiol. 40, 80–85 (2017).
Doorbar, J., Egawa, N., Griffin, H., Kranjec, C. & Murakami, I. Human papillomavirus molecular biology and disease association. Rev. Med. Virol. 25, 2–23 (2015).
Graham, S. V. The human papillomavirus replication cycle, and its links to cancer progression: a comprehensive review. Clin. Sci. Lond. Engl. 131, 2201–2221 (2017).
Litwin, T. R., Clarke, M. A., Dean, M. & Wentzensen, N. Somatic host cell alterations in HPV carcinogenesis. Viruses 9, E206 (2017).
Parikh, A. et al. Malignant cell-specific CXCL14 promotes tumor lymphocyte infiltration in oral cavity squamous cell carcinoma. J. Immunother. Cancer 8, e001048 (2020).
Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell 171, 1611–1624 (2017).
Qi, Z., Barrett, T., Parikh, A. S., Tirosh, I. & Puram, S. V. Single-cell sequencing and its applications in head and neck cancer. Oral Oncol. 99, 104441 (2019).
Qi, Z. et al. Single-cell deconvolution of head and neck squamous cell carcinoma. Cancers 13, 1230 (2021).
Castellsagué, X. et al. HPV involvement in head and neck cancers: comprehensive assessment of biomarkers in 3680 patients. J. Natl Cancer Inst. 108, djv403 (2016).
Ramqvist, T. et al. Studies on human papillomavirus (HPV) 16 E2, E5 and E7 mRNA in HPV-positive tonsillar and base of tongue cancer in relation to clinical outcome and immunological parameters. Oral Oncol. 51, 1126–1131 (2015).
Neftel, C. et al. An integrative model of cellular states, plasticity, and genetics for glioblastoma. Cell 178, 835–849 (2019).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Tirosh, I. et al. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature 539, 309–313 (2016).
Slootweg, P. J., Hordijk, G. J., Schade, Y., van Es, R. J. J. & Koole, R. Treatment failure and margin status in head and neck cancer. A critical view on the potential value of molecular pathology. Oral Oncol. 38, 500–503 (2002).
Maynard, A. et al. Therapy-induced evolution of human lung cancer revealed by single-cell RNA sequencing. Cell 182, 1232–1251 (2020).
Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 517, 576–582 (2015).
Kinker, G. S. et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nat. Genet. 52, 1208–1218 (2020).
Parikh, A. S. et al. Immunohistochemical quantification of partial-EMT in oral cavity squamous cell carcinoma primary tumors is associated with nodal metastasis. Oral Oncol. 99, 104458 (2019).
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514 (2020).
Yao, J. et al. Single-cell transcriptomic analysis in a mouse model deciphers cell transition states in the multistep development of esophageal cancer. Nat. Commun. 11, 3715 (2020).
Gerlee, P. & Nelander, S. The impact of phenotypic switching on glioblastoma growth and invasion. PLoS Comput. Biol. 8, e1002556 (2012).
Giese, A. et al. Dichotomy of astrocytoma migration and proliferation. Int. J. Cancer 67, 275–282 (1996).
Akagi, K. et al. Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability. Genome Res. 24, 185–199 (2014).
Duensing, S. & Münger, K. The human papillomavirus type 16 E6 and E7 oncoproteins independently induce numerical and structural chromosome instability. Cancer Res. 62, 7075–7082 (2002).
Korzeniewski, N., Spardy, N., Duensing, A. & Duensing, S. Genomic instability and cancer: lessons learned from human papillomaviruses. Cancer Lett. 305, 113–122 (2011).
Shen, S., Vagner, S. & Robert, C. Persistent cancer cells: the deadly survivors. Cell 183, 860–874 (2020).
Gao, G. et al. A novel RT-PCR method for quantification of human papillomavirus transcripts in archived tissues and its application in oropharyngeal cancer prognosis. Int. J. Cancer 132, 882–890 (2013).
Inkman, M. J. et al. HPV-EM: an accurate HPV detection and genotyping EM algorithm. Sci. Rep. 10, 14340 (2020).
Germain, P.-L. scDblFinder. R package version 1.6.0 https://github.com/plger/scDblFinder (2021).
Bais, A. S. & Kostka, D. scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinforma. Oxf. Engl. 36, 1150–1158 (2020).
Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).
Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics. 11, 367 (2010).
Goveia, J. et al. An integrated gene expression landscape profiling approach to identify lung tumor endothelial cell heterogeneity and angiogenic candidates. Cancer Cell 37, 21–36 (2020).
Buechler, M. B. et al. Cross-tissue organization of the fibroblast lineage. Nature 593, 575–579 (2021).
Chen, Y.-P. et al. Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. Cell Res. https://doi.org/10.1038/s41422-020-0374-x (2020).
Kim, N. et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat. Commun. 11, 2285 (2020).
Lee, H.-O. et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat. Genet. 52, 594–603 (2020).
Ma, L. et al. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Cancer Cell 36, 418–430 (2019).
Peng, J. et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 29, 725–738 (2019).
Qian, J. et al. A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling. Cell Res. 30, 745–762 (2020).
Tang, K.-W., Alaei-Mahabadi, B., Samuelsson, T., Lindh, M. & Larsson, E. The landscape of viral expression and host gene fusion and adaptation in human cancer. Nat. Commun. 4, 2513 (2013).
Ogłuszka, M., Orzechowska, M., Jędroszka, D., Witas, P. & Bednarek, A. K. Evaluate cutpoints: adaptable continuous data distribution system for determining survival in Kaplan–Meier estimator. Comput. Methods Prog. Biomed. 177, 133–139 (2019).
Barczak, W., Suchorska, W., Rubiś, B. & Kulcenty, K. Universal real-time PCR-based assay for lentiviral titration. Mol. Biotechnol. 57, 195–200 (2015).
Puram, S. V. et al. STAT3-iNOS signaling mediates EGFRvIII-induced glial proliferation and transformation. J. Neurosci. 32, 7806–7818 (2012).
Deleage, C. et al. Defining HIV and SIV reservoirs in lymphoid tissues. Pathog. Immun. 1, 68–106 (2016).
Erben, L., He, M.-X., Laeremans, A., Park, E. & Buonanno, A. A novel ultrasensitive in situ hybridization approach to detect short sequences and splice variants with cellular resolution. Mol. Neurobiol. 55, 6169–6181 (2018).
This work was supported by the V Foundation (S.V.P.), Cancer Research Foundation (S.V.P.), Emerson Collective Cancer Research Fund (S.V.P.), Barnes Jewish Hospital Foundation (S.V.P.), NCI 1K08CA237732 (S.V.P), Doris Duke Fund to Retain Clinician Scientists (S.V.P.), Doris Duke Clinician Scientist Development Award (S.V.P.), NIDCD T32DC000022 (T.F.B.), the Swedish Society of Medicine (M.M.), Washington University Department of Medicine Faculty Diversity Award (J.S.F.), Israel Science Foundation (I.T.), Rising Tide Foundation (S.V.P. and I.T.), Mexican Friends New Generation Grant (I.T.), Mauricio Schwarz (I.T.), Zuckerman STEM Leadership Program (I.T.) and The Dr Celia Zwillenberg-Fridman and Dr Lutz Zwillenberg Career Development Chair (I.T.). The funding sources had no involvement in the design, conduct and reporting of the research.
I.T. is a member of the Scientific Advisory Board (SAB) of Immunitas Therapeutics. All other authors report no competing interests.
Peer review information
Nature Genetics thanks David Sidransky, Lisa Mirabello, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
(a) UMAP of all cells (n = 70,970) colored by expression of selected marker genes. (b) UMAP of immune cells (n = 22,818) colored by expression of selected marker genes. (c) Histologic sections of two representative HPV+ (p16+) and HPV- (p16-) oropharynx tumors (OP34 and OP12), stained by H&E (top) and p16 (bottom). Staining was repeated three independent times with similar results. Scale bar = 100 μm. (d) UMAP of all cells colored by detection of at least one read from HPV16 genes. (e) Dot plot showing variability in expression of HPV genes (rows) across patients (columns). The last column summarizes all HPV-negative tumors. The top row shows the sum of HPV gene expression per patient (HPVtotal). The size of each dot represents the fraction of cells with at least one read for that gene in each patient, while the color represents the fraction of HPV reads in one patient that reflect the corresponding gene. For the latter metric, HPVtotal is set to 1.
(a) Average CNA profiles of malignant cells, normal epithelial cells and fibroblasts/endothelial cells used as reference for each patient. Each row is a cell subset within a patient. Rows are ordered by cell subset and patient ID. Columns are chromosomal positions. For each row and chromosome, the chromosome was split into five bins. (b) UMAP of all cells colored by HPV-positive tumor score. (c) CNA signal and correlation scatter plot of OP17. Cells are colored by their expression of the HPV-positive tumor score. (d) Violin plots showing expression of the OP9 mesenchymal signature (left panel) and the TCGA HNSCC mesenchymal signature (right panel) in four subsets of cells; 300 cells were randomly sampled from each subset to ensure equal-sized groups. (e) Dot plot showing variability in HPV gene expression between subclones in one patient, OP4. The size of each dot represents the fraction of cells with at least one read for that gene in each subclone, while the color represents the fraction of HPV reads in one subclone that reflect the corresponding gene. For the latter metric, HPVtotal is set to 1.
(a) Histologic section of the lateral margin from OP34, stained by H&E. A piece of mucosa was taken beyond this histologically clear (pathologically negative) margin for scRNA-seq (labeled ‘margin’). Staining was repeated three independent times with similar results. Scale bar = 1000 μm. (b) CNA signal and correlation scatter plot of OP34. Cells are colored by their expression of the HPV-positive tumor score. Epithelial cells from the margin sample are circled. (c) CNA plot of OP34. Cells were randomly sampled from all subclones in equal numbers to ensure equal-sized groups. Column at the right shows the origin of cells from the tumor core and margin samples. (d) Heatmap of differentially expressed genes in the three epithelial cell subsets of lung adenocarcinoma sample TH179 – normal epithelial cells, invasive malignant cells and malignant cells from the tumor core. Rows are genes, columns are cells. Cells were randomly sampled from the normal and core subsets to ensure equal-sized groups. (e) CNA plot of lung adenocarcinoma sample TH179. Column at the right shows the origin of cells from the tumor core and margin samples. (f) HPV expression in normal epithelial cells. Violin plots showing values for CNA signal and CNA correlation for the 51 HPV-positive and 779 HPV-negative negative nonmalignant epithelial cells from HPV-positive patients, as well as for 830 randomly sampled cancer cells from the same patients, one cancer cell per patient sampled per nonmalignant epithelial cell. (g) Volcano plot of differentially expressed genes between nonmalignant epithelial cells (defined by lack of CNAs) with or without HPV expression. P-value derived from two-sided t-test adjusted for multiple comparisons.
(a) Heatmap showing relative expression of differentially expressed genes (rows) across all tumor samples (columns). Selected genes include the top 50 preferentially expressed genes from each tumor. (b) Hierarchical clustering of ‘pseudobulk’ tumor profiles (defined by averaging all malignant cells per sample). Shown are Pearson correlations, ordered by the clustering of samples. Bottom panels show additional tumor characteristics with the same tumor ordering as in the heatmap, including (from top to bottom): the percentage of cells with detected HPV reads, the clinical HPV status (defined by p16 staining), three TCGA subtype scores, and scores for all meta-programs defined in Fig. 3c, d. (c) UMAP of all malignant cells, colored by mRNA expression of CDKN2A (encoding for p16). OP19 is circled.
Extended Data Fig. 5 Heterogeneity among common cell types in the OPSCC microenvironment, related to Fig. 3.
For each of the common cell types in the OPSCC microenvironment (endothelial cells, fibroblasts, macrophages, T cells, B cells, and myofibroblasts), the corresponding panel shows meta-programs, as defined using the same approach as performed for malignant cells and shown in Fig. 3d. Shown are the relative expression levels of meta-program genes (rows) in all cells of the corresponding cell types (columns). Top panels indicate the patient of origin for all cells.
(a) Percentage of cells positive for E6 or E7 in RNA ISH analyses (n = 4 tumors, shown are mean and standard errora cross nine regions per tumor). Percentage of HPVon cells by scRNA-seq (bottom) correlates with RNA ISH values (P < 0.01, ANOVA). (b) IHC of representative HPV-positive (OP5, OP6, OP33, and OP35) and HPV-negative (OP19) tumors and normal tonsil stained for malignant-cell specific marker p16 (pink) and viral E6 protein (brown). Similar results were obtained in three independent experiments. White arrowheads denote p16 positivity without E6 expression. Scale bars: Low magnification = 10 mm (tonsil, OP5, OP6), 5 mm (OP19), 7.5 mm (OP35); intermediate magnification = 1000 μm; highest magnification = 250 μm. (c) Enriched MSigDB Hallmark gene-sets among genes significantly overexpressed in HPVon versus HPVoff cells. X-axis: fraction of significantly upregulated genes in the gene set. (d) Differential expression of all analyzed genes between HPV-related classes of malignant cells. X-axis: difference between HPVon and HPVneg cells; Y-axis: difference between HPVon and HPVoff cells, averaged across all HPV-positive patients. Genes are colored by their assignment to meta-program (right legend). CDKN2A (p16, highlighted in red) was not significantly different between HPVon versus HPVoff cells, but was the most overexpressed gene in HPVon cells compared to HPVneg cells. (e) For three meta-programs (panels), cells were divided into 10 bins of equal size, ranked by average expression from low (left) to high (right). Y-axis: mean ratio of cells belonging to an HPV subset versus the expected number assuming random distribution across bins. Error bars reflect SEM based on 100 re-sampling runs (n = 5 patients for HPVneg, n = 11 patients for HPVon and HPVoff). P-values are based on chi-square test. (f) Fractions of cycling cells, EpiSen-high cells and HPVon cells across genetic subclones. Subclones with a high fraction of HPVon cells tend to also have higher proliferation (p < 0.05 for correlations in OP13, OP33 and OP35). (g) G1/S (X-axis) and G2/M (Y-axis) scores of all malignant cells, colored by the percentage of cycling cells among their neighbors (20 closest cells in this plot).
(a) HPV expression and G1/S gene expression across cervical squamous cell carcinoma TCGA samples. Shown are residuals after regression (Supplementary Table 3). (b) Variability in HPV expression between cell lines. Dot size and color represent fraction of cells with at least one read and fraction of HPV reads that reflect the corresponding gene, respectively. (c) Cells were divided into 5 bins by average G1/S expression from low (left) to high (right). Y-axis: mean ratio of cells in an HPV subset versus expected number assuming random distribution. Error bars are SEM by 100 resampling runs. P-value based on chi-square test. (d) Immunocytochemistry of 93VU147T cells probed with Ki67 (red), p16 (green), and DAPI (blue). Scale bar = 100 μm. (e) Percentage of Ki67 positive cells among p16 positive and negative cells. 50 cells were counted across four fields (n = 4). (f) Relative expression of E6 and E7 in non-target, control (NT) compared to E6 or E7 CRISPRi knockdown (KD) 93VU147T (left) or SCC47 (right) lines (n = 3; P < 0.0001, t-test). (g) Relative expression of p16 in same lines as in (F). Data are presented as mean + /− SEM. There was no change in p16 upon E6 or E7 knockdown (n = 3). (h) Relative expression of E6 and E7 among HPVon and HPVoff single clones derived from 93VU147T (left) and SCC47 (right) after three weeks of culture and numerous passages. HPVon and HPVoff clones maintained relatively high and low expression states (n = 3; P < 0.005, t-test). (i) Relative expression of p16 in same clones as in (H). (j) Proportion of cycling cells in HPVon and HPVoff single clones in 93VU147T (left) and SCC47 (right) by flow cytometry (n = 3; P < 0.05, t-test). (k) Relative proliferation of HPVon single clones from 93VU147T (left) and SCC47 (right) cultured under normal growth conditions (+FBS) or serum starvation (-FBS) for 48 hours. Proliferation was reduced with serum starvation (n = 5; P < 0.001, t-test). (l) Relative expression of E6 and E7 in HPVon single clones in 93VU147T (left) and SCC47 (right) under normal growth conditions (+FBS) or serum starvation (-FBS) for 48 hours (n = 3).
(a) HPV copies per genome of E6 and E7 (normalized to albumin) for HPVon and HPVoff single clones from 93VU147T (left) and SCC47 (right). (b) DNA ISH (DNAScope) of representative HPV-positive (OP14, OP20, OP33, and OP35) and HPV-negative (OP16) tumors for viral E6 (left) and E7 (right) DNA (red) with immunofluorescence co-staining for regions of tumor as marked by p16 protein (green) and nuclei by DAPI (blue). HPV-positive tumors display p16 positive malignant cells with homogenous E6 and E7 DNA signal. HPV-negative tumors do not have signal for p16 protein or E6 or E7 DNA. Scale bar = 1000 μm. (c) Percentage of cells positive for E6 or E7 DNA among p16 positive malignant cells in DNA ISH analyses (n = 4 tumors, five areas per tumor). Nearly all p16 positive malignant cells demonstrated E6 or E7 DNA signal. (d) Relative expression of E6 and E7 in 93VU147T cells treated with vehicle or tazemetostat (n = 3). All doses did not significantly affect cell viability. (e) Relative expression of E6 and E7 in SCC47 cells treated with vehicle or escalating concentrations of decitabine (n = 3). All doses did not significantly affect cell viability. (f) Relative expression of E6 and E7 in HPVon and HPVoff single clones from 93VU147T (left) and SCC47 (right) treated with tazemetostat, decitabine, or vehicle. HPVon clones show reduction in E6 and E7 expression upon tazemetostat or decitabine treatment compared to HPVoff clones (n = 3; P < 0.00001, t-test). (g) Proportion of viable cells after treatment of SCC47 HPVon and HPVoff single cell clones with cisplatin, relative to cells treated with vehicle (dashed line). HPVon clones were more susceptible to cisplatin compared to HPVoff clones (n = 5; P < 0.00001, t-test). (h) Invasion of HPVon and HPVoff single clones from 93VU147T (top) and SCC47 (bottom). Scale bar = 100 μm. (i) Relative invasion of HPVon and HPVoff single clones from 93VU147T (left) and SCC47 (right) cells. HPVoff cells were more invasive than HPVon (n = 4; P < 0.05, t-test). (j) Improved disease-free survival in HPVhigh compared to HPVlow samples, among TCGA p16+ oropharyngeal samples (n = 28; P = 0.05). (k) Top: percentage of p16 positive malignant cells (by IHC) and proportion of HPVon cells (by scRNA-seq). Bottom: p16 staining from tumors with low (OP9), intermediate (OP35) and high (OP20) proportions of HPVon cells (bottom). No correlation between HPVon proportion and percentage of p16 positive cells (n = 10 tumors). Scale bar = 100 μm.
About this article
Cite this article
Puram, S.V., Mints, M., Pal, A. et al. Cellular states are coupled to genomic and viral heterogeneity in HPV-related oropharyngeal carcinoma. Nat Genet 55, 640–650 (2023). https://doi.org/10.1038/s41588-023-01357-3
This article is cited by
Multiomic analysis of cervical squamous cell carcinoma identifies cellular ecosystems with biological and clinical relevance
Nature Genetics (2023)
Nature Genetics (2023)
Predicting HPV association using deep learning and regular H&E stains allows granular stratification of oropharyngeal cancer patients
npj Digital Medicine (2023)