Case control study comparing the HPV genome in patients with oral cavity squamous cell carcinoma to normal patients using metagenomic shotgun sequencing

The aim of this study was to carry out a case control study comparing the HPV genome in patients with oral cavity squamous cell carcinoma (OC-SCC) to normal patients using metagenomic shotgun sequencing. We recruited 50 OC-SCC cases which were then matched with a control patient by age, gender, race, smoking status and alcohol status. DNA was extracted from oral wash samples from all patients and whole genome shotgun sequencing performed. The raw sequence data was cleaned, reads aligned with the human genome (GRCH38), nonhuman reads identified and then HPV genotypes identified using HPViewer. In the 50 patients with OC-SCC, the most common subsite was tongue in 26 (52%). All patients were treated with primary resection and neck dissection. All but 2 tumors were negative on p16 immunohistochemistry. There were no statistically significant differences between the cases and controls in terms of gender, age, race/ethnicity, alcohol drinking, and cigarette smoking. There was no statistically significant difference between the cancer samples and control samples in the nonhuman DNA reads (medians 4,228,072 vs. 5,719,715, P value = 0.324). HPV was detected in 5 cases (10%) of OC-SCC (genotypes 10, 16, 98) but only 1 tumor sample (genotype 16) yielded a high number of reads to suggest a role in the etiology of OC-SCC. HPV was detected in 4 control patients (genotypes 16, 22, 76, 200) but all had only 1–2 HPV reads per human genome. Genotypes of HPV are rarely found in patients with oral cancer.


Scientific Reports
| (2021) 11:3867 | https://doi.org/10.1038/s41598-021-83197-x www.nature.com/scientificreports/ smoking since 1975 (from ~ 40 to 20%) has caused only a moderate change in the incidence of oral cancer 4,5 . This indicates a paradigm shift in the cause of oral cancer and the need to search for other risk factors. High risk genotypes of Human papilloma virus (HPV), genotypes 16, 18 and 33, now account for 60-80% of all oropharyngeal squamous cell cancers [6][7][8][9][10][11] . This has led to the hypothesis that some genotypes of HPV may also be responsible for the epidemiology change in the etiology of oral cancer as well. Although the prevalence of the high risk genotypes HPV 16, 18 and 33 varies greatly across multiple studies on oral cancer 12 it is now accepted that these high risk genotypes are unlikely to be responsible. With regards to other genotypes of HPV, over the past 10 years an increasing number of HPV types have being found in oral samples 13,14 . The currently available HPV detection kits only detect a limited number of HPV genotypes. Since there are over 200 different genotypes of HPV, we hypothesized these may be responsible for some cases of oral cancer. The aim of our study was therefore to carry out a case control study in 50 oral cancer patients and 50 matched control patients to detect all 200 genotypes of HPV using next generation sequencing.

Methods
To examine the hypothesis that oral HPV is associated with OC-SCC, we performed a case control study with mouthwash samples from 50 patients with OC-SCC and 50 subjects with no oral lesions. Total genomic DNA was extracted from cell pellets of the mouthwash samples. Subject recruitment, sample collection, data generation and analysis are detailed below.

Recruitment of human subjects for oral cavity squamous cell carcinoma cases and matched controls.
A case-control study was approved by the Institutional Review Board of Memorial Sloan Kettering Cancer Center (IRB 15-256) and New York University School of Medicine (i15-00389). All methods were performed in accordance with the relevant guidelines and regulations. Informed consent was obtained from all patients in the study population. From Memorial Sloan Kettering Cancer Center (MSKCC), we recruited 50 oral cavity squamous cell carcinoma (OC-SCC) cases. These cases were then matched with a control patient by age, gender, race, smoking status and alcohol status. Overall, 100 subjects (50 cases and 50 controls) were enrolled in this study. The controls comprised patients with thyroid nodules (benign or malignant). These patients had complete head and neck examination including flexible laryngoscopy and were found to have no evidence or oral cancer. In patients with oral cancer, OC-SCC was confirmed by histological examination of biopsy specimens. Pathological grade and stage of OC-SCC were determined by histopathological examination at the time of surgical resection. Demographic and clinical information was collected for each patient.

Detection of HPV in OC-SCC tumor samples. High risk (HR) HPV infection was determined by Tissue
HR HPV PCR and p16 immunohistochemistry on tumor tissue of OC-SCC patients. Ang et al. has reported that the expression of p16INK4a by immunohistochemistry correlated well (kappa = 0.80; 95% CI 0.73-0.87) with the presence of HPV DNA in tumors 15 . This is cheaper and easier to carry out than ISH and PCR and therefore immunostaining of tumor sections for p16INK4a is now used as an indirect marker for HPV status in clinical pathology laboratories around the world 16 . In prospective randomized trials on treatment of patients with HPV related oropharyngeal cancer, p16 immunohistochemistry is now used as the surrogate marker for HPV positivity in the USA. Rarely some p16 positive tumors may not be HPV related. The addition of HPV PCR to the detection methodology would increase specificity as described by Prigge et al. 17 . In our study, all pathology specimens were examined by a single pathologist specialized in head and neck pathology (NK). p16 immunohistochemistry was performed as follows: four-micrometer tumor sections were deparaffinized, and after heatinduced epitope retrieval, immunohistochemistry for p16INK4a was performed with the primary antibody dilution of 1:7 as per manufacturer's protocol (CINtec Histology Kit, catalog #9517, Roche mtm Laboratories AG, Heidelberg, Germany). Cases with nuclear and cytoplasmic immunolabeling in at least 70% of the tumor cells were considered positive for p16. In patients with available tissue, HR HPV PCR was done to confirm results of the p16 immunohistochemistry.

Detection of all 200 genotypes of HPV in mouthwash samples of OC-SCC patients and control patients using metagenomic shotgun sequencing (MSS).
The detection of HPV in oral rinse samples and saliva samples using PCR has been reported as being a sensitive and specific method for the detection of HPV related oropharyngeal cancer [18][19][20][21][22][23][24][25] . This technique has been reported as a potential screening method for HPV oropharynx cancer [18][19][20][21][22] and also for the detection of persistent disease or recurrent disease following treatment in patients with HPV related oropharyngeal cancer [23][24][25] . We therefore used oral rinse specimens from patients with oral cavity cancer and patients with no head and neck cancer for the detection of HPV genotypes. The workflow for the collection and processing of samples and detection of HPV by DNA sequencing is shown in Fig. 1 and detailed as follows: Mouthwash sample collection, processing, storage and DNA extraction. The participants rinsed their mouth vigorously with 10 ml sterile saline for 30 s and then mouthwash collected in a 50 cc falcon canical flask container. After centrifugation at 3120×g for 20 min, supernatants were decanted and then the cell pellets were transferred into a 2-ml Eppendorf tube and stored at − 80 °C freezer for further study. All oral rinse specimens were taken prior to surgical resection of the OC-SCC. These samples were de-identified and coded. Using the MoBio method, we successfully extracted DNA from all 100 oral wash samples. DNA yield was measured by the Nanodrop method. www.nature.com/scientificreports/ Library preparation and samples sequencing. The DNA fragmentation and shotgun metagenomic library construction and sequencing was carried out at the BGI Americas Corp (Cambridge, MA) using Kapa kit and Illumina HISeq X Ten, with 100 samples pooled into 8 lanes.
Raw sequence data quality control. The Illumina sequencing process supplied raw sequence reads in fastq format and assigned a quality score called 'Phred scores' to describe the base accuracy. The raw sequence quality was reviewed using FASTQC software and any adapters and low-quality reads were removed using Trimmomatic. Low quality reads were defined as leading low quality or N bases, quality < 25; trailing low quality or N bases, quality < 25, scanning the read with a 4-base wide sliding window and cutting when the average quality per base drops below 25 Fig. 2).
Detecting and genotyping HPV in mouthwash samples obtained from patients with OC-SCC and healthy controls. We compared HPV prevalence and abundance between the 50 patients with OC-SCC and 50 subjects with no oral lesions. The nonhuman DNA reads were searched for HPV DNA using HPViewer ( Supplementary  Fig. 3). We have developed a pipeline to identify HPV reads generated by MSS based on the sequence similarities to the genome sequences of HPV prototypes and used it in a survey of HPV in healthy subjects 26 . Since then, we have made several improvements to allow more accurate detection and classification of HPV reads in human samples in a new software program HPViewer 27 . Briefly, we found that HPV shares not only massive amount of homologous sequences among different HPV types but also extensive simple repeats with human and some bacteria. The inter-type homologous sequences cause errors in HPV genotyping and the shared repeats between human and bacteria can be mistaken as HPV DNA. In HPViewer, these shared regions in the reference HPV genomes are masked to minimize these errors. We also replaced BLAST in the old pipeline with Bowtie2, an

Results
Patient characteristics of cases and controls. Patient demographics are shown in Table 1. There were no statistically significant differences between the cases and controls in terms of gender, age, race/ethnicity, alcohol drinking, and cigarette smoking. In the 50 patients with OC-SCC, the subsite was tongue in 26 (52%), floor of mouth in 8 (16%), lower gum in 7 (14%), upper gum in 4 (8%) ( Table 2). All patients were treated with primary resection and neck dissection.
Tumor characteristics of OC-SCC. Pathology details are shown in Table 2. Of the 50 OC-SCC patients, 37 (74%) had pathological T1T2 tumor and 21 (42%) had a pathological positive neck. The majority of primary tumors were either well (20%) or moderately differentiated (64%). All but 3 tumors were negative on p16 immunohistochemistry. Of 26 samples tested by HPV PCR, 25 were negative for high risk HPV and only 1 was positive. The positive HPV PCR case was also positive on p16 imunohistochemistry.  (Fig. 2). There was no statistically significant difference between the cancer samples and control samples in the nonhuman DNA reads (medians 4,228,072 vs. 5,719,715, P value = 0.324, Mann Whitney test (Fig. 3).

Detecting and genotyping HPV in mouthwash samples obtained from patients with OC-SCC and controls. Using
HPViewer, HPV was detected in five cases (10%) of OC-SCC and four controls (8%) ( Table 3). The raw data for HPV reads is accessible at http://www.ncbi.nlm.nih.gov/biopr oject /69271 3 using the BioProject ID PRJNA692713. In the 5 OC-SCC cases, only 1 tumor sample (sample 90) yielded a considerable number of HPV reads suggesting a role in the etiology of OC-SCC in this patient. This was genotype 16. This patient was also positive on tissue HPV by PCR and positive p16 immunohistochemistry. The location of the tumor was the anterior 2/3rds of the oral tongue. The other 4 tumor cases yielded only about 1-2 HPV reads per human genome. These were in serotypes HPV 10, 16 (2 cases) and 98. All 4 patients were negative on p16 tissue immunohistochemistry and tissue HPV-PCR. Of the 4 control patients, all had only 1-2 HPV reads per human genome. These were in genotypes HPV 16, 22, 76, and 200.

Discussion
Over the past 20 years the epidemiology of oral cancer has been changing. Traditionally, oral cancer has been caused by smoking and heavy alcohol consumption 3 . There has been a steady decline in the use of cigarettes and alcohol in the population 4 . Despite this, the incidence of oral cancer has failed to decline. New studies show that there is an increasing number of patients who do not smoke or drink alcohol excessively but still develop oral cancer 5 . These patients tend to be younger with an increased frequency in females. The cause of oral cancer in these patients remains an enigma. This change in epidemiology has occurred over the same time period as the change in epidemiology of oropharyngeal cancer. In oropharyngeal cancer it is now recognized that the agent causing cancer is high risk genotypes of the Human Papilloma virus, notably HPV 16, 18, and 33 12 . This has led several studies to be carried out on oral cancer patients to identify high risk HPV genotypes, either in saliva or in tissue tumor specimens. The results of these studies have been highly variable with some studies showing little association whereas others have reported a strong association 12 . However, it is now generally accepted that the high risk genotypes HPV16, 18, and 33 are unlikely to be responsible for the change in epidemiology or oral cancer. There are 200 different genotypes of HPV. The traditional HPV detection kits/methods cover only a limited number of high/low risk (HLR) HPV genotypes [28][29][30][31][32] . Most of these traditional HPV detection methods are PCR-based and detect 14 genotypes (HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, and  68). Since these detection kits do not cover all 200 different genotypes of HPV, it is possible that these detection methods may be failing to identify other genotypes of HPV which may be responsible for causing oral cancer.
The limited ability of current commercial HPV detection kits can be overcome by metagenomic shotgun sequencing (MSS). This is a non-selective approach that, in theory, permits the identification of all HPV sequences. Recently, MSS has been used to detect HPV in some human samples and to identify several novel  [33][34][35][36][37][38] . In particular, a study that involved condyloma samples shown to be negative for HPV by traditional PCR revealed the ability of MSS to identify many putative HPV sequences 33 . Using MSS, we surveyed HPV distribution in various body sites of 103 healthy human subjects and found that the majority of the 109 HPV types detected could not be detected using the widely used commercial kits and do not belong to the HLR HPV types 26 . Interestingly, the HPV types detected have strong organ tropism and the oral HPV community is different from that of the vagina. These findings raised the possibility that the oral HPV types that are invisible to the traditional detection methods contribute to the etiology of human diseases, such as OC-SCC. The aim of our study was to carry out a case control study in 50 oral cancer patients and 50 matched control patients to detect all 200 genotypes of HPV by MSS. In our study our cancer patients and control patients were well matched in terms of age, gender, and smoking and alcohol status. We used oral rinse samples from each patient and extracted DNA from cell pellets prior to sequencing. DNA extracted from oral rinse samples has been reported to be a sensitive and specific method for    41 .
There is much interest in identifying factors responsible for oral cancer in these patients. It is possible other viruses such as herpes simplex, herpes zoster, Epstein Barr virus may be responsible though research on these common viruses have not shown any association to date 40 . Even more research is ongoing to identify bacteria which may be responsible. Studies on the oral microbiome have recently been published suggesting specific bacteria may be responsible 19,42,43 . Our own group recently identified that the periodontal pathogens Fusobacterium, Prevotella, Alloprevotella were enriched while commensal Streptococcus depleted in OC-SCC in nonsmoking patients with premalignant oral cavity lesions as well as oral cancer 19 . Clearly this is an area which requires much research effort to try to provide new insight into this devastating disease.
In conclusion, our study suggests no role in the aetiology of HPV in oral cavity cancer. However, our population of patients is fairly homogenous population (88% white ethnicity) and all from the USA. It is possible that there may be geographic or ethnic differences in the role of HPV across populations and we therefore cannot extrapolate from a single 50 patient study". Further research in different geographic and ethnic populations is needed. New research is also needed to explore other infectious agents such as bacteria or viruses that may be responsible for oral cancer. Although we saw few HPV reads in our metagenomic data, it is important to analyse the metagenomic sequences for other non human reads from other viruses, bacteria or even fungi. It is possible these other microbes may have an association with OSCC pathogenesis. To carry out such a comprehensive analysis requires complex bioinformatics as well as validation studies. These studies are currently underway.