Whole-Genome Analysis of Human Papillomavirus Types 16, 18, and 58 Isolated from Cervical Precancer and Cancer Samples in Chinese Women

Human papillomavirus (HPV) types 16, 18 and 58 are ranked the top three high-risk HPV types for cervical intraepithelial neoplasia (CIN) and invasive carcinoma. We aimed to evaluate the diversity of HPV16, HPV18, and HPV58 genetic variants by HPV capture technology combined with next generation sequencing. 295, 73, and 148 variations were observed in 51 HPV16, 7 HPV18, and 11 HPV58 genomes, respectively. HPV16 isolates were predominantly of the A variant lineage, and sublineage A4 (Asian) was the most common. However, there were no significant differences in the distribution of HPV16 A1–3 and A4 variants between CIN1-, CIN2/3, and cervical cancer groups. The 7 HPV18 genomes were assigned to the A3/A4 and A1 sublineages. Of the 11 HPV58 genomes, the most predominant variant sublineages were A2, followed by A1 and B2. The majority of HPV16/18 samples containing contiguous genomic deletions were found to harbor HPV integration. Some T-cell epitope sequences in HPV16 E6 and E7 showed considerable divergence from the prototype NC_001526, suggesting their importance in immunotherapy of HPV-associated carcinomas. In conclusion, sequence diversity and phylogenies of HPV16, 18, and 58 provide the basis for future studies of discrete viral evolution, epidemiology, pathogenicity, and the differences in response to vaccines.

Recently, Mirabello L et al. 7 reported the largest study to date of HPV16 variant lineages and the risk of cervical precancer/cancer in 3200 women primarily from white, non-Hispanic population and first showed that the variant lineages that are often grouped are heterogeneous in pathogenicity. However, HPV16 variant lineages have co-evolved with specific human populations. Most women in their study had an HPV16 A1-3 variant lineage infection and only four cases of A4 sublineage 7 . Therefore, the relationship between HPV16 A4 sublineage and the risk of cervical precancer/cancer in Chinese populations living in China requires further study. To date, no studies on the role of HPV18 and HPV58 (sub)lineages in predicting cancer risk based on available complete genome sequences were reported, due to the whole-genome sequences of HPV16, HPV18, and HPV58 isolated from China being still limited.
Additionally, HPV integration essentially contributes to HPV-mediated neoplastic transformation and previous HPV variant studies lack high-resolution HPV breakpoints data. It is intriguing to determine which HPV variant lineage is preferentially integrated into the host genome. Recently a new method based on HPV-targeted hybrid capture together with flanking genomic sequences was described by our group 13 . Using this approach, HPV type, as well as variant and HPV integration in 47 tissue specimens with cervical cancer 13 and 166 cervical biopsy specimens with normal cervical epithelium and CIN 14 were analyzed, and 53 HPV16-, 12 HPV58-, and 7 HPV18-positive samples were identified.
In this study, variations of HPV16, HPV18, and HPV58 at the whole genome level were identified and the evolutionary phylogenies were described, aiming to provide basic data for future studies on their discrete evolution, epidemiology, pathogenicity and different responses to vaccines.

Results
Mutation and phylogenetic analysis of HPV16 sequences. Of the 53 HPV16-positive samples, 51 HPV16 genomes were assembled from CIN 1-(n = 8), CIN 2/3 (n = 15), and cervical cancer group (n = 28) (Supplementary Table S1). HPV16 genomes in T10 and T21 failed to be assembled, mainly due to the poor coverage across the genome. 43 of the 51 specimens successfully retrieved HPV16 full-length sequences, while 6 samples with cervical cancer and 2 samples with CIN 2/3 contained contiguous amplicons with very low or no sequence reads, which were interpreted as genome deletions. The deleted region often included fragments within the E2, E4, E5, and L2 genes, whereas the E6, E7, and LCR regions generated high sequence read numbers. Based on our previous HPV integration data 13,14 , we confirmed that 5 of the 6 cervical cancer samples and 2 CIN 2/3 samples with deletions harbored HPV integration (Supplementary Table S1).
Mutation and phylogenetic analysis of HPV18 sequences. Seven HPV18 genome sequences were obtained from CIN 1-(n = 3) and cervical cancer groups (n = 4) (Supplementary Table S2). The 3 isolates from CIN 1-samples generated complete genome sequences, while the 4 isolates from cervical cancer samples contained contiguous amplicons with very low or no sequence reads, which were interpreted as genome deletions. The deleted region often included fragments within the E2, E4, E5, L2, and L1 genes, whereas the E6, E7, and LCR regions generated high sequence read numbers. On the basis of validated HPV integration breakpoints in our previous study 13 , we found that 3 of the 4 cervical cancer samples with deletions harbored HPV integration (Supplementary Table S2).
The phylogenetic analysis was conducted based on multiple nucleotide sequence alignments of whole genomes, including the unique 7 HPV18 genomes in this study and the 10 published representative HPV18 genomes for viral variant sublineages. Analyses on whole HPV18 genomes indicated that all the 7 viral strains were mapped to the HPV18 A lineage (Fig. 2). Of the 7 HPV18-positive samples, the predominant variant sublineages were A3/A4 (57.14%, 4/7), followed by A1 (42.86%, 3/7).
The same sequences in T11, T19, and T30 were observed, therefore, 9 unique HPV58 genomes were obtained. The phylogenetic analysis was conducted based on multiple nucleotide sequence alignments of whole genomes, including the 9 newly available unique HPV58 genomes in this study and 8 previously published representative HPV58 genomes for viral variant lineages and sublineages. Analyses on whole HPV58 genomes indicated that 7 viral strains were closely related to HPV58 lineage A and 2 viral strains were closely related to HPV58 lineage B (Fig. 3). Of the 11 HPV58-positive samples, the most predominant variant lineages were A2 (45.45%, 5/11), followed by A1 (36.36%, 4/11), and B2 (18.18%, 2/11) (Supplementary Table S3).
To determine which HPV variant lineage is preferentially integrated into the host genome, the ratio of integration in different HPV 16 variant sublineages was investigated. Out of the 17 HPV16 A1-3 isolates, 47% (8/17) were integrated into the host genome. Of the 33 HPV16 A4 isolates, 48% (16/33) were integrated into the host genome (Table 3).
Amino acid changes in CD4 + and CD8 + T-cell epitopes of HPV16 E6 and E7. The extensive genetic diversity within HPV16 E6 and E7 can have a significant impact on immune recognition of these antigens. A comprehensive investigation into the immunological impact of HPV16 E6 and E7 sequence polymorphism is essential for optimizing the adoptive cell therapy approaches and vaccine development strategies. According to the epitopes specific for both CD4 + and CD8 + T cells defined and reviewed in previous publications [15][16][17][18][19][20][21][22] , amino acid changes were found in 5 CD4 + epitopes and 5 CD8 + epitopes of HPV16 E6 protein, 5 CD4 + epitopes of HPV16 E7 protein (Supplementary Tables S4 and S5). Some of the nonsynonymous mutations were affecting multiple epitopes. For example, a C-to-T substitution at coordinate 790 (NC_001526) resulted in the change of residue 77 (R → C) in HPV16 E7 in only one HPV16 isolate, where CD4 + epitopes GQA (E7 43-77), CDS (E7 61-80), STH (E7 71-85), and IRT (E7 76-86) were located. A G-to-A substitution at coordinate 176 resulted in the change of residue 32 (D → N) in HPV16 E6 in 4 HPV16 isolates, a T-to-G substitution at coordinate 178 resulted in the change of residue 32 (D → E) in HPV16 E6 in 33 HPV16 isolates, and a T-to-C substitution at coordinate 183 resulted in the change of residue 34 (I → T) in HPV16 E6 in only 1 HPV16 isolate, where CD8 + epitopes TIH (E6 29-37), TIH (E6 29-38), and HDI (E6 31-38) were located. The positions of the nonsynonymous changes located in the epitopes are illustrated in Fig. 4 and tabulated in Supplementary Tables S4 and S5.

Discussion
It was well recognized that the association of HPV and cervical carcinogenesis was based on the presence of HPV DNA sequence 23 . With the development and improvement of assays for HPV detection and analysis, the underlying nucleotide changes could be comprehensively studied. Here, we adopted HPV capture technology combined   Table 3. Association between HPV16 A variants and HPV16 integration.
with next generation sequencing not only to detect the complete genome sequences, but also to identify HPV integration. In the current study, de novo assembly was successfully performed for 69 HPV genomes, including 51 HPV16 genomes, 7 HPV18 genomes, and 11 HPV58 genomes. We described the genome variability and evolutionary phylogeny of HPV16, HPV18, and HPV58. Our whole genome sequence-based data enabled us to make several novel observations that could not be available with sequencing of limited regions. With this dataset, we were able to find more SNPs and define HPV lineages and sublineages across the genome. The diversity of the ORFs/regions varies by HPV types. Each ORF/region differs in sequence diversity, for HPV16, from most variable to least variable: NCR > LCR/E5 > E4/L2 > E2/E7/E6 > L1/E1; for HPV18, NCR > L2/E2/E4 > LCR/L1 > E1/E6/E7; for HPV58, NCR > E5/E7/E4 > LCR/L2 > L1/E2 > E1/E6. Our observation supported that L1 is a conserved region of HPV; however, there are still a number of SNPs which may affect the efficacy of vaccination to some extent. It is worth noting that, both Cervarix ® and Gardasil ® are virus-like particles (VLPs) of L1, which are licensed for use as prophylactic vaccines for HPV in many countries worldwide. Our work showed genetic variability of L1, making it essential to take into account the HPV16 variant lineages or population stratification when developing vaccines 24 .
It is well known that HPV16 variant lineages have co-evolved with specific human populations and are most prevalent in specific geographic regions. Our results confirmed that A4 lineages account for the majority of HPV16 isolates obtained from Chinese populations. Compared with women with an HPV16 A1-A3 European lineage infection, women with an HPV16 A4 Asian lineage infection had no significantly increased risk for cervical cancer. However, Hang et al. reported that A4 variants appear to predict higher risk of cervical cancer among HPV16-positive women 25 . Thus, further study is required to determine whether A4/Asian variants represent a higher risk for cervical cancer. Additionally, functional studies regarding HPV polymorphisms across the HPV16 genome of A4/Asian variants should be conducted to explore the biological evidence of carcinogenicity.
Some previous studies investigating HPV16 full length sequences in cervical specimens have shown that the contiguous deletions identified to be highly associated with cancer are suggestive of a pattern of HPV integration 26,27 . However, HPV integration breakpoints were not available due to the limitation of the methods in these studies. Through analyzing the corresponding HPV integration breakpoints validated via Sanger sequencing in our previous study 13 , 87.5% (7/8) of the 8 HPV16 samples and 75.0% (3/4) of the 4 HPV18 samples containing contiguous genomic deletions were found to harbor HPV integration, indicating that these deletions are indeed signatures of HPV integration. HPV DNA can integrate into the host genome, sometimes in precancer and especially in cancers 28 . The deletion patterns observed in our study indicated that the HPV16/18 DNA in the specimens with CIN 2/3 or cervical cancer may exist in the pure integrated form. Moreover, out of the retrieved 43 full-length HPV16 specimens, 39.5% (17/43) were also found to harbor HPV integration breakpoints broadly distributed across the whole genome (Supplementary Table S1), revealing a mixture of episomal and integrated forms in these samples. It is of significant interest to observe that small deletions of 177 to 706 bp in size for HPV16 and a 24-bp deletion for HPV18 failed to be amplified by PCR amplification, which further validated the hypothesis that these deletions are due to the HPV DNA integration into the host genome in these samples. On the contrary, a 135-bp deletion in CIN 3-6 (7770 bp in length) and a 30-bp deletion in CIN 3-3 (7875 bp in length) existed in biopsies and were confirmed by PCR amplification and Sanger sequencing. This minor region of HPV deletion has not been reported in HPV genomes before.
The HPV-encoded early proteins, especially oncoproteins E6 and E7, are promising tumor-specific antigens to be targeted by immunotherapeutic approach since they are consistently expressed in HPV-associated malignancies and precancerous lesions. Indeed, adoptive transfer of E6/E7-targeted T cells is able to induce regression of cervical cancer 29,30 . In this study, sequence analysis of the gene encoding E6 and E7 in HPV isolates from 51 HPV-positive specimens has revealed considerable E6 and E7 sequence divergence from the prototype NC_001526. Importantly, T cells recognition of E6 and E7 epitopes might be greatly affected by this sequence polymorphism, which implies that HPV variant-specific epitopes are warranted to expand and activate functional adoptive T cells.
Further studies with complete sequencing of HPV genomes from large population-based and case-control studies of cervical precancer and cancer are required to understand viral carcinogenesis and possibly to improve preventive and therapeutic strategies in the future. . Amino acid changes in CD4 + and CD8 + specific T cell epitopes of HPV16 E6 and E7 proteins. Amino acid changes in at least one of the 51 HPV16 isolates at CD4 + and CD8 + specific T cell epitopes are marked with hollow and solid arrows, respectively. Stacking arrows indicate that the amino acid change is in a peptide which serves as both CD4 + and CD8 + epitopes.