Identifying the HLA DRB1-DQB1 molecules and predicting epitopes associated with high-risk HPV infection clearance and redetection

Several determining factors are involved in HPV infection outcomes; human leukocyte antigen (HLA) polymorphisms have been described as related factors. This study has ascertained the effect of genetic variation on HLA-DRB1 and DQB1 genes on HPV-16/-18/-31/-33/-45 and -58 clearance and redetection in Colombian women. PCR and qPCR were used for viral identification and the Illumina MiSeq system was used for HLA-typing of cervical samples (n = 276). Survival models were adjusted for identifying alleles/haplotypes related to HPV clearance/redetection; L1/L2 protein-epitope binding to MHC-II molecules was also predicted. Significant associations suggested effects favouring or hampering clearance/redetection events depending on the viral type involved in infection, e.g. just DRB1*12:01:01G favoured HPV-16 (coeff: 4.8) and HPV-45 clearance (coeff: 12.65) whilst HPV-18 (coeff: 2E-15), HPV-31 (coeff: 8E-17) and HPV-58 hindered elimination (coeff: 1E-14). An effect was only observed for some alelles when configured as haplotypes, e.g. DRB1*04:07:01G (having the greatest frequency in the target population) was associated with DQB1*02:01:1G or *03:02:03. Epitope prediction identified 23 clearance-related peptides and 29 were redetection-related; eight might have been related to HPV-16/-18 and -58 persistence and one to HPV-18 elimination. HLA allele/haplotype relationship with the course of HPV infection (clearance/redetection) depended on the infecting HPV type, in line with the specific viral epitopes displayed.

This study was aimed at identifying HLA-DRB1 and DQB1 alleles related to the clearance and redetection of the 6 HPV types having the greatest distribution in Colombia (HR-HPV-16, -18, -31, -33, -45 and -58) in a cohort of Colombian women using next generation sequencing (NGS) for HLA typing and quantitative PCR assay for viral detection. L1 and L2 protein peptides fitting into alleles were analysed for predicting which of them might have been related to infection events regarding each viral type. Such information is relevant to understanding specific infections' natural history and the genetic factors modulating them. The results should prove useful in identifying inmunological biomarkers enabling establishing HPV infection suceptibility and its clinical course.

Results
Selecting the study population. Two hundred and seventy-six women were considered eligible for this study; 12 were excluded as complete information was missing (lack of information regarding HPV or HLA typing). Cohort follow-up lasted 32.3 months; 206 women were visited four times and 59 women a fifth time during the follow-up period. Table 1 describes the target population and its sociodemographic characteristics. Regarding type-specific detection, HPV-16 had the greatest prevalence, followed by HPV-18; however, variations regarding specific type distribution were found during follow-up (shown in Supplementary Fig. S1).
HLA DRB1 and DQB1 allele and haplotype frequencies and the clinical course of HR-HPV infection. Supplementary Tables S1 and S2 give HLA DRB1 and DQB1 allele and haplotype frequencies. The results regarding type-specific infection clearance for 219 women out of the 276 included in this study have already been reported 23 ; Table 2 gives the data for the women included in this study (n = 264). Supplementary Tables S3 and S4 give sociodemographic variable distribution, risk factors and cytopathology result concerning the three outcomes considered (clearance, persistence and redetection).
Evaluating time-related redetection percentage revealed that most viral types (HPV-18, -33, -45 and -58) became positive again after 5 months' non-detection (Fig. 1). There was a lower percentage of redetection of positive cases at the end of follow-up for HPV-16 (6.95%) and HPV-31 (9.64%) compared to a higher non-redetection percentage for HPV-58 (66.8%) and HPV-33 (51.82%) at the end of follow-up (Fig. 1). The highest clearance rate was observed for HPV-33 (6.99) whilst HPV-18 (6.27) and HPV-31 (6.31) had the highest redetection rates (Table 2).  Table 2. Clearance and redetection rates for the 6 HR-HPV types. HPV: human papillomavirus, 95%CI: 95% confidence interval. Rates per 100 people/month. a Percentage clearance and redetection events identified in the total amount of infections and those proving negative during follow-up. n/a: not applicable, right-censored data.  www.nature.com/scientificreports www.nature.com/scientificreports/ Identifying HLA alleles and haplotypes related to HR-HPV infection's clinical course. Calculating multicollinearity between the variables (evaluated by variance inflation factor (VIF) and tolerance) revealed no collinearity between the variables included in the model. Cox multivariate and log-normal models were thus adjusted for identifying alleles and haplotypes related to infection clearance/persistence (Supplementary Tables S5 to S8) and redetection (Supplementary Tables S9 to S11) for each HPV type evaluated.
Alleles/haplotypes having associations favouring clearance/redetection were found (considered in the model as having greater and earlier probability of occurrence) as well as hindering associations (considered as lower probability and later occurrence).
Sixty-three DRB1 and nine DQB1 allele associations with HR-HPV infection clearance were found (Supplementary Tables S5 and S6) of which 47 for DRB1 (represented by 20 alleles) and 6 for DQB1 (represented by 3 alleles) continued being significant after p values (p ≤ 0.001) had been corrected ( Fig. 2 and Supplementary Table S12). Fifteen associations were identified favouring clearance (greater probability or earlier occurrence) and 38 hindering it (lower probability or later occurrence) (Fig. 2).
The HPV-33 model could not be adjusted for infection redetection due to the few (n = 9) events found for this viral type in the study population (Supplementary Table S9); 88 associations were found for the remaining viral types (Supplementary Table S9 and S13). Forty-five associations continued being significant for DRB1 (23 alleles) and five for DQB1 (3 alleles) after the p values had been corrected (p ≤ 0.001), 17 of these associations favouring infection redetection and 33 hindering it ( Fig. 2 and Supplementary Table S13).
Some alleles and haplotypes were found to be related to a single HR-HPV type and others were associated with more than just one HR-HPV type. Concerning the latter, allele/haplotype associations were found which were www.nature.com/scientificreports www.nature.com/scientificreports/ consistent amongst the different HPV types (i.e. in the same sense, favouring or hindering a particular event), e.g. DRB1* 04:11:01 favouring HPV-31, -45 and -58 redetection. Different associations were found depending on HR-HPV type (favouring and hindering a particular event), e.g. DRB1*03:02:01 was associated with favouring HPV-31 and -33 clearance, whilst HPV-16, -45 and -58 hindered clearance (Fig. 2).
Allele/haplotype associations were found having the same sense as the event (clearance and/or redetection) for the same HPV type, i.e. an allele/haplotype was associated with hindering clearance of an HPV type and also associated with a greater probability of redetection of the same viral type. For example, DRB1*11:02:01 was associated with a lower probability of HPV-31 clearance and earlier redetection of the same viral type. DRB1*03:02:01-DQB1*04:02:01 was associated with earlier HPV-31 clearance and its later redetection (Figs. 2 and 3).

Discussion
Differences were seen throughout follow-up regarding infection patterns for all 6 HR-HPV types, this being consistent with previous studies [23][24][25] . However, differences amongst studies have been reported, mainly due to variations concerning persistence, clearance and redetection 26 , inclusion of prevalent and incident infection patterns 24 and target population characteristics (host risk factors).
This study's findings regarding infection clearance correlated with those of a prior cohort study carried out with part of the population in this study 23 . Differences concerning redetection rates were seen to depend on the HR-HPV type; the percentages and rates reported for every viral type were similar to those reported previously 27,28 . However, redetection events could not be differentiated between new infection and/or latent infection due to study design 29 .
Previous studies evaluating HPV generic redetection have reported cumulative percentages of up to 23.9% regarding new viral identification 29,30 . Cumulative post-clearance redetection in this study, including all types of HR-HPV analysed, was 7.42% and cumulative redetection following non-detection was 22.10% (Table 2). However, comparing redetection and clearance rates amongst studies is difficult regarding the different designs and definitions used regarding follow-up duration and the type of cohort 28,31 .  www.nature.com/scientificreports www.nature.com/scientificreports/ Significant associations between HLA class II alleles/haplotypes and outcomes regarding infection (clearance/persistence, redetection) were found to be positive (greater probability or earlier occurrence of an event) or negative (lower probability or later occurrence of an event) (Figs. 2 and 3). The alleles/haplotypes favourably associated with viral clearance and hindering redetection could have been related to a lower risk of CC (given the lower risk of infection and viral persistence), whilst associations hindering clearance and favouring redetection  www.nature.com/scientificreports www.nature.com/scientificreports/ could have been related to a greater risk of CC (greater risk of infection and persistence). Previous cohort results (regarding just infection by HPV-16 and -18) would seem to support such inferences and the results presented here (Supplementary Tables S12 to S15) [19][20][21][22] .
An immune response against HPV plays an essential role in determining such infections' clinical course and the natural history of CC. Women having alleles/haplotypes negatively associated with clearance in this study would probably have had a lower viral peptide presentation for activating an immune response, thus favouring persistence and thereby increasing the risk of developing CC 32,33 . Immunoinformatics led to identifying viral peptides which could be considered factors favouring viral persistence in a host since they were related to a lower probability/later occurrence of a clearance event and the greater probability/earlier occurrence of redetection (Tables 3 and 4).
Alleles/haplotypes favouring redetection (greater probability or earlier occurrence), could have favoured HPV replication due to lower antigen presentation capability favouring the replication of such latent infections in a host, so that only when the amount of copies exceeded the detection threshold could they have been identified and diagnosed 29 . Epitope prediction thus contributes to identifying key host factors involved in the response to infection and could therefore be considered for designing therapeutic tools for HPV infection control (Tables 3 and 4).
A joint effect was observed in this population which could be considered as the average of each allele's individual effect or as a secondary effect regarding possible allele interaction 17,22 . The forgoing considers that some haplotypes were found having consistent effects with that found for the alleles constituting them, haplotypes with alleles not associated independently and haplotypes formed by an associated allele and another non-associated allele.
Future studies should consider additional variables such as changes regarding the amount of sexual partners through follow-up and changes in viral load, thereby broadening understanding of outcomes and supporting the conclusions, mainly regarding this study's redetection events. Although biological assays are required to support the bioinformatics findings described here, this has been the first study predicting peptides which could favour or hinder viral persistence (in viral types different to HPV-16 and -18) and demonstrating their potential usefulness as therapeutic anti-HPV peptide vaccines.
This study has thus reported, for the first time, alleles and haplotypes (typed by NGS) associated with clearance/persistence and redetection events, as well as L1 and L2 epitopes from the six most frequently occurring HPV types which are responsible for around 85% of CC cases. The information obtained through this study provides relevant knowledge for understanding the genetic component in the immune response against HR-HPV types and the natural history of CC. The associations described here enable constructing the bases for future studies aimed at evaluating the impact and effectiveness of anti-HPV vaccines and treatments (current and future), considering each population's genetic particularities.

Materials and Methods
Study population. The study's target population consisted of a cohort of women who had been participating in a Fundación Instituto de Inmunología de Colombia (FIDIC) multicentre study evaluating 6 HR-HPV types' persistence, clearance and reinfection 23 . The women had been attending hospitals in the Colombian cities of Bogotá, Girardot and Chaparral between January 2007 and March 2010; they voluntarily participated in the study and declared that they had not changed their place of residence in at least two years after the start of the study. Detailed information regarding the study population and the procedures related to the 6 HR-HPV types' detection and quantification have been described in previous publications 5,23 .
Women having had at least three follow-ups (6 ± 3-month intervals) and real-time PCR viral identification results were included in the study. Women whose cervical samples did not have the minimum amount of DNA and/or required quality for HLA-DRB1 and DQB typing were excluded from the analysis ( Supplementary  Fig. S2A). All the methods were performed in accordance with the Helsinki declaration and the Colombian Ministry of Health and Social Protection guidelines, as approved by the Universidad Nacional de Colombia's Faculty of Medicine's Research Ethics Committee (resolution 004-067-18). All the women who agreed to participate in the study signed an informed consent form. www.nature.com/scientificreports www.nature.com/scientificreports/ following previously described procedures 23,37,38 . The human β-globin gene was amplified by PCR for evaluating DNA integrity, using GH20/PC04 primers. This was followed by using two sets of generic primers targeting a region of the L1 gene for the generic detection of HPV (GP5 +/GP6 + and MY09/MY11) (Supplementary Table S20) 37,38 .

DNA extraction.
Samples proving positive by at least one generic primer set were typed using primers targeting a region located on the E5, E6 and E7 genes; these were specific for HR-HPV-16, -18, -31, -33, -45 or -58 (Supplementary  Table S20). All amplification products were visualised on 2% agarose gels. Synthetic genes from early HPV-18, -31, -45 and -58 regions of interest and samples proving positive and confirmed by Sanger sequencing for HPV-16 and -33 were used as positive controls. DNA-free water was used as negative control 37,38 .
Real-time PCR detection was used for determining the amount of viral copies and the CFX96 Touch qPCR detection system was used for analysis; the primers and TaqMan probes used here are described in Supplementary  Table S21. 1:10 (10 11 -10 6 ) serial dilutions were obtained from plasmid DNA (known concentration) from each viral type and the HMBS gene for making the calibration curve. The human HMBS gene was used for validating DNA integrity and determining the amount of viral copies per cell. The samples were analysed for the aforementioned six HR-HPV types, involving absolute (total HPV copies) and relative quantification (HPV copies /cell = HPV copies /(HMBS/2 copies)) of each type's load. Six dilutions of plasmid DNA were included for each type and included as controls for HMBS in each analysis along with a negative NTC control (no template control) 5,23 .
HLA typing. Illumina MiSeq (San Diego, CA, USA; Histogenetics, Ossining, NY, USA) was used for typing HLA-DRB1 and DQB1 l molecules from exons 2 and 3 from every loci (3x resolution) from good quality gDNA samples 39,40 . The IPD-IMGT/HLA database (https://www.ebi.ac.uk/ipd/imgt/hla), published in January 2018 (3.31.0), was used for assigning alleles. WHO Nomenclature Committee for Factors of the HLA System guidelines were followed when reporting alleles, using National Marrow Donor Program (NMDP) codes and G codes for ambiguous alleles (i.e. those having an identical nucleotide sequence at the antigen recognition site in exon 2) 41 .

Statistical analysis.
Frequencies and percentages were used for qualitative variables and measures of central tendency (median and mean) for quantitative ones, along with their measures of dispersion (interquartile ranges (IQR) and standard deviation (SD).
An expectation-maximisation (EM) algorithm was used for obtaining HLA-DRB1 and DQB1 allele and haplotype frequencies. Persistence was considered as being the detection of the same type of HPV in at least two consecutive visits, whilst clearance was determined as no viral detection in two consecutive samples after a positive detection when analysing infection by each of the 6 HPV types (-16, -18, -31, -33, -45 and -58). Redetection was taken as being positive for the same viral type following its prior non-detection (regardless of infection clearance) (Supplementary Fig. S2B) 23,42,43 . As some women had had more than one positive HPV-related event during follow-up, specific HR-HPV type infections were taken as the unit of analysis (i.e. not the women).
Cox-proportional hazard and accelerated failure time (AFT) models were adjusted when the assumption of proportionality was not met to evaluate allele and HLA-DRB1 and DQB1 relationship with type-specific infection (clearance/redetection) outcome 44 . Multicollinearity between the variables included in the models was also evaluated by VIF and tolerance.
Every AFT models' goodness of fit was evaluated in line with Akaike information criterion (AIC) and Bayesian information criterion (BIC), selecting the model having the best fit (lowest AIC and BIC values) 45 . Hazard ratios (HR) and time ratios (TR) were reported, depending on the model used 46,47 . Independent models were run for each HPV type. TR values less than 1 denoted an earlier occurrence of an event, whilst values greater than 1 indicated a later occurrence of such event 46,47 .
A p < 0.200 value in univariate analysis was taken when selecting the independent variables for adjusting the models and the change in the crude estimator when added to the models. Origin, age, the amount of sexual partners, a background of abortion, coinfection (i.e. infection by 2 or more HR-HPV types) and absolute viral load (categorised as low: ≤9.99E + 5, middle: from 1.00E + 6 to 9.99E + 9 and high: ≥1.00E + 10 23 ) were taken as independent variables.
The Bonferroni method was used for correcting each model's raw p values, considering the multiple alleles and haplotypes identified in the study population and that there was no a priori hypothesis concerning the associated alleles/haplotypes 20,48 . Two-tailed tests were used for hypotheses testing (0.05 significance) and STATA14 software was used for the aforementioned analysis.
Predicting epitopes for HLA-II alleles. The Immune Epitope Database (IEDB) was used for predicting binding peptides (20 aa long) for every HLA-II allele 49 , using the Technical University of Denmark's Systems Biology Department's Center for Biological Sequence Analysis' NetMHCIIpan 3.2 server prediction method 50 . Peptides predicted to have a strong binding threshold (<5% Rank) were sought manually by aligning each HR-HPV type's complete L1 or L2 capsid protein reference sequences (GenBank codes: HPV16-L1: ANA05539.

Data availability
The datasets produced and/or analysed during this study are available from the corresponding author on reasonable request.