Genetic divergence of HIV-1 B subtype in Italy over the years 2003–2016 and impact on CTL escape prevalence

HIV-1 is characterized by high genetic variability, with implications for spread, and immune-escape selection. Here, the genetic modification of HIV-1 B subtype over time was evaluated on 3,328 pol and 1,152 V3 sequences belonging to B subtype and collected from individuals diagnosed in Italy between 2003 and 2016. Sequences were analyzed for genetic-distance from consensus-B (Tajima-Nei), non-synonymous and synonymous rates (dN and dS), CTL escapes, and intra-host evolution over four time-spans (2003–2006, 2007–2009, 2010–2012, 2013–2016). Genetic-distance increased over time for both pol and V3 sequences (P < 0.0001 and 0.0003). Similar results were obtained for dN and dS. Entropy-value significantly increased at 16 pol and two V3 amino acid positions. Seven of them were CTL escape positions (protease: 71; reverse-transcriptase: 35, 162, 177, 202, 207, 211). Sequences with ≥3 CTL escapes increased from 36.1% in 2003–2006 to 54.0% in 2013–2016 (P < 0.0001), and showed better intra-host adaptation than those containing ≤2 CTL escapes (intra-host evolution: 3.0 × 10−3 [2.9 × 10−3–3.1 × 10−3] vs. 4.3 × 10−3 [4.0 × 10−3–5.0 × 10−3], P[LRT] < 0.0001[21.09]). These data provide evidence of still ongoing modifications, involving CTL escape mutations, in circulating HIV-1 B subtype in Italy. These modifications might affect the process of HIV-1 adaptation to the host, as suggested by the slow intra-host evolution characterizing viruses with a high number of CTL escapes.

To date, literature on HIV-1 evolutionary dynamics focused attention on the spread of drug resistance and on the potential that drug resistant mutations, together with compensatory mutations, had on increased fitness and virulence in patients never exposed to or starting cART [13][14][15] . Much has also been done in the field of CD8+ cytotoxic T-lymphocyte (CTL) related regions, their intra-host evolution and their relationship to clinical outcome [16][17][18] .
On the other hand, little is known regarding the extent of HIV-1 genetic modification over time, and regarding the role that CTL-mediated selection pressure has in inter-host HIV-1 evolution, even if some evidence has recently emerged 19,20 . Moreover, if and how fixation of CTL escapes may influence viral load and immunological parameters in the first phases of infection is still largely unknown.
Thus, in the present study, we investigated the extent of genetic divergence, CTL escape prevalence and intra-host evolution of the most prevalent HIV-1 subtype, the B form, in a cohort of HIV-1 infected patients diagnosed in Italy between 2003 and 2016.

Results
Study population. A total of 3,328 HIV-1 B subtype infected individuals with at least one polymerase (pol) sequence at HIV-1 diagnosis (range of weeks between HIV-1 diagnosis and pol sequencing: 0.8-5.0) were retrieved for this study.  (Table 1).

Role of epidemiological confounders in influencing HIV-1 B subtype variability and CTL escapes.
We next sought to investigate whether changes in risk factors, in patients' origin, and in sequences involved in transmission clusters (TCs) across time might interfere with HIV-1 B subtype increased variability. Overall results suggest that the HIV-1 B subtype increased variability and the spread of CTL escapes are not predominantly correlated with expanding clades over time.
We next investigated whether variation in the proportion of risk factors, and patients' origin across time might influence the HIV-1 B subtype increased variability. To do so, we restricted the analysis to a more homogenous population, composed of pol (N = 1,308) and V3 (N = 449) sequences belonging to MSM of Italian origin. The results confirmed the progressive and significant increase in genetic divergence from consensus B over time for   ).
Rates of HIV-1 intra-host evolution (95% confidence interval, CI) were consistently lower in pairs containing ≥3 CTL escapes than in those containing ≤2 CTL escapes ( . Consistent with these data, the selective   These findings might suggest a better HIV-1 intra-host adaptation for most recent viruses driving a higher number of CTL escapes.
Correlation between pol CTL escape mutants and viro-immunological parameters. By analysing a subgroup of 781 individuals characterized by recent infection or plasma viral load >10,000 copies/ml and CD4+T cells >500 cells/mm 3 , no significant correlations were found between CTL escapes and plasma viral load or CD4+T cell counts.
On the other hand, CD4/CD8 ratio significantly decreased in relation to the increased number of CTL escapes (from 1.64 ± 0.68 in patients infected by a virus with 0 CTL escapes to 0.69 ± 0.05 in patients infected by a virus with ≥5 CTL escapes, ρ = −0.100, P = 0.027) ( Table 5). The negative correlations between the number of CTL escapes in pol and lower CD4/CD8 ratio were confirmed also when only the recent infections were analysed (1.02 ± 0.18 for 0 CTL escapes vs. 0.74 ± 0.06 for ≥3 CTL escapes, ρ = −0.15, P = 0.067) ( Table 6).

Discussion
Our analysis based on HIV-1 B subtype pol (N = 3,328) and V3 (N = 1,152) sequences collected from a large number of HIV-1 newly diagnosed individuals in Italy in the time frame 2003-2016 indicates a progressive and significant increase in genetic and amino acid distance over time (with 18 amino acid positions characterized by entropy increase) for the most disseminated HIV-1 subtype in the world. This finding is also confirmed when only the subgroup of recently infected individuals is taken into account, confirming the higher variability of the new circulating viral strains belonging to HIV-1 B subtype compared with the older.
Despite the short time since its emergence (dated in 1944), HIV-1 B has generated successful epidemics in many countries in five continents, and the spread continues 21 . Due to the global extension of the HIV-1 B epidemic [22][23][24] , and the high evolutionary rates characterizing HIV 25 , a progressive diversity over time for this subtype has been reported both in North America 20    Of note, prevalence of CTL escapes significantly increased in the last time frame at five of these seven positions (PR: A71V; RT: S162ACQR, D177E, I202V, and R211K). Whereas entropy at positions 35 and 207 increased, V35I and Q207GE escapes did not change their prevalence concomitantly (V35I and Q207G/E prevalence: 18.8% and 19.1% in 2003-2006 vs. 16.8% and 19.4% in 2013-2016, P = 0.672 and P = 0.119, respectively). The reason for the entropy increase at these 2 positions may thus reside in other amino acid mutations not involved in CTL escapes, such as V35LTM and Q207HNADRK, whose prevalence increased from 7.9% and 8.1% in 2003-2006 to 13.4% and 10.7% in 2013-2016, respectively (P = 0.034 and P = 0.048). The role of these amino acid mutations is not clearly demonstrated, and thus further studies might be necessary to define their impact on HIV-1 B evolution and pathogenesis.
Notwithstanding the increase in sequence diversity observed over time in our dataset, we have to consider that HIV variation and immune escape process are also limited by structural and functional constraints 28,29 . In particular, the virus would have fewer viable options to generate escape variants in conserved protein regions because of the fitness cost required 30 . However, HIV overcomes this problem thanks to specific mechanisms that compensate for the fitness costs. For example, studies on gag polyprotein demonstrated the selection of compensatory mutations, which can restore the viral fitness. These mutations can occur both within 31 and outside the epitope, but nearby in the three-dimensional protein structure 32,33 .
In this respect, the increased prevalence of some amino acid variants in pol and their correlation in mutational pathways (such as for S162ACQR and D177E) may suggest a compensatory role played by these mutations, which can therefore explain their frequent occurrence, rather than by chance in the circulating viral population. Its effect may positively influence the fixation of a specific CTL escape at population level 34 . In line with this, the prevalence of sequences with ≥3CTL escapes progressively increased in both overall population and recent infections, reaching 54.0% and 56.2% in 2013-2016, respectively. This may represent an evolutionary advantage with consequences for disease progression, infectivity, transmissibility, and response to antiviral treatments 35 .
Moreover, the ability of HIV-1 immune escape mutations to accumulate rapidly could undermine host antiviral immunity [36][37][38] , and thus in turn have immunological consequences at population level. In this respect, our study found that the higher number of CTL escapes in pol frequently correlated with lower CD4/CD8 ratio (delta CD4/CD8 ratio = −0.95) at HIV-1 diagnosis, suggesting an increased pathogenic potential of these viral strains. However, these data need to be interpreted with caution, especially because of the lack of HLA information.
The impossibility to match CTL escape variants with HLA types of the host also represents a limitation for the estimation of the CTL escape prevalence. It is known that CTL escape prevalence may differ between HLA mismatched and HLA matched people 39,40 , and that the fixation of some HIV-1 polymorphisms at population level strongly depends on population characteristics 41 . To partially overcome the lack of HLA information, we have focused attention on those CTL escapes that significantly increased over time, such as the PR A71V and the RT D177E (related to HLA-B*1503), the RT S162ACQR (related to HLA-A*1101 -A*8601, -A*0301, -B*38 and B*0702), I202V (related to HLA-B4001), and R211K (related to HLA-A*2501). By matching these data with the prevalence of HLA in Italian population, we found that all the mentioned HLA are well represented in Italy. In particular, the HLA-A*1101, -A*2501 and -A*0301 represent 16% of the overall HLA-A locus, while the HLA-B*5101, -B*0702, -B*4001, and -B*03 16% of the overall HLA-B locus 42 . Thus, the spread of the previously mentioned CTL escape polymorphisms at the population level in Italy may be explained by the relative diffusion of matched HLA in the general Italian population.
However, the persistence of CTL escapes is not always dependent on matched HLA. Specific escape mutations persist upon transmission to HLA unmatched hosts, perhaps owing to a lack of fitness costs 43,44 . A recent mathematical model shows that the median time to escape in HLA matched individuals across the 26 epitopes considered is 8 years, and the reversion rates is a little slower (median time to reversion: 6.5 years) with no evidence of reversion in 56% of CTL epitopes, suggesting that CTL escapes are able to spread rapidly at population level independently of HLA 45 .
The role of CTL escapes in the process of HIV adaptation to the host was also confirmed by the estimation of the intra-host evolution. In particular, by analysing a subgroup of individuals with two pol sequences available, we found that the rate of intra-host evolution and the consequent selective pressure were significantly lower for most recent HIV-1 diagnoses carrying ≥3 CTL escapes than older diagnoses carrying fewer CTL escapes (3.2 × 10 −3 vs. 4.5 × 10 −3 ). In the context of the already known slow HIV-1 pol evolution 46 , the slower intra-host evolution found for most recent circulating viruses driving a higher number of CTL escapes may suggest that HIV-1 had improved its adaptation to the host in recent years compared with the past, even if the adaptation process is still occurring.
Besides CTL escapes, further attention should be given to neutralizing epitope escapes, of which the V3 loop is rich. In our study, we found that while in pol entropy values increased also within CTL epitopes (such as PR position 71 and the RT positions 35, 162, 177, 202, 207 and 211), in V3 higher mutation rates occurred almost exclusively within previously described neutralizing epitopes 47 , and at residue 19 already associated with R5 tropism 48 . We are conscious that our analysis is limited to a very short region of env, and that further studies taking into account a longer env fragment encompassing at least all the variable loops and conserved regions of gp120 may be necessary. Similar studies will be of extreme interest in defining the tracing of amino acid modifications in HIV-1 domains involved in vaccine strategies and co-receptor affinity.
Looking at potential epidemiological confounders able to influence HIV-1 B subtype variability and CTL escapes over time, we found that neither risk factor nor patients' origin, nor TCs strictly (and exclusively) impact HIV-1 B subtype variability over time. Doing these sub-studies yielded results mainly consistent with the overall analysis, suggesting that HIV-1 B subtype varied over time, regardless of these confounders.
Nevertheless, we cannot definitely exclude that the relevant epidemiological changes occurring in Italy in recent years, such as the migratory waves from low-middle income areas 49 , or the time-dependent influx of sequences from different risk groups might contribute (even if not predominantly) to HIV-1 B subtype variation in this country. For example, in our dataset individuals of "unknown" origin and "other or unknown" risk factors Finally, as the article is mainly focused on pol, the introduction of antiretroviral therapies merits attention (RT inhibitors were introduced in the late 1980s and PR inhibitors in the mid 1990s). Indeed, two out of the 18 positions characterized by entropy increase, the PR 33 and 71, are involved in drug resistance. The observed entropy increase was mainly due to the mutations L33F/I and A71VTIL, whose prevalence increased in the last two time frames ( Both L33F and A71VTIL are PI-selected accessory mutations, associated with a minimal or null reduced susceptibility to PIs 51,52 . Even if some of these mutations such as L33I and A71VT may also appear as natural polymorphisms or CTL escapes (such as A71V), we cannot exclude that their increased prevalence in the last two time frames may be an indicator of viral adaptation to PI drug exposure.
In conclusion, this study, based on HIV-1 B subtype sequences diagnosed in Italy, provides additional evidence that HIV-1 is still adapting to host, and that this adaptation may include CTL escape mutations. These findings can help in making accurate maps of escape pathways and in defining optimal induction CTLs strategies that may be effective in controlling HIV-1 replication, which may have implications for immunotherapeutic interventions. Moreover, this paper poses the bases for further studies involving viral evolution and viral adaptation focused on rapidly expanding epidemics such as those currently occurring in Africa and Asia, mainly driven by recombinant forms and non-B pure forms.

Materials and Methods
Study population. This study was conducted on 3,328 pol sequences (containing the full-length PR and the first 270 RT codons; nucleotides length: 1,107) and 1,152 V3 sequences (nucleotides length: 105; amino acid length: 35) obtained from HIV-1 B subtype newly diagnosed infected individuals (one sequence per patient) in Italy. All patients were naïve to cART and 210 of them were HIV-1 recently infected. Pol and V3 sequences were performed from plasma samples for routine clinical purposes from 2003 to 2016 and from 2007 to 2016, respectively, immediately after the diagnosis, and then collected in an anonymous database 53,54 . HIV-1 B subtype for pol and V3 sequences was firstly determined by Stanford DB (https://hivdb.stanford.edu/) and geno2pheno algorithm (https://coreceptor.geno2pheno.org/), respectively, and further confirmed by phylogenetic analysis based on pol, as described previously 55 .
Recent infections were determined by: (i) clinical/laboratory signs of primary HIV infection (HIV-1 plasma viral load levels >10,000 copies/mL and negative or indeterminate HIV-1 antibody test); (ii) a documented negative HIV-1 test performed within the 6 months prior to the HIV-1 diagnosis; and (iii) an antibody avidity index ≤0.80 (test performed only in clinically AIDS free individuals) 56 To identify potential patterns of pairwise correlations between specific CTL escape mutations, a binomial correlation coefficient (phi) and its statistical significance for each pair of mutations were calculated. Average linkage hierarchical agglomerative cluster analysis was performed to investigate potential evolutionary pathways among CTL escape mutations 60  CTL escapes were divided into 6 pre-defined categories (0, 1, 2, 3, 4, and ≥5 CTL escapes) for the overall population and into 4 pre-defined categories (0, 1, 2, and ≥3 CTL escapes) for recently infected individuals.
Genetic-distance, dN and dS values, and number of CTL escapes were also evaluated in 1) the subgroup of 1,308 pol and 449 V3 sequences belonging to MSM of Italian origin, 2) the subgroup of 1,119 pol sequences involved in TCs compared with the subgroup of 2,209 pol sequences out of TCs for the purpose of investigating whether risk factors, patients' origin, and sequences in TCs across time might interfere with obtained results. Potential TCs (defined as clusters of ≥3 pol sequences) were identified by HIV-TRACE using a Tamura-Nei model, and a genetic distance cut-off ≤0.015 63 .
Kruskal-Wallis and Chi-squared test for trend were used to estimate significant changes over time periods, while Pearson's Chi-squared test and Mann Whitney test were used to estimate significant changes between two different time periods.
Spearman correlation and Kruskal-Wallis test were also used to estimate significant correlations between viro-immunological parameters and number of CTL escapes. In order to ensure, wherever possible, exclusion of individuals with a late diagnosed chronic infection, these analyses were restricted to a subgroup of 781 individuals including the documented recent infections (N = 210) and individuals characterized by plasma viral load >10,000 copies/ml and CD4+T cells >500 cells/mm 3 (N = 571). Viral load and CD4+T cells cut-offs were chosen in accordance with Pantazis N. et al. Lancet HIV 2014 64 .
Again, only P ≤ 0.05 after Benjamini-Hochberg correction 62 were considered statistically significant. All the analyses were performed using the R open source statistical environment (version 3.3.1) and SPSS (version 23) for Windows (SPSS Inc., Chicago, Illinois).

Evolutionary rate estimation.
To determine if year of diagnosis and number of CTL escapes may affect the intra-patient evolutionary rate of HIV-1, a subgroup of 153 patients with two pol sequences available before therapy initiation (time-window between sequences within 70 weeks) were analyzed by using HyPhy pol evolution package 65 . All sequences from each patient were arranged chronologically and the probability of evolving from the oldest to the newest sequence was estimated using maximum likelihood method. In particular, the pol sequences of individuals diagnosed after 2009 and with ≥3 CTL escapes were compared with pol sequences of individuals diagnosed before 2009 and with ≤2 CTL escapes. Branch lengths between the two sequences were assigned based on the difference between sampling dates. For each individual, we determined the nucleotide substitution rate by using a General Time Reversible (6 substitution rates, empirical frequencies) model, assuming strict molecular clock. Significance was assessed using the likelihood ratio test (LRT) with N-2 degrees of freedom. The evolutionary rate per site per year within host were tested for equality using the likelihood-ratio test (LRT) 66 . Sequence data are available in the Supplementary Dataset 1.