Molecular epidemiology and HIV-1 variant evolution in Poland between 2015 and 2019

The occurrence of HIV-1 subtypes differs worldwide and within Europe, with non-B variants mainly found across different exposure groups. In this study, we investigated the distribution and temporal trends in HIV-1 subtype variability across Poland between 2015 and 2019. Sequences of the pol gene fragment from 2518 individuals were used for the analysis of subtype prevalence. Subtype B was dominant (n = 2163, 85.90%). The proportion of subtype B-infected individuals decreased significantly, from 89.3% in 2015 to 80.3% in 2019. This was related to the increasing number of subtype A infections. In 355 (14.10%) sequences, non-B variants were identified. In 65 (2.58%) samples, recombinant forms (RFs) were noted. Unique recombinant forms (URFs) were found in 30 (1.19%) sequences. Three A/B recombinant clusters were identified of which two were A6/B mosaic viruses not previously described. Non-B clades were significantly more common among females (n = 81, 22.8%, p = 0.001) and heterosexually infected individuals (n = 45, 32.4%, p = 0.0031). The predominance of subtype B is evident, but the variability of HIV-1 in Poland is notable. Almost half of RFs (n = 65, 2.58%) was comprised of URFs (n = 30, 1.19%); thus those forms were common in the analyzed population. Hence, molecular surveillance of identified variants ensures recognition of HIV-1 evolution in Poland.

Multivariate logistic regression analysis of the whole dataset with inclusion of gender, transmission route, viral load at diagnosis, CD4 + T cell counts at baseline, nadir CD4 + T cell counts, age at diagnosis, and HIV infection stage, indicated that infection with HIV-1 non-B clades was independently associated with female gender ( Fig. 8). This was related to the increasing number of subtype A infections (from 5.6% in 2015 to 13.4% in 2019 (OR: 1.26, 95% CI 1.14-1.40, p < 0,0001 ). Interestingly, subtype D frequency has dropped from 1.9 to 0.3% (OR: 0.63, 95% CI 0.43-0.87, p = 0.008) in the analyzed years.
Additionally, the proportion of infections other than the B variants increased in asymptomatic cases from 12.1% in 2015 to 29% in 2018 and in AIDS patients from 6.5% in 2015 to 28.6% in 2018 (OR: 1.33, 95% CI 1.04-1.70, p < 0.05 and OR: 1.93, 95% CI 1.18-3.29, p < 0.01, respectively). In nearly half of the analyzed cases, data on the clinical category at diagnosis was missing. Accordingly, between 2015 and 2019, occurred a simultaneous increase of non-B infections occured in AIDS individuals and asymptomatic patients.
Also, among non-B variant infected individuals, the frequency of cases with lymphocyte CD4 count exceeding 200 cells/μl increased over time (from 12.5% in 2015 to 30.4% in 2018). The proportion of MSM was also significantly increased from 8.6 to 37.1% (OR: 1.53, 95% CI 1.20-1.95, p < 0.001. For the analysis of time trends of the last two parameters patients diagnosed in 2019 were excluded as their clinical data were not available.

Discussion
Our results showed that increasing A-clade incidence in Poland is driven mostly by the heterosexual transmission mode, age at diagnosis, and female gender. The decline in subtype B has been observed in Europe, but the spread of HIV-1 variants is different in each country. This study is the first to characterise the cohort of A6 sublineage on Poland's national scale. Together with evidence of novel A6/B URFs, this suggest considerable heterogeneity of HIV-1 in the polish population. Our results might support drug therapy interventions because infections with the A6 variant may bear resistance to cabotegravir, the long-acting antiviral soon to be implemented in the clinical practice 12 .
In this article, we delineated the distribution of HIV-1 variants in the period between 2015 and 2019 in a large group of diagnosed patients from Poland to reveal the countrywide molecular diversity of the virus. For the first time, based on phylogenetic studies, we report on the presence of the HIV-1 A6 sublineage in Poland. Subtype B remains the most common variant to date, shared by 85.9% of the surveyed population, and remains predominant among MSM. This observation is in line with our previous report on the period from 2008 to 2014, where 86.9% of recorded infections developed from HIV-1 B lineage 8 and other Polish HIV epidemic studies 7,13,14 . However, the proportion of non-B clade increased significantly between 2015 and 2019, primarily due to increased number of infections with sublineage A6. The decline in subtype B has been observed in Europe. It may be driven by the economic migration and refugee crisis and the increasing circulation of non-B subtypes among the native European population [15][16][17]

Dominant transmission rout, n (%)
• Intravenous drug use 4 (3,9) 0    [20][21][22] . In Poland, up to date, a small percentage of immigrants (< 5%) have been observed among the HIV-1 diagnosed individuals, however, the observed data suggest introduction and spread of novel variants in the country 23 . Molecular surveillance over A6 variants is of  www.nature.com/scientificreports/ importance due to possible emergence of resistance and virological failure after therapy containing the HIV-1 integrase strand transfer inhibitor cabotegravir 12,24 . In addition to A6 samples, ten samples of the A1 and one of the A3 variant were identified. The phylogeographic mixture of A sub-lineage variants has also been observed in Germany 22 and Italy 20 . Identification of three A/B recombination clusters is an important novel finding of this study. For the first time we identify two new clusters of A6B recombinant variants. Our former publication described one monophyletic www.nature.com/scientificreports/ group of A1B and a single sequence of A6B 8 . In the current analysis, A/B recombinants accounted for 5.9% of the non-B variants, while in the period 2008-2014, they were identified in 3.1% of sequences 8 . Thus, the frequency of mosaic sequences composed of subtypes A and B almost doubled over the analyzed years. Previously, we described one regional HIV center (Cracow) as the source of all clustered A1B sequences and the city of Bialystok as the origin of a single sample of the A6B mosaic form. In the present data, recombinant virus isolates were obtained from individuals followed up in seven cities, which are located across different provinces. Our analyses provide evidence that the described mosaic forms circulate in Poland and contribute to increasing genetic heterogeneity of non-B subtype strains. Although the sequences in A1B Cluster had similar recombination profile, the diversity of recombinant forms within the A6B Clusters was large. In A6B clusters, the number of recombination in the pol gene region vary from one up to four break events which reflect high recombination rate of HIV-1 variants. Since the subtype B and sub-subtype A6 are the most widespread in Poland, such a recombination was expected and might reflect the presence of a novel, yet unidentified CRF. Cluster sequences may likely implicate transmission networks. The analysis of phylogenetic clusters, together with epidemiological and demographic data, are important to understand the factors underlying the growth of epidemics. Others, have published that MSM are more likely to cluster compared to all other risk groups 25 . Indeed, all sequences in the A/B clusters derived only from male individuals and apparently indicate the presence of MSM transmission group. However, there was a shortage of information relative to transmission mode for those patients. Furthermore, gender is presumably related to the differences in HIV-1 clade distribution within exposure groups 26 . The lack of women in the A/B recombination clusters probably reflect the MSM and/ or IDU transmission mode. Both A6B clusters include A sequence regions with Eastern European ancestry. In the LANL-HIV CRF's database (https:// www. hiv. lanl. gov/ conte nt/ seque nce/ HIV/ CRFs/ CRFs. html), only one example of subtype B and A recombination (CRF03_AB) has been identified so far.
The differences in demographic characteristics across the analyzed samples are due to the increase in the female gender and dominance of the heterosexual transmission route among individuals infected with non-B clades. Our results indicate that the male-to-female ratio in the non-B clade is 3.30 compared to 6.43 in subtype B. In the past five years of study, HIV-1 infections were mostly driven through MSM-and IDU-associated clades and as a consequence, predominance of men and subtype B were observed. Many authors have published similar findings 25,26 . It should be noted that the increase in the proportion of non-B clades among females is connected with migration and travel related introduction of infections, especially in sublineage A6 viruses. The ratio of male-to-female correlates with the number of non-B subtypes because more males have been diagnosed with subtype B. The Silesian and Lower Silesian provinces had a low incidence of non-B subtypes infections result in a high (above five) male-to-female ratio. South of these regions, a similar proportion was noted in the Czech Republic, where the male-to-female ratio elevates the European average 23 . As observed in Podlaskie and West Pomerania lower male-to-female ratio (under three) was correlated with frequent A6 subtype occurrence.
Interestingly and importantly, a significantly higher lymphocyte count at baseline among non-B variant infected individuals is in contrast to our previous records 8 . This is also associated with the increasing frequency of subtype A. Individuals infected with those virus strains clearly undergo care entry at the earlier stages of infection, with less pronounced immunodeficiency. Additionally, female gender possesses better immunovirological parameters, which is higher CD4 + T cells following primary HIV-1 infection at the beginning of infection compared to males 27,28 .
Furthermore, among non-B subtype characteristics, notable time associated trends were observed. Firstly, the proportion of non-B clades among MSM is increasing. This is also reflected in the increasing frequency of males for non-B variants in the analyzed timeline, and may be associated with dense transmission networks and clustering of MSM 29 . Secondly, across HIV age at diagnosis, especially in young adults (under thirty and forty), non-B clade distribution is increased. Age at diagnosis is shifting towards younger individuals in non-B clades. This observation may correspond to the increasing trend in non-B variants frequency associated with MSM mode. Moreover, these findings could be related to less frequent testing of older heterosexual males.
Additionally, at the time of sampling differences in incidence among clinical symptoms in non-B variants were observed. Regardless of CDC clinical stage, approximately half of the data was missing in available clinical reports. Therefore, the representation of non-B clade asymptomatic and AIDS patients could mutually increase in the timeframe of the study. Patient clinical statuses at diagnosis vary due to prevention efforts and time of enrollment in care. The larger frequency of CD4 + cell count above 200 cells/μl at baseline and the diagnosis of patients at a younger age in the non-B clade may be associated with a higher proportion of registered asymptomatic patients. Moreover, a higher level of lymphocytes at HIV-1 diagnosis is noted in women than in men 28 . This point may explain the increase in non-B individual's frequency associated with higher lymphocyte count as females were notably more common among non-B subtypes.
On the other hand, we observed a growing prevalence of AIDS individuals within non-B variants. This result is in line with previous findings that in Poland, 44.8% of newly diagnosed patients are diagnosed late (LP) or in the AIDS stage. Factors associated with LP/AIDS are older age, IDU, and HTX 30 . Diagnosis several years after an infection led to declining levels of CD4 + cells at the time of sampling. Nevertheless, cell count below 200 per μl, when AIDS is diagnosed, remains in a quarter of newly registered patients 31 .
According to the "HIV/AIDS surveillance in Europe 2019" report, the highest age-specific HIV diagnosis values in both genders is among the 25-39 years old group. The results of our research meet these data. In the present study, the under thirty-years age group was the most prevalent and the under forty-years age group followed. Moreover, the total male to female ratio in the current studied population was 5.77, almost identical to the value in central Europe (5.6). Unfortunately, the above report did not include the subtype distribution 23 .
The principal limitation of the study was the method of sampling. Individuals submitted for the report were originated from HIV-1 Regional Centers for primary transmitted drug-resistance or treatment failure analysis and may not fully reflect the prevalence of infections in the overall population. Nevertheless, the data obtained are www.nature.com/scientificreports/ based on a significant number of cases. Samples collected for our study come from 10 out of 16 provinces inhabited by 65% of the population (https:// stat. gov. pl/ en). Furthermore, we did not have access to complete patient's documentation, so some parameters possessed fewer representations but still allowed for statistical modelling. The subtyping analyses performed in this study were based on only a fragment of the pol gene. Unfortunately, this makes detailed analysis of the sequence recombination profile impossible. Full HIV-1 sequencing would most likely allow to identify a higher number of recombinants and confirm the breakpoints in the identified ones. Furthermore, extensive characterization of the whole genome of the identified AB recombinants could provide valuable information about novel circulating variants. Subtype B remains the predominant HIV-1 variant in Poland. However, there was a significant increasing trend in the prevalence of non-B clades, mostly due to a rising number of sublineage A6 HIV-1 infections. Our results showed a high frequency of recombinations between these two most prevalent subtypes and a small number of other URFs. We have identified three separate A/B sequence clusters, and the formation of novel recombinants is evidence of increased HIV-1 heterogeneity. The comparison of subtypes and transmission groups with demographic parameters suggests that HIV-1 disease is highly diverse. Non-B variants are associated with heterosexual transmission, age at diagnosis, and female gender. The study will be further extended as the identification of the novel recombinants is part of molecular surveillance for HIV-1 and may influence drug susceptibility.

Methods
Study group. We used 2518 samples collected from patients who undergo genotypic drug resistance testing in 10 of 27 Regional AIDS Centers for this study (cities: Bialystok, Bydgoszcz, Cracow, Chorzow, Gdansk, Lodz, Poznan, Szczecin, Wroclaw, Zielona Gora). The dataset included sequences obtained from both naive (84.3%) and treatment-experienced patients (15.7%) (one sequence per individual; if multiple sequences were available the earliest one was included in the analysis). The time frame of sampling was from 2015 to 2019. Plasma samples were collected locally and shipped to the genotyping laboratory. The collected data included gender, CD4 + lymphocyte count, HIV viral load at sampling, age at diagnosis (age at first positive HIV confirmatory test), transmission route (self-defined by the patient) and WHO clinical stage at sampling. HIV-1 RNA isolation and sequencing of the reverse transcriptase (RT) and protease (PR) domains were carried out using the ViroSeq HIV-1 Genotyping System v 2.0 (Abbot Molecular, Des PlainesIL, USA). The integrase region was sequenced using the methodology described by Laethem et al. 32 . Amplicons obtained by the nested PCR method were used for Sanger sequencing using the BigDye technology on an ABI 3500 platform (Applied Biosystems, Foster City, CA, USA). Integrase sequence assembly was performed with the Recall online tool 33 . All samples were sequenced at the Clinical Laboratory at the Department of Infectious, Tropical Diseases and Immune Deficiency at Pomeranian Medical University in Szczecin, Poland. PR/RT sequences were available for all samples, with addition of the integrase coding region for 28 (1.1%) samples.
Ethics statement. The protocol of the study was approved by the Bioethical Committee of the Pomeranian Medical University, Szczecin, Poland, approval number (no. KB-0012/26/17 and KB-0012/08/12). All patients gave informed consent to the proceeding of the sample and clinical data processing to conduct this study. All samples in this study were de-identified to maintain participants anonymity. The research was conducted in accordance with the Declaration of Helsinki.
Subtyping and phylogenetic analyses. The 1302 nucleotide fragments of the pol gene were used for subtyping. This region includes the protease and part of the reverse transcriptase sequence (corresponding to HXB2 genome location positions from 2253 to 3554). When sub-lineage breakpoints were detected, we also performed sequencing of the integrase 866 bp-long fragments (covering sites 4230-5096 of the HXB2 genome). All sequences were initially assessed with REGAv3 (http:// www. bioaf rica. net/ typing-v3/ hiv), COMETv2 34 , SCUEAL 35 , and Stanford subtyping tools 36 and confirmed by phylogenetic analyses. For phylogenetic inference, sequences were aligned with Clustal Omega tool 37 . The GTR + I + G model with four gamma categories was selected as optimal for the analysed dataset using Modeltest-NG 0.1.3 software 38 . The calculated nucleotide frequencies using this model were as follows: freqA = 0.4373, freqC = 0.1556, freqG = 0.1840, freqT = 0.2228, gamma shape parameter = 0.6193, and p-inv = 0.2462. Reference sequences from LANL-HIV-1 compendium 2017 version were used for subtyping, and the dataset was further supplemented with sequences with high similarity (> 95%) from BLAST (Basic Local Alignment Search Tool) analysis. The DAMBE program was exploited for duplicates removal 39 . Phylogenetic trees were generated using PHYML v3.0 web-server (http:// www. atgc-montp ellier. fr/ phyml/) with the maximum likelihood (ML) method, Nearest Neighbor Interchange (NNI) type of tree rearrangement and using likelihood ratio test (aLRT) based on a Shimodaira-Hasegawa-like procedure 40 . First, the trees for subtype B and A were inferred, subsequently non-B and non-A variants as well as other recombinants were analyzed. Breakpoints were identified using two methods: the jumping-profile hidden Markov model (jpHMM) 41 and the Simplot v3.5.1 software using bootscanning with 300 bp window size and 20 bp increments, based on the NJ method using the 2-parameter Kimura model 42 . Mosaic viruses were confirmed by phylogenetic trees (PhyML online tool) containing breakpoint fragments with reference sequences to recognize parental subtypes. For alignment, fragments were treated as separate sequences with different bp length. Alignment was made to the reference sequences A-D, F-H, J, K. The length of the reference sequences was 2844 bp, which span continuously from the beginning of the PR gene to the end of the IN gene (2253-5096 bp of HXB2 genome) of the pol region. Thus, the phylogenetic analysis for the breakpoints fragments was independent of each other. Empty spaces in the analysed fragments were indicated as a "period" symbol. Such alignment serves to infer the phylogenetic tree with GTR + I + G model as described above. All trees have been prepared in the

Data availability
Submission of almost all regional cohorts to public databases would allow identification and risk violation of patient confidentiality. We have followed earlier practice 46 , and submitted a random sample of 10% of each regional HIV Centre sequence to GenBank. The sequences can be found under the appropriate IDs: MZ468643-MZ468894. Additionally, sequences for the identified AB recombinant clusters were deposited under following accession numbers: MZ671788-MZ671823.