Introduction

Human immunodeficiency virus type 1 (HIV-1) has caused a pandemic due to a significant genetic variation produced by high replication rate (viral doubling time is 0.65 days)1 and lack of the reverse transcriptase (RT) proofreading activity, which generates a considerable error rate (4.1 ± 1.7) × 10−3 per base per cell2. Four phylogenetically distinct groups (O, P, N, M) have been identified so far3, and the predominant group M (main) has evolved into nine subtypes (A–D, F–H, J, K), of which subtypes A and F have been further subdivided into sub-lineages (A1–A4, A6, and F1, F2)4.

Furthermore, inter-subtype recombination events within a single infected individual result in the emergence of novel variants such as circulating recombinant forms (CRFs) and unique recombinant forms (URFs). A specific variant of new circulating viruses is defined when a monophyletic clade of the same inter-subtype structure is found among at least three epidemiologically separated individuals. While URF is observed less frequently with no evidence of onward transmission in the population. To date, based on the LANL-HIV (Los Alamos National Laboratory) database, 104 CFRs have been reported. The most recent, found in China, is composed of the CRF01 and CRF07 types, indicating secondary CRF formation—CRF104_01075.

The most common variant of HIV-1 observed worldwide remains subtype C (46.6% of the infected global population) followed by subtype B (12.1%), and subtype A (10.3%). In western and central Europe, the most ubiquitous are subtypes B (83.3%), C (3.9%), and A (1.9%)6. Subtype B vs. A proportion is inverse in eastern Europe (A—52.8% vs. B—17.4%).

From the first recorded case in 1985 until the end of 2019, there were 25,544 people diagnosed with HIV-1 in Poland. Infection prevalence is stable at 0.1% of the population. In recent years, men aged 30–39 years predominate among infected individuals, with the primary route of transmissions being men who have sex with men (MSM)7. Around 1200 new cases are recorded annually in the country, with a slow increasing trend. In Poland, thus far, subtype B has been the most prevalent (86.9%) followed by subtype A (5.2%)8. There are also non-B monophyletic clusters containing a range of divergent variants and recombinants9.

Genetic variability of HIV-1, reflected in subtypes and recombinant forms, impacts the rate of progression, virus tropism, and patterns of drug resistance10. Therefore, reconnoitering subtype diversity across populations provides valuable information about virus spread11. In this study, we aimed to present molecular surveillance data on HIV variant evolution in the recent years, with the inclusion of the transmission route and the clinical characteristics of non-B clades. For the first time, we include sub-lineages of variant A in the Polish epidemic nomenclature. Moreover, we performed a detailed recombination breakpoints analysis to reveal unique recombinant clusters.

Results

Prevalence of HIV-1 subtypes and recombinants

In the study group, subtype B was dominant (n = 2163, 85.90%). In 355 (14.10%) sequences, non-B subtypes (A6, n = 218, 8.66%; D, n = 27, 1.07%; C, n = 26, 1.03%; A1, n = 10, 0.40%; G, n = 6, 0.24%; F1, n = 2, 0.08%; and A3, n = 1, 0,04%) were identified (Fig. 1). As the online subtyping tools did not allow proper identification of subtype A sub-lineages, a tree with inclusion of the reference and regional sub-subtype A sequences was inferred (Fig. 2). Phylogenetic tree indicated that most of the analyzed A6 variants cluster within local reference sequences from Poland, Ukraine, Czech Republic, and Russia. In 65 samples (2.58%), recombinant forms were noted, including: CRF02_AG (n = 13, 0.52%), CRF01_AE (n = 8, 0.32%), for both CRF03_AB and CRF12_BF (n = 3, 0.12%), and CRF60_BC (n = 2, 0.08%). A single case (0.04%) was observed for each of CRF06_cpx, CRF07_BC, CRF42_BC, CRF47_BF, CRF53_01B, and CRF56_cpx. A phylogenetic tree containing all non-B, non-A viruses, with exception of the URF variants is shown in Supplementary Figure S1. URFs were found in thirty (1.19%) sequences, of which 22 (0.87%) were A/B recombinants (Fig. 3). Four of the remaining sequences were B/F1 recombinants and the last four were composed of subtypes CRF_02AG/H/G; A1/D/B; F2/CRF02_AG; CRF11_cpx/CRF02_AG (Supplementary Figure S2).

Figure 1
figure 1

Distribution of HIV-1 variants in Poland between 2015–2019 according to phylogenetic and recombination analyses. RFs—(recombinant forms) comprise of CRF—(circulating recombinant forms) and URFs—(unique recombinant forms).

Figure 2
figure 2

Phylogenetic tree inferred using the maximum likelihood method with HIV references (only Subtype A n = 15, CRF01_AE n = 10, and CRF02_AG n = 10) from HIV sequence compendium 2017, supplemented with HIV-1 sequence database-deposited sequences for subtype A (n = 71) with 229 sequences (of partial 1302 bp HIV-1 pol gene) for A-clade that were found in the studied population. Branches containing CRF01_AE and CRF02AG clade have been collapsed. The tree was rooted with group O (accession no KY953205), however, this root was removed from the final figure. The Figure was made using iTol (43).

Figure 3
figure 3

Phylogenetic tree of partial HIV-1 pol sequences (1302 bp) inferred using ML method with HIV-1 reference from HIV sequence compendium 2017 (all A and B subtypes sequences and CRFs with at least one fragment of either A or B variant), supplemented with HIV-1 sequence database-deposited sequences for subtype A and B recombinants with 21 unique recombinant forms found in the studied group. White highlights indicate A/B recombinant sequences. The black triangles mark the URF sequences found in our previous publication (from 2008–2014). Branches containing the same HIV-1 clade have been collapsed. The tree was rooted with group O (accession no KY953205), however, this root was removed from the final figure. The Figure was made using iTol (43).

Distribution of subtypes over the analyzed regions

The incidence of non-B subtypes between 2015 and 2019, varied from 5.88% in the mid-northern region (Kuyavian-Pomeranian) to 25.75% in the West Pomerania. In six of the ten investigated regions, non-B clades had significantly higher proportions compared to the entire studied sample (Fig. 4 and Supplementary Table 1). In north-western Poland, the increase in non-B variants (OR: 2.42, 95% CI 1.82–3.23, p < 0.001) was mainly due to the increased representation of subtype A (OR: 1.96, 95% CI 1.34–2.78, p < 0.001) and subtype D (OR: 27.9, 95% CI 11.1–69.6, p < 0.001). A similar situation occurred in three provinces, where the lower distribution of subtype B was driven by a significant share of subtype A infections compared to the whole analyzed population. Firstly, Podlaskie with 78.3% for subtype B (OR: 0.58, 0.35–0.95, p < 0.05) and 15.5% for subtype A (OR: 1.88, 95% CI 1.07–3.33, p < 0.05). Secondly, Lesser Poland with 80.9% for subtype B (OR: 0.63, 95% CI 0.48–0.83, p < 0.001), and 13,5% for subtype A (OR: 1.76, 95% CI 1.28–2.40, p < 0.001). Lastly, the increment of subtype A (12.3%) (OR: 1.62, 95% CI 1.12–2.35, p < 0.05) in Pomerania is associated with decreased occurrence of subtype B (82.3%) (OR: 0.73, 95% CI 0.53–0.99, p < 0.05) below the national calculated value. In contrast, in south-western Poland, there were two regions with a notable high prevalence of B clade versus the total studied population. Upper Silesia with only 3.9% subtype A contribution (OR: 0.34, 95% CI 0.22–0.54, p < 0.001) and high share (93.8%) of subtype B infections (OR: 2.96, 95% CI 2.06–4.25, p < 0.001). Moreover, Lower Silesia possessed a large proportion of subtype B (90.9%) (OR: 1.81, 95% CI 1.31–2.49, p < 0.001) contrary to subtype A individuals (5.3%) (OR: 0.50, (95% CI 0.33–0.76), p < 0.001).

Figure 4
figure 4

Distribution of subtype B, subtype A and all non-B, and non-A variants across analysed provinces.

Characterization of the A/B clusters

Identified monophyletic clusters

Phylogenetic analysis revealed that the identified A/B variants formed three monophyletic clusters. These three clusters contained sequences obtained from males of Polish citizenship and Caucasian ethnicity but there was no information available on epidemiological relationships between the patients. The A6/B sequence Cluster-I covered three sequences identified in the city of Poznan, and two in the centers of Wroclaw and Zielona Gora. The A6/B Cluster-II contained nine isolates from five distinct provinces (Wroclaw = 3, Lodz = 2, Cracow = 2, Chorzow = 1, and Bialystok = 1). The A1/B Cluster contained six sequences obtained from the patients followed up in the city of Chorzow and one from Cracow. Furthermore, one unclustered A6/B URF sequence (no 258-19WR) was found. A BLAST search analysis indicated two different pol regions with high homology to sequences within the A6/B Cluster-II and four to sequences within the A1/B Cluster. The sequences for PR, RT, and IN (consisting of 2168 bp length) were used in the bootscaning analysis for unveiling the recombination profiles of all three AB Clusters.

Cluster I

Phylogenetic and bootscanning analyses of fragments from the A6/B Cluster-I showed a complex breakpoint profile (Fig. 5; Supplementary Figure S3 and Figure S4). In two sequences (202-17PN and 305-17WR) one split region was observed (subtype B to A6 breaks occurred between 2595 to 2774 bp positions of HXB2 genome) in the PR_RT region, but the entire IN sequence was identified as of the A6 lineage. In one sequence (83-20PN) two breakpoints were observed: the first one in the PR_RT region (subtype B to A6 ranged from 2498 to 2521 bp) and the second in IN sequence (subtype A6 to B break between 4788–4800 bp). In the last two sequences five subregions were identified corresponding to four break events. The following genome fragments and subtypes were identified: for the 106-20PN isolate, subtype B included the fragments of 2253–2533, 3012–3554, and 4786–5096 bp whereas subtype A6 included the fragments 2534–3011 and 4230–4785 bp; for the 133-19ZG isolate, subtype B included the fragments 2253–2512, 4230–4427, and 4795–5096 bp whereas subtype A6 included the fragments 2513–3554 and 4428–4794 bp.

Figure 5
figure 5

Recombination pattern of the five partial HIV-1 pol sequences from the identified A6B Cluster 1. Analysed region include the protease with a fragment of the reverse transcriptase gene (2253–3554 bp) and the integrase gene (4230–5096 bp). Recombination breakpoints (according to HXB2 genome positions) were obtained using Simplot v3.5.1 software and jumping-profile hidden Markov model (jpHMM). The blue color stands for HIV-1 subtype B, red color stands for sub-subtype A6. Component subtype regions are labeled from 1 up to 5 on the genome map, corresponding with numbered on phylogenetic trees in Supplementary Figure S3. The mosaic map was generated using the Recombinant HIV-1 Drawing Tool (https://www.hiv.lanl.gov/content/sequence/DRAW_CRF/recom_mapper.html).

Cluster II

There were no breakpoints within the IN region among the analyzed HIV-1 sequences that formed the A6/B Cluster-II (Fig. 6; Supplementary Figure S5 and Fig. S6). Two sequences (166-15BI and 323-16KR) had a single breakpoint in PR_RT (subtype B to A6 over 3288–3303 bp) and were of subtype B for the IN region. In these cases, the breaks must have occurred in a 677 bp fragment of the pol gene that we had not sequenced. In the remaining isolates (249-19WR, 328-16KR, 33-17L, 604-16WR, 30-17L, 200-18CH, and 102-19WR) one break occurred in the PR_RT sequence (subtype B to A6 break from 2598 to 3221 bp). The IN region for five of these variants (249-19WR, 328-16KR, 33-17L, 604-16WR) were of the A6 sublineage, in two cases (249-19WR and 33-17L) this fragment was not sequenced.

Figure 6
figure 6

Recombination pattern of the nine partial HIV-1 pol sequences from the identified A6B Cluster II. Analysed region include the protease with a fragment of the reverse transcriptase gene (2253–3554 bp) and the integrase gene (4230–5096 bp). In two cases (33-17L, 249-19WR) IN gene was not sequenced. Recombination breakpoints (according to HXB2 genome positions) were obtained using Simplot v3.5.1 software and jumping-profile hidden Markov model (jpHMM). The blue color stands for HIV-1 subtype B, red color stands for sub-subtype A6. Component subtype regions are labeled from 1 up to 3 on the genome map, corresponding with numbered on phylogenetic trees in Supplementary Figure S5. The mosaic map was generated using the Recombinant HIV-1 Drawing Tool (https://www.hiv.lanl.gov/content/sequence/DRAW_CRF/recom_mapper.html).

Cluster III

The A1B Cluster comprises seven newly described isolates (252-18KR, 359-18CH, 512-15CH, 555-16CH, 93-16CH, 406-15CH, and 114-15CH) that had a very similar pattern of recombination. All IN sequences were subtype B. Within the PR_RT sequences, a short (351 to 415 bp long) insertion of the A1 fragment was noted (Fig. 7; Supplementary Figure S7 and Fig. S8).

Figure 7
figure 7

Recombination pattern of the seven partial HIV-1 pol sequences from the identified A1B Cluster. Analysed region include the protease with a fragment of the reverse transcriptase gene (2253–3554 bp) and the integrase gene (4230–5096 bp). Recombination breakpoints (according to HXB2 genome positions) were obtained using Simplot v3.5.1 software and jumping-profile hidden Markov model (jpHMM). The blue color stands for HIV-1 subtype B and red color stands for sub-subtype A1. Component subtype regions are labeled from 1 up to 4 on the genome map, corresponding with numbered on phylogenetic trees in Supplementary Figure S7. The mosaic map was generated using the Recombinant HIV-1 Drawing Tool (https://www.hiv.lanl.gov/content/sequence/DRAW_CRF/recom_mapper.html).

Clinical characteristics of the identified HIV-1 variants

Non-B subtypes were notably more common among females (n = 81, 22.8% for non-B variants versus n = 291, 13.5% for subtype B; p = 0.001). Specifically, a higher proportion of women, compared to subtype B was observed for the A (n = 45, 19.7%; p = 0.01), D (n = 18, 66.7%; p = 0.001), and G (n = 3, 50%; p = 0.036) subtypes. Non-B variants were also more frequent among heterosexually infected (HET) individuals (n = 45, 32.4%) compared to MSM (n = 88, 63.3%; p = 0.0031) and injection drug users (IDU) (n = 6, 4.3%; p < 0.0031), especially for subtype D (n = 4, 66.7%; p = 0.001) and CRF02_AG (n = 2, 100%; p = 0.033) (detailed in Table 1).

Table 1 Group characteristics across the analysed subtypes and differences compared to the subtype B.

Overall, cases infected with non-B clades presented with significantly higher CD4 + T-lymphocyte counts at care entry (median: 454 cells/μl, IQR: 244–629) compared to B lineage (median: 373 cells/μl, IQR: 176–572). Similarly, values of nadir lymphocyte CD4 + counts were higher for non-B variants [median: 444 cells/μl, IQR: 249–586] vs. subtype B [median: 358 cells/μl, (IQR: 174–541]. These differences were driven mainly by subtype A [median lymphocyte CD4 + count: 457 cells/μl, IQR: 291–630; p = 0.006; median nadir lymphocyte CD4 + counts: 451 cells/μl, IQR: 282–579, p = 0.015] as well as CRF01_AE [median lymphocyte CD4 + count: 651 cells/μl, IQR: 570–842, p = 0.025; median nadir lymphocyte CD4 + counts: 651 cells/μl, IQR: 570–842, p = 0.015]. Lastly, individuals infected with subtype D were older (median: 40 years, IQR: 39–58) at the time of diagnosis than subtype B cases (median: 35 years, IQR: 29–42). HIV-1 virial load at diagnosis among patients with rare CRFs & URFs variants was higher (mean: 5.5 log copies/ml, IQR: 4.5–6.5) than among subtype B infected cases (median: 4.6 log copies/ml, IQR: 3.7–5.5).

Multivariate logistic regression analysis of the whole dataset with inclusion of gender, transmission route, viral load at diagnosis, CD4 + T cell counts at baseline, nadir CD4 + T cell counts, age at diagnosis, and HIV infection stage, indicated that infection with HIV-1 non-B clades was independently associated with female gender (OR: 1.69, 95% CI 1.22–2.34, p = 0.002), higher age at diagnosis (OR: 1.03, 95% CI 1.002–1.052, p = 0.032) and HIV-1 viral load at diagnosis (OR: 1.35, 95% CI 1.05–1.74, p = 0.02).

Temporal trends for subtype frequency

During the observation timeline, annual changes in the proportion of subtype B individuals decreased significantly, from 89.3% in 2015 to 80.3% in 2019 (OR: 0.85, 95% CI 0.78–0.92, p = 0.0001) (Table 2, Fig. 8). This was related to the increasing number of subtype A infections (from 5.6% in 2015 to 13.4% in 2019 (OR: 1.26, 95% CI 1.14–1.40, p < 0,0001 ). Interestingly, subtype D frequency has dropped from 1.9 to 0.3% (OR: 0.63, 95% CI 0.43–0.87, p = 0.008) in the analyzed years.

Table 2 Time trends for non-B subtypes across transmission category, gender, clinical category of HIV infection and lymphocyte CD4 count.
Figure 8
figure 8

Logistic regression estimates for time trends between 2015–2019 for subtype B and subtype A, together with pie charts of subtype prevalence each year. Dots indicate the percentage per year and vertical bars 95% confidence intervals for the percentages with a vertically presented logistic regression time trend line with dark-grey shaded 95% confidence intervals for the regression estimate. Only limited numbers of subtypes other than B and A each year were found; these non-B, non-A subtypes were grouped as "other” on the pie charts.

In non-B variant infected population, the proportion of the male gender increased steadily from 8.7% in 2015 to 17.4% in 2019 (OR: 1.20, 95% CI 1.09–1.32, p < 0.001—Fig. 9a). The frequency of non-B variants increased significantly from 10.3% in 2015 to 22.7% in 2019 among individuals aged < 30-years at diagnosis (OR: 1.24, 95% CI 1.06–1.46, p < 0.01—Fig. 9b). The percentage of the 31–40 age at diagnosis patients group increased from 11.3% in 2015 to 22.3% in 2019 (OR: 1.16, 95% CI 1.02–1.33, p < 0.05—Fig. 9c).

Figure 9
figure 9

Logistic regression estimates for time trends between 2015–2019 for transmission category: men who have sex with men (MSM) (a), age at diagnosis for ≤ 30 years-old group (b) and 31–40 years-old group (c). Dots indicate the percentage per year and vertical bars 95% confidence intervals for the percentages with a vertically presented logistic regression time trend line with dark-grey shaded 95% confidence intervals for the regression estimate.

Additionally, the proportion of infections other than the B variants increased in asymptomatic cases from 12.1% in 2015 to 29% in 2018 and in AIDS patients from 6.5% in 2015 to 28.6% in 2018 (OR: 1.33, 95% CI 1.04–1.70, p < 0.05 and OR: 1.93, 95% CI 1.18–3.29, p < 0.01, respectively). In nearly half of the analyzed cases, data on the clinical category at diagnosis was missing. Accordingly, between 2015 and 2019, occurred a simultaneous increase of non-B infections occured in AIDS individuals and asymptomatic patients.

Also, among non-B variant infected individuals, the frequency of cases with lymphocyte CD4 count exceeding 200 cells/μl increased over time (from 12.5% in 2015 to 30.4% in 2018). The proportion of MSM was also significantly increased from 8.6 to 37.1% (OR: 1.53, 95% CI 1.20–1.95, p < 0.001. For the analysis of time trends of the last two parameters patients diagnosed in 2019 were excluded as their clinical data were not available.

Discussion

Our results showed that increasing A-clade incidence in Poland is driven mostly by the heterosexual transmission mode, age at diagnosis, and female gender. The decline in subtype B has been observed in Europe, but the spread of HIV-1 variants is different in each country. This study is the first to characterise the cohort of A6 sublineage on Poland’s national scale. Together with evidence of novel A6/B URFs, this suggest considerable heterogeneity of HIV-1 in the polish population. Our results might support drug therapy interventions because infections with the A6 variant may bear resistance to cabotegravir, the long-acting antiviral soon to be implemented in the clinical practice12.

In this article, we delineated the distribution of HIV-1 variants in the period between 2015 and 2019 in a large group of diagnosed patients from Poland to reveal the countrywide molecular diversity of the virus. For the first time, based on phylogenetic studies, we report on the presence of the HIV-1 A6 sublineage in Poland. Subtype B remains the most common variant to date, shared by 85.9% of the surveyed population, and remains predominant among MSM. This observation is in line with our previous report on the period from 2008 to 2014, where 86.9% of recorded infections developed from HIV-1 B lineage8 and other Polish HIV epidemic studies7,13,14. However, the proportion of non-B clade increased significantly between 2015 and 2019, primarily due to increased number of infections with sublineage A6. The decline in subtype B has been observed in Europe. It may be driven by the economic migration and refugee crisis and the increasing circulation of non-B subtypes among the native European population15,16,17. Of note, in the past decade immigration in Poland was observed from the Former Soviet Union (FSU) nations, especially Ukraine18. Until the end of 2019, 1,351,418 Ukrainians lived in Poland, which constitutes 64% of all officially registered foreigners (https://stat.gov.pl/en). The increase in subtype A is likely to be related to immigration as this variant is ubiquitous in Eastern Europe19. Indeed, the highest percentage of the A6 variant have been recorded in northeast province, close to the former Soviet federation border (Supplementary Table 1 and Fig. 4). A6 sublineage was first identified among IDUs in the Ukrainian city of Odessa in late 1994, and was previously included in the A (FSU) subgroup. Immigration related HIV infection has remained common in Europe20,21,22. In Poland, up to date, a small percentage of immigrants (< 5%) have been observed among the HIV-1 diagnosed individuals, however, the observed data suggest introduction and spread of novel variants in the country23. Molecular surveillance over A6 variants is of importance due to possible emergence of resistance and virological failure after therapy containing the HIV-1 integrase strand transfer inhibitor cabotegravir12,24. In addition to A6 samples, ten samples of the A1 and one of the A3 variant were identified. The phylogeographic mixture of A sub-lineage variants has also been observed in Germany22 and Italy20.

Identification of three A/B recombination clusters is an important novel finding of this study. For the first time we identify two new clusters of A6B recombinant variants. Our former publication described one monophyletic group of A1B and a single sequence of A6B8. In the current analysis, A/B recombinants accounted for 5.9% of the non-B variants, while in the period 2008–2014, they were identified in 3.1% of sequences8. Thus, the frequency of mosaic sequences composed of subtypes A and B almost doubled over the analyzed years. Previously, we described one regional HIV center (Cracow) as the source of all clustered A1B sequences and the city of Bialystok as the origin of a single sample of the A6B mosaic form. In the present data, recombinant virus isolates were obtained from individuals followed up in seven cities, which are located across different provinces. Our analyses provide evidence that the described mosaic forms circulate in Poland and contribute to increasing genetic heterogeneity of non-B subtype strains. Although the sequences in A1B Cluster had similar recombination profile, the diversity of recombinant forms within the A6B Clusters was large. In A6B clusters, the number of recombination in the pol gene region vary from one up to four break events which reflect high recombination rate of HIV-1 variants. Since the subtype B and sub-subtype A6 are the most widespread in Poland, such a recombination was expected and might reflect the presence of a novel, yet unidentified CRF.

Cluster sequences may likely implicate transmission networks. The analysis of phylogenetic clusters, together with epidemiological and demographic data, are important to understand the factors underlying the growth of epidemics. Others, have published that MSM are more likely to cluster compared to all other risk groups25. Indeed, all sequences in the A/B clusters derived only from male individuals and apparently indicate the presence of MSM transmission group. However, there was a shortage of information relative to transmission mode for those patients. Furthermore, gender is presumably related to the differences in HIV-1 clade distribution within exposure groups26. The lack of women in the A/B recombination clusters probably reflect the MSM and/or IDU transmission mode. Both A6B clusters include A sequence regions with Eastern European ancestry. In the LANL-HIV CRF’s database (https://www.hiv.lanl.gov/content/sequence/HIV/CRFs/CRFs.html), only one example of subtype B and A recombination (CRF03_AB) has been identified so far.

The differences in demographic characteristics across the analyzed samples are due to the increase in the female gender and dominance of the heterosexual transmission route among individuals infected with non-B clades. Our results indicate that the male-to-female ratio in the non-B clade is 3.30 compared to 6.43 in subtype B. In the past five years of study, HIV-1 infections were mostly driven through MSM- and IDU-associated clades and as a consequence, predominance of men and subtype B were observed. Many authors have published similar findings25,26. It should be noted that the increase in the proportion of non-B clades among females is connected with migration and travel related introduction of infections, especially in sublineage A6 viruses. The ratio of male-to-female correlates with the number of non-B subtypes because more males have been diagnosed with subtype B. The Silesian and Lower Silesian provinces had a low incidence of non-B subtypes infections result in a high (above five) male-to-female ratio. South of these regions, a similar proportion was noted in the Czech Republic, where the male-to-female ratio elevates the European average23. As observed in Podlaskie and West Pomerania lower male-to-female ratio (under three) was correlated with frequent A6 subtype occurrence.

Interestingly and importantly, a significantly higher lymphocyte count at baseline among non-B variant infected individuals is in contrast to our previous records8. This is also associated with the increasing frequency of subtype A. Individuals infected with those virus strains clearly undergo care entry at the earlier stages of infection, with less pronounced immunodeficiency. Additionally, female gender possesses better immunovirological parameters, which is higher CD4 + T cells following primary HIV‐1 infection at the beginning of infection compared to males27,28.

Furthermore, among non-B subtype characteristics, notable time associated trends were observed. Firstly, the proportion of non-B clades among MSM is increasing. This is also reflected in the increasing frequency of males for non-B variants in the analyzed timeline, and may be associated with dense transmission networks and clustering of MSM29. Secondly, across HIV age at diagnosis, especially in young adults (under thirty and forty), non-B clade distribution is increased. Age at diagnosis is shifting towards younger individuals in non-B clades. This observation may correspond to the increasing trend in non-B variants frequency associated with MSM mode. Moreover, these findings could be related to less frequent testing of older heterosexual males.

Additionally, at the time of sampling differences in incidence among clinical symptoms in non-B variants were observed. Regardless of CDC clinical stage, approximately half of the data was missing in available clinical reports. Therefore, the representation of non-B clade asymptomatic and AIDS patients could mutually increase in the timeframe of the study. Patient clinical statuses at diagnosis vary due to prevention efforts and time of enrollment in care. The larger frequency of CD4 + cell count above 200 cells/μl at baseline and the diagnosis of patients at a younger age in the non-B clade may be associated with a higher proportion of registered asymptomatic patients. Moreover, a higher level of lymphocytes at HIV-1 diagnosis is noted in women than in men28. This point may explain the increase in non-B individual’s frequency associated with higher lymphocyte count as females were notably more common among non-B subtypes.

On the other hand, we observed a growing prevalence of AIDS individuals within non-B variants. This result is in line with previous findings that in Poland, 44.8% of newly diagnosed patients are diagnosed late (LP) or in the AIDS stage. Factors associated with LP/AIDS are older age, IDU, and HTX30. Diagnosis several years after an infection led to declining levels of CD4 + cells at the time of sampling. Nevertheless, cell count below 200 per μl, when AIDS is diagnosed, remains in a quarter of newly registered patients31.

According to the "HIV/AIDS surveillance in Europe 2019" report, the highest age-specific HIV diagnosis values in both genders is among the 25–39 years old group. The results of our research meet these data. In the present study, the under thirty-years age group was the most prevalent and the under forty-years age group followed. Moreover, the total male to female ratio in the current studied population was 5.77, almost identical to the value in central Europe (5.6). Unfortunately, the above report did not include the subtype distribution23.

The principal limitation of the study was the method of sampling. Individuals submitted for the report were originated from HIV-1 Regional Centers for primary transmitted drug-resistance or treatment failure analysis and may not fully reflect the prevalence of infections in the overall population. Nevertheless, the data obtained are based on a significant number of cases. Samples collected for our study come from 10 out of 16 provinces inhabited by 65% of the population (https://stat.gov.pl/en). Furthermore, we did not have access to complete patient’s documentation, so some parameters possessed fewer representations but still allowed for statistical modelling. The subtyping analyses performed in this study were based on only a fragment of the pol gene. Unfortunately, this makes detailed analysis of the sequence recombination profile impossible. Full HIV-1 sequencing would most likely allow to identify a higher number of recombinants and confirm the breakpoints in the identified ones. Furthermore, extensive characterization of the whole genome of the identified AB recombinants could provide valuable information about novel circulating variants.

Subtype B remains the predominant HIV-1 variant in Poland. However, there was a significant increasing trend in the prevalence of non-B clades, mostly due to a rising number of sublineage A6 HIV-1 infections. Our results showed a high frequency of recombinations between these two most prevalent subtypes and a small number of other URFs. We have identified three separate A/B sequence clusters, and the formation of novel recombinants is evidence of increased HIV-1 heterogeneity. The comparison of subtypes and transmission groups with demographic parameters suggests that HIV-1 disease is highly diverse. Non-B variants are associated with heterosexual transmission, age at diagnosis, and female gender. The study will be further extended as the identification of the novel recombinants is part of molecular surveillance for HIV-1 and may influence drug susceptibility.

Methods

Study group

We used 2518 samples collected from patients who undergo genotypic drug resistance testing in 10 of 27 Regional AIDS Centers for this study (cities: Bialystok, Bydgoszcz, Cracow, Chorzow, Gdansk, Lodz, Poznan, Szczecin, Wroclaw, Zielona Gora). The dataset included sequences obtained from both naive (84.3%) and treatment-experienced patients (15.7%) (one sequence per individual; if multiple sequences were available the earliest one was included in the analysis). The time frame of sampling was from 2015 to 2019. Plasma samples were collected locally and shipped to the genotyping laboratory. The collected data included gender, CD4 + lymphocyte count, HIV viral load at sampling, age at diagnosis (age at first positive HIV confirmatory test), transmission route (self-defined by the patient) and WHO clinical stage at sampling. HIV-1 RNA isolation and sequencing of the reverse transcriptase (RT) and protease (PR) domains were carried out using the ViroSeq HIV-1 Genotyping System v 2.0 (Abbot Molecular, Des PlainesIL, USA). The integrase region was sequenced using the methodology described by Laethem et al.32. Amplicons obtained by the nested PCR method were used for Sanger sequencing using the BigDye technology on an ABI 3500 platform (Applied Biosystems, Foster City, CA, USA). Integrase sequence assembly was performed with the Recall online tool33. All samples were sequenced at the Clinical Laboratory at the Department of Infectious, Tropical Diseases and Immune Deficiency at Pomeranian Medical University in Szczecin, Poland. PR/RT sequences were available for all samples, with addition of the integrase coding region for 28 (1.1%) samples.

Ethics statement

The protocol of the study was approved by the Bioethical Committee of the Pomeranian Medical University, Szczecin, Poland, approval number (no. KB-0012/26/17 and KB-0012/08/12). All patients gave informed consent to the proceeding of the sample and clinical data processing to conduct this study. All samples in this study were de-identified to maintain participants anonymity. The research was conducted in accordance with the Declaration of Helsinki.

Subtyping and phylogenetic analyses

The 1302 nucleotide fragments of the pol gene were used for subtyping. This region includes the protease and part of the reverse transcriptase sequence (corresponding to HXB2 genome location positions from 2253 to 3554). When sub-lineage breakpoints were detected, we also performed sequencing of the integrase 866 bp-long fragments (covering sites 4230–5096 of the HXB2 genome). All sequences were initially assessed with REGAv3 (http://www.bioafrica.net/typing-v3/hiv), COMETv234, SCUEAL35, and Stanford subtyping tools36 and confirmed by phylogenetic analyses. For phylogenetic inference, sequences were aligned with Clustal Omega tool37. The GTR + I + G model with four gamma categories was selected as optimal for the analysed dataset using Modeltest-NG 0.1.3 software38. The calculated nucleotide frequencies using this model were as follows: freqA = 0.4373, freqC = 0.1556, freqG = 0.1840, freqT = 0.2228, gamma shape parameter = 0.6193, and p-inv = 0.2462. Reference sequences from LANL-HIV-1 compendium 2017 version were used for subtyping, and the dataset was further supplemented with sequences with high similarity (> 95%) from BLAST (Basic Local Alignment Search Tool) analysis. The DAMBE program was exploited for duplicates removal39. Phylogenetic trees were generated using PHYML v3.0 web-server (http://www.atgc-montpellier.fr/phyml/) with the maximum likelihood (ML) method, Nearest Neighbor Interchange (NNI) type of tree rearrangement and using likelihood ratio test (aLRT) based on a Shimodaira-Hasegawa-like procedure40. First, the trees for subtype B and A were inferred, subsequently non-B and non-A variants as well as other recombinants were analyzed. Breakpoints were identified using two methods: the jumping-profile hidden Markov model (jpHMM)41 and the Simplot v3.5.1 software using bootscanning with 300 bp window size and 20 bp increments, based on the NJ method using the 2-parameter Kimura model42. Mosaic viruses were confirmed by phylogenetic trees (PhyML online tool) containing breakpoint fragments with reference sequences to recognize parental subtypes. For alignment, fragments were treated as separate sequences with different bp length. Alignment was made to the reference sequences A-D, F–H, J, K. The length of the reference sequences was 2844 bp, which span continuously from the beginning of the PR gene to the end of the IN gene (2253–5096 bp of HXB2 genome) of the pol region. Thus, the phylogenetic analysis for the breakpoints fragments was independent of each other. Empty spaces in the analysed fragments were indicated as a "period" symbol. Such alignment serves to infer the phylogenetic tree with GTR + I + G model as described above. All trees have been prepared in the Interactive Tree of Life (iTOL)43. Trees were rooted with group O, however, this root was removed from the final figures.

Statistical analyses

Statistical comparisons were performed with Fisher's exact and X2 test for nominal variables, as needed. The Mann–Whitney U-test was used to analyze continuous variables (lymphocyte counts, HIV viral load, age at diagnosis). Interquartile ranges (IQR) and confidence intervals (CI) were marked as appropriate. Statistical calculations were made with commercial software (TIBCO Software Inc. 2019. Statistica Software: Release 13.6. Palo Alto, CA: TIBCO Software Inc). Time trends and logistic regression were performed with the R (4.0.1.)44 platform package MASS45 (coding available on request).