Introduction

Human immunodeficiency virus type 1 (HIV-1) surface glycoprotein (gp120), a product of viral env gene, plays a key role in cell entry, viral tropism, pathogenesis, vulnerability to the host immune response and susceptibility to the entry inhibitors such as maraviroc1,2,3. The third variable loop (V3) of the gp120 contributes to the HIV-1 entrance into target cells by binding to the cell CD4 receptor followed by attachment to either the CCR5 or CXCR4 coreceptor molecules1,4. Viruses that during cell entry exclusively use the CCR5 or CXCR4 coreceptor are termed R5 or X4, respectively, whereas those that can use both coreceptors are described as R5X4 viruses. Usually, R5 viruses are associated with HIV-1 transmission and predominate at early phases of infection, while X4 or R5X4 (non-R5) viruses emerge as infection develops1, and are linked to accelerated disease progression and decline of CD4+ T-cell counts5,6,7. Although several mechanisms of selection against transmission of X4 viruses were proposed8,9, it has been reported that also CXCR4-using HIV-1 variants can be transmitted to a new host, e.g. to individuals lacking functional CCR5 receptor10. The proportion of infections with CXCR4-using strains investigated among patients with early infection stages, suggesting transmission of X4 variants, ranged from 3% to 26% depending on the population examined and method used to assess viral coreceptor tropism5,11,12,13,14,15,16,17,18,19,20.

To date, studies on HIV-1 coreceptor tropism in Poland were restricted to the northern geographic region and were performed among patients with long-term HIV-1 infection and experience of antiretroviral therapy21, and among treatment-naïve persons with a new HIV-1 diagnosis22. In the second group the overall prevalence of non-R5 strains was 15.5% or 27.8% depending on the interpretation criteria used in genotypic tropism test, and an increase in the frequency of non-R5 strains during the study period was observed, suggesting either increase of late HIV-1 diagnoses or putative transmission of CXCR4-using viruses in the local population22. Nevertheless, these studies did not allow to evaluate the rate of transmission of non-R5 strains, while this information would be useful for reasonable planning of treatment programs, especially in the context of the strong push to identify early HIV-1 infections, and the potential role of CCR5 antagonists in pre-exposure prophylaxis23.

The analysis of HIV-1 env gene sequences enables not only genotypic predictions of viral coreceptor tropism, but also allows for phylogenetic investigations aimed at detection of transmission clusters, which can facilitate the understanding of the epidemiological links. Thus, it is possible to study the contribution of patients with specified characteristics to HIV-1 transmission, and e.g. confirm the forward transmission of non-R5 strains or viral spread from patients at different infection stages24,25,26.

To obtain a more complete picture of the HIV-1 epidemics in Poland, analyses of env gene viral sequences from patients with new HIV-1 diagnosis established in the years 2008–2014 and known proportion of recent HIV-1 infections, were performed. Investigations included viral and host characteristics associated with coreceptor tropism and identified transmission clusters, in order to trace the spread of non-R5 HIV-1 strains and assess the contribution of recent infections to the Polish HIV-1 epidemic.

Results

Patients and virus characteristics

The characteristics of the 292 patients included in the analysis are presented in Table 1. All study participants were of Polish origin and were naïve for antiretroviral therapy. Among them there were 46 (15.8%) patients with recent HIV-1 infection (RHI), and 246 (84.2%) individuals with long-term HIV-1 infection (LTHI). Sexual contacts between men were the most frequently reported transmission route (199/292; 68.2%), with the proportion not significantly different in groups with dissimilar duration of infection (RHI: 35/46, 76.1% vs LTHI: 164/246, 66.7%; p = 0.232). Similarly, there were no statistical differences between patients with RHI and LTHI with respect to other transmission routes, sex, age, and CCR5 Δ32 genotype status. Although patients recruited in Chorzów predominated in the study (130/292; 44.5%), patients recruited in Wrocław constituted the most numerous group among the individuals with RHI (21/46; 45.7%; p = 0.007). A statistically significant difference between individuals with recent and long-term HIV-1 infection was observed for median baseline CD4+ T-lymphocyte count (RHI: 564 vs LTHI: 396 cells/µl; p = 0.002), and proportion of patients with higher CD4+ T-lymphocyte count at diagnosis (>500 cells/μl – RHI: 26/43, 60.5% vs LTHI: 79/229, 34.5%; p = 0.002). A baseline HIV-1 viral load, usually high at early infection stages, unexpectedly was significantly higher among patients with LTHI (RHI: 4.19 vs LTHI: 4.51 log copies/ml; p = 0.045) (Table 1). We note however, that the average delay between the diagnosis and the sample collection in our study was 32 days, which means that in many cases of recent infection we may have captured an already stabilised viral load, rather than the initial peak.

Table 1 Characteristics of the study participants diagnosed in the years 2008–2014 with HIV-1 env sequences available.

According to the subtype analysis of 292 HIV-1 env sequences, the majority of patients were infected with subtype B (279; 95.6%) (Table 2). Among 13 patients with non-B subtype infection, there were 8 individuals harbouring sub-subtype A6 (2.7%), 3 persons with CRF50_A1D infection, and remaining 2 persons were infected with CRF02_AG and sub-subtype F1, respectively. Strains of sub-subtype A6 were detected in all four diagnostic centres, and among patients who acquired HIV-1 infection either through same sex contact between men (n = 5) or heterosexual contact (n = 1) or via drug injections (n = 2), while CRF50_A1D strains were found exclusively in Kraków, among patients with homosexual (n = 2) or heterosexual (n = 1) route of transmission. CRF02_AG and F1 were identified in men who have sex with men (MSM) recruited in Kraków and Wrocław, respectively. There were no statistically significant differences regarding subtype frequency between patients with recent and long-term HIV-1 infection (p = 0.176). However, non-B subtypes were significantly less frequent among patients diagnosed in Chorzów than in other cities (Chorzów: 2/130, 1.5% vs other diagnostics centres: 11/162, 6.8% p = 0.043), and more frequent among patients diagnosed in Kraków (Kraków: 5/41, 12.2% vs other diagnostics centres: 8/251, 3.2% p = 0.023).

Table 2 Virus characteristics based on gp120 sequences obtained from the study participants in the years 2008–2014.

The median frequency of ambiguous nucleotides in the analysed env sequences was significantly lower among patients with RHI than those with LTHI (RHI: 0% (0–0) vs LTHI: 0.14% (0–0.41); p < 0.001) (Table 2).

Coreceptor usage

The frequency of non-R5 strains among all patients included in the study was 23.3% (68/292) when the geno2pheno 10% FPR cut-off was applied for the subtype B sequences and PhenoSeq results were used for non-B subtypes (Table 2). The same frequency of non-R5 variants was observed for all sequences analysed with geno2pheno 10% FPR only. Applying geno2pheno FPR cut-off values of 15% and 20% resulted in the frequency of non-R5 strains as high as 34.9% (102/292) and 45.5% (133/292), respectively. With the more restrictive 5.75% FPR cut-off the frequency of non-R5 strains was decreased to 9.6% (28/292). There were no significant differences between persons with RHI and LTHI regarding the frequency of the non-R5 strains, and values of geno2pheno FPRs were comparable between patients with recent and long-term HIV-1 infection (p = 0.578) (Table 2).

Comparison of all patients harbouring non-R5 and R5 strains predicted by geno2pheno 10% FPR for the subtype B sequences and PhenoSeq for non-B subtypes revealed no significant differences with regard to sex, age, city of diagnosis, CCR5 Δ32 genotype, median viral load, and HIV-1 subtype (Table 3). Similarly, no significant differences were seen between non-R5 and R5 strains with respect to the number of ambiguous nucleotides and deduced ambiguous amino acids in V3 coding region. When the same comparisons were performed exclusively among persons with RHI, again the lack of significant differences between non-R5 and R5 strains was observed. A trend toward a higher proportion of MSM among patients with non-R5 strains in comparison with those harbouring R5 strains, was observed when all samples were tested (p = 0.054). This trend reached the level of statistical significance when analysis was restricted to the samples obtained from patients with recent HIV-1 infection only (p = 0.044) (Table 3).

Table 3 Comparison of patients harbouring non-R5 and R5 strains predicted by V3 analysis with geno2pheno 10% FPR (for subtype B) and PhenoSeq (for non-B subtypes) algorithms.

Transmission clusters

Among 292 HIV-1 sequences there were 27 transmission clusters containing 57 sequences (19.5%) identified (Table 4, Fig. 1). These clusters included from 2 to 3 sequences, with the majority of clusters containing only 2 sequences (24/27, 88.9%). The differences between patients with clustered and non-clustered HIV-1 sequences regarding the proportion of individuals of different sex, age, city of HIV-1 diagnosis, CCR5 Δ32 genotype, or infected with different HIV-1 subtypes or strains of different tropism were not significant, although a trend toward a higher prevalence of men within the group of patients with clustered sequences was noticed (p = 0.058) (Table 4). Patients with heterosexual transmission route were exclusively infected with viruses of non-clustered env sequences (p < 0.001), while patients for whom the route of transmission was unknown were more frequently represented among individuals with clustered HIV-1 sequences than among those with non-clustered sequences (p = 0.015). Similarly, higher proportion of MSM was observed within the group of clustered HIV-1 sequences, but the difference was not statistically significant (clustered: 44/57, 77.2% vs non-clustered: 155/235, 66.0%; p = 0.115). While the majority of clustered HIV-1 sequences were obtained from patients with LTHI (42/57, 73.7%), the proportion of sequences from persons with RHI was higher among clustered sequences than among non-clustered sequences (clustered: 15/57, 26.3% vs non-clustered: 31/235, 13.2%; p = 0.024).

Table 4 Comparison of patients with clustered and non-clustered HIV-1 env sequences and characteristics of clusters identified among 292 HIV-1 sequences.
Figure 1
figure 1

MCMC phylogenetic tree of env sequences obtained from 292 patients diagnosed in Poland in the years 2008–2014. Transmission clusters identified with the maximum likelihood aLRT value of >90%, maximum intracluster pairwise genetic distance <3%, and posterior probability of 1 in Bayesian inference are highlighted. Clusters highlighted in green contain sequences obtained from patients with long-term and recent HIV-1 infection. Clusters highlighted in grey and light blue contain sequences obtained from patients with long-term HIV-1 infection only and recent HIV-1 infection only, respectively. Squares indicate the presence of non-R5 strains. Self-reported transmission routes for patients with clustered HIV-1 sequences are specified with MSM (for sex between men), MSM/HET (for sex between men or women and men), IDU (for injecting drug use), O/Unk (for other/unknown). Majority of sequences represented subtype B, thus only non-B subtypes are indicated.

The levels of baseline median viral load were comparable between patients infected with HIV-1 strains of clustered and non-clustered sequences. The median CD4+ T-cell count at diagnosis was significantly higher among patients from whom clustered HIV-1 sequences were obtained than among persons with non-clustered viral sequences (clustered: 502 vs non-clustered: 399; p = 0.041), and accordingly, the proportion of patients with CD4+ T-cell counts of <200/µl was significantly lower among those harbouring HIV-1 with clustered sequences, while the proportion of patients with >500 CD4+ T-cell counts among them was significantly higher (Table 4).

To give a more detailed description of the 57 patients harbouring HIV-1 with clustered sequences and putatively engaged in transmission events, the numbers of clusters gathering HIV-1 sequences from patients of the same characteristics (sex, age group, city of diagnosis, transmission route, non-R5 strain infection, HIV-1 infection status), as well as the numbers of sequences in such clusters, were calculated (Table 4). Since the majority of clustered HIV-1 sequences were obtained from men, also the majority of clusters contained exclusively sequences obtained from men (25 clusters with 53 sequences), and only a single cluster contained one sequence from a woman and another from a man. Nine clusters included 18 HIV-1 sequences from persons under 30 years old only, 6 clusters contained 13 HIV-1 sequences from persons ≥30 years old only, and 11 clusters gathered 24 sequences from patients of <30 and ≥30 together. One cluster contained 2 sequences from persons for whom the information on sex and age was unavailable. The majority of clusters included sequences obtained from patients diagnosed with HIV-1 in the same location (23 clusters with 49 sequences). In the 4 remaining clusters, sequences obtained from patients recruited in different centres were grouped; within each of these 4 clusters there was 1 sequences from patient diagnosed in Chorzów combined either with sequence from patient diagnosed in Wrocław (3 clusters) or in Kraków (1 cluster) (Table 4, Fig. 2a).

Figure 2
figure 2

HIV-1 transmission clusters inferred from the analysis of env sequences obtained from 292 study participants diagnosed in the years 2008–2014. Clusters were identified with the maximum likelihood aLRT value of >90%, maximum intracluster pairwise genetic distance <3%, and posterior probability of 1 in Bayesian inference. Viral tropism is indicated with shapes: squares and circles represent sequences of non-R5 and R5 strains, respectively. For all patients with clustering viral sequences shapes are colored according to: city of HIV-1 diagnosis (a), self-reported transmission route (MSM - sex between men, MSM/HET - sex between men or women and men, IDU - injecting drug use, O/Unk - other/unknown) (b), and duration of HIV-1 infection (c).

The majority of clusters included sequences from persons who declared the same route of HIV-1 transmission, namely 17 clusters contained 36 sequences obtained exclusively from MSM, and 1 cluster gathered 2 sequences from persons with unknown transmission route (Table 4, Figs 1 and 2b). Among the remaining 9 mixed clusters, all but one contained at least one sequence from an MSM or MSM who also reported sex with women. Three mixed clusters included sequences from injecting drug users (IDUs), but none of these 3 sequences came from a person with RHI.

Most of clustered HIV-1 sequences obtained from patients with LTHI (27/42) were grouped within 13 clusters gathering only the sequences from patients with LTHI (Table 4, Figs 1 and 2c). The next 13 clusters contained sequences from both, persons with LTHI (15 sequences) and patients with RHI (13 sequences). There was only one cluster with 2 sequences from patients with RHI only.

To infer about possible transmission events from patients with recent HIV-1 infection we examined the sample collection dates for persons with viral sequences gathered in a single transmission cluster. These dates were checked for all 14 clusters containing 15 sequences obtained from patients with RHI. Onward transmission during the recency period could not be the case for any of the 2 patients with RHI, whose viral sequences were grouped in a single cluster, since one patient was recruited to the study over 4 years earlier than the second one, and therefore could be engaged in transmission only during long-standing HIV-1 infection. For 10 out of 13 mixed clusters, i.e. containing sequences from persons with RHI and LTHI, the date of sample collection for a patient with RHI was less than 6 months earlier than the sampling date for any patient with LTHI involved in the same cluster, thus we were not able to unambiguously demonstrate that these patients with recent HIV-1 infection were engaged in forward transmission.

As many as 23 clusters contained sequences of HIV-1 strains of the same tropism, among them there were 17 clusters with 36 sequences of R5 strains, and 6 clusters with 12 sequences of non-R5 variants. The remaining 9 clustered sequences were included within 4 mixed clusters, with sequences of R5 and non-R5 strains gathered simultaneously in each cluster (Table 4, Figs 1 and 2). Non-R5 strains were identified among 11 out of 46 patients with RHI, which may suggest acquisition of non-R5 variants for these patients. Among these 11 patients there were 5 with viral sequences grouped within 5 separate transmission clusters, and 6 with non-clustered sequences. More detailed characteristic of all patients with putative acquired non-R5 strains infection is presented in Table 5. Briefly, patients with possible acquired non-R5 infection were MSM with HIV-1 subtype B, most of them (7/11) were diagnosed in Chorzów. CCR5 genotypes wt/Δ32 and Δ32/Δ32 were identified in 3 of 11 patients (27.3%) with putative acquired non-R5 strains infection, and were not significantly more frequent than among patients with RHI harbouring R5 strains (5/35, 14.3%; p = 0.374). Among 5 clustered sequences obtained from patients with recent HIV-1 infection and non-R5 strains, there were 4 that grouped with non-R5 strains, further supporting potential acquisition of non-R5 strains for 4 out of 46 patients with RHI (8.7%) (Table 5).

Table 5 Possible acquired non-R5 strains infections among patients with clustered and non-clustered HIV-1 sequences.

Time trends

The proportion of patients diagnosed during recent HIV-1 infection has increased significantly from 9.9% (7/71) in 2008 to 23.6% (13/55) in 2013 and 25% (1/4) in 2014 with an average annual difference of +3.15% per year (OR: 1.26, 95% CI: 1.07–1.48, p = 0.005) (Table 6). Consistently, there were significantly increasing mean CD4+ T-cell counts from 460 ± 249 in 2008 to 550 ± 354 in 2013 (slope =+22.1/year, p = 0.008). Significantly increasing trend in the CD4+ T-cell counts was not visible, when the analysis was restricted to patients with RHI (p = 0.286), but was true for the group of patients with LTHI (slope =+18.0/year, p = 0.047). Analysis of the viral load revealed no significant time trends, neither for all patients nor for patients with RHI or LTHI, separately.

Table 6 The logistic and linear regression analyses of time trends in selected parameters for patients with recent and long-term HIV-1 infection – years 2008–2014.

In line with the decreasing proportion of women among HIV-1 infections (from 11.3%, 8/71 in 2008 to 0% in 2013 and 2014; average annual difference = −2.38%/year, OR: 0.68, 95% CI: 0.52–0.91, p = 0.003), the proportion of MSM has increased significantly from 53.5% (38/71) in 2008 to 85.5% (47/55) in 2013 and 75% (3/4) in 2014 (average annual difference =  + 6.80%/year, OR: 1.43, 95% CI: 1.23–1.66, p < 0.001), and the percent of patients who reported heterosexual intercourses as a transmission route significantly decreased in this time period from 22.5% (16/71) to 3.6% (2/55) and 0% in 2014 (average annual difference = −5.10%/year, OR: 0.61, 95% CI: 0.49–0.77, p < 0.001), while the proportion of IDUs was stable (p = 0.340). Similar temporal trends for the mode of HIV-1 transmission were visible within the group of patients with LTHI, while the proportions of persons with the three transmission routes did not change significantly among patients with RHI (Table 6).

The proportion of non-R5 strains was stable over time with a slight decrease over the studied period (average annual difference = −1.52%/year, p = 0.234), and geno2pheno FPR values were also stable over time (p = 0.404). Similarly, there was a stable proportion of clustered HIV-1 sequences (p = 0.634), and the percent of infections with non-B subtypes over time (p = 0.445). The age at diagnosis was also stable throughout the examined time range (p = 0.108). The same stable trends were observed in the separate analyses restricted to the patients with RHI and LTHI, with the exception of a tendency to the less frequent clustered transmission through years among patients with RHI (p = 0.074) (Table 6).

Discussion

In the studied group of 292 patients with HIV-1 infection diagnosed in the years 2008–2014 in four centers for HIV Diagnostics and Therapy for AIDS located in southern and central Poland (Chorzów, Kraków, Łódź, and Wrocław) recent HIV-1 infections accounted for 15.8% of all newly diagnosed infections. This is a much smaller proportion of recent infections than observed in Poland in 2006 with the BED assay (44%)27. Although the latter may have been overestimated due to the substantial false recent rate of the BED test, it also coincides with significant upsurge of infections among MSM, which occurred in the mid-200028. Thus, the smaller, but gradually increasing, percent of recent infections observed in the current study likely represents a real feature of the epidemic. In the current study recent HIV-1 infection status was detected with quantitative limiting antigen avidity enzyme immunoassay (LAg-Avidity EIA), which provides the false recent rate as low as 1.3% (0.3–3.2)29. The accuracy of RHI assessment by the LAg-Avidity EIA seems to be supported by the significantly lower frequency of ambiguous nucleotides in env sequences obtained from individuals with RHI than in sequences from patients with LTHI, as the proportion of ambiguous nucleotides is known to increase with the stage of infection30.

The overall frequency of non-B subtypes in the examined group was 4.5% (13/292), and was comparable for patients with recent and long-term HIV-1 infection. Nevertheless, this value was considerably lower than the frequency of non-B clades determined in other Polish study covering similar time period (112/946; 11.8%; p < 0.001)31. The generally low frequency of non-B HIV-1 variants observed in our study may be explained by low frequency of non-B subtypes among samples obtained from patients diagnosed in Upper Silesia region (Chorzów, 2/130; 1.5%), which constituted the most prevalent group in our research, and were not included in the previous study. This finding supports the notion that the prevalence of non-B HIV-1 subtypes may differ by geographic region. Additionally, HIV-1 infections with sub-subtype A6 and CRF50_A1D which were previously reported to spread in former Soviet Union area and Western European countries, respectively32,33, were detected in Poland for the first time. Similarly to the previous study31, the proportion of non-B variants in our study was stable throughout the study period.

Admittedly, patients with HIV-1 diagnosis during long-term infection predominated in our study. However, we found both, a significant increase of the proportion of recent infections and an increase in CD4+ T-cell count among the new HIV diagnoses over the study period, from 2008 to 2014. These data suggest that during the study period the average time from infection to HIV diagnosis decreased, although this decrease was not spectacular and should be presumably measured in weeks or months rather than in years. Indeed, such a decline in the estimated mean interval of time-to-HIV-diagnosis from 4.0 in 2001 to 3.2 years in 2010 was previously reported for MSM in United Kingdom34, and a gradual decrease of time from infection to diagnosis was observed between 2012 and 2016 across the European Union35. It is also in line with the previously described increasing testing rates36, and higher rates of HIV-1 testing associated with the same-sex sexual exposure37.

The overall distribution of the transmission routes reported by patients with RHI and LTHI was comparable. However, there was a significant annual decrease in the proportion of women and individuals infected through heterosexual contact, coupled with an increase in proportion of MSM. These temporal trends for the HIV-1 transmission route were observed within the whole studied group and among patients with LTHI and RHI separately, only in the latter group not reaching the level of significance. Hence, these data may confirm increase in the frequency of MSM testing for HIV-1 infection, and indicate that the epidemic among MSM in Poland was on the rise, while the increasing testing rates were still not sufficient to achieve an HIV diagnosis within the first 6 months after infection in the majority of cases.

In the analysis of phylogenetic clusters, sequences from 15 (32.6%) out of 46 patients with RHI were included in 14 clusters. Similarly to other reports38,39, sequences from individuals with RHI were notably more frequent within the clusters than outside the clusters (26.3% vs 13.2%). This higher frequency may be explained by the fact that the samples obtained in early infection experience minimal genetic divergence and are more likely to cluster with the putative donor sequence24. Besides, the majority of clustering sequences obtained from patients with RHI (13/15) were grouped with the sequences derived from patients with LTHI. Considering that in most of these mixed clusters the date of sample collection for a patient with RHI was less than 6 months earlier than the sampling date for a patient with LTHI or the sample collection dates were the same for both patients, it may suggest that the patients with long-standing infection were presumably responsible for the virus propagation, with most of them likely transmitting the virus before being HIV-1-diagnosed, and thus being not aware of their HIV-1 status. Although such an assumption could be confirmed by the medical interview data only in the case of one pair of patients with RHI and LTHI, it is in line with the results of the country-wide survey performed among MSM in the Netherlands revealing that over 70% of transmission events in the years 1996–2010 originated from undiagnosed individuals40. On the other hand, the lack of clustering for 67.4% (31/46) of sequences from patients with RHI may indicate that a considerable proportion of HIV-1 infections could be transmitted by undiagnosed (or not linked to care) individuals. However, due to the relatively low sample size in our study, also individuals that were not sampled should be considered as a potential source of HIV-1 infection for patients with RHI.

The frequency of clustered HIV-1 sequences identified in our study (57/292, 19.5%) was within the range of the frequencies observed in other European and North American HIV-1-positive groups38,41,42,43,44,45. Similarly to other European and North American cohorts38,41,42,43,44,45,46 the proportion of MSM (together with men who reported sex with either men or women as a transmission route) was significantly higher among patients with clustered sequences compared to non-clustered (clustered: 48/57, 84.2% vs non-clustered: 164/235, 69.8%; p = 0.031), showing that, like in other regions, the epidemic in southern and central Poland has been driven mainly by MSM. However, according to our data, unlike in most other studies38,41,43,44,45,46, patients with clustered HIV sequences were not significantly younger than patients with viral sequences located outside the clusters.

According to our data, the proportion of all sequences clustering with other sequences was stable over time or tended to decrease during the study period among samples obtained from patients with RHI. This is opposite to what was observed in a prior study in Poland, where significantly increasing trends for clustered transmissions were reported46. However, we only collected samples from four of 16 regions in Poland. In our sample 4/27 clusters (14.8%) contained sequences from different regions. Such interregional clusters with highly related sequences coming from different regions were also found by other researchers in Poland46 and Germany44, and in comparison to our results the proportion of these clusters was shown to be higher (31/109, 28.4% and 32/184, 17.4%, respectively). It is possible that due to increasing mixing, other patients with clustering sequences were diagnosed in other regions of Poland, and therefore were not captured, explaining the lower than expected level of interregional clusters in our study. In line with other studies44,46, sequences from MSM were the most common in the interregional clusters (7/8, 87.5%), confirming the role of MSM in bridging regional HIV epidemics.

The overall prevalence of non-R5 strains in our study was 23% as detected with geno2pheno 10% FPR for subtype B samples and PhenoSeq for non-B subtypes or geno2pheno 10% FPR for all samples, and was slightly lower than non-R5 prevalence determined in the former study (28%) with the geno2pheno 10% FPR, among newly diagnosed patients from northern Poland22. This small discrepancy could have resulted from the diverse approach to tropism prediction in both studies, namely single vs triplicate sequencing of V3 coding region in the current and the previous study, respectively. Triplicate genotypic V3 testing is able detect more non-R5 variant in comparison with analysis of single sequences. However, tropism predictions using V3 genotyping based on single or triplicate testing using geno2pheno with a FPR of 10% were shown to be comparable, and the high concordance of tropism prediction among samples with single and triplicate amplification stands behind the use of single amplification in diagnostic practice47,48,49,50,51. Alternatively, the lower frequency of non-R5 viruses in our study may be explained by the local, within-country differences in the occurrence of CXCR4-using strains. This may be supported by the observed stable prevalence of non-R5 strains during the study period among all patients with new HIV-1 diagnoses and among patients with recent and long-term HIV-1 infection separately, whereas in previous study performed in northern Poland an increasing trend for the frequency of the non-R5 strains was reported among patients with new diagnoses22. It is also noteworthy that the median value of geno2pheno FPR in our study was relatively low (23.5%; (10.5–49.8%)), since it was demonstrated that low baseline FPR determined by the geno2pheno tool can predict tropism switch from CCR5 to CXCR4, and patients with R5 viruses predicted at diagnosis with a geno2pheno FPR of less than 50% (or <40.6%) were more prone to switch coreceptor over time than patients with FPR values of >50% (or >40.6)17,52.

The presence of non-R5 strains among persons with RHI may suggest possible transmission of non-R5 strains. In the current study, the frequency of non-R5 strains was comparable between groups of patients with RHI (24%) and LTHI (23%). Such a relatively high frequency of CXCR4-using strains among patients with RHI was not commonly observed in other groups of patients with similar characteristics (recent or acute HIV-1 infection, subtype B predominance, mainly sexual mode of transmission), and similar method of tropism prediction (geno2pheno with 10% FPR). For instance, according to genotypic tropism testing performed on proviral DNA in Italy only 3% of patients with RHI were infected with X4 strains11, and in other studies the frequency of CXCR4-using strains among patients at early infection stages was under 20%5,12,15,16,17,53. The relatively high frequency of non-R5 strains determined in the current study for patients with RHI deserves attention because of the restricted number of patients eligible to therapy with CCR5 antagonist, maraviroc, as well as in the light of the established correlation between infection with the CXCR4-using strains and faster disease progression15,54, or first-line treatment failure19.

Although in current and other studies5,15,53,55,56 there were no significant differences between patients with RHI harbouring non-R5 and R5 viruses with regard to the baseline CD4+ T-cell count and viral load, longitudinal observations indicated that the presence of CXCR4-utilizing strains at the beginning of infection was associated with faster disease progression characterised by accelerated CD4+ T-cell count decline below 350 cells/μl5,53. Besides, no significant differences in sex, age, CCR5 Δ32 genotype, and HIV-1 subtype between patients with possible transmission of non-R5 and R5 strains were found in our research, confirming the results obtained in other studies15,53,57.

In our study, non-R5 strains among patients with RHI were observed exclusively in MSM, while patients with all other routes of HIV-1 transmission were harbouring R5 viruses. In some surveys addressing the issue of putative transmission of X4 strains to the new hosts, the route of HIV-1 transmission was similar among patients with R5 and X4 variants15,53,55,58, whereas in others, X4 viruses were more frequently detected in IDUs than in patients infected by sexual contacts5,56. In our study the number of RHI attributed to the transmission routes other than sex between men was very small, thus our ability to study the differences by transmission route was limited. Nonetheless, our observations are consistent with prior research suggesting that proposed barriers protecting against X4 strains infection via blood and mucosal epithelium, and possibly selecting for R5 variants during transmission8 are similarly not perfect, and the spread of non-R5 and R5 strains may occur as a random process57.

Since detection of non-R5 strains among patients with RHI may reflect either the initial transmission of such strains or rapid evolution of X4 viruses from acquired R5 strains, to further address the issue of possible transmission of non-R5 strains to the new hosts among MSM, the analysis of transmission clusters identified by phylogenetic approach was performed. Among 11 patients with RHI and non-R5 strains, 5 had their HIV-1 sequences included within separate phylogenetic clusters indicating related transmission. Only in one of these 5 clusters HIV-1 sequences presented different tropism (non-R5 and R5). These clustered sequences were derived from 2 patients with RHI, however, the patient with R5 strain infection entered the study over 4 years earlier than the patient harbouring non-R5 variant. Thus, either he was not the immediate donor to the second patient, or he was not under successful treatment during 4 years after diagnosis. If the latter is true, it is possible that at the time of HIV-1 transmission, non-R5 variants could have emerged in the putative donor patient, initially infected with R5 virus, and could have been transmitted to a new host. Unfortunately, we cannot confirm this assumption, since we have neither an additional, later sample from the patient with R5 strain infection nor additional clinical information. In the remaining 4 clusters the second patient involved in the same cluster (putative donor) was a person with LTHI, also harbouring non-R5 strains, which may indicate the acquisition of non-R5 viruses, and the prevalence of acquired non-R5 HIV-1 infection confirmed by the phylogenetic analysis can be settled at 8.7% (4/46). Studies in which the possibility of transmission of non-R5 strains was evaluated using phylogenetic clusters analysis are scarce, and give inconsistent results. Frange et al.58 failed to confirm the transmission of non-R5 strains in the group of patients with primary HIV-1 infection, as only 1 out of 27 non-R5 sequences was present within a transmission cluster, and grouped with the sequence of an R5 strain obtained from a patient who was infected 34 months after infection of the presumed donor harbouring non-R5 virus. In turn, our data are congruent with the results presented by Chalmet et al.57. In their study transmission of non-R5 strains was suggested by the presence of non-R5 sequences within common phylogenetic clusters, and was confirmed by the identification of non-R5 strains in both partners of well-recognised transmission pairs shortly after infection, supporting the hypothesis of random spread of strains presenting non-R5 or R5 tropism.

The major limitation of our study is a relatively small sample size, hence we received small cluster sizes (mostly pairs) and some important transmission links could be omitted. Although patients included in the analysis constituted an estimated sample of 16% of all new diagnoses detected in the studied provinces during study period, and such a sample is expected to be representative of the examined area, the interpretation of the transmission clusters in such a case should be performed with the extreme caution. Especially, it should be considered that individuals who were not sampled can be either the intermediate hosts between patients with viral sequences gathered within the same cluster or the direct source of HIV-1 infection for them.

Another limitation of our study is linked to the years of samples collection, i.e. 2008–2014. While it is intrinsic to the similar types of studies that sampling considerably precedes the time of publication making the results significant for the earlier time periods38,39,45,59, we have tried to extend the validity of our study by conducting the time trends analyses.

Moreover, unlike in most other studies where pol HIV-1 sequences were used, we have used env gene sequences to detect phylogenetic transmission clusters, which complicates direct comparisons between studies. However, it was recently found that env sequences, due to their higher variability, may be more suitable than the pol sequences for analysis of recent infections59.

Finally, tropism predictions in our study were performed solely with the genotypic methods. Although genotypic tropism testing is currently accepted as an alternative to the phenotypic testing47,60, and >80% concordance between phenotypic assays and the geno2pheno method used for subtype B tropism assessment was reported51,61,62,63,64,65, it was shown that detection of CXCR4-using viruses by some genotypic assays may be inaccurate, especially for non-B subtypes61,66,67. Thus, to predict the tropism in non-B HIV-1 samples we used PhenoSeq, a tool with improved sensitivity and specificity for establishing the tropism of non-B subtypes68. Nevertheless, the discrepant results of the genotypic and phenotypic tropism assays were reported for patients with acute or recent infections5,13,14,16, indicating that our tropism predictions should be interpreted with caution.

In conclusion, we found relatively high prevalence of non-R5 strains among Polish patients with new HIV-1 diagnoses (23%) and recent infections (24%), but we did not confirm the increasing trend for the frequency of non-R5 viruses among newly diagnosed individuals reported in a former Polish study. Transmission of non-R5 viruses was confirmed by cluster analysis for 8.7% (4/46) patients with RHI. Non-R5 strain distribution and recent HIV-1 infection frequency, as well as the prevalence of non-B subtypes in Poland may differ by geographic region. Although viral sequences obtained from individuals with RHI were notably more frequent within the transmission clusters than outside them, the participation of these individuals in the forward HIV-1 transmission was not confirmed.

Methods

Study group and sample collection

The study was performed among 298 consecutive Polish patients who were recruited under the following inclusion criteria: (1) being newly diagnosed with HIV-1 infection, and having no clinical AIDS (indicator disease) at first testing; (2) presenting at 1 of 4 centers for HIV Diagnostics and Therapy for AIDS in Poland placed in Chorzów, Kraków, Łódź, and Wrocław, during the enrolment period, between March 2008 and February 2014; (3) having their blood sample collected at first presentation; and (4) providing written informed consent to participate in the study. During the recruitment procedure the data regarding sex, age, date and city of HIV-1 diagnosis, as well as self-reported transmission route were collected. Viral load and CD4+ T-lymphocyte at HIV-1 diagnosis were retrieved from patients’ medical records and were the first results obtained after HIV-1 diagnosis. Anticoagulated venous blood and plasma samples were collected at first clinical presentation and were stored at –80 °C until further laboratory procedures were performed.

All plasma samples were tested with the quantitative limiting antigen avidity enzyme immunoassay (SediaTM HIV-1 LAg-Avidity EIA, Sedia BioSciences Corporation, Portland, Oregon, USA) to differentiate recent from long-term HIV-1 infections. Assay was performed according to the manufacturer’s protocol69. As recommended, samples with normalized median optical density values (ODn) ≤ 1.5 in a triplicate confirmatory testing were considered to be obtained from persons with recent HIV-1 infection, which corresponds to a duration of HIV-1 infection no longer than 130 days (95% CI: 118–142) since seroconversion70.

QIAamp DNA Blood Mini Kit (QIAGEN GmbH, Hilden, Germany) was used to extract genomic and proviral DNA from blood samples. The presence of 32 base-pair deletion in patients’ CCR5 gene (Δ32) was determined with the polymerase chain reaction (PCR) as described previously10,71.

Amplification and sequencing of HIV-1 env gene

Amplification of env gene fragments coding for gp120 region was performed with nested-PCR in 298 samples of genomic and proviral DNA obtained from 47 patients with recent HIV-1 infection (RHI), and 251 persons with long-term infection (LTHI). In the first step of the nested-PCR, fragment spanning nucleotides 6201–9089 was amplified using E00-F and E01-R as outer primers72, with the following amplification conditions: an initial denaturation at 94 °C/7 minutes, followed by 40 cycles of 94 °C/40 seconds, 51 °C/40 seconds, and 72 °C/3 minutes, with the final extension at 72 °C/7 minutes, in a final volume of 50 μl. In the next step, inner primers ED5-F and E125-R were used to obtain 782 base pairs fragment of gp120 spanning nucleotides 6557-733872,73. Amplification conditions for inner primer pairs were as follows: 94 °C/5 minutes, followed by 35 cycles of 94 °C/35 seconds, 55 °C/35 seconds, and 72 °C/90 seconds, with the final extension at 72 °C/5 minutes, in a final volume of 50 μl.

Purified nested-PCR products were subjected to the population-based sequencing with the ABI Prism Big Dye Terminator v3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA, USA) with the primers used in the inner step of nested-PCR. For each sample, the sequences of both strands were determined separately using 96-capillary 3730xl DNA Analyzer (Applied Biosystems, USA). The obtained sequences were manually checked and trimmed to remove primers, resulting in fragments spanning nucleotides from 6583 to 7314 corresponding to codons 121–363 of gp120, covering the complete coding sequence for V1, V2, C2 and V3 regions. Nucleotides were considered ambiguous, when the next highest peak in the electropherogram exceeded 25% of the main peak. All nucleotide positions in the manuscript are presented according to the numbering positions of HIV-1 HXB2 (GenBank accession number: K03455). Amplification and sequencing was successful for 292 specimens, including 46 originated from patients with RHI.

Analysis of HIV-1 env sequences and coreceptor usage prediction

All sequences were analysed for the presence of hypermutations using HYPERMUT software v2.074. HIV-1 subtypes were initially determined with REGA HIV-1 Subtyping Tool 3.0 (http://dbpartners.stanford.edu:8080/RegaSubtyping/stanford-hiv/typingtool), and afterward with NCBI Genotyping Tool using HIV-1 reference set from 2009 (https://www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi). The NCBI Genotyping Tool analyses were performed especially to check sequences determined as non-B or not assigned to any subtype by REGA Tool. To confirm subtype identification by REGA and NCBI genotyping tools, phylogenetic analysis with PhyML 3.0 online software was conducted75. Reference HIV-1 sequences of known genotype (subtypes and circulating recombinant forms (CRFs)), that were used in phylogenetic analysis, were retrieved from Los Alamos National Laboratory HIV Sequence Database’s compendium from the year 2016 (https://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html). Phylogenetic tree for all examined and reference sequences was created in the PhyML 3.0 with the maximum likelihood (ML) method under the general time reversible nucleotide substitution model + gamma distribution of rates + proportion of invariable sites (GTR + G + I) with the gamma shape parameter and proportion of invariable sites estimated from data (Supplementary Fig. S1). Tree topology was improved with both NNI (Nearest Neighbor Interchange) and SPR (Subtree Pruning and Regrafting) algorithms. Subsequently, all examined sequences were evaluated for recombination with the jumping profile hidden Markov model (jpHMM, http://jphmm.gobics.de/submission_hiv)76.

To determine HIV-1 coreceptor usage, all single proviral DNA env sequences coding for a V3 loop were analysed with the online geno2pheno algorithm (http://coreceptor.geno2pheno.org/index.php)77, which is one of the most widely accepted and used methods for genotypic prediction of viral tropism. In accordance with the current recommendations47 and results of prospective studies11,12,78, a false positive rate (FPR) of 10% was used to interpret the clonal geno2pheno’s results, thus viruses were classified as non-R5 when V3 sequences displayed an FPR value ≤ 10%, (10% probability of classifying an R5 virus falsely as X4). Since the sensitivity of the geno2pheno method to detect X4 variants was shown to be lower for non-B subtypes than for subtype B67, sequences representing subtypes other than B were tested with PhenoSeq program, which is a newer tool developed to reliably predict the tropism of HIV-1 variants such as A, B, C, D, CRF01_AE and CRF02_AG (http://tools.burnet.edu.au/phenoseq/)68.

Geno2pheno algorithm with FPR of 5.75 was also used to improve the specificity, and higher FPR values (of 15% and 20%) were applied to increase the sensitivity for detection of non-R5 variants.

The number of ambiguous nucleotides in the obtained V3 coding fragments ranged from 0 to 7, thus all 292 V3 sequences could be included in the genotypic tropism predictions, since it is not recommended to consider the V3 sequences with ≥8 ambiguous nucleotides in a coreceptor usage testing47.

Phylogenetic analyses and identification of clustered HIV-1 env sequences

Prior to phylogenetic analyses ClustalX algorithm was used to align 292 env sequences together with group M consensus sequence79. The alignment was manually edited using the BioEdit program80, and columns with gaps were removed. The GTR + G + I was selected as the best fitting nucleotide substitution model according to the Akaike and Bayesian Information Criterions (AIC, BIC) implemented in the jModeltest 2.1.9 software81. Under this model the nucleotide frequencies were: A = 0.4243, C = 0.1988, G = 0.1723, T = 0.2046; the substitution rates were: AC = 1.2416, AG = 3.9991, AT = 0.7674, CG = 0.5556, CT = 3.7929, GT = 1.0000; proportion of invariant = 0.165; and gamma shape parameter = 0.776.

Initially, transmission clusters were investigated with the PhyML 3.0 software75 using maximum likelihood (ML) method, under the selected GTR + G + I model, with the best of NNI and SPR algorithms for tree topology improvement. To evaluate branch supports of a phylogeny approximate likelihood ratio test (aLRT) values with nonparametric Shimodaira–Hasegawa–like (SH–like) algorithm were computed in the PhyML 3.0. The resulting ML phylogenetic tree and aligned sequence data were analysed with the Cluster Picker 1.2.5 software to identify monophyletic clusters based on specified thresholds for maximum pairwise genetic distances and aLRT support values82.

Additionally, a more robust Bayesian Metropolis coupled Markov chain Monte Carlo (MCMCMC) method was used to further verify the clusters. Two independent replicates of 50 million generations were run in MrBayes v.3.2.683 with sample frequency of 2500, under GTR + G + I model. A burn-in of 25% was used to summarize parameters and trees in the Bayesian approach. To control whether a sample from the posterior probability distribution was adequate, a plot of the generation vs the log probability was checked. The total effective sample size (ESS) values were above 200 for all parameters, average standard deviation of split frequencies decreased below 0.01 after 11.9 million generations, and potential scale reduction factors (PSRFs) were reasonably close to 1.0 indicating appropriate convergence. Finally, transmission clusters were identified when the following criteria were simultaneously met: i) aLRT value in ML method >0.9, ii) maximum within cluster pairwise genetic distance calculated in the Cluster Picker <3%, and iii) posterior probability in a Bayesian inference = 1. Such clusters are assumed to represent related transmission events between patients whose viral sequences are gathered within a single cluster. Phylogenetic trees were visualized in FigTree v.1.4.3.

Sequence data

All of the HIV-1 nucleotide sequences obtained in the study have been deposited in GenBank with the KT778123-KT778247 and MH627052-MH627218 accession numbers.

Statistical analysis

Comparisons of categorical variables between specified groups were performed with the Pearson’s Chi-square test. Two-tailed Fisher’s exact test was applied for binary variables and for variables with several possible categories to compare a specified category vs all other categories. The nonparametric Mann-Whitney U test was used for analysis of continuous variables. To examine the time trends in selected parameters, logistic and linear regression analyses with the year of HIV-1 diagnosis as a single predictor were performed for binary and continuous outcome variables, respectively. For the continuous outcome variables we present the slope of the linear effect of the diagnosis year. For the binary variables odds ratios per each additional year are provided. For greater comparability, for the binary variables we also calculated average prevalence differences per year (average annual difference), by fitting linear regression model to annual average prevalences. All statistical analyses were performed using STATISTICA v13.1 enriched with a Medical Set (Statsoft, Warsaw, Poland), with the significance level defined by P < 0.05.

Ethical approval

The research was approved by the National Institute of Public Health – National Institute of Hygiene Bioethics Committee, Poland (no. 3/2007), and all procedures were performed in accordance with relevant guidelines and regulations. All collected samples and data were anonymous and coded. Written informed consent was obtained from study participants.