Recent increased identification and transmission of HIV-1 unique recombinant forms in Sweden

A temporal increase in non-B subtypes has earlier been described in Sweden by us and we hypothesized that this increased viral heterogeneity may become a hotspot for the development of more complex and unique recombinant forms (URFs) if the epidemics converge. In the present study, we performed subtyping using four automated tools and phylogenetic analysis by RAxML of pol gene sequences (n = 5246) and HIV-1 near full-length genome (HIV-NFLG) sequences (n = 104). A CD4+ T-cell decline trajectory algorithm was used to estimate time of HIV infection. Transmission clusters were identified using the family-joining method. The analysis of HIV-NFLG and pol gene described 10.6% (11/104) and 2.6% (137/5246) of the strains as URFs, respectively. An increasing trend of URFs was observed in recent years by both approaches (p = 0·0082; p < 0·0001). Transmission cluster analysis using the pol gene of all URFs identified 14 clusters with two to eight sequences. Larger transmission clusters of URFs (BF1 and 01B) were observed among MSM who mostly were sero-diagnosed in recent time. Understanding the increased appearance and transmission of URFs in recent years could have importance for public health interventions and the use of HIV-NFLG would provide better statistical support for such assessments.

A higher proportion of URFs was identified among patients reported to be infected in Sweden (n = 8; 18%) than outside the country (n = 3; 5%) (p = 0·0530) (Fig. 1C). Based on the patients who had an estimated time of infection (n = 88), there was a significant increase of URFs in recent years (p = 0·0082) (Fig. 1D).
Clinical characteristics of patients with URFs identified by pol sequencing. Subsequently, we analyzing the pol genes from the complete Swedish InfCare HIV database (n = 5246), 137 URFs (2.6%) were identified. The majority of the patients infected with URFs were sampled from heterosexuals (54%, 74/137) followed by MSM (27%; 37/137), unknown/other (12%; 17/137), mother to child transmission (4%; 5/137), through blood transfusion and PWID (3%; 4/137). Almost half of the patients (46%; 63/137) diagnosed with a URF had been Transmission clusters identified by HIV-NFLG. Subsequently we inferred evolutionary relationships using family-joining. Transmission clusters were constructed for trees based on the HIV-NFLG, at thresholds of 0.08 substitution/site ( Supplementary Fig. S2). The C_NFLG tree identified seven clusters, consisting of HIV-1B (n = 3), HIV-1C (n = 3) and URF_A1D (n = 1), respectively (Fig. 3). All three HIV-1B transmission clusters (cluster 1: two MSM; cluster 2: one MSM and one heterosexual; cluster 3: three PWID) and one HIV-1C transmission cluster (two heterosexuals) had occurred within Sweden. For URF_A1D all three individuals had been infected heterosexually in the country. In one HIV-1C cluster with three transmission events, there were one Swedish MSM and two individuals originating from a Sub-Sahara African country, who were reported to be heterosexual and MSM, respectively. In the two remaining HIV-1C clusters, two heterosexuals had been infected in a central-African country and two heterosexuals in Sweden, respectively.
Transmission clusters were also constructed for trees based on the pol sequences obtained through NFLG at thresholds of 0.02 substitution/site. The C_NFLG and C_pol trees had six clusters in common. The Jaccard index of the two sets of clusters was thus 0.67 (6/9). In one HIV-1C cluster, C_NFLG included three patients' sequences (Pt#074, 080 and 082) of which two were there described in the pol (Pt#074 and 080) gene analysis. One additional cluster was found in C_pol only (Pt#005 and 030) which was absent in C_NFLG. This data indicates false identification of clusters in the pol analysis due to low statistical power.
Transmission clusters identified by pol analysis. Transmission clusters were also constructed for trees based on the 137 URFs obtained through analysis of routine pol sequences at thresholds of 0.02 substitution/site ( Supplementary Fig. S2). A total of 14 clusters consisting of two to eight sequences were observed. All six of the two sequence clusters (one each A1C, A1B, 02A1, BC, and two different 02A1) were observed among 12 heterosexuals and all but one of these small clusters had appeared outside Sweden. Among MSM (n = 37), five clusters with three to eight patients were observed (two clusters of BC; one each of 01D, BF1, and 02B), which comprised 27 of the 37 (73%) men.
Eight larger URFs transmission clusters were identified, consisting of three to eight individuals ( Table 2). The largest of these clusters (BF1: n = 8 individuals; 01B: n = 7 individuals) were observed among MSM. Three clusters of one each URFs DC (n = 4 patients), A1D (n = 3 patients) and BF1 (n = 3 patients) were observed among the individuals who had been infected through a heterosexual (n = 8) or other/unknown (n = 2) route.
For 16 out of the 37 patients who belonged to six out of the eight large clusters, the time of HIV-1 transmission was estimated to be at least five years earlier (median: 7 years, range: 5-14) than the time of diagnosis (Table 2). Also, all of them had a CD4 + T-cell count below 350/μl at diagnosis, fulfilling the criteria of being a late presenter 17 . In three clusters including heterosexually infected individuals, the three out of ten persons estimated to have been infected for the longest time were reported to have been infected outside Sweden, the remaining seven were infected in Sweden. For MSM, a more mixed pattern was seen with nine out of the 27 (33%) subjects reported to have been infected outside Sweden.

Discussion
In the present study, the appearance and spread of transmission clusters of unique recombinant forms (URFs) in Sweden over time was investigated using near full-length HIV-1 genomes (HIV-NFLG) and pol gene sequences. By both approaches, we identified an increased incidence of URFs among individuals diagnosed in the country and transmission of such strains within the country, in recent years. Several cases of cross-transmissions between MSM and persons who have a self-identity as heterosexuals were also identified. Our data indicates that intermixing of strains may occur within the country with a potential for the development of more complex recombinant forms as well as further spreading of such URFs.
Based on the pol gene, multiple subtypes have been reported in Sweden from the beginning of the HIV epidemic 2 . Several subtypes have been introduced among PWID 6 and heterosexuals 1 , but in MSM HIV-1B is still reported to be predominant 1 . However, analysing only one gene fragment underestimates the presence of true recombinant forms [8][9][10][11] , which was confirmed in our study. Thus, using HIV-NFLG, a more accurate description of URFs in the Swedish HIV-1 epidemic can be obtained. It is therefore likely that the incidence of URFs in Sweden was underestimated in our analysis of the whole database and that more than 4% of newly diagnosed patients are infected with URFs after 2010. A similar situation may be present in other European countries. Based on pol gene analysis, increases of HIV-1A1 and HIV-1C have thus been reported among MSM in United Kingdom 18 and Greece 19 . It is possible that an even higher proportion of recombinant forms had been identified in these studies if NFLG instead had been used.
The subtype distribution among MSM in Sweden between 1983 and 2012 was still dominated by HIV-1B, based on our earlier analysis of the pol gene 1 . In the present study, a significant trend was observed with a higher proportion of URFs among patients who were diagnosed the recent years, both when analyzing the HIV-NFLG or only the pol gene, supporting the notion of an increased viral heterogeneity in the Swedish HIV-1 epidemic. Interestingly, almost all (97%) of the URFs detected among MSM had an HIV-1B gene fragment, including the two major clusters, URF_01B and URF_BF1. As the epidemic among the MSM has been dominated by HIV-1B in the past, it is likely that this is a consequence of a more recent introduction of non-B subtypes among the MSM population. Actually, all larger URFs clusters (four to eight individuals) were observed among MSM, while small clusters of two sequences were mainly restricted to heterosexuals. It shall be noted that transmissions of URFs_01B in China 20 and URF_BF1 in Brazil 21 have been reported among MSM, although these URFs are different than those observed in our cohort.
We used HIV-NFLG to determine the clustering pattern using the newly developed FJ-method 22 and compared with the use of the pol gene only. Our data indicated that pol gene analysis may overestimate clustering statistics, as earlier described. The HIV-NFLG clustering analysis observed two clusters (one each HIV-1B and HIV-1C) where there were cross-transmission events between persons reported to be heterosexually infected and MSM. All of these heterosexuals were black-African men who are more likely to self-identify themselves as heterosexuals compared with other ethnicities 23 . This is in line with a recent study which reported multiple occasions of shared transmission clusters between MSM and heterosexuals in the Nordic countries 24 . Also, a large study from United Kingdom, based on the pol gene, reported crossover transmission of HIV-1C from heterosexuals to MSM, which has led to an expansion of this subtype in United Kingdom 25 . Our phylogenetic analysis of the pol gene obtained from the large national database indicated that major transmission clusters were restricted to the MSM population with no crossover transmissions identified, indicating an added value of NFLG for understanding the HIV-1 epidemic.
In the Swedish setting, deficiencies in the health care system with missed HIV testing opportunities contribute to a high proportion of late presenters in whom the time of transmission often is unknown 26 . We used a CD4 + T-cell decline trajectory model to deduce the predicted year of actual infection and observed increasing trends of URFs in patients infected recent years. Also, URFs were diagnosed more frequently among patients reported to be infected in Sweden. The CD4 + T-cell trajectory model was originally designed to be used mainly on a population level. In this study, it was used on an individual level and this may have introduced bias in our estimation. However it should be noticed that when using the CD4 + T-cell trajectory model at an individual level, it has been shown to give valid and robust estimates, when compared with data obtained from physicians and phylogenetic analysis 27 . Our analysis showed that a substantial proportion of the patients with URFs had been infected for  Table 2. Characteristics of eight transmission clusters with more than two patients, based on the pol gene using the FJ-method. Transmission clusters were constructed at 0.02-subs/site thresholds. *Reported by the treating physician; **cells/μl at HIV-1 diagnosis; ***PHI: primary HIV infection; MSM: men who have sex with men; Hetero: heterosexually infected; OTH: other mode of transmission or unknown; Late presenters are marked in bold. more than five years and had an advanced immunodeficiency at diagnosis. Thus, the failure of diagnosing these patients at an earlier stage of infection has contributed to the spread of URFs in Sweden.
In conclusion, our study provides molecular evidence of a higher detection rate of URFs by HIV-NFLG compared to analysis of pol gene fragment in an epidemic where diverse subtypes are circulating. Transmission of the URFs seems to have increased in recent years among the MSM infected in Sweden, partly as a result of amalgamating with migrants. As molecular surveillance with NFLG provides greater statistical support for clustering, HIV-NFLG sequencing of newly diagnosed cases within a country is likely to promptly detect changes in the viral genetic composition of the epidemic. This could contribute to a better understanding of HIV-1 transmission networks and potential of improved public health interventions in countries like Sweden as well as other countries where multiple subtypes are present.

Material and Methods
Clinical Specimens. Two categories of sequences were analysed: i) HIV-NFLG: attempts were done on archived plasma of 148 participants included in the Swedish InfCare HIV cohort, drawn over a time of 22 years (1993 to 2016) ( Table 1)  CD4 + T-cell decline trajectory model for estimation time of infection. In addition to self-reported time of infection, we also used a CD4 + T-cell decline trajectory algorithm to estimate time of HIV-1 transmission, after having identified and adjusted for factors associated with the slope of decline among identified groups of HIV-1 seroconverters (age and region of birth), as described by us recently 27 . The time of estimated HIV seroconversion was presented in three estimates; the earliest probable time of seroconversion, the median probable time, and the latest probable time. We did not apply the CD4 trajectory model to serologically verified PHI. Actual date of serology is presented as time (year) of infection.
HIV-1 near full-length genome sequencing (HIV-NFLG). Viral RNA was extracted using the QIAamp Viral RNA Extraction Kit, Qiagen, Germany, as per manufactures instructions. The NFLG amplified the 9 kb HIV-1 genome in two fragments followed by sequencing by two approaches: Sanger sequencing using 17 sequencing primers 13 or next generation sequencing (NGS) in Illumina HiSeq. 2500, followed by consensus sequence generation using in-house bioinformatics pipeline, as recently described by us 31 . The NGS was validated against an external quality control (EQC) panel. Clustering of the consensus sequences generated by NGS and Sanger sequencing from a given sample was identified by maximum likelihood phylogenetic analysis with 100% bootstrap support and both type of sequences could thus be used simultaneously in the molecular epidemiology studies 31 .

HIV-1 subtyping and identification of recombination.
Reference HIV-NFLG sequences were downloaded from the Los Alamos (LANL) database. All HIV-NFLG sequences generated were submitted to the BLAST tool available in the LANL database. A unique set of 175 reference sequences were used for phylogenetic analysis as well as cluster analysis. HIV-1 subtyping were carried out using REGA v3 32 , Recombination Identification Program (RIP) v3 33 and COMET-HIV 34 followed by maximum likelihood phylogenetic tree using RAxML 35 . Precise inter-subtype recombination analysis was performed by bootscanning analysis and similarity plot analysis implemented in SimPlot ver3·5·1 with 500 bp window size and 20 bp step size 36 , Recombination Detection Program (RDP) ver4 37 and jumping profile Hidden Markov Model (jpHMM) 38 . After getting the consensus breakpoint, fragment specific phylogenetic analysis was performed using ML-phylogenetic tree in RAxML.
Evolutionary relationships inferred using family-joining. We used RAxML to estimate maximum likelihood distances under a GTR + Gamma model and constructed a phylogenetic tree using family-joining, as described recently 22 . The sequences were grouped into transmission clusters based on tree-based distances. Two sequences were considered to be in the same cluster if the corresponding tree-based distance was less than a pre-selected threshold. Transmission clusters were constructed for the pol tree (C_pol) and for the NFLG tree (C_NFLG), at distance thresholds of 0·02 subs/site, and 0·08 subs/site, respectively. The similarity of these two sets of clusters was calculated by the Jaccard index: Number of clusters in common/Number of distinct clusters present either in C_NFLG or in C_pol.
Ethical considerations and data availability. The study was approved by regional ethics committee of Stockholm (2002/367; 2005/1167; 2007/1533; 2014/928-31/2) and all methods were performed in accordance with approved institutional guidelines. The patient identity was anonymised and delinked prior to analysis. The authors confirm that there are some restrictions on the data underlying the conclusions in the manuscript. The sequences that were analysed are representative of the entire country thereby, in principle, allow for the reconstruction of the transmission network 1 . Data are however available from the authors upon reasonable request and with permission of the steering committee of InfCare HIV. All the HIV-NFLG sequences generated in this study are available from GeneBank through accession numbers KP411823-KP411826, KP411828, KP411830-KP411845 and MF373124-MF373206.