Analysis of HIV-1 diversity, primary drug resistance and transmission networks in Croatia

Molecular epidemiology of HIV-1 infection in treatment-naive HIV-1 infected persons from Croatia was investigated. We included 403 persons, representing 92.4% of all HIV-positive individuals entering clinical care in Croatia in 2014–2017. Overall prevalence of transmitted drug resistance (TDR) was estimated at 16.4%. Resistance to nucleoside reverse transcriptase inhibitors (NRTIs), non-nucleoside RTI (NNRTIs) and protease inhibitors (PIs) was found in 11.4%, 6.7% and 2.5% of persons, respectively. Triple-class resistance was determined in 2.2% of individuals. In addition, a single case (1.0%) of resistance to integrase strand-transfer inhibitors (InSTIs) was found. Deep sequencing was performed on 48 randomly selected samples and detected additional TDR mutations in 6 cases. Phylogenetic inference showed that 347/403 sequences (86.1%) were part of transmission clusters and identified forward transmission of resistance in Croatia, even that of triple-class resistance. The largest TDR cluster of 53 persons with T215S was estimated to originate in the year 1992. Our data show a continuing need for pre-treatment HIV resistance testing in Croatia. Even though a low prevalence of resistance to InSTI was observed, surveillance of TDR to InSTI should be continued.


Comparison of SDRM detected with Sanger sequencing (SS) and deep sequencing (DS).
We compared patterns of SDRMs detected by Sanger sequencing (variant frequency threshold of approximately <15%) and deep sequencing (variant frequency threshold >5%). A total of 48 persons were included in this part of the study, but sequencing of one sample failed. SS and DS did not detect any SDRM in 31 samples (65.9%). We identified SDRMs in 12 samples by SS (25.5%) and in 16 samples by DS (34%) (Supplementary Table S2). As we defined complete mutation concordance when both SS and DS identified the same pattern of SDRMs, in 50% (8/16) of samples results completely matched. Partial concordance was identified in 2 samples, both of them carried triple-class drug resistance with the same pattern that was identified by SS. DS analysis detected all of these mutations and additionally identified several other RTI mutations that were present at frequencies below the SS threshold. Complete divergence was observed in 6 samples. In 4/6 samples complete divergence was determined in samples in which DS detected T215S mutation at frequencies below the SS threshold (around 10%). In the other two samples divergence was assigned due to the different patterns of SDRMs detected by SS and DS (Supplementary Table S2).
The character of Croatian HIV-1 transmission networks is mainly local and regional. Phylogenetic trees were constructed separately for 368 subtype B sequences, 17 subtype A1 sequences and 7 subtype C sequences. For sequences subtyped as intersubtype recombinants (n = 3) and CRFs (n = 8) phylogenetic analyses were not   Table 2). The subtype A1 phylogenetic tree revealed that 41.1% (n = 7) of sequences belonged to one local TC (containing 4 Croatian sequences) and one mixed TC with non-Croatian sequences (containing 3 Croatian sequences), with approximate likelihood ratio test value (aLRT) >0.90. In addition, 5 mixed TCs each containing only one Croatian sequence were identified, indicating a separate introduction of these strains in the country ( Supplementary Fig. S1). Inference analysis of subtype C determined presence of two local transmission pairs with aLRT >0.90 and two mixed TCs, each containing only one Croatian sequence ( Supplementary Fig. S2). Phylogenetic analysis of subtype B sequences included >90% of the analysed cohort and consisted of 368 local sequences from the new dataset (2014-2017), 107 local sequences from the previous dataset (2006-2008) and 663 background sequences (Fig. 1). Sequences from the previous time period (2006)(2007)(2008) were included in order to identify "active" TCs, consisting of sequences from the previous (2006)(2007)(2008) and new (2014-2017) datasets, "historical" TCs, consisting of sequences from the previous dataset and "newly formed" TCs, consisting of sequences from the new dataset. Overall, the analysis revealed that 89.4% (329/368) of sequences from the new dataset were a part of 23 local TCs and 22 mixed TCs with non-Croatian sequences. We identified 9 (20%) TCs with 3 persons, 2 (4.4%) TCs with 4 persons, 23 (51.1%) TCs with 5 to 15 persons and 11 (24.4%) TCs that include ≥16 persons. The size of the TCs ranged from 3 to 80 persons. We selected 15 TCs with ≥5 Croatian sequences and presented their characteristics in Table 3. Most of the identified TCs were active (43/45), since analysis revealed only two historical TCs. Out of 43 active TCs, there were 15 clusters expanding throughout the observed time period and  Table 4. The most common SDRM determined in the previous dataset (2006)(2007)(2008), T215S, remained the most common SDRM in the newly studied period; specifically, 53 sequences from the previous and the new datasets formed one TC. Within this TC, a sub-cluster with SDRMs T215S + L210W expanded during the time frame of the new dataset (2014-2017). We identified one local TC with a triple-class resistance pattern. This TC was formed of eight Croatian sequences and one UK sequence. The Croatian sequences contained the SDRM pattern PI: V32I, I47V + NRTI: T215E/D + NNRTI: L100I, K103N. The sequence originating from the UK had a similar mutation pattern with additional SDRMs (PI: M46L, V82A and NRTI: T215Y). Also, one Croatian patient with a complex pattern of resistance (SDRM to PI: I84V + NRTI: M184MIV, T215S, L210W + NNRTI: K101E, Y181C, G190A, P225PH) was observed in a transmission pair with a Serbian sequence with a similar SDRM pattern with one additional SDRM (PI: M46I), indicating cross border transmission of multi-class drug resistance. For some of the sequences with SDRMs (n = 9) we did not find forward transmission. Some of these sequences carrying SDRMs: T69D, G190E, K219Q and K219R + M41L belonged to large TCs (Table 3), others were not part of TCs, such as SDRMs K103N, G140A, G190A and M41L + T215D, while the sequence with SDRMs T215D + L210W formed a transmission pair with a Croatian sequence from the previous dataset (2006-2008) that carried the same resistance pattern.
We inspected the geographical origin of non-Croatian sequences found in TCs, as well as the sampling dates, and observed that most of the TCs with non-Croatian sequences are very diverse ( Fig. 2     www.nature.com/scientificreports www.nature.com/scientificreports/ Fig. S3). Non-Croatian sequences originated mainly from Germany, Slovenia, Serbia and Czech Republic. In addition, some Croatian sequences clustered with sequences from the United States (1 TC, n = 17 sequences), Turkey and the United Kingdom (1 TC, n = 12 sequences), Spain and Tunisia (1 TC, n = 11 sequences) and Mexico (1 TC, n = 11 sequences).
Phylogenetic inference was repeated with 43 SDRM sites present in sequences and showed that all transmission TCs remained stable with the exception of one local TC, comprised of 3 sequences from the new dataset and 3 sequences from the previous dataset. When SDRM sites were present in the phylogenetic inference, this TC merged with 5 sequences originating from Poland and formed a mixed TC.
First croatian HIV-1 transmission clusters originate from early 90 s. Bayesian skyline analysis was performed in order to determine when the HIV-1 epidemic started in Croatia by estimating the time to the most recent common ancestor (tMRCA) of the HIV-1 sequences belonging to the Croatian TCs. This analysis was carried out only for subtype B sequences from the new dataset (n = 356) and their corresponding control sequences (n = 571). The phylogenetic tree, constructed using Bayesian skyline analysis (Fig. 2), was in good agreement with the phylogenetic tree obtained by Maximum Likelihood analysis (Fig. 1). Importantly, none of the TCs identified initially, using Maximum likelihood phylogenetic analysis, broke down during the Bayesian skyline analysis. Estimated tMRCA values for the 15 TCs with ≥5 Croatian sequences are presented in Table 3. Our

Discussion
In this study we characterized the Croatian HIV-1 epidemic by combining clinical, socio-demographic, molecular and virological data from treatment-naive HIV-1 persons. We also described HIV transmission networks responsible for the forward spread of HIV-1 infection and drug-resistant variants in the country. This study included >90% of HIV-1 persons entering HIV clinical care in the time period 2014-2017, therefore it is to the best of our knowledge the most extensive HIV molecular epidemiology study performed in Croatia, so far. Our results confirmed previous findings that in recent years HIV-1 infection in Croatia primarily affects the young MSM. Individuals were more likely to present for clinical care at the chronic stage of HIV-1 infection with a median CD4 count of 325 cells/μL. According to the international definition of late presentation for care 33 , which is defined by all persons with CD4 count <350 cells/μL, our cohort consisted of >50% late presenters and 18% were diagnosed with clinical AIDS. Even though this proportion is alarming, when compared to the study results from the period 1999-2004 34 , with baseline median CD4 count of 276 cells/μL and 34% persons presenting with clinical AIDS, our data show that the situation has improved.
The Croatian HlV-1 epidemic is distinguished by a high prevalence of TDR, as first reported in our previous nation-wide study (22%) 14 and confirmed in the present study (16.4%). Even though TDR prevalence appears to have decreased moderately in the last decade, the pattern of SDRMs has changed significantly and become more challenging for tailoring a successful first-line treatment. In the previous study, high TDR was mainly due to one large TC of MSM with SDRM T215S, while NNRTI resistance-associated mutations were only present at a low frequency (2.4%) and PI resistance-associated mutations were not detected at all. In contrast, the present study shows that the revertant T215S remains the most frequent SDRM, determined in 30 (7.4%) persons. With this, TDR to NRTIs was the most prevalent, determined in 46 (11.4%) persons; however, we also observed the emergence of SDRMs to NNRTIs among 27 (6.7%) persons. In addition, PI resistance was determined in 10 (2.5%) persons, including 9 (2.2%) with triple-class drug resistance, mediated by the mutation pattern PI: V32I, I47V + NRTI: T215D/E + NNRTI: L100I, K103N. Phylogenetic inference indicated a transmission link with one UK sequence with a similar mutation pattern, PI: V32I, M46L, I47A, V82A + NRTI: T215Y + NNRTI: L100I, K103N. Moreover, a patient with treatment failure due to this resistance pattern has not yet been encountered in clinical care in Croatia, indicating that this SDRM pattern was introduced in Croatia from abroad and has subsequently spread in the country.
Prevalence of TDR in European countries has been monitored via the SPREAD program, organised under the supervision of the European Society for Translational Antiviral Research (ESAR) 15 . The latest SPREAD study analysed data for the period 2008-2010 and showed an overall TDR prevalence of 8.3%. The prevalence of SDRMs to NRTIs was 4.5%, while SDRMs to NNRTIs and PIs were present at lower frequencies, 2.9% and 2%, respectively. Comparing our data (Croatian old and new dataset) to the SPREAD data, we observed that overall TDR prevalence in Croatia was and remains high above the European average. TDR prevalence studies in the region show that only a few countries report high TDR prevalence similar to that of Croatia. However, one has to take into account that variation in the study design, national coverage and interpretation algorithms may affect cross-country comparison. Andreis et al. reported TDR of 12.3% in a cohort of 750 HIV-infected persons diagnosed in northern Italy in the time period 2013-2016, with E138A mutation being the most frequent 22 . However, Andreis et al. used the interpretation algorithm for genotypic drug resistance incorporated in Viroseq HIV Genotyping System, which also includes polymorphic mutations that do not indicate TDR. This highlights the importance of using a universally adopted mutation list for TDR surveillance, such as the WHO list 35 . In respect to that, E138A was also found in our cohort, but was not reported in this study. Temereanca 36 . The study showed that before the drugs were introduced there were no circulating variants that could jeopardize the effect of InSTIs. InSTIs are drugs with very high potency that ensure fast and sustainable drop of viral load but also provide tolerability and safety for long-term use. Because of their positive characteristics they are recommended as a first-line regimen by ART guidelines 37 . Since the use of these drugs became more widespread recently, the emergence of InSTI-resistant variants can be expected 38 . However, surveillance of InSTI resistance is currently recommended only for patients with evidence of TDR to other drug classes 38 . Due to the high prevalence of TDR to RTI and PI drug classes in our cohort, we tested resistance to InSTIs in selected patients. InSTIs were first introduced to Croatia in 2015 and there was no clear benefit of baseline TDR testing to InSTI in persons who entered clinical care in the period 2014-2016. In addition, first cases of resistance in InSTI-treated persons were reported in 2016 39 , therefore InSTI resistance was tested only among treatment-naive persons who entered clinical care during 2017. We found only one major InSTI mutation (G140A), three accessory resistance mutations (T97A, Q146QH, D232N) and a very high prevalence of polymorphism S230N (49/100, 49%), which is not associated with reduced InSTI susceptibility. We confirmed high susceptibility to InSTIs in this Croatian cohort, as expected and reported by other countries 22,[40][41][42] .
Subtype B remains the most common HIV-1 subtype in Croatia. The previous analysis 14 determined infection with the subtype B in 89% of newly diagnosed HIV-1 persons and the present study showed a comparable prevalence rate of 91.3%. Among the non-B subtypes only subtype A1, subtype C and CRF02_AG variants were observed in >1% of the cohort. In respect to that, we focused more on transmission networks of subtype B. In the 2006-2008 analysis, phylogenetic inference identified eight distinct TCs and the largest TC consisted of MSM carrying T215S mutation 14 . Since no background sequences were included, only local TCs were identified and HIV introduction events remained unknown. The present study, which included sequences from the old (2006)(2007)(2008) and new (2014-2017) datasets as well as control sequences, identified 45 TCs. The majority of TCs were active and only two clusters were comprised of only the sequences from the old dataset. Even though a considerable number of active TCs (65%) were newly formed, the clusters identified in the previous study expanded significantly and represented the largest TCs determined in this analysis. Most of the old dataset sequences that were determined to be without an obvious transmission link in the previous analysis, were observed to belong to mixed TCs, consisting of mainly regional (Slovenian, German, Czech, Poland and Serbian) and Croatian sequences in the present analysis. Most of these mixed TCs were intermixed with sequences from regional countries, however, it seems that a number of persons could have acquired the infection in other countries, such as Spain, Turkey, Mexico and/ or the United States. Interestingly, we observed no connection between Croatian and Hungarian sequences and very few between Croatian and Italian sequences. Similar observations of cross-border HIV transmission have been described by other studies, together with reporting that risk factors, such as the use of alcohol/drugs prior to sex, visiting sex parties and travelling abroad to engage in sex, seem to be increasing in Europe 16,43,44 .
Overall, the phylogenetic analysis has shown that a significant number of persons (86%) were part of TCs, among which 70% were found in local TCs. When combining results of phylogenetic inference with socio-demographic data reported by persons included in this study, our results indicate that HIV-1 epidemic in Croatia is mainly local and driven by young MSM. Similar findings were observed in Slovenia, where Lunar et al. determined a high percentage of clustering among Slovenian sequences (81%) 30 . Our findings are comparable with epidemiological studies from other Western and Central European countries, where HIV-1 infection is concentrated among MSM infected with subtype B, however the proportion of clustered sequences was <60% [25][26][27][28]45,46 . Since analysis of TCs is dependent on sampling methods, methodological approaches and transmission cluster definition 47 , high extent of clustering in our study can be explained by high sampling density. This study included >90% of HIV-1 persons who entered HIV clinical care during consecutive 4-year period, while the majority of above mentioned studies achieved a coverage of 30-60%, sampled during an extended time frame (usually during a 10-20 year period).
A considerable number of persons carrying SDRMs (61/66, 92.4%) were observed in TCs, thus confirming the substantial forward spread of SDRMs in the country. Most of the sequences with SDRMs (83.6%) formed local TCs, of which the T215S cluster of 53 persons remained the largest SDRM cluster in Croatia. Phylodynamic analysis suggested that this TC could be one of the earliest clusters in the country, with its estimated tMRCA in the year 1992. T215 revertants are one of the most common NRTI mutations found in untreated persons 15,48 . These variants develop from strains containing T215Y/F mutations, primarily selected under the stavudine (d4T) and zidovudine (AZT) treatment and tend to persist for many years 49 . In Croatia, regimens containing AZT and d4T were in use from the early 90 s on, but the use of AZT has decreased to <5% in 2017 and d4T is not in use since 2012. However, the high prevalence of the revertant T215S variants in the Croatian HIV-1 cohort is likely a hallmark of once highly prevalent therapy in Croatia.
Along with T215S, we report on K101E, T215S + L210W and triple-class drug resistance pattern, as major contributors to the spread of resistant strains in Croatia. All strains carrying these mutations were found in local TCs. Interestingly, each cluster had its specific characteristics such as age of the persons participating in the cluster, prevalence of AIDS at initial diagnosis and estimated time of the origin of the cluster. However, one common characteristic was observed in all TCs -all persons but one were MSM. The TC with K101E, the triple-class resistance TC and the sub-cluster T215S + L210W were all identified as newly formed clusters. On the other hand, the largest TC with T215S is an expanding cluster that has contributed in the spread of HIV-1 infection in Croatia over the last two decades. Its longevity and impact on the local epidemic shows the importance of monitoring TDR on a national level. Particularly alarming is the finding of a local TC with triple-class resistance pattern. We www.nature.com/scientificreports www.nature.com/scientificreports/ estimated that the origin of this TC dates in 2008, with the sequence most closely related originating from the UK, sampled in 2006. Since it is a newly formed TC comprised of 8 Croatian sequences that were sampled throughout the period 2014-2017, we can expect to see more patients with this complex resistance pattern in Croatia in the future. For some of the observed SDRMs further transmission was not detected. These were possibly individually introduced into the country or originated from Croatian persons with treatment failure and simply had not been transmitted further, due to lower fitness of these variants. For mutations M46I, T69D, G190E, K219Q and K219R reduced fitness compared to wild-type virus has been observed, as these tend to revert back to wild type 50 . All persons with these SDRMs participated in mixed TCs, but none of the other related sequences, sampled in Croatia or abroad, shared these SDRMs.
In addition to Sanger sequencing, 48 randomly selected samples were tested for SDRMs using deep sequencing. The aim was to observe the agreement between the two sequencing platforms and estimate the potential use of DS for routine diagnostics of clinical HIV resistance. The proportion of resistant variants present in initial infection can fall below the standard level of detection (<15%) of SS, especially when persons are diagnosed in the late chronic stage of infection 51,52 . In the present analysis, DS was able to detect low-abundance viral variants with frequencies <10%, which were not detected by SS, however, most of the results were in complete agreement (n = 39/47, 82.9%). Partial concordance was observed in two samples where DS was able to detect all the mutations identified by SS and some additional SDRMs. Complete discordance was observed in six samples. In four samples discordance was due to the SDRM T215S being detected by DS at low frequencies and falling below threshold in SS. In the other two samples, highlighted by the disagreement between the platforms, the mutation patterns diverged completely. Similar results were reported by other studies, which showed that DS is able to identify SDRMs that are commonly not identified by SS and that minor disagreement between the two platforms is possible 53 . In this study, clinical significance of minority SDRMs was not evaluated, since we wanted to emphasize the importance of including new technologies in clinical diagnostics and combining them with already established ones, especially in complex clinical cases that do not respond to first-line treatment options.
In conclusion, this is the first comprehensive study on molecular epidemiology of HIV-1 infection in Croatia. We have shown that the epidemic is mainly local and driven by the young MSM population. We determined a high overall prevalence of SDRM of 16.4% and forward spread of resistant viral strains, even that of triple-class resistance. Most of the identified mutations were clinically relevant, indicating a need for baseline HIV resistance testing before treatment initiation in Croatia. Even though a low prevalence of resistance to InSTI was observed, TDR to InSTI needs to be carefully surveyed on a national level. In addition, we demonstrated that deep sequencing is a reliable tool for accurate identification of low-level viral quasispecies. These data can help develop prevention strategies and public health interventions. We suggest that in settings similar to that of Croatia, where high prevalence of SDRM is present and where most of the cohort is connected through transmission networks, new approaches, such as identification of transmission clusters "in real-time" and immediate intervention, could pose a method to prevent further expansion of HIV-1 epidemics. For 403 persons the entire HIV-1 protease region (codons 1-99) and part of the reverse transcriptase region (codons 1-240) were amplified with one-step reverse transcriptase polymerase chain reaction (RT-PCR) by using SuperScript III One-Step RT-PCR System with Platinum Taq (Invitrogen, Carlsbad, CA) and the region-specific primer set 54 . Nested-PCR assay was carried out for samples that were negative with first round PCR by using HotStarTaq DNA Polymerase (Qiagen) and the inner primer set 54 . Obtained amplicons of 1017 bp were sequenced with BigDye Terminator v3.1 Cycle Sequencing Kit (Thermo Fisher Scientific, Waltham, MA) with a set of five primers to obtain bidirectional sequences 53 . Sequences were aligned and compared with the reference strain HIV-1 HXB2 (GenBank number K03455) by using Vector NTI software (Thermo Fisher Scientific). Primary resistance to antiretroviral drugs was defined as the presence of ≥1 mutation of the WHO SDRM list 35 . Clinically relevant resistance to NRTIs, NNRTIs or PIs was evaluated with Stanford University HIV Drug Resistance Database, Genotypic Resistance Interpretation Algorithm version 8.8 31 and IAS Drug Resistance Mutation list 32 .

Methods
Analysis of resistance to InSTIs was performed for persons who entered clinical care at UHID during 2017. A total of 110 patients entered clinical care during 2017, of which 100 patients met the inclusion criteria as reported above and were included in this part of the study. The entire HIV-1 integrase region (codons 1-288) was amplified by using SuperScript IV One-Step RT-PCR System with Platinum Taq (Invitrogen) and the specific primer set (Supplementary Table S3). Amplicons of 864 bp were sequenced with BigDye Terminator V3.1 Cycle Sequencing Kit (Thermo Fisher Scientific) and a set of four primers to obtain bidirectional sequences (Supplementary  Table S3). Sequences were aligned and compared with the reference strain HIV-1 HXB2 (GenBank number K03455) by using Vector NTI software (Thermo Fisher Scientific). Primary resistance to InSTIs was predicted with Stanford University HIV Drug Resistance Database, Genotypic Resistance Interpretation Algorithm version 8.8 31 .
HIV-1 subtypes were determined by several algorithms: Rega HIV-1 Subtyping Tool, version 3.0., jumping profile Hidden Markov Model (jpHMM), COntext-based Modelling for Expeditious Typing (COMET) and finally confirmed with phylogenetic analysis 55-57 . Deep sequencing analysis. To characterize HIV-1 minority drug resistance variants present at frequencies below the detection limit of Sanger sequencing, 48 persons were randomly selected for deep sequencing analysis. Part of the HIV pol gene that spans the whole HIV-1 protease region and part of the reverse transcriptase region (K03455 number for the gene specific position 2189-3753) and the region that spans the whole integrase gene (K03455 number for the gene specific position 4180-5200) were sequenced with MiniSeq (Illumina, San Diego, CA). HIV-1 RNA was extracted as reported above and reverse transcribed with SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen) and UNINEF primer 58  Phylogenetic and phylodynamic analysis of Sanger sequences. To observe transmission networks, separate phylogenetic analyses were performed for most prevalent subtypes determined in this study. Namely, 368 sequences subtyped B, 17 sequences subtyped A and 7 sequences subtyped C were included in this part of the study. Ten most similar control sequences were included per each local sequence by searching the BLAST database 60 . After removing duplicates, 667 subtype B, 104 subtype A and 62 subtype C control sequences were included. In addition, three reference sequences of subtype A were included to root the subtype B and C phylogenetic trees and four reference sequences of subtype B to root subtype A phylogenetic tree. All sequences were aligned using CLUSTALW sequence alignment tool available in Bioedit, manually edited and trimmed to the size of 992 bp 61 . Phylogenetic inference was performed with PhyML 3.0 program with an integrated model selection following the Akaike Information Criterion (AIC) 62 . TC was defined as a group of three or more persons/ sequences on the same branch of the phylogenetic tree with aLRT >0.90 63 . Local TCs were defined as clusters consisting of >75% of Croatian sequences, while mixed TCs were defined as clusters consisting of <75% of Croatian sequences. TCs within the phylogenetic trees were manually examined to ensure sequences from clusters grouped together with high branch support with using Figtree v1.4.3 64 . To rule out clustering bias due to the presence of drug resistance mutations (DRM), additional phylogenetic trees were constructed for each subtype: with the exclusion of DRM sites. Namely, 43 codons related to the major drug resistance mutations, according to the IAS list 32 , were removed: PR: 23,24,30,32,46,47,48,50,53,54,73,76,82,83,84,85,88,90;RT: 41,65,67,69,70,74,75,77,100,101,103,106,115,116,151,179,181,184,188,190,210,215,219,225,230; resulting in the final sequence size of 863 bp.
To estimate the age of the most recent common ancestor (tMRCA, years) a Bayesian Markov Chain Monte Carlo (MCMC) approach was used and TCs with posterior probability 1 were further analysed. Since phylogenetic inference showed high rate of clustering among local sequences, we conducted separate analysis for 10 TCs with sample size varying from 9 to 61 sequences. Analysis of the whole dataset and selected TCs was done in order to compare their tMRCA values and to estimate population growth rate of each TC. We included only sequences without indication of recombination according to RDP3 analysis 65 . Out of a total of 363 subtype B sequences, four sequences were determined as potential recombinants and for eight sequences the exact sampling dates were unknown and were therefore removed from the analysis. The final dataset consisted of 354 local sequences and 571 time-stamped control sequences. Sampling date for all local sequences was defined as the date when the patient's blood was drawn, while for the most background sequences the exact date was unknown and was defined as a midpoint of the sampling year, as reported in the Los Alamos HIV sequence database. The best fitting nucleotide substitution model was determined to be HKY + I + G in MEGA6 programme 66 . Phylodynamic analysis was performed with the BEAST2 package, v.2.1.3 67 , using the coalescent Bayesian skyline population model with a relaxed log-normal clock model 68 and the HKY substitution model with 4 discrete gamma rates and invariant sites 69 . The MCMC was run in duplicate MCMC chains of 500,000,000 iterations each, with different initialization seed values and a sampling rate of 0.001. The first 10% of states sampled from each MCMC chain were discarded as burn-in and the duplicate chains were concatenated using LogCombiner 67 . The concatenated BEAST2 trace log-file was viewed in Tracer v.1.7.1 70 and checked for effective sample size (ESS) values of > 200. The concatenated tree log-file was downsampled to 50,000 trees using LogCombiner and Maximum Clade Credibility Tree was prepared using TreeAnnotator 67 and further visualised using FigTree v1.4.3 64 .