Introduction

The human immunodeficiency virus type 1 (HIV-1) subtype C accounts for approximately 48% of all people living with HIV, representing the most prevalent HIV-1 subtype in the world1. The high global prevalence of the C subtype results from its predominance in regions with the highest rates of HIV-1 infection and with large populations, such as Southern and Eastern Africa, India and Southern Brazil1,2. The origin of HIV-1 subtype C was recently traced to the Katanga region of the Democratic Republic of Congo (DRC) in the late 1930s3 from where it spread independently to Eastern and Southern Africa, leading to a phylogeographic subdivision between the HIV-1 subtype C strains circulating in those two African regions4,5.

The expansion of HIV-1 subtype C inside Eastern Africa gave rise to the C East African clade (CEA), whose most probable epicenter of dissemination was in Burundi around the early 1960s. During the 1970s, this country acted as ignition point of several local CEA sub-epidemics in other Eastern African countries5 and also in Southern Brazil6 where the CEA sub-epidemic was fueled from a single founder event7. The CEA clade currently predominates among subtype C strains from Eastern African countries and Brazil, and accounts for almost 100% of subtype C sequences from Burundi and Brazil, 97% from Uganda, 64% from Kenya, 61% from Ethiopia and 49% from Tanzania5,8. The evolutionary analyses of the CEA sub-epidemics performed so far mostly address questions about the place and timing of outbreaks onset, focusing on the reconstruction of the geographic dissemination pathways of this viral clade2,5,6,8,9,10,11,12. Studies on the population dynamics of CEA sub-epidemics, while key to understand their historical epidemic growth trends, epidemic potential and ecological processes shaping their evolution, have been much less frequent13,14.

Key epidemiological and population parameters, most notably the effective number of infections (Ne), the epidemic growth rate (r) and the basic (R0) and effective (Re) reproductive number, can be estimated from viral sequence data by using Bayesian phylodynamic approaches based on coalescent15 and birth-death16 models. These models have very different mathematical grounds as well as particular strengths and limitations. The coalescent is appropriate only if the number of sampled infected individuals is small compared with the size of the total infected population16, despite certain robustness to violation of this requirement has been demonstrated17. The birth-death model, meanwhile, explicitly models the sampling process and can thus be used for sparse or densely sampled viral populations16, although estimates may be biased if the model of the sampling process is misspecified. The coalescent allows the inference of the R0 (key epidemiological parameter indicator of increment [R0 > 1], decline [R0 < 1] or stabilization [R0 = 1] in the number of new cases18) only through modeling the population dynamics under a deterministic assumption, which represents a limitation for populations undergoing complex dynamics19, and require an independent estimate of the average duration of infectiousness. The birth–death model has the advantage of accounting for stochasticity of the demographic process and provides an estimate for Re changes over time using only sequence data16,20. A potential disadvantage of the birth-death model is that credibility intervals grow wider the further we go into the past, which is not the case for the coalescent-based models21; although simulation studies showed that the coalescent might not capture the true r because of the narrow credibility intervals around the median estimate attributed to its assumption of deterministic changes in the population size20.

The present work aims to shed light on the past population dynamics of the major HIV-1 CEA sub-epidemics established in Burundi, Rwanda, Ethiopia, Tanzania and Brazil by analyzing viral pol gene sequences sampled between 1990 and 2014 with Bayesian phylodynamic methods based on coalescent and birth-death models.

Materials and Methods

Sequence dataset compilation

A reference dataset of HIV-1 subtype C pol sequences belonging to the east, southern and central African lineages was selected from a previous study5 and combined with: 1) more recent east African subtype C pol sequences with known sampling dates available in Los Alamos HIV Database (http://www.hiv.lanl.gov) by August 2017, and 2) subtype C pol sequences with known sampling dates isolated from heterosexual populations living in the two southernmost Brazilian states (Rio Grande do Sul and Santa Catarina) previously described11. The option “One sequence/patient” was selected from Los Alamos HIV database to exclude multiple sequences from the same subject. The subtype assignment of all sequences was confirmed using the REGA HIV-1 subtyping tool v.3.0. Given the two genetically distinct subtype C clades (C and C’) co-circulating in Ethiopia22, linked to subtype C viruses of eastern and southern African origin respectively, putative intrasubtype C/C’ recombinant sequences (n = 99) were identified by Bootscanning using Simplot v3.5.123 as described previously5 and removed from further analyses. This resulted in a final dataset of 1,147 HIV-1 subtype C pol sequences (Table S1) covering the complete protease (PR) and the first part of the reverse transcriptase (RT) regions (nucleotides 2,253 to 3,272 relative to HXB2 genome).

Identification of dominant country-specific HIV-1 CEA subclades

To identify major country-specific clades within the CEA radiation, HIV-1 subtype C pol sequences from eastern Africa and southern Brazil were first aligned with reference subtype C sequences belonging to the eastern, southern and central African clades using the CLUSTAL X program24 and subjected to maximum likelihood (ML) phylogenetic analysis. ML trees were inferred with the PhyML program25, using an online web server26, under the general time-reversible model of nucleotide substitution plus invariant sites and four discrete gamma rate categories (GTR+I+Γ4) selected with jModeltest program27 and the subtree pruning and regrafting (SPR) branch-swapping algorithm of heuristic tree search. The reliability of the phylogenies was estimated with the approximate likelihood-ratio test based on a Shimodaira–Hasegawa-like procedure (SH-aLRT)25. Basal HIV-1 CEA sequences from Burundi and Rwanda and major (n ≥ 50 sequences) country-specific (>90% of sequences from a single country) monophyletic groups with high support (SH-aLRT ≥0.85) nested within the CEA clade radiation were selected for demographic analyses. Reference sequences of HIV-1 subtypes A1 and D from the Los Alamos HIV Database were used as outgroups. Final trees were visualized in FigTree v1.4.2.

Estimation of phylodynamic parameters

Epidemiological and evolutionary parameters of the defined CEA subclades were estimated via Bayesian Markov Chain Monte Carlo (MCMC) phylogenetic inference using coalescent and birth-death tree priors as implemented in BEAST v1.828 and BEAST v.2.429 software packages, respectively. Changes in Ne using the coalescent tree prior were first assessed using the non-parametric Bayesian skyline (BSKL)30 and Bayesian Skygrid (BSKG)31 models and estimates of the r were subsequently obtained using the parametric model that provided the best fit to the demographic signal contained in each dataset. Comparison between demographic models (logistic, exponential, or expansion) was performed using the log marginal likelihood estimation (MLE) based on path sampling (PS) and stepping-stone sampling (SS) methods32. The cumulative number of lineages through time (LTT) was calculated from the combined posterior distribution of sampled coalescent tree topologies by using TRACER v1.6 program33. A special case of the birth-death tree prior, namely the birth-death skyline (BDSKY) was applied to model viral transmissions through time21. The sampling rate (δ) was set to zero for the period prior to the oldest sample and estimated from the data afterwards. The Re was estimated in a piece-wise manner over three different equidistant intervals using a lognormal prior distribution (Re: mean = 0, standard deviation = 1). Bayesian analyses for each transmission clade employed the GTR+I+Γ4 model of nucleotide substitution selected using the jModelTest program27 and a relaxed uncorrelated lognormal molecular clock model34. Because linear regression analysis of root-to-tip distances as function of sampling time obtained by TempEst v1.535 revealed low temporal signal in the datasets, an informative normal prior distribution on the time to the most recent common ancestor (tMRCA) was applied based on previous estimates5,6. MCMC chains were run for sufficiently long to ensure stationarity (constant mean and variance of trace plots) and good mixing (Effective Sample Size >200) for all parameter estimates, as diagnosed by TRACER v1.6 program33.

Results

Identification of major subclades within the HIV-1 CEA clade radiation

To obtain a more updated picture of the HIV-1 CEA clade radiation, subtype C pol sequences from Eastern Africa and Southern Brazil deposited in Los Alamos HIV sequence database between 2013 and 2016 were combined with CEA pol sequences from those regions previously characterized5,6. The reconstructed ML phylogeny showed that most (79%) subtype C sequences from Eastern Africa and all sequences from Southern Brazil sampled at most recent time (2013–2016) branched within the highly supported (SH-aLRT = 0.96) HIV-1 CEA clade (Fig. S1). As expected, sequences from Burundi and Rwanda were highly intermixed among each other and occupied the most basal positions of the CEA clade radiation; while sequences from other Eastern African countries and Brazil were nested within Burundian and Rwandan CEA sequences.

Most sequences from Kenya and Uganda appeared as sporadic (non-clustered) lineages or clustered in monophyletic subclades of small sizes (n < 50) (Fig. S1). All Brazilian sequences and most sequences from Ethiopia (67%) and Tanzania (66%), by contrast, branched within four country-specific CEA subclades of large size (n > 50) (Fig. S1) that were more clearly visualized after pruning of non-clustered CEA sequences and CEA sequences within monophyletic subclades of small sizes (Fig. 1). The four identified CEA subclades (CEA/BR, CEA/ET-1, CEA/ET-2 and CEA/TZ) together with sequences from Burundi and Rwanda (CEA/BI-RW) comprise 76% (n = 616) of all the CEA sequences analyzed here; thus confirming the epidemiological relevance of the selected subclades.

Figure 1
figure 1

ML phylogenetic tree of HIV-1 CEA pol PR/RT sequences (~1,000 nt) from eastern Africa and southern Brazil. Branches are colored according to the geographic origin of sequences as indicated in the legend (upper right). Gray shaded boxes indicate the positions of major CEA lineages. Asterisks point to key nodes with high support (SH-aLRT >0.85). The tree was rooted using HIV-1 subtypes A1 and D reference sequences and the branch lengths are drawn to scale with the bar at the center indicating nucleotide substitutions per site.

Bayesian population dynamics inference in a coalescent framework

Bayesian MCMC coalescent-based analyses under the BSKL model suggest that all CEA subclades presents roughly comparable demographic histories, with an initial phase of fast exponential growth followed by a stabilization of the Ne at some time between the late 1970s and the late 1980s that persisted until the most recent sampling date of each of them (Figs 2a,e, 3a,e and 4a). The observed stabilization in the Ne of African CEA sub-clades occurs before the respective stabilization of the HIV incidence in corresponding countries estimated around the mid-1990s according to the UNAIDS (Figs 2d,h and 3d,h). The UNAIDS data also supports a significant reduction of the HIV incidence in Burundi/Rwanda, Ethiopia and Tanzania between the mid-1990s and the mid-2000s that was not captured by the BSKL inference. The stabilization of the CEA/BR Ne around the early 1990s is consistent with the stabilization of the new HIV cases in Rio Grande do Sul and Santa Catarina at around the same time (considering a lag time of eight years between HIV infection and new AIDS cases reported by the Brazilian AIDS cases databank for those Brazilian states); but fails to capture a recent increase in the number of new HIV cases from 2010 onwards (Fig. 4d).

Figure 2
figure 2

Epidemiological and population dynamics of the CEA sub-epidemics in Ethiopia. Median estimates of the effective number of infections (Ne) using the Bayesian skyline or skygrid models (black lines) together with their 95% highest probability density (HPD) intervals (gray areas), co-plotted together with the median Ne estimates using the logistic coalescent-based parametric model (blue lines) and its 95% HPD intervals (blue areas). The green dashed lines indicate the time of the last coalescent event reported by the lineages-through-time (LTT) (a,b,e,f). Temporal fluctuation of the effective reproductive number (Re) of the CEA/ET-1 and CEA/ET-2 sub-epidemics estimated using the Bayesian birth-death approach (c and g). For an easier visualization, the median coalescent-based R0 estimate (blue dashed lines) inferred for each sub-epidemic and its 95% HPD intervals (blue area) were added. The gray dashed lines indicate Re = 1 (c and g). Plots representing the number of new HIV cases in Ethiopia as obtained from UNAIDS website http://aidsinfo.unaids.org/ (d and h). The yellow, pink and gray intervals denote the time spanned for the birth-death-based Re-initial, Re-middle and Re-final estimates of each CEA sub-epidemic.

Figure 3
figure 3

Epidemiological and population dynamics of the CEA sub-epidemics in Tanzania and Burundi/Rwanda. Median estimates of the effective number of infections (Ne) using the Bayesian skyline or skygrid models (black lines) together with their 95% highest probability density (HPD) intervals (gray areas), co-plotted together with the median Ne estimates using the logistic coalescent-based parametric model (blue lines) and its 95% HPD intervals (blue areas). The green dashed lines indicate the time of the last coalescent event reported by the lineages-through-time (LTT) (a,b,e,f). Temporal fluctuation of the effective reproductive number (Re) of the CEA/TZ and CEA/BI-RW sub-epidemics estimated using the Bayesian birth-death approach (c and g). For an easier visualization, the median coalescent-based R0 estimate (blue dashed lines) inferred for each sub-epidemic and its 95% HPD intervals (blue area) were added. The gray dashed lines indicate Re = 1 (c and g). Plots representing the number of new HIV cases in Tanzania and Burundi/Rwanda as obtained from UNAIDS website http://aidsinfo.unaids.org/ (d and h). The yellow, pink and gray intervals denote the time spanned for the birth-death-based Re-initial, Re-middle and Re-final estimates of each CEA sub-epidemic.

Figure 4
figure 4

Epidemiological and population dynamics of the CEA sub-epidemic in southern Brazil. Median estimates of the effective number of infections (Ne) using Bayesian skyline or skygrid models (black lines) together with their 95% highest probability density (HPD) intervals (gray areas), co-plotted together with the median Ne estimates using the logistic coalescent-based parametric model (blue lines) and its 95% HPD intervals (blue areas). The green dashed lines indicate the time of the last coalescent event reported by the lineages-through-time (LTT) (a and b). Temporal fluctuation of the effective reproductive number (Re) of the CEA/BR sub-epidemic estimated using the Bayesian birth-death approach (c). For an easier visualization, the median coalescent-based R0 estimate (blue dashed line) inferred for the CEA/BR subclade and its 95% HPD intervals (blue area) were added. The gray dashed line indicate Re = 1 (c). Plot representing the number of new HIV cases in the Southern Brazilian states of Rio Grande do Sul (RS) and Santa Catarina (SC). AIDS cases reported by the Brazilian AIDS cases databank (SINAN = SIM = SISCEL: http://www.portalsinan.saude.gov.br/dados-epidemiologicos-sinan), minus eight years, was used as an approximation for new HIV infections (solid black line). From 2007 onward, Brazilian Ministry of Health started to report HIV new infections (http://www.aids.gov.br/pt-br/pub/2016/boletim-epidemiologico-de-aids-2016), reprented here as a dashed black line (d). The yellow, pink and gray intervals denote the time spanned for the birth-death-based Re-initial, Re-middle and Re-final estimates of the CEA/BR sub-epidemic.

The BSKG demographic reconstructions point to a longer period of exponential growth for all CEA subclades that extends up to between the early and the mid-1990s, in agreement with incidence data. For the Ethiopian CEA sub-epidemics, the BSKG points to subsequent decline of the median Ne until mid-2000s and a final plateau until 2010, in agreement with epidemiological data (Fig. 2b,d,f,h). The median estimated Ne for the CEA/BI-RW, CEA/TZ and CEA/BR sub-epidemics reach a plateau that persisted until the most recent sampling time which differs markedly from the HIV incidence pattern (Figs 3b,d,f,h and 4b,d). The large 95% highest probability density (HPD) interval of the Ne estimates inferred by the BSKG model, however, may accommodate different demographic patterns in the latter stages making it difficult to draw solid conclusions on the consistency (or the lack thereof) between these estimates and the HIV incidence temporal pattern.

The logistic growth model was the best-fit parametric model of population growth for all the CEA subclades (Table S2). The median Ne trajectories obtained by the logistic growth model during the initial exponential phase closely matched the corresponding trajectories obtained with the non-parametric approaches, particularly those obtained with the BSKG (blue line Figs 2a,b,e,f, 3a,b,e,f and 4a,f). The mean R0 values derived from the logistic growth model using the formula R0 = rD + 115 (where D is average duration of infectiousness herein considered of eight years) was very similar for all CEA subclades, ranging between 5.0 and 5.9 (Table 1). The median Ne trajectories of the CEA/BI-RW, CEA/TZ and CEA-BR at the plateau phase obtained with parametric and non-parametric approaches were very similar. As the logistic growth model is unable to capture non-constant trends on recent time period estimates, obviously do not reproduced the declining phase pointed by the BSKG for the CEA/ET-1 and CEA/ET-2 sub-clades, however, the parametric and non-parametric models converge to similar median Ne values at the late plateau phase.

Table 1 Evolutionary and demographic parameters estimated for HIV-1 CEA subclades.

Bayesian population dynamics inference under birth-death model

The BDSKY model was applied allowing the Re to change in a piece-wise manner over three different time intervals that allow us to estimate the epidemic potential before identification of HIV (first time interval, ~1960–1970 to 1980–1984) and observe the potential impact of prevention (second time interval, ~1981–1985 to 1995–1999) or therapy (third time interval, ~1996–2000 to 2012–2014) measures on epidemic dynamics (Figs 2c,g, 3c,g and 4c). The BDSKY model support initial exponential growth dynamics fully consistent with those estimated using coalescent models. Although the mean R0 values (5.0–5.9) were slightly higher than the estimated mean Re-initial (3.8–4.9), the uncertainty on the R0 estimates was always contained within the broader 95% HPD intervals of Re-initial (Table 1). The BDSKY model supports a progressive reduction of the mean Re in the second (mean Re-initial >mean Re-middle) and third (mean Re-middle >mean Re-final) time intervals for all African subclades. Although the mean Re estimates for the Ethiopian clades remained above one during all time intervals, which clearly did not match with the declining and subsequent stabilization of the HIV incidence in Ethiopia from the mid-1990s onwards (Fig. 2d,h), those Re estimates should be interpreted with caution because the extremely wide 95% HPD intervals (Table 1). For the CEA/BI-RW and CEA/TZ subclades, the Re-middle was above one, while the Re-final was below one, in agreement with the increasing HIV incidence in Burundi/Rwanda and Tanzania up to the mid-1990s and the subsequent declining from the mid-1990s onwards (Fig. 3d,h). For the CEA/BR subclade, the BDSKY model supports a reduction of the mean Re in the second interval (Re-initial ~1) and a new increase in the third one (Re-final >1). This is consistent with the HIV incidence trends in Rio Grande do Sul and Santa Catarina Brazilian states, that supports an epidemic stabilization during the 1990s and a new epidemic increase during the 2000s (Fig. 4c,d).

Discussion

In this study, we characterized key features of the epidemic dynamics of major HIV-1 subtype CEA lineages circulating in East Africa and Southern Brazil through the use of different phylodynamic frameworks based in coalescent and birth-death process. The different coalescent models capture very similar epidemic dynamics over the earlier decades of the CEA lineages dissemination; but point to quite different epidemic dynamics from the mid-1980s onwards.

Both phylodynamic approaches suggest an initial stage of fast exponential growth of all the CEA sub-epidemics during the period of cryptic transmission of HIV in human populations. These initial phases of exponential growth herein inferred correlate with retrospective serological-based studies and simulations that indicated that an explosive epidemic was already sweeping the Eastern African region36,37,38,39,40,41 by 1981, when the AIDS was first recognized. The exponential growth phase inferred for the Brazilian CEA sub-epidemic during the 1970s and 1980s is also fully consistent with the sharp increase in the number of new HIV cases detected in the southernmost Brazilian states during the 1980s42. The mean R0 (5.0–5.9) and Re-initial (3.8–4.9) values here estimated for all CEA sub-clades were roughly comparable. The 95% HPD intervals of R0 were always smaller and contained within the broader 95% HPD intervals of Re-initial, consistent with previous empirical and simulated data20,21. The coalescent-based logistic growth model is expected to provide narrow HPD intervals because it considers a deterministic population trajectory, while the BDSKY model incorporates stochasticity in population size. Additionally, the HPD interval of Re estimates from the BDSKY model grows wider the further we go towards the past, that is not the case for the coalescent-based logistic growth model estimates.

Importantly, the mean R0 and Re-initial values inferred for the African CEA sub-epidemics were fully consistent with those estimated through analyses of HIV prevalence rate and life expectancy in Eastern African countries43. This suggests that both phylodynamic frameworks were able to recover the true early growth rates of HIV-1 CEA sub-epidemics.

Factors like gender inequality44,45,46, civil and ethnic conflicts47,48,49,50, conflict-induced displacement, and increasing urbanization51,52 have shaped the early HIV epidemic dynamics across all Eastern African countries, consistent with the similar epidemic growth rates of African CEA sub-epidemics. Notably, the epidemic growth rate inferred for the Brazilian CEA sub-epidemic was very similar to those obtained for the African CEA sub-epidemics despite significantly distinctive history of human conflicts in those regions. The exponential growth phase of the CEA/BR sub-epidemic matches with a period in which public health system was unaware about the severity of the epidemic53 and the CEA/BR subclade was efficiently disseminated in Southern Brazil through heterosexual (HET) networks2,13,54, similar to that observed in Eastern Africa. This suggests that the absence of prevention efforts and the predominant viral transmission through HET route may have been the common driving forces of the early dynamics of the CEA sub-epidemics in Eastern African and Southern Brazil.

The BSKL model supports that African CEA sub-epidemics grew exponentially until between the late 1970s and late 1980s, after which there occurs a plateau in the Ne until the most recent sampling time of each of them. The stabilization of the Ne trajectories occurs around 10 years before the last coalescent event (Figs 2a,b,e,f and 3a,b,e,f), thus supporting that the inferred plateau of the Ne is not due to a paucity of coalescent events after the early 1990s55. More important, such stabilization occurred before implementation of prevention campaigns during the 1990s56,57,58,59,60 and introduction of universal access to antiretroviral (ARV) therapy during the 2000s61,62 in Eastern Africa. The overall Ne trajectories inferred by the BSKL after the mid-1980s, however, differ markedly from the data of the United Nations Joint Program on HIV/AIDS (UNAIDS)63 according to which the HIV incidence in Burundi/Rwanda, Ethiopia and Tanzania reached a peak around the mid-1990s (rather than during the 1980s), and was followed by a sharp decline (rather than a plateau) until the mid-2000s, before stabilize.

The overall epidemic dynamics inferred by the BSKG model from the mid-1980s to the mid-1990s are more consistent with the HIV incidence data than those inferred by the BSKL, although some divergences were also detected at later times. The BSKG model points that CEA African sub-epidemics grew exponentially until the early/mid-1990s and further supports a declining Ne of CEA/ET subclades between the mid-1990s and the mid-2000s, consistent with epidemiological data. This model, however, failed to capture a similar decline of Ne for the CEA/BI-RW and CEA/TZ sub-epidemics. These results indicate that the BSKG model can correctly predict epidemic decline in some situations, as demonstrated here for the CEA/ET subclades and previously for the CRF02_AG epidemic in Cameroon31,64; but not in others.

Although the BDSKY model also supports a progressive reduction of epidemic growth over time, an Re >1 was estimated at the second time interval (that roughly covers the period between early/mid-1980s and mid/late 1990s) for all African CEA subclades, consistent with a continuous increase of HIV incidence up to the mid-1990s. The BDSKY model capture an Re <1 for the CEA/BI-RW and CEA/TZ sub-epidemics at the most recent time interval (after the mid-1990s); but supported an Re ≥1 for the CEA/ET sub-epidemics in the same time interval. These results confirms that the BDSKY model can correctly uncover a signature of a recent declining epidemic not reflected in the coalescent plots, as previously seen in the HIV-1 subtype B epidemic in the UK21; but also reveals that it may fail to capture such trend in some other datasets.

It is interesting to note that the BSKG failed to capture a decline of Ne for the CEA/BI-RW and CEA/TZ sub-epidemics since the middle-late 1990s onwards, while the BDSKY model failed to capture a Re ≤1 for the CEA/ET sub-epidemics at the same time interval, suggesting that the performance of different phylodynamics approaches could be affected by different factors. The BSKG model requires strongly informative data to prevent erroneous estimates of Ne stabilization as pointed by a recent study65. The lower proportion of HIV-1 subtype C sequences sampled at recent times (since 2008 onwards) in the CEA/BI-RW (6%) and CEA/TZ (30%) datasets compared with the CEA/ET datasets (≥70%) may have reduce the ability of this coalescent model to capture changes in Ne for the CEA/BI-RW and CEA/TZ sub-epidemics at most recent times. The BDSKY model could be more robust to the paucity of coalescent events at most recent time; but its performance could be limited by the number of time intervals (changes in Re) specified. Increasing the number of Re-changes may allow a better fit of the Re trajectories to the epidemiological data for the CEA/ET sub-epidemics. This strategy, however, resulted in a lack of parameter convergence and huge 95% HPD intervals, indicating that accurate Re estimations at more time intervals would require a larger number of CEA/ET sequences than those used in the present study.

The BSKL, BSKG and BDSKY models support quite consistent epidemic dynamics for the CEA/BR sub-epidemic until the late 1990s. According to the coalescent models, the Ne of the CEA/BR subclade growth exponentially until the early (BSKL) or mid-1990s (BSKG) and then reached a plateau. In agreement, the BDSKY model supports an expanding epidemic (Re >1) in the first time interval (~ mid-1970s to late 1980s) and a transient epidemic stabilization (Re ~ 1) in the second time interval (~ late 1980s to early 2000s). The stabilization of the CEA/BR incidence since the early/mid-1990s is in line with the reported trend toward stability of the new HIV cases in Rio Grande do Sul and Santa Catarina states since the mid-1990s42, probably due to the implementation of prevention efforts that acted as the driven-force of people’s behavioral changes53,66. While coalescent models support a roughly constant Ne for the CEA/BR sub-epidemic until the most recent sampling time, the BDSKY model uncovers a new epidemic increase (Re >1) at the last time interval. This matches with an upward trend of new HIV diagnoses in Rio Grande do Sul and Santa Catarina states since 200767. Such epidemiological changes are probably too recent to be fully captured by coalescent models.

A recent study using BSKG to analyze the population dynamics of the CEA/BR sub-epidemic from pol and env sequences from HET and men having sex with men (MSM) individuals reported a continuous increase in the Ne until mid to late 2000’s that was associated with the recent expansion of subtype C throughout the MSM group13. Interestingly, universal access to free fully suppressive ARV therapy is available in Brazil since the late 1990s61,62 and an association between ARV treatment availability and increases in sexual risk behavior (and consequent rise in HIV incidence) have been previously reported among MSM from developed countries68,69,70,71. Our BDSKY analyses of sequences from HET individuals support that the recent expansion of the CEA/BR sub-epidemic is probably not restricted to a specific group, but also occurred among HET individuals. Increases in sexual risk behavior among HET individuals fully agrees with the sustained increase of HIV67 and other sexually transmitted disease observed in Southern Brazil since 201072.

A drawback to consider about the highlighted agreements and disagreements between the available epidemiological data and our phylodynamic modeling is that while the former characterizes the HIV epidemic of each of the countries/regions as a whole, the CEA clade herein analyzed is not the only prevalent HIV lineage in all of them5. Then, it is possible that trends in the number of new HIV cases belonging to the CEA sub-epidemics do not fully correspond with those of the overall HIV epidemic. Besides, a more homogeneous and dense sampling of each CEA sub-epidemic over time as well as the use of sequence data from multiple genetic loci31 and the incorporation of covariates into the demographic inference framework64 may improve the performance of phylodynamics methods to recover true population trajectories.

Overall, this study supports that major HIV-1 CEA lineages circulating in Eastern Africa and Southern Brazil seem to have had an exponential spread with very similar growth rates until the early/mid-1990s. The overall agreement of the R0 and Re-initial values here estimated from genetic sequences with those previously obtained from classical epidemiological data strengthen the utility of coalescent and the birth-death phylodynamic approaches to infer relevant epidemiological information of HIV epidemics at the earlier stages. Our data supports that introduction of universal access to ARV therapy during the late 1990s and early 2000s coincides with a declining epidemic in Eastern Africa, but with an upward trend of new HIV diagnoses in Southern Brazil. Our results also underscore the importance of the joint use of both coalescent and birth-death phylodynamic approaches for the analyses of HIV population dynamics given its apparent differential sensitivity for recovering changes in population dynamics at most recent times in different datasets.