Introduction

The persistence of chromosomally-integrated HIV DNA in CD4+ T cells is the primary barrier preventing people living with HIV (PWH) from achieving viral remission after stopping antiretroviral therapy (ART)1,2. HIV persistence has been associated to physiological mechanisms of CD4 cells3,4 (e.g., homeostatic5,6,7 and antigen-driven proliferation8, cellular differentiation/maturation9, and death). To help elucidate persistence mechanisms, it is critical to compare HIV DNA and CD4 cell dynamics as directly as possible.

To that end it is important to consider that integrated HIV DNA can be found in multiple CD4 cell subsets6,9,10,11 (categorized by surface markers12,13,14) which have different physiological functions (phenotypes) and maturational levels. For instance, at certain time points, higher proportions of HIV DNA have been found in more mature memory and effector CD4 cells, suggesting they are preferentially infected and/or expand HIV DNA through cellular proliferation6,15,16,17. On the other hand, longitudinally across individuals, HIV DNA appears to accumulate over time in less mature subsets that turn over less frequently18. However, no study to date has measured both CD4 cell turnover and HIV kinetics across subsets in the same individuals.

Mathematical modeling has continually proven useful to understand the kinetics and kinetic heterogeneity of HIV levels within a person over time during suppressive ART19,20,21,22. In addition, modeling studies have sometimes inferred cellular rates using HIV as a molecular tag23,24. Our methodology builds upon a rigorous body of work using dynamical systems and population mixed effects modeling to quantitatively describe viral dynamics and recently, for multiple simultaneous data types25,26,27.

Previously, most CD4 cell subsets have been shown to turn over several times per year in individuals without HIV28,29. These rates have been compared to HIV DNA decay rates (generally >4 year half-lives30,31,32), with the implication that HIV DNA in the reservoir must be replenished consistently while CD4 cells are born and die. Yet, a further complication is that genetically intact proviruses generally decay faster than defective ones22,33,34, suggesting extrinsic factors like immune selection35,36 may also influence viral persistence. Overall, the precise balance of processes that support reservoir maintenance remain incompletely characterized.

Here we measured cellular turnover in each of five resting CD4 cell subsets and changes in integrated HIV DNA levels within these subsets over 3 years in the same participants. We directly compared HIV DNA kinetics and cellular turnover rates within each subset and identified how these rates contribute to overall slow HIV DNA clearance. By selecting the most parsimonious mechanistic model for these combined data, we inferred the degree to which cellular proliferation and differentiation contribute to maintenance of integrated HIV DNA levels during suppressive ART. Finally, we simulated temporary modulations of proliferation and differentiation to highlight how minor changes in these processes might result in meaningful changes to HIV kinetics.

Results

Study cohort

The HOPE cohort consists of 37 PWH on suppressive ART (clinical and demographic information in Supplementary Table 1), 24 of whom underwent a 45-day deuterium labeling study to measure CD4+ T cell turnover rates and were reported previously17 in a cross-sectional study. Here, we report a prospective 3-year longitudinal analysis of levels of integrated HIV DNA in distinct maturational CD4 cell subsets from all 37 HOPE participants and integrated these data with measured CD4 cell subset turnover rates. Follow up began 1–10 years after achieving viral suppression. Levels of integrated HIV DNA per million CD4+ T cells tended to be stable over time within individuals but differed between individuals by several orders of magnitude (Supplementary Fig. 1).

Quantifying HIV DNA in CD4+ T cell subsets

From these longitudinal samples, resting (HLA-DR-) CD4+ T cells were isolated and sorted by flow cytometry into six CD4 cell subsets (sort schematic in Supplementary Fig. 2): naïve (TN), stem-cell memory (TSCM), central memory (TCM), transitional memory (TTM), effector memory (TEM) cells, and a putative terminally differentiated (TTD) population. As we observed contamination with TN in TTD, the present analysis was focused on the first five sorted populations, each of which was sorted with high purity17.

CD4+ T cell subset frequency was calculated as the ratio of subset cells per resting CD4 cells (Fig. 1A). TN and TCM were most common, each with a median across participants and time of ~25% of all resting CD4 cells. The infection frequency was then calculated as the number of integrated HIV DNA copies per million resting cells within each subset (Fig. 1B). Typically, ~1 in 1000 resting TTM and TEM harbored integrated HIV DNA, whereas the other subsets less commonly harbored HIV DNA16. Finally, by multiplying the subset frequency by the infection frequency, we derived the subset HIV DNA level which reflects the relative contribution of each subset to the measured HIV DNA, i.e., the number of integrated HIV DNA copies in a given subset per million total CD4 cells (Fig. 1C). Although not the highest in infection frequency, given its high subset frequency, TCM contributed the highest median HIV DNA levels, with ~100 infected TCM for every million CD4 cells. Median HIV DNA levels were generally lower but not significantly different in other memory phenotypes (TTM and TEM). Considerable variability was noted within each subset and for each data type.

Fig. 1: Definitions and representation of study data.
figure 1

From 37 PWH in the HOPE cohort, samples were taken at 1–3 time points over a 3-year period. Resting CD4+ T cells were sorted into five phenotypic subsets including naïve (TN), stem-cell memory (TSCM), central memory (TCM), transitional memory (TTM), and effector memory cells (TEM). Three measurements were observed or calculated (panel headings): (A) subset frequency—the proportion of cells in each subset relative to total resting CD4 cells (“other” represents resting cells not among the five sorted subsets), (B) subset infection frequency—integrated HIV DNA in each subset per million subset cells, and (C) subset HIV DNA—the number of HIV DNA copies in a given subset per million CD4 cells. Colored dots indicate values from all participant  time points and black diamonds represent means across all dots.

HIV infected cells decay faster than non-infected cells in TTM and TEM (but not other) subsets

To determine if HIV DNA cleared differently in each subset, we used a statistical framework (log-linear mixed effects model) to assess changes in subset infection frequency over the 3-year study period (Fig. 2A). Although the decay rates were heterogeneous (and even positive, i.e., growing, in certain individuals), the average integrated DNA levels within TN, TSCM, and TCM did not significantly change over time (t test p > 0.05 against null hypothesis of no change), while those within TTM and TEM decayed slowly but significantly over time (t test p < 1e-8) (Fig. 2B). Accordingly, TN and TCM rates were significantly different from TTM and TEM rates (pairwise t tests p < 0.005 according to Bonferroni correction for multiple comparisons). Estimated median half-lives were 81 and 59 months for TTM and TEM, respectively. A declining subset infection frequency implies that HIV-infected cells decay faster than non-HIV-infected cells in that subset, suggesting an active process whereby HIV-infected cells are selectively removed.

Fig. 2: The kinetics of subset HIV frequency vary by subset and are generally slower than cellular turnover.
figure 2

A Longitudinal kinetics of HIV subset infection frequency in each cell subset: thin lines and dots are individual trajectories and thick solid lines represent the estimated average slopes from a log-linear mixed effects model. B Box plots of participants’ decay rates—note that some are positive, meaning that HIV frequency increased. P-values indicate one-sided t-test against null hypothesis of no clearance. For scale, the decay rate equivalent to the QVOA reservoir benchmark 44 month half-life30 is denoted with the dashed gray line. C Cellular turnover rates derived from deuterated water labeling in 24 of these 37 individuals. P values indicated paired two-sided t-tests with non-equal variance. Magnitudes of cellular turnover rates (in non-TN subsets) are much higher than HIV decay rates—note difference in y-axis scales in (C) versus (B). D The % of cellular turnover that is accompanied by HIV turnover (Methods). Values close to 100% indicate that HIV is typically repopulated when cells turn over. In (BD) box plots indicate median (center line), interquartile range (box), 1.5x interquartile range (whiskers), and outliers (gray diamonds). Each dot (N = 24) represents an individual. E Cartoon example for TEM: in a year, there is frequent cellular turnover, which is infrequently (~5% of events) accompanied by elimination of HIV-infected cells, resulting in the observed slight decay of HIV DNA.

Measuring cellular turnover via deuterium labeling

We used deuterated water labeling37 performed on 24 of the 37 HOPE participants to estimate cellular turnover rates in each subset17. In these experiments, the cellular turnover rate is derived from modeling the proportion of cells that take up a deuterium label during a 45-day labeling period (model schematic in Supplementary Fig. 3). More specifically, the fraction of cells that divided during exposure to deuterated water is calculated37,38,39. Although what is initially measured from deuterium incorporation into genomic DNA is S-phase cell division, or proliferation40, we instead use the term turnover rate here because this rate represents the combination of all mechanisms that impact levels of deuterium in a subset, including migration/trafficking and/or differentiation. For instance, labels in a given subset can rise due to maturation of a labeled progenitor cell or fall due to further maturation41. Cellular turnover rates ranged across subsets from slowest (TN median 0.2/year) to most rapid (TEM median 2.6/years) (Fig. 2C). Turnover rates were generally more rapid in more differentiated subsets, with the greatest differences between TN to TSCM and TCM to TTM (pairwise t-test p-values in Fig. 2C). A turnover rate of 1 per year corresponds to a half-life of 8.3 months, so these CD4 subsets have median half-lives of 35, 5.3, 4.3, 3.4, and 3.1 months, respectively. Considerable variability was noted within each subset.

CD4+ T cell turnover is often but not always accompanied by HIV DNA turnover in certain subsets

In all subsets except TN, the cellular turnover rate was roughly an order of magnitude faster than the rate of decay of HIV-infected cells (compare Fig. 2B, C). This suggests that cellular turnover of HIV-infected cells does not usually result in removal of HIV DNA. We therefore estimated the percentage of cellular turnover events that might also be accompanied by HIV turnover rather than HIV clearance (Methods). For the five subsets respectively, we calculated medians of 112, 94, 99, 96, and 94% (Fig. 2D). In TN, this number is greater than 100% suggesting some increases in HIV DNA in this subset; however, there was very high variability across participants making the median less reliable. Additionally, the much lower cellular turnover rates invoke lower signal compared to noise in the deuterium labeling measurements, potentially reducing precision. In the TCM subset, we estimate that cellular turnover almost always results in HIV turnover, so HIV DNA does not necessarily decline. Finally, in TSCM, TTM, and TEM, 94–96% of cellular turnover can be associated with HIV turnover. That is, roughly 5% of cellular turnover events are accompanied by clearance of HIV DNA in these subsets (see example for TEM in Fig. 2E). Together, these results indicate that most, but not all, events that increase cell numbers—cellular proliferation and other mechanisms contributing to turnover—are accompanied by concomitant increases in HIV DNA. Any slight imbalance towards cell number increases without HIV increases could drive decay of HIV DNA in certain CD4 subsets.

Mechanistic modeling of subset HIV DNA suggests differentiation rapidly passages HIV through CD4+ T cell subset maturation pathways

CD4+ T cell subsets are connected to one another by known steps of lineage maturation14. Previously, in this cohort, we found HIV DNA integrated into identical human chromosomal sites among TCM and TTM and TTM and TEM subsets, a strong sign that differentiation of HIV-infected cells can occur17. Moreover, HIV DNA frequencies and levels were found to correlate between certain subsets (Supplementary Fig. 4). Yet, the relative degree to which differentiation into a given CD4 cell subset versus proliferation within that subset contributes to HIV DNA persistence remains unclear. Therefore, we next sought to model HIV DNA levels with a mechanistic model that included specific rules of cellular proliferation, death, and differentiation.

We developed a variety of models inclusive of different mechanistic processes and degrees of complexity (Table 1, see Methods for equations and text describing assumptions). The list of models encodes scenarios in which HIV DNA levels are governed by one or more mechanisms including slow decay, proliferation, and cell differentiation between subsets. A schematic and table of definitions illustrates the rates we consider (Fig. 3A, B). We then tested these models for fit against levels of subset HIV DNA (e.g., Fig. 1C). Importantly, this is a different data type than in Fig. 2 and provides a common denominator of million CD4+ T cells for each subset. In our model, the levels of HIV DNA are linked across subsets, allowing proliferation and differentiation rates to be directly compared.

Table 1 Results of information theoretic mathematical model selection on integrated HIV DNA per million CD4+ T cells
Fig. 3: Modeling subset HIV DNA dynamics via physiological mechanisms of T cells including proliferation, differentiation, and death.
figure 3

Model schematic (A) and definitions (B) of model rates for a single subset. Net effect rates \({{{{\Theta }}}}\) describes the total kinetic rate summing all modeled mechanisms governing HIV DNA so can be positive or negative for each subset. The turnover rate represents the positive contribution to cellular turnover, estimated via the labeling study. Our mathematical model estimates the repopulation (θ) and differentiation (φ) rates in and out of each subset. Therefore, we can calculate the proliferation (α) and death (δ) rates for each subset from turnover and differentiation. C The most parsimonious model of all combined subset HIV DNA levels included infected cell proliferation (dots flashing), death (dots falling and fading), and differentiation between certain subsets (dots moving). This image is a screenshot of the Supplementary Movie 1 which visualizes the system over time. The differentiation pattern that was most parsimonious included a general flow from least to most mature subsets, but also some “skip” patterns, i.e., TN-to-TCM and TCM-to-TEM. With no further measured subset past TEM, death and differentiation out could not be distinguished for TEM so we combined the two phenomena (see *).

Models were ranked by their accuracy (fit to data) but also penalized for complexity using information criterion. The selected model (Fig. 3C, Supplementary Movie 1) ranked best by both Akaike and Bayesian information criteria42 (AIC and BIC, Table 1). In this best model, each subset level of HIV DNA \({H}_{s}\) has a repopulation rate \({\theta }_{s}\) that encapsulates the balance of cell proliferation and death. Cellular differentiation passages HIV DNA between subsets \(i\) to \(j\) with rate \({\phi }_{i:j}\). Because we did not include the terminally differentiated subset (TTD) due to TN experimental contamination, we could not estimate TEM clearance and differentiation rates simultaneously. Therefore, we explicitly note a combination of the two phenomena (see * in Fig. 3C). We also constrained parameter estimation to ensure rates for each subset were no larger than observed cellular turnover rates for that subset (Supplementary Fig. 5A, B). When this constraint on parameter space was relaxed, some models performed slightly better, but our initial best model remained second only to a model with the same structure but including biologically unrealistic rates (Table 1). Therefore, for the remainder of the analysis, we proceeded with this more conservative model.

Qualitative features of model selection provide several mechanistic results. First, all models lacking differentiation had significantly poorer fit compared to the optimal model (ΔAIC > 2, Table 1). A model that attempted to explain HIV levels through differentiation without cell proliferation was substantially worse than the optimal model (ΔAIC = 85). The selected model includes passaging of HIV DNA along CD4 maturation pathways (i.e., linearly from least to most differentiated subsets) but additionally was improved by the addition of “skip” differentiation from TN to TCM, and from TCM to TEM. A simpler model with purely linear differentiation TN > TSCM > TCM > TTM > TEM was ranked 3rd but did not provide as strong a fit to data (Table 1). Together, these findings suggest differentiation is necessary but not sufficient to precisely describe HIV DNA dynamics in CD4 cell subsets over time.

To potentially broaden the applicability of this model, we provide a table of initial conditions, mean and standard deviation of population rates, and estimated variability of HIV DNA data (Supplementary Table 2).

Sensitivity analysis on model selection

To assess whether the sparse 3-year sampling could have resulted in observations favoring a model with skip differentiation, we simulated the best-fit version of the model with linear differentiation, added appropriate noise, and sampled time points per the 3-year study scheme (Supplementary Fig. 6). We then refit this model to the linear- and skip-differentiation models. As expected, the linear differentiation model fit these data better than the skip-differentiation model (ΔLL = 1.5, ΔAIC = 10 compared to skip-differentiation model). This sensitivity analysis illustrates how model selection can be self-consistent, such that data generated with a given model contains enough information to recover the same model via model selection. In addition, it supported that the skip differentiation model was not innately favored based on noise or the sampling scheme.

Estimating HIV DNA decay half-lives in the model inclusive of cellular differentiation

With some exceptions, model fits were excellent across highly variable subset trajectories (see 18 of 37 fits for participants with three time points, Fig. 4A). The overall population trends for each subset show that, notwithstanding some degree of heterogeneity, the average integrated HIV DNA level decays per million CD4+ T cells in 4/5 subsets with a half-life of: 4.3 years in TN, 2.6 years in TSCM, 3.2 years in TCM and 3.7 years in TEM (Fig. 4B). At the same time, HIV DNA levels in TTM appeared to increase (which implies no half-life). When HIV DNA levels in all subsets were summed, the net half-life across all subsets was calculated to be 5.4 years. Although these data are not inclusive of all CD4 cell subsets capable of harboring HIV genomes, and individuals have different timeframes of ART (i.e., see trajectories in Supplementary Fig. 1), these half-life estimates are within ranges of previously-estimated HIV DNA decay22,32,43.

Fig. 4: Modeling including proliferation and differentiation recapitulates individual subset HIV DNA kinetics.
figure 4

A Model fits (solid lines) of subset HIV DNA levels (dots/dashed lines) for all participants having 3 longitudinal measurements (N = 18). B Population model (solid lines) estimates of subset HIV DNA (copies per million CD4 T cells) to all longitudinal participant data (dots with thin lines).

Quantifying the contribution of cell proliferation, death, and differentiation to integrated HIV DNA persistence

To compare and contrast the mechanisms underlying HIV persistence in the best model, we next directly applied the cellular turnover data to estimate the absolute number of integrated HIV DNA copies (per million CD4+ T cells) that enter and leave each subset pool during a typical year due to proliferation, differentiation in and out, and death (Methods, Fig. 3A, B).

In a typical year in the average individual, we calculated (Methods) that 1–10 HIV DNA copies per million CD4 T cells are generated by proliferation of TN and TSCM while 100–1000 copies are generated by proliferation in TCM, TTM, and TEM (Fig. 5B). Meanwhile, similar numbers of HIV DNA copies are removed by death (Fig. 5D). These numbers imply that HIV DNA persists in a rapid and dynamic near-equilibrium state (Supplementary Movie 1). At the same time, few HIV DNA copies per million CD4+ T cells enter TN and TSCM (Fig. 5A), and 1–10 copies exit those subsets (Fig. 5C) due to differentiation. On average, ten copies enter, and 100 copies leave TCM due to differentiation (Fig. 5C). The unequal differentiation in and out then requires a slight imbalance favoring proliferation over death (Fig. 5B vs. Fig. 5D) to maintain TCM near equilibrium. TTM differentiation was almost balanced (mean ~100 copies in, ~70 copies out in Fig. 5A vs. Fig. 5C). We could not distinguish TEM outward differentiation from death using these data since terminally differentiated cells were not studied in this analysis. Considerable variability was noted across participants within each subset.

Fig. 5: Absolute and relative contribution to HIV reservoirs by cell proliferation, death, and differentiation.
figure 5

AD Absolute contributions to HIV subset DNA by differentiation in, proliferation, differentiation out, and death of each subset. E Relative contribution of each mechanism to each subset. Positive (persistence) and negative (clearance) contributions are treated separately for % calculations. Differentiation out and death of TEM are grouped together because the lack of terminally differentiated cells in this analysis precluded identification of both rates. In AE, estimate for each individual (N = 24) are shown as colored dots and black diamonds indicate means across individuals. F The absolute contribution of each mechanism averaged across all individuals.

Next, we compared mechanisms relative to one another by calculating the percentage of creation (differentiation in and proliferation) and removal (differentiation out and death) events from each mechanism and for each cell subset (Fig. 5E). Proliferation was the dominant mechanism contributing to the persistence of integrated HIV DNA in TN, TCM, and TTM. However, differentiation inward may play an important role in maintaining HIV genomes in TSCM and TEM. Differentiation outward was an important mechanism particularly for TN and TCM, in which removal was projected to occur more through differentiation than death. TEM are known to proliferate frequently and had the highest cellular turnover rates. However, the absolute contribution of proliferation estimated here was lower than differentiation in. If HIV DNA dynamics mirror cellular dynamics measured with deuterated water experiments, this suggests that cellular turnover of HIV-infected TEM may particularly be influenced by differentiation.

In summary the model portrays typical HIV DNA levels as a rapidly proliferating, dying, and differentiating population that, in aggregate, maintains a nearly equilibrated system such that integrated HIV DNA only decays slowly and only in more mature CD4 subsets (Supplementary Movie 1). Importantly, proliferation remains the predominant mechanism in the generation of integrated HIV DNA. TN and TSCM contain less HIV DNA; therefore, the absolute HIV DNA creation and removal in those subsets is orders of magnitude smaller than that found in memory subsets. Proliferation is of particular impact in the context of TCM and TTM: when coupled with differentiation outward (to one or more subsets), these subsets contribute meaningfully to HIV DNA persistence in the rapidly dying/differentiating TEM pool (Fig. 5F).

Modeling cell-associated HIV RNA

We also fit models with no differentiation, linear differentiation, and our favored model with skip differentiation to cell-associated HIV RNA (caRNA) levels measured in the same participants (Supplementary Fig. 7). For these data, the model without differentiation was optimal via AIC. In line with observations for HIV DNA, caRNA levels per million CD4 T cells appeared to increase slightly in TCM and decrease in TEM in these participants. But unlike for DNA, RNA increased in TTM. Together, these data suggest that RNA levels are less tightly connected across subsets, potentially because RNA is generated by DNA and additional variability in this process reduces correlations.

In silico knockout demonstrates the theoretical capacity of reservoir reduction through reduced cell proliferation and/or enhanced cell differentiation

Mechanistic modeling provides the valuable ability to project the dynamics of HIV DNA persistence in the context of perturbed CD4+ T cell subset proliferation and/or differentiation. Thus, we used the model to simulate three therapeutic scenarios over a period of three years: ART alone (Fig. 6A), ART with anti-proliferative therapy that reduces cellular proliferation for all subsets by a factor of 2 (Fig. 6B), and ART with enhanced differentation therapy that increases differentiation for all subsets by a factor of 2 (Fig. 6C). We calculated changes in HIV DNA per million CD4+ T cells over time. For ART alone, (as observed in the raw experimental data) we projected a relatively minimal median change and wide variability inclusive of increases and decreases in all subsets. For ART and anti-proliferative therapy, median HIV DNA across subsets dropped by 300 copies (or ~90%) with most simulations resulting in overall decrease. For ART and enhanced differentiation therapy, median HIV DNA across subsets dropped by 200-300 copies (or ~80–90%) with slightly more simulations inclusive of no change or increase versus anti-proliferative therapy.

Fig. 6: Simulations of modulated HIV persistence mechanisms.
figure 6

Projections of subset HIV DNA levels in all resting CD4+ T cell subsets during three theoretical therapeutic interventions: A ART alone, (B) ART and anti-proliferative therapy: 2-fold reduction in cell proliferation in all subsets, and (C) ART and enhanced differentiation therapy: 2-fold increase in cell differentiation in and out of all subsets. Box plots indicate median (center line), interquartile range (box), 1.5x interquartile range (whiskers), and outliers (open circles). Each line (N = 24) represents a simulation using parameters from each individual.

Discussion

Here, we addressed the mechanistic basis for HIV persistence during ART in different phenotypic subsets of CD4 + T cells. We measured both longitudinal levels of integrated HIV DNA and cellular turnover rates in five resting CD4 cell subsets in ART-suppressed people living with HIV (PWH). In agreement with previous studies in adults and children6,15,17,44, HIV DNA in these individuals was most commonly found in central, transitional, and effector memory subsets (TCM, TTM, and TEM). Although total levels of naïve CD4 T cells (TN) were as high, if not higher than those with a memory phenotype, TN were much less frequently found to harbor integrated HIV DNA, consistent with observations that memory subsets are easier to infect45,46 and/or that HIV DNA accumulates more quickly within them7.

We documented that HIV decays more rapidly in differentiated CD4 cell subsets (TTM and TEM) vs. less mature subsets (TCM and TN). This explains why HIV DNA appears to accumulate in less-differentiated subsets, as observed in a prior cross-sectional study17. It is possible that proliferation and/or differentiation in these subsets promotes HIV expression and immune recognition47, leading to preferential removal of latently infected cells48. However, TCM also commonly proliferated, so more experiments are needed to refine mechanisms in each subset.

Deuterium labeling data from these PWH demonstrated that turnover rates of predominantly uninfected memory CD4 cells were approximately tenfold faster than HIV DNA decay rates. Therefore, we concluded that HIV-infected cells must frequently die and repopulate by cellular proliferation (and/or differentiation). Additionally, TN turned over substantially less frequently, such that in this subset cellular longevity of latently infected cells is a potential mechanism of reservoir persistence. Most importantly, HIV-infected cells must be slightly balanced towards death during cellular turnover to allow for the HIV DNA decay we observed in the most differentiated subsets.

Cellular differentiation naturally occurs in the context of homeostasis of the total CD4+ T cell population13,14,49. However, the contribution of CD4 cell differentiation to HIV persistence has mostly been discerned indirectly9 and the magnitude of differentiation, especially as compared to cellular proliferation, has not been quantified. We also observed strong associations between HIV levels in different cell subsets over time. Therefore, we tested mathematical models of HIV DNA levels that directly linked subsets and found that models inclusive of differentiation allowed for the best agreement with the data, strengthening the evidence that integrated HIV DNA is passaged from one subset to another through physiologic pathways of CD4 T cellular differentiation.

Our optimal model included “skips” in which HIV DNA was passaged from TN to TCM and TCM to TEM without going through intermediate TSCM and TTM subsets. There may be a mechanistic explanation for why apparent “skipping” is a better fit than linear differentiation. Indeed, it is hard to reconcile the speed of antigenic response with a model in which TCM must become TTM (with 3 months half-lives) before becoming TEM. As different viral infections are controlled by phenotypically different CD8+ T cell subsets50, it may be that certain CD4 cell subsets respond to different antigens.

On an absolute scale, cellular proliferation was confirmed to be the dominant mechanism of reservoir persistence51, accounting (using the best model) for 10–10,000s of new HIV DNA copies per million CD4 cells in a given year. The upper end of this range is comparable to estimates of total reservoir sizes such that, in 1 year, individual cells carrying integrated HIV DNA may be completely refreshed while HIV DNA levels remain nearly constant. In addition to proliferation, the persistence of HIV DNA was also found to be driven by cell differentiation. An implication is that differentiation does not necessarily re-activate HIV expression and result in immune recognition. The overall picture from the model is one in which all subsets proliferate (and TCM in particular proliferate and differentiate rapidly), creating HIV DNA and passaging it onto more mature progeny.

Though we did not study HIV provirus clonality, the best model helps to mechanistically explain past observations of clonal HIV proviruses detected in different CD4 cell subsets9,10,11,52. Other recent data showed predominantly unique HIV sequences isolated from TN, whereas those retrieved from TEM were mainly clonal53,54. Previously, we reported on HIV clonality in some HOPE study participants17. Oligoclonality was generally higher in more mature subsets and these subsets also had the highest degree of sharing of the same clonotypes. The present modeling provides some mechanistic insight for these observations: we estimate that much of the integrated HIV DNA found in TEM and TTM was likely generated upon cell proliferation and/or passaged onward from the highly proliferative (and therefore highly clonal) progenitor TCM subset. Thus, these subsets are both highly likely (in absolute terms) to be clonal and to share clones in common. However, while TN is still predominantly sustained by proliferation, their lower proliferation rates mean TN is relatively less clonal than other subsets.

Armed with the mechanistic model55, we simulated in silico therapeutics and found that continually reducing cellular proliferation (anti-proliferative therapy7) or enhancing differentiation (akin to “rinse and replace”56) during suppressive ART could substantially reduce HIV DNA levels relative to the use of ART alone. These approaches achieve reduction in HIV DNA differently. It is assumed that the natural (slow) HIV DNA clearance rate in each subset arises from a balance of cell proliferation and death. Anti-proliferative therapies imbalances each subset individually, and HIV DNA clearance is projected to be faster in subsets with higher natural death (turnover) rates. Alternatively, enhancing differentiation does not increase clearance in each subset but rather pushes HIV DNA into the most differentiated compartments, in which HIV DNA clears more rapidly.

In all our simulations, sustained therapy for several years was required to meaningfully reduce HIV DNA, which presents both clinical and experimental challenges for validation. Nevertheless, a human study on IL-15 superagonist N-803, an anti-cancer drug that might promote differentiation57, achieved a small but significant reduction in inducible HIV proviruses58. Individuals taking Dasatinib, a different anti-cancer agent that restricts antigen-driven and homeostatic proliferation of CD4+ T cells in PWH59, also appeared to have lower HIV DNA levels than those taking ART alone, but whether this effect is driven by anti-proliferation requires more research60.

Predictions about anti-proliferative or pro-differentiation therapy from the in silico models should be interpreted carefully. For instance, larger HIV clones found during ART were observed to be less likely to reactivate when ART was stopped (with a continuous relationship between probability and size61), perhaps because they are either genetically defective62,63,64,65 or integrated within epigenetically silenced locations (graveyards)35,36,66. This could suggest that more proliferative clonotypes, which in turn might be more affected by anti-proliferative therapy, may be less relevant for predicting viral rebound. Because we did not have viral rebound data, we did not explore models that included viral reactivation67,68,69 (also precluding simulation of latency reversal agents70).

There are also experimental caveats to this work. CD4 cell subset categorization is inherently imperfect because identifying cells by cell surface markers requires defining thresholds and dichotomizing what is likely a continuum of cell maturation states71. In particular, TN may be heterogeneous to the point of resembling other phenotypes72. We could not distinguish the loss of HIV genomes in TEM through cell death or migration or differentiation outward because we did not successfully sort high-purity terminally differentiated cells.

On the modeling side, our absolutely best scored model admitted rates that were not necessarily biologically plausible. By constraining these rates, we derived a reasonable model that still fits data accurately. Going forward, it would be ideal to collect more temporally resolved data to refine these rates. Other simplifications include that we did not model HIV DNA influx into TN cells -- although small numbers of recent thymic emigrants and/or bone marrow progenitors can be infected73,74. Cellular trafficking to other anatomic compartments and the role of resident memory CD4+ T cells75 were not explicitly modeled; on the other hand, the composite movement of cells in and out of tissues are likely balanced over the multi-year study timescales in our study. Finally, on a conceptual level, cell differentiation and proliferation are fundamentally single cell/lineage properties, whereas we interpreted estimated rates as frequencies of cellular processes averaged over cell populations, which inherently minimizes within-host stochastic effects.

A strength of this study is the direct comparison of CD4 cell turnover and HIV DNA decay in the same participants and subsets. Still, most CD4 cells during ART are not HIV-infected, so it is unclear whether measured turnover rates precisely represent those of HIV-infected cells. HIV-infected cells that persist may be particularly biased toward cell survival and/or proliferation76,77, or more likely to express signatures indicating resistance to immune-mediated killing35,36,66. Our modeling does not reach this level of genetic precision, but our observations of proportional DNA decay in more differentiated CD4 cell subsets (TTM and TEM) indicate that survival mechanisms are likely insufficient to overcome clearance mechanisms in these subsets.

Finally, it would be desirable to estimate mechanistic contributions specifically to the persistence of intact proviruses, which are much more rare but known to clear more rapidly than defective HIV proviruses in the first years of ART33,34,62,65,78. Depth remains a challenge in many HIV reservoir studies, and filtering HIV DNA into both subsets and by intactness has admitted very low proviral counts16. We hope this limitation can be overcome in the future.

In summary, by examining HIV DNA levels and cellular turnover in CD4+ T cell subsets, we found that HIV DNA decays faster in differentiated CD4 cell subsets and quantified how both cellular proliferation and differentiation contribute to HIV persistence. Our simulations suggest that the same mechanisms that HIV exploits for its persistence might also be leveraged for its elimination.

Methods

Inclusion and ethics

All participants were over 18 years old and provided written informed consent for inclusion before they participated in the study. The study (NCT00187512) is an observational, prospective study of HIV-1 infected volunteers designed to provide a specimen bank of samples with carefully characterized clinical data. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the University of California San Francisco Committee on Human Research.

Study participant characteristics

Thirty-seven persons living with HIV (PWH) on ART were recruited between 2015 and 2019 from the clinic-based SCOPE and OPTIONS cohorts at Zuckerberg San Francisco General Hospital. Study participants returned yearly for 1-3 time points of follow up. The SCOPE cohort enrolls PWH with chronic HIV, whereas the OPTIONS cohort enrolls PWH < 12 months (before 2003) and <6 months (after 2003) following HIV antibody seroconversion. Viral suppression by ART was a requirement for study entry. Duration of viral suppression was estimated based on clinic records (typically assessed every 3–6 months). HIV acquisition timing for each participant was estimated as previously described79.

Isolation of CD4+ T cell subpopulations

All participants underwent leukaphereses performed as outpatients. PBMC were isolated and viably cryopreserved. Frozen PBMC were thawed and CD4 T cells enriched with the EasySep™ Human CD4 + T Cell Negative Selection Enrichment Kit (Stemcell). Cells were stained with Live/Dead Fixable Aqua (Life Technologies) and the following monoclonal antibodies cocktail: anti-CD3-FITC, anti-CD4-AlexaFluor700, anti-CCR7-PE-Cyanine7, anti-CD27-APC, anti-HLA-DR-APC H7, anti-CD57-Brilliant Violet 421, and anti-CD95-PE (Becton Dickinson) as well as anti-CD45RA-ECD (Beckman Coulter). HLA-DR- CD4 + T cell subpopulations were sorted on a FACS ARIA II flow cytometer (BD Biosciences) at >97% purity. Dry pellets were snap-frozen at −80 °C. Flow cytometry data were analyzed on FACSDiva v8.0.1 (BD Biosciences) and FlowJo v8.7 (Tree Star). Sorting schema is provided in Supplementary Fig. 2.

Integrated HIV DNA quantification

Total DNA was extracted using the Allprep DNA/RNA/miRNA Universal Kit (Qiagen). Integrated HIV DNA copies were quantified with a two-step PCR reaction80 using isolated genomic DNA for PCR amplification instead of whole cell lysates. Integrated HIV DNA was pre-amplified with two Alu primers and a primer specific for the HIV LTR region, in addition to primers specific for the CD3 gene to determine cell counts. Nested qPCR was then used to amplify HIV and CD3 sequences from the first round of amplification. Specimens were assayed with up to 500 ng cellular DNA in triplicate and copy number was determined by extrapolation against a 5-point standard curve (3–30,000 copies), using extracted DNA from ACH-2 cells.

Cell-associated HIV RNA quantification

Total RNA was extracted using the Allprep DNA/RNA/miRNA Universal Kit (Qiagen) with on-column DNase treatment (Qiagen RNase-Free DNase Set). HIV RNA levels were quantified with a qPCR TaqMan assay using LTR-specific primers F522-43 (5′ GCC TCA ATA AAG CTT GCC TTG A 3′; HXB2 522-543) and R626-43 (5′ GGG CGC CAC TGC TAG AGA 3′; 626-643) coupled with a FAM-BQ probe (5′ CCA GAG TCA CAC AAC AGA CGG GCA CA 3) on a StepOne Plus Real-time PCR System (Applied Biosystems, Inc.)81. Up to 500 ng of total RNA per sample were characterized in triplicate, and copy numbers were determined by extrapolation against a 7-point standard curve (1–10,000 copies). The input cell number in each PCR well was estimated using independent qPCR measurement of the cellular housekeeping human RPLP0 gene.

Estimating the slope of subset infection fraction

To estimate the slope of HIV subset infection frequency (per million cells of each resting subset), we assumed that the longitudinal kinetics of each subset infection frequency \({f}_{X}\) followed an independent exponential model:

$$\dot{{f}_{X}}={\Delta }_{X}{f}_{X}$$
(1)

So that each subset (denoted by \(X\)) has a rate of change per year (or log-linear slope) \({\Delta }_{X}\). Using MONOLIX25, we estimated the five values of \({\Delta }_{X}\). Importantly, we did not assume this rate was negative, such that increases (rather than clearance) were possible. Then, for subsets with negative values of this rate, the half-life in years could be estimated as \({hl}=-{{{{\mathrm{ln}}}}}\left(2\right)/{\Delta }_{X}\).

Calculating the percentage of cellular turnover events that result in HIV repopulation

In Fig. 2D, we used each subset infection frequency decay rate \({\Delta }_{X}\) and its matching cellular turnover rate \({{{{{{\mathscr{T}}}}}}}_{X}\) to calculate the percentage of cellular turnover events resulting in HIV repopulation. Assuming the net decay can be accounted for as a balance of turnover and repopulation \({\Delta }_{X}={r}_{X}-{{{{{{\mathscr{T}}}}}}}_{X}\), the repopulation percentage is \({r}_{X}/{\Delta }_{X}\) or:

$$\frac{{r}_{X}}{{\Delta }_{X}}=\frac{{\Delta }_{X}+{{{{{{\mathscr{T}}}}}}}_{X}}{{\Delta }_{X}}$$
(2)

Normalized correlations between subset levels

Further evidence for connections between subsets emerged from a correlation analysis (Supplementary Fig. 4). For both subset frequencies and subset HIV DNA data, values were normalized to each individual’s longitudinal average value (i.e., \({\widetilde{f}}_{X}(t)={f}_{X}(t)/{\left\langle {f}_{X}\right\rangle }_{t}\)). This procedure prevents spurious correlations (Simpson’s paradox) related to large or small absolute reservoir sizes. Then, pairwise Spearman correlations were computed using the SciPy Stats package.

Mechanistic mathematical models for subset HIV DNA

Our general model of the connected system of HIV DNA in each subset is governed by a system of differential equations that splits the kinetics of HIV DNA into the processes of proliferation, death, and differentiation between subsets. Others have used similar equations82. Each model can be written in vector form as:

$$\dot{{H}_{s}}=F({H}_{s}|{\theta }_{s},\, {\phi }_{k:s},\, {\phi }_{s:k})$$
(3)

Where subset HIV DNA in each subset is the vector \({H}_{s}{{{{{\boldsymbol{=}}}}}}\{{H}_{N},\, {H}_{S},\, {H}_{C},\, {H}_{T},\, {H}_{E}\}\), and the clearance and differentiation rates are written with the vectors as \({\theta }_{s}\) and \({\phi }_{i:j}\), respectively. Differentiation could be generally from different compartments into others so it is not necessarily the same sized vector in each model. The models tested are numbered as follows:

Model 1 assumes each subset is independent and decays or grows independently (similar to the model used for subset infection frequency in Eq.(1)):

$${\dot{H}}_{s}={\theta }_{s}{H}_{s}$$
(4)

Based on past observations of a net decrease in HIV DNA over years of ART, Model 2 tested the hypothesis that all subset HIV DNA decays independently by using the same structure as Model 1 but forcing \({\psi }_{s} \, < \, 0\).

Model 3 assumes a linear differentiation model whereby each subset had a decay term and differentiation terms in and out from most proximal subsets. There are, therefore, four differentiation terms: \({{{{{\boldsymbol{\phi }}}}}}=\{{\phi }_{N:S},\, {\phi }_{S:C},\, {\phi }_{C:T},\, {\phi }_{T:E}\}\).

$${\dot{H}}_{s}={\theta }_{s}{H}_{s}+{\phi }_{i:s}{H}_{s}-{\phi }_{s:j}{H}_{s}$$
(5)

Model 4 assumed a more complex differentiation pattern derived from the significant correlations between subsets observed in Supplementary Fig. 4. In this model, there are 6 differentiation terms, the same four linear differentiation rates as in Model 3, and two additional skip terms: \({\phi }_{N:C}\) and \({\phi }_{C:E}\).

For models including differentiation, we generally assumed that the differentiation rate of HIV DNA into naïve cells from some unknown/unobserved compartment was zero: \({\phi }_{?:N}=0\). This assumption is based on TREC content observations suggesting thymic emigrants are not carrying HIV DNA frequently, if at all17. We also assumed differentiation out from TEM was zero:\({\phi }_{E:?}=0\). There may be other terminally differentiated cells that TEM can transition into, but these were not observed in the study. Therefore, the clearance rate of TEM effectively covers death and differentiation out and is denoted \({\psi }_{E}\) rather than \({\theta }_{E}\) to make this explicit in Fig. 3.

As another approach, Model 5 assumes each subset was independent and followed a logistic growth term with a carrying capacity. This tests the hypothesis that decay was not occurring and that HIV DNA levels in each subset had a rough equilibrium:

$${\dot{H}}_{s}={r}_{s}{H}_{s}(1-{H}_{s}/{K}_{s})$$
(6)

Yet another approach (Model 6) more explicitly tested the hypothesis that proliferation and differentiation were linked. We assumed that some fraction \(\zeta \in [{{{{\mathrm{0,1}}}}}]\) of repopulation events are associated with differentiation:

$${\dot{H}}_{s}={\theta }_{s}{H}_{s}(1-{\zeta }_{s:s+1})+{\theta }_{s-1}{H}_{s-1}(1-{\zeta }_{s-1:s})$$
(7)

As a final note, multiphasic decay is well documented for HIV DNA clearance after initiation of ART32,83. However, these phases are generally equilibrated within a year or two of ART initiation, which was irrelevant to our data.

Model fitting and selection with population non-linear mixed effects modeling (pNLME)

Model fit and selection was performed using MONOLIX25 software, which employs a population nonlinear mixed-effects (pNLME) approach. We assumed assay variability (noise) was log normal. Repopulation parameters were generally assumed to be normally distributed (allowing for negative values), and differentiation rates were generally assumed to be lognormally distributed. Population parameters were found to be uncorrelated, but across individuals, certain parameters were strongly correlated (Supplementary Fig. 5C). This finding suggests that those with higher rates in one subset tend to also have higher rates in others. Individual best fit parameters for each participant using the optimal model are collected in Supplementary Data 1.

Imputing turnover rate to define mechanistic components

The underlying assumption of the repopulation rate is that it is a balance of proliferation and death, \({\theta }_{s}={\alpha }_{s}-{\delta }_{s}\). To estimate these component rates, we use the cellular turnover data (Fig. 3B). We begin with a general equation for the \(i\)-th HIV DNA subset level from the best model and use a quasistatic assumption \(\dot{{H}_{i}}=0\) to indicate that cellular turnover mitigates a balance inward and outward of each subset. Yet, this balance has an absolute value \({{{{{{\mathscr{T}}}}}}}_{i}\) such that in some subsets, although net zero change occurs, there is more inward and outward flow. We therefore set inward and outward mechanisms (from Eq. (5)) equal and split repopulation into proliferation and death, leaving:

$${{\varSigma }_{j}\phi }_{j:i}^{{in}}{H}_{j}+{\alpha }_{i}{H}_{i}=({\varSigma }_{k}{\phi }_{i:k}^{{out}}+{\delta }_{i}){H}_{i}.$$
(8)

The turnover rate \({{{{{{\mathscr{T}}}}}}}_{i}\) (per year) then can be factored out of the rhs as \({{{{{{\mathscr{T}}}}}}}_{i}={\varSigma }_{k}{\phi }_{i:k}^{{out}}+{\delta }_{i}\), such that the death rate can be defined as the turnover rate minus the differentiation rate out:

$${\delta }_{i}={{{{{{\mathscr{T}}}}}}}_{i}-{\varSigma }_{k}{\phi }_{i:k}^{{out}}$$
(9)

And similarly solving \({{\varSigma }_{j}\phi }_{j:i}^{{in}}{T}_{j}+{\alpha }_{i}{T}_{i}={{{{{{\mathscr{T}}}}}}}_{i}{T}_{i}\), leads to:

$${\alpha }_{i}={{{{{{\mathscr{T}}}}}}}_{i}-{{\varSigma }_{j}\phi }_{j:i}^{{in}}{T}_{j}/{T}_{i}$$
(10)

Which we approximate by using the values of \({T}_{i}(0)\).

Tracking equations to distinguish mechanistic contributions

After imputing the turnover rates to define the mechanistic compartments, the HIV DNA created at any time \(t\) (instantaneously in the interval \(\Delta t\)) due to each mechanism was computed. This computation occurs independently after solving the differential equations. Thus, the proliferation and death terms follow

$${H}_{s}^{{{{{{\rm{pro}}}}}}}\left(t\right)={{\alpha }_{s}H}_{s}\left(t\right)\Delta t,\, {H}_{s}^{{{{{{\rm{death}}}}}}}(t)={{\delta }_{s}H}_{s}(t)\Delta t,$$
(11)

While the differentiation terms follow:

$${H}_{s}^{{{{{{\rm{diff}}}}}}-{{{{{\rm{in}}}}}}}\left(t\right)=\mathop{\sum }\limits_{k}{\phi }_{{ks}}{H}_{k}(t)\Delta t,\, {H}_{s}^{{{{{{\rm{diff}}}}}}-{{{{{\rm{out}}}}}}}(t)=\mathop{\sum }\limits_{k}{\phi }_{{sk}}{H}_{s}(t)\Delta t.$$
(12)

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.