At present, a cure for HIV-1 infection is considered elusive owing to infected cells that harbour genome-intact, chromosomally integrated viral DNA and persist long-term despite suppressive antiretroviral treatment (ART)1,2. After initiation of ART, the frequency of these cells, here termed HIV-1 reservoir cells, declines over time3; however, this process is slow, and the mechanisms underlying this decline are not well understood. ART itself is unlikely to directly contribute to the longitudinal decrease of viral reservoir cells because antiviral drugs have no cytotoxic activity against virally infected cells; they merely protect HIV-uninfected cells against viral infection and prevent the seeding of new viral reservoir cells. Instead, it is likely that immune mechanisms have an important role in the longitudinal reduction of HIV-1 reservoir cells during ART, specifically for the decline of the small subset of cells encoding for functional, genome-intact proviral sequences, which seem to decrease faster compared to defective proviruses4,5,6,7. Footprints of antiviral immune effects against the HIV-1 reservoir cells may be more visible when qualitative features of HIV-1-infected cells, including proviral chromosomal integration sites and transcriptional activities, are analysed8,9,10,11,12, and when viral reservoir profiles are evaluated over extended periods of suppressive ART. However, previous studies suggest that viral reservoir cells can frequently resist host selection forces, and long-term persistence of highly transcriptionally active proviruses that seem to effectively avoid host immune activity has been reported in multiple studies10,13,14. Mechanisms that allow HIV-1-infected cells to withstand host immune effects and to survive lifelong are not well defined at present.

Single-cell proteogenomic profiling

Interrogation of the surface phenotype of HIV-1 reservoir cells may be highly informative for understanding the susceptibility or resistance of HIV-1 reservoir cells to host immune responses; however, such investigations have been precluded in the past by technical limitations that did not allow profiling the surface phenotype of these cells directly ex vivo, and instead relied on a phenotypic analysis of a small subset of reservoir cells that produce viral antigen in response to in vitro stimulation15,16 or on a biocomputational inference of the reservoir cell profile17. As an alternative approach, we developed an experimental strategy, here termed ‘phenotypic and proviral sequencing’ (PheP-seq), which was designed to evaluate the phenotype of patient-derived HIV-1-infected cells, using a single-cell next-generation sequencing assay18 (Extended Data Fig. 1a) that permits one to jointly profile phenotypic markers and selected parts of chromosomal DNA from single cells19. This approach complements recent studies using cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) that focus on a combined analysis of phenotypic markers and the cellular transcriptome in HIV-1 reservoir cells20,21. For our experiments, memory CD4+ (mCD4+) T cells isolated from the peripheral blood (PB) of study participants infected with HIV-1 (n = 5) were labelled with a cocktail of oligonucleotide-tagged antibodies directed to T cell surface markers (n = 53), without any type of prior in vitro activation or manipulation. Afterwards, cells were subjected to a single-cell analytic platform technique designed to conduct a multiplex polymerase chain reaction (PCR) for amplifying small fragments of genomic HIV-1 DNA, coupled with amplification of corresponding antibody-bound oligonucleotide tags. Amplification products were then pooled, sequenced and biocomputationally deconvoluted to isolate sequencing reads originating from individual cells. For a deep interrogation of the proviral sequence in each infected cell, a total of 18 non-overlapping custom-designed primer pairs were synthesized that spanned strategically important and phylogenetically conserved regions in the HIV-1 proviral genome and allowed for simultaneous amplifications of small HIV-1 DNA fragments approximately 200–300 base pairs (bp) in length from single virally infected cells (Fig. 1a,b, Extended Data Fig. 1b and Supplementary Table 1); specifically, we included primers from the intact proviral DNA assay (IPDA)22 that permit identifying proviruses with high probability to be genome-intact. For enhanced analytic depth, we also included primers spanning the virus–host chromosomal integration site junctions of large, sequence-identical proviral clones previously characterized in our study participants; amplification of such previously defined chromosomal regions allows one to unequivocally identify clonal HIV-1-infected cells with known proviral sequences and to relate phenotypic information of infected cells to proviral chromosomal locations.

Fig. 1: Combined assessment of cellular phenotype and proviral sequence in HIV-1-infected cells (PheP-seq) from PB.
figure 1

a, The genomic location of 18 different HIV-1 amplification products incorporated in the PheP-seq assay; these 18 amplicons cover approximately 4,000 bp of the HIV-1 genome; gaps between amplification products are indicated. Note that additional amplicons corresponding to predefined virus–host junctions of large proviral clones were also included. Amplicons 2 and 16 were previously described for the IPDA assay. LTR, long terminal repeat. Ψ, retroviral psi packaging element. b, A virogram summarizing individual HIV-1 DNA amplification products in single HIV-1-infected cells from study participant 2 (P2). Each row represents data from one infected cell; the numbers of sequences meeting criteria for categories 1, 2, 3 and 4 are shown. For longer amplification products (amplicons 5, 10 and 12), centrally located nucleotides could not be efficiently amplified using the Illumina NextSeq 2 × 150-bp sequencing read length used in this study. c, Venn diagrams summarizing the numbers of category 1–4 HIV-1-infected cells, shown separately for each study participant (P1–P5) and for all study participants combined. The numbers of sequences meeting criteria for different proviral categories are shown individually in each category. d, Circular maximum-likelihood phylogenetic trees of selected proviral sequences from each study participant, determined by PheP-seq. Clonal sequences are indicated; chromosomal integration sites are listed when available. Graphics in a were created using

Global analysis of HIV-1 reservoir cells

Using this assay, we analysed 530,143 individual mCD4+ T cells in PB from five study participants (four who had remained on suppressive ART for approximately 10 years, and one maintaining undetectable levels of HIV-1 plasma viraemia in the absence of ART (Extended Data Fig. 2a,b). To analyse the proviral landscapes in infected cells, we introduced the following classification system (Fig. 1b,c, Extended Data Figs. 1b and 2c and Supplementary Table 2). Category 1 included cells (n = 2,859) considered HIV-1-infected and harbouring any type of HIV-1 proviruses; for added rigour, we required these cells to have a total of at least 20 proviral reads from at least two HIV-1 amplicons. Category 2 contained cells (n = 193) harbouring proviruses enriched for genome-intact HIV-1 sequences23,24 and were defined by having a total of at least 20 sequencing reads from the two IPDA amplicons. In case an IPDA amplicon was missing (owing to, for example, suboptimal PCR performance), a total of at least 15 proviral amplicons had to be present to qualify for inclusion in category 2. Sequences showing statistically significant signs of hypermutation mediated by APOBEC3G or APOBEC3F were excluded from category 2. Category 3 included cells (n = 125) belonging to large clones of virally infected cells that were defined by amplicons spanning proviral integration sites (Extended Data Fig. 1c) or, in a limited number (n = 32) of cells, by completely identical proviral sequences from all available amplicons (Fig. 1d). This category was further subdivided into cells containing genome-intact clonal proviruses (category 3.1, n = 56 cells) and defective clonal proviruses (category 3.2, n = 69 cells), and exhibited expected clonal phylogenetic clusters (Fig. 1d and Extended Data Fig. 2c). Category 4 (total of n = 398) cells harboured proviral genomes with statistically significant sequence hypermutations. For comparative purposes, cells without detectable proviral sequencing reads (category 0) were considered as HIV-uninfected cells.

For an initial global analysis of the phenotype of HIV-1-infected cells from PB, we visualized in silico-gated CD3+CD4+ cells (after exclusion of contaminating CD45RA+CCR7+ naive T cells) from the different categories on uniform manifold approximation and projection (UMAP)25 plots, classifying the mCD4+ T cell pool in five distinct, computationally defined phenotypic clusters (Fig. 2a,b). This analysis demonstrated a diverse distribution of category 1 HIV-1-infected cells across the entire spectrum of mCD4+ T cells, with only minor under- or over-representation in specific cell clusters relative to HIV-uninfected cells; specifically, there was no evidence of enhanced clustering of virally infected category 1 cells in activated, HLA-DR+ or CD38+mCD4+ T cells (Extended Data Fig. 3a–c). A similar observation was made for category 4 cells (harbouring hypermutated proviruses), which closely imitated the phenotypic distribution pattern of category 0 and 1 cells (Fig. 2a,b). The phenotypic profiles of category 2 and 3 cells were markedly biased in comparison to those of category 0, 1 and 4 cells; cells from both categories 2 and 3 were disproportionally enriched within clusters exhibiting features of a more mature, effector memory phenotype, characterized by high levels of expression of CD45RO and low levels of expression of CCR7 and CD62L (Fig. 2a,b and Extended Data Fig. 3a–c). Together, these results indicate that cells encoding intact proviruses and/or being part of large proviral clones exhibit distinct phenotypic properties; by contrast, virally infected cells in categories 1 and 4, mostly consisting of lymphocytes harbouring defective proviruses, remained phenotypically largely indistinguishable from HIV-uninfected cells in this global phenotypic analysis.

Fig. 2: Phenotypic profile of patient-derived HIV-1-infected cells circulating in PB.
figure 2

a, Global visualization of phenotypic properties of HIV-1-infected CD4+ T cells from five study participants using two-dimensional UMAP plots; five computationally defined spherical clusters reflecting phenotypically distinct mCD4+ T cell subsets are shown. One plot is shown for each cell category. b, A heatmap summarizing the normalized phenotypic profile of cells in each spherical cluster, based on 53 surface markers included in this study. c, Volcano plots showing enrichment ratios of marker-positive cells and corresponding FDR-adjusted (adj.) P values for all 53 surface markers included in this study; selected markers are labelled individually. Marker sensitivities, calculated as the proportions of marker-positive cells in the indicated categories of cells, are indicated by dot sizes. Comparisons between indicated categories of cells are depicted; bootstrapped data from all five participants are shown. Significance was tested using a two-sided chi-squared test; FDR-adjusted P values are shown.

For a formal statistical evaluation of the phenotypic profile of HIV-1-infected cells from PB, we first established an average read count for unspecific isotype control antibody binding, normalized to the total read count in each cell; this background was then used as a cutoff to calculate the proportion of cells expressing a given surface marker in each category of virally infected cells, relative to the HIV-uninfected cells (Supplementary Table 3). To account for differences in the total number of virally infected cells available from each participant, which could disproportionately bias results by participants with higher numbers of HIV-1-positive cells, we used a bootstrapping analysis approach so that data from each participant had an equal impact on final statistical outcomes. Several surface markers with significant enrichment in category 1 HIV-1-infected cells were noted, relative to HIV-uninfected cells (Fig. 2c). Although differences for many of these markers reached very definitive levels of statistical significance owing to relatively high numbers of cells in category 1, their biological significance was questionable, given that only very small proportions of category 1 cells were positive for such markers, and/or there were rather limited fold enrichments relative to uninfected cells. This was true for expression of, for example, CD276, GITR, CD57, interleukin-10 receptor (IL-10R), HLA-DR, LIGHT and OX40. Overall, inter-individually consistent surface marker expression differences between category 1 cells and HIV-uninfected cells were modest, suggesting that category 1 cells frequently show only slight phenotypic variations relative to their uninfected counterparts (Figs. 2c and 3a and Extended Data Fig. 4). Notably, weaker phenotypic differences were also observed for category 4 cells relative to HIV-uninfected cells (Fig. 3a and Extended Data Figs. 4 and 5a).

Fig. 3: Differentially expressed surface markers on patient-derived HIV-1-infected CD4+ T cells circulating in PB.
figure 3

a, Density plots indicating the surface expression of selected phenotypic markers on indicated categories of cells in a cross-sectional analysis from all five participants after approximately 10 years of ART. b, Density plots reflecting surface expression of selected phenotypic markers in longitudinal samples collected at approximately year 1 (‘early’) and year 10 (‘late’) of ART in study participants P1 and P2. c, ‘Simplified Presentation of Incredibly Complex Evaluations’ (SPICE) diagrams reflecting the proportions of HIV-1-infected cells expressing ensemble phenotypic markers in a cross-sectional analysis of samples collected from five study participants after 10 years of ART. The pie charts indicate the relative proportions of cells expressing 0, 1, 2, 3 or 4 of the indicated markers; individual markers are shown as overlaying arches. One separate diagram is shown for each category of cells; the data in the left panel reflect phenotypic markers associated with resistance to immune-mediated killing, and the data in the right panel show immune checkpoint markers. Significance was tested using an FDR-adjusted permutation test.

Reservoir cells harbouring intact HIV-1

For PB cells harbouring proviruses with a high probability to be genome-intact (category 2 cells), we noted more pronounced statistical enrichments for a list of markers that are associated with functional inhibition of T cells (Figs. 2c and 3a and Extended Data Fig. 4). Such markers included the immune checkpoint receptors PD1, TIGIT, BTLA, 2B4 and KLRG1; only minor, nonsignificant changes were observed for CTLA4, TIM3 and LAG3. By increasing the threshold for cellular activation, these markers may limit proviral gene expression and reduce subsequent visibility and vulnerability of reservoir cells to host immune recognition mechanisms; pharmacological blockade of one checkpoint marker (PD1) was indeed shown to increase proviral gene expression in recent animal26 and human clinical studies27,28. Notably, category 2 cells also showed elevated expression of several markers that act as ligands for inhibitory receptors expressed on immune effector cells and increase resistance of target cells to CD8+ T or natural killer (NK) cell-mediated immune activity by conferring negative immunoregulatory impulses (Figs. 2c and 3a); overexpression of such markers might defend HIV-1-infected cells against cellular immune responses and may translate into a longitudinal selection advantage. In particular, we observed upregulation of the herpes virus entry mediator (HVEM), which negatively regulates T and NK cells through binding to BLAT29 and CD160 (refs. 30,31) and has a known role for protecting virally infected cells against host immune effects, supported by the discovery of the HVEM paralogue UL144 encoded by human CMV32 (Fig. 3a). In addition, we noted upregulation of the poliovirus receptor (PVR), which can inhibit cytotoxic effects of T and NK cells by interactions with its high-affinity ligands TIGIT33 and KIR2DL5 (ref. 34), respectively. Enrichment within category 2 cells was also observed for PDL1 (the physiological ligand for PD1) and for HLA-E, a non-classical MHC class I molecule that acts as the high-affinity ligand for the inhibitory receptor NKG2A expressed on NK and T cells35; however, these markers were expressed on more limited numbers of cells and statistical scores were weaker (false discovery rate (FDR)-adjusted P = 0.02 for HLA-E and P = 0.04 for PDL1). Category 2 cells harbouring intact HIV-1 DNA were further enriched for a higher level of surface expression of CD49d, CD45RO and CD95, all of which can be associated with T cell activation and differentiation towards an effector memory profile. Moreover, they exhibited elevated expression of the receptor for transforming growth factor β (TGFβ), which can promote proviral latency36, of the receptor for IL-21, and of CD127, the receptor for the homeostatic cytokine IL-7. In a subsequent analysis, we noted little phenotypic variation between cells harbouring clonal genome-intact proviruses and intact proviruses detected only once, with the notable exception of KLRG1, which was more strongly expressed on the former (Extended Data Fig. 4a). Moreover, we compared the phenotypic profiles of category 2 cells with those of cells encoding for defective proviruses (category 1 cells after excluding category 2 cells; Figs. 2c and 3a); these analyses involved lower numbers of cells and reached lower levels of statistical significance, but also identified elevated surface expression of markers associated with resistance to immune-mediated killing (PVR, HLA-E and HVEM) as key distinguishing features for cells enriched for harbouring genome-intact HIV-1 DNA. Markers associated with functional inhibition of T cells, in particular PD1, were also upregulated in category 2 cells relative to cells harbouring defective proviruses (Figs. 2c and 3a).

Phenotype of clonal reservoir cells

In a dedicated analysis of cells harbouring clonally expanded proviruses (category 3), we noted strong enrichment for expression of KLRG1, which, in addition to its role as an inhibitory checkpoint marker binding to E-cadherin37, has been associated with terminal differentiation and cellular senescence38; enrichment of KLRG1 in category 3 cells probably reflects a history of strong clonal proliferative turnover in this specific cell population (Figs. 2c and 3). PDL1 emerged as a negative immunoregulatory marker with a more notable enrichment on category 3 cells, suggesting a role of this marker for protecting clonal HIV-1-infected target cells against T cell immune attacks. Among markers associated with functional inhibition of T cells, TIGIT exhibited the most notable upregulation in category 3 cells; expression of PD1 and BTLA was also increased on category 3 cells. For a more detailed analysis of individual clones of HIV-1 reservoir cells, we investigated the phenotype of clonal HIV-1-infected cells sharing a common chromosomal integration site. In a global analysis, cells belonging to the same clone tended to cluster near one another on a UMAP plot, suggesting a rather homogenous phenotypic behaviour of HIV-1-infected cells derived from the same clone (Fig. 4a); however, some clones showed a higher level of phenotypic variability. Moreover, we noted that the upregulation of specific immunoregulatory markers on category 3 cells, including PDL1, HVEM and PVR, was not selectively driven by specific cell clones but occurred relatively consistently across most analysed clonal reservoir cell populations (Fig. 4b); nevertheless, some inter-clonal phenotypic diversity was noted and deserves additional investigation in future studies. Together, these findings suggest that clonal HIV-1-infected cells, which may be under more profound immune selection pressure due to a higher propensity for viral reactivation during cellular proliferation, may exhibit a specific surface phenotype that is likely to increase a cell’s ability to resist host immune selection forces.

Fig. 4: Phenotypic profile of individual HIV-1-infected CD4+ T cell clones circulating in PB.
figure 4

a, Two-dimensional UMAP diagrams indicating the global phenotypic profile of selected HIV-1-infected CD4+ T cell clones relative to HIV-uninfected mCD4 T cells. b, Density plots showing the surface expression of selected phenotypic markers on individual HIV-1-infected CD4+ T cell clones, identified by proviral chromosomal integration sites. The data were collected in a cross-sectional analysis from study participants after approximately 10 years of ART. Results from all clonal HIV-1-infected cells (category 3) and HIV-uninfected cells (category 0) are shown as references.

Combinatorial and longitudinal analysis

We subsequently considered a combinatorial analysis of surface markers that may be jointly upregulated on HIV-1-infected cells from the PB, applying an algorithm previously developed for analysing multivariate flow cytometry datasets39. These additional studies demonstrated that after prolonged periods of viral suppression, proportions of cells expressing none of the markers associated with protection from killing by T or NK cells (HVEM, PVR, HLA-E and PDL1) were significantly higher in category 0 and category 1 cells, whereas the fractions of cells simultaneously expressing two or more of these markers were significantly increased among category 2 and 3 cells (Fig. 3c). A similar observation was made for immune checkpoint molecules (PD1, KLRG1, BTLA and TIGIT); by contrast, a combinatorial analysis failed to demonstrate a pronounced enrichment of category 2 or 3 cells with combinations of markers associated with immune activation (CD38, CD49d, CD95, CD69 and HLA-DR; Extended Data Fig. 5b). These findings were corroborated by experiments evaluating the frequencies of intact proviruses in sorted mCD4+ T cells expressing the described marker combinations, although some variations in phenotypic patterns were noted among different study participants (Extended Data Figs. 6 and 7). Collectively, our results indicate that category 2 and 3 reservoir cells are characterized by ensemble phenotypic signatures of immune checkpoint molecules that probably reduce a cell’s propensity to reactivate proviral gene transcription, and of marker combinations that can protect infected target cells against killing by T and NK cells. Such a phenotypic profile seems evolutionarily advantageous by minimizing exposure to and killing by host immune effector cells and is consistent with functional studies suggesting increased resistance of HIV-1 reservoir cells to T cell-mediated killing40,41. Functional experiments conducted with cells from our study participants also suggested enhanced resistance of reservoir cells encoding for intact HIV-1 DNA to HIV-1-specific CD8+ T cell-mediated killing (Extended Data Fig. 8).

To longitudinally evaluate the stability of the reported phenotypic profile of HIV-1 reservoir cells, we carried out PheP-seq assays on PB samples collected early (1–2 years) after ART initiation in study participants 1 and 2, from whom such samples were available for analysis (Extended Data Fig. 2). In total, n = 257,574 individual mCD4+ T cells were analysed from these early time points, of which n = 4,156 and n = 44 were classified as category 1 and 2 cells, respectively. Notably, category 2 reservoir cells already exhibited an increased level of expression of immune checkpoint markers and ligands for inhibitory receptors of T or NK cells at early time points after ART initiation, although such changes tended to be less pronounced relative to later time points of ART for some markers, specifically for PVR and HVEM (Fig. 3b and Extended Data Fig. 5c). Moreover, there was a trend for elevated expression for the receptors of IL-21 and TGFβ after more extended durations of ART. Together, these observations suggest that selection of reservoir cells starts early after ART initiation, possibly implying that only a small subset of category 2 cells with specific phenotypic properties are predestined to enter the long-lasting reservoir cell pool. As after longer durations of ART, category 1 reservoir cells from the early time point exhibited only limited phenotypic variations relative to uninfected cells from the same time point. Few cells analysed at early stages of ART meet our criteria for category 3 reservoir cells, precluding a specific comparison of such cells between early and late ART.

HIV-1 reservoir cells in lymph nodes

As most HIV-1 reservoir cells are located in lymphoid tissues42, we subsequently conducted a phenotypic analysis of virally infected cells from inguinal lymph nodes (LNs) collected by surgical excision from three people living with HIV-1 who remained on continuous suppressive ART for approximately 10–15 years (Extended Data Fig. 9a,b). In total, we analysed the phenotype of n = 396,628 single mCD4+ T cells from these samples, of which n = 3,888, n = 111 and n = 39 met criteria for category 1, 2 and 4 cells, respectively; the small number of category 4 cells may be related to primer mismatches to hypermutated proviral sequences. Category 3 cells were not individually analysed in LN samples owing to the limited number of cells meeting our category 3 criteria (Extended Data Fig. 9c). A global analysis of the phenotypic landscape of HIV-1 reservoir cells from LNs (Fig. 5a,b) indicated that viral reservoir cells exhibited focal enrichments in two different cell clusters, one characterized by high-level surface expression of CXCR5, indicative of T follicular helper (TFH) cell polarization43, and a second one characterized by elevated expression of CD127 and CD69, suggestive of a CD4tissue resident memory T (TRM) cell profile in LNs44,45. Whereas TFH cells accounted for most cells in category 1, consistent with prior results46, category 2 cells were more equally balanced between the TFH cells and TRM cells. Subsequently, we conducted a detailed evaluation of differentially expressed surface markers between the different categories of HIV-1-infected cells, again using a bootstrapping analysis approach ensuring equal contribution of each study participant to statistical outcomes (Fig. 5c, Extended Data Fig. 10a,b and Supplementary Table 3). These evaluations demonstrated upregulation of some immune checkpoint markers on category 2 cells in LNs; however, statistical scores and enrichment ratios were weaker relative to category 2 reservoir cells from PB. Similarly, fewer markers conferring resistance to killing by T or NK cells were upregulated on category 2 cells in LNs relative to category 2 cells from PB, although both PVR and HLA-E met statistical criteria for upregulation on category 2 cells in LNs. These differential phenotypic profiles between category 2 cells from LNs versus blood may possibly be related to the predominant presence of non-cytolytic CD8+ T cells in LNs47,48, probably resulting in a more limited selection advantage for viral reservoir cells with phenotypic signs of increased resistance to immune-mediated killing in this tissue compartment. Markers that strongly distinguished category 2 reservoir cells from uninfected cells in LNs included CD44, the IL-21 receptor (CD360), the IL-7 receptor (CD127) and, to a lesser extent, CD28 (Fig. 5c,d); these markers were also upregulated on category 2 reservoir cells in LNs relative to LN cells harbouring defective proviruses (Fig. 5c) and to category 1 and 2 reservoir cells from PB (Extended Data Fig. 10B). All of these markers are functionally involved in mediating cell survival signals; in particular, CD44 can increase survival and apoptosis resistance of CD4+ T cells through engagement of the phosphoinositide 3-kinase–AKT signalling pathway49. Moreover, T cell survival signals related to activation of the AKT signalling pathway also occur downstream of the IL-21 receptor50,51. Likewise, CD28 has a prominent role for supporting CD4+ T cell survival; this is probably related to a CD28-mediated upregulation of cell-intrinsic anti-apoptosis markers from the BCL-2 family52. Phenotypic signatures of increased cell survival were also detected on category 2 cells in LNs when a combinatorial analysis was applied (Fig. 5e). Notably, we also observed a trend for elevated expression of several members of the TNF receptor superfamily on category 2 cells in LNs, specifically of CD30, GITR and OX40, which have also been implicated in mediating cell survival signals53 (Extended Data Fig. 10a). Phenotypic differences between category 2 and category 1 or 4 reservoir cells were not as definitive in LNs compared to PB cells, a finding that requires more attention in future studies. Together, our data demonstrate that category 2 reservoir cells from LNs predominantly show phenotypic features associated with apoptosis resistance and survival; upregulation of these markers on category 2 cells in LNs may promote their long-term persistence and can be viewed as a sign of immune adaptation of reservoir cells to the LN immune microenvironment.

Fig. 5: Phenotypic analysis of HIV-1 reservoir cells from inguinal LNs.
figure 5

a, Global phenotypic analysis of HIV-1 reservoir cells from LNs by two-dimensional UMAP plots. Six computationally defined spherical clusters are indicated; one plot is shown each for category 1, category 2 and category 4 cells. b, A heatmap summarizing the phenotypic profile of cells in each spherical cluster, based on 53 surface markers included in this study. c, Volcano plots showing enrichment ratios of marker-positive cells (upper panel) or fold changes in marker expression intensity (lower panel) and corresponding FDR-adjusted P values for all 53 surface markers included in this study; selected markers are labelled individually. Marker sensitivities, calculated as the proportions of marker-positive cells in the indicated categories of cells, are indicated by dot sizes. Comparisons between indicated categories of cells are depicted; bootstrapped data from all three donors of LN biopsies are shown. Significance was tested using a chi-squared test or two-sided t-test, FDR-adjusted P values are shown. d, Density plots indicating the expression of selected phenotypic markers on indicated categories of cells from all three LN donors. e, SPICE diagrams reflecting the proportions of HIV-1-infected LN cells expressing the ensemble phenotypic cell survival markers CD127, CD28 and IL-21R. The pie charts indicate the relative proportions of cells expressing none, one, two or three of the indicated markers; individual markers are shown as overlaying arches. One separate diagram is shown for each indicated population of cells. Significance was tested using an FDR-adjusted permutation test.


Our experiments, involving single-cell proteogenomic profiling data from a total of 1,184,345 individual patient-derived mCD4+ T cells, indicate that following years of continuous ART, HIV-1-infected cells show several distinct phenotypic features, specifically when harbouring intact proviruses; however, they do not support the presence of a single unifying phenotypic marker that can effectively distinguish virally infected cells from uninfected cells. Given that a higher expression level of certain markers was more definitively observed on category 2 (in LNs and PB) and on category 3 cells (in PB), we propose that these phenotypic changes are not due to an intrinsic association with HIV-1 infection, or a higher susceptibility of cells expressing these markers to HIV-1 infection. Instead, they probably represent a consequence of immune selection mechanisms that promote preferential persistence of reservoir cells with a phenotypic profile offering optimal adaptation to their specific immunological microenvironment. We emphasize that the enrichment for the described phenotypic markers, specifically when analysed as isolated parameters, must be considered as moderate, although there were stronger enrichments for specific marker combinations, suggesting that phenotypic marker ensembles allow for a more effective and, possibly, a more customized adaptation to host immune activity. Together, our findings strongly support the hypothesis that viral reservoir cells are subject to active host immune selection pressure, and that footprints of such immune selection are particularly visible in cells harbouring intact HIV-1 proviruses. Given their likely dependence on host immune selection activity, we suggest that phenotypic alterations of viral reservoir cells described here are unlikely to act as universal biomarkers of viral reservoir cells in all ART-treated people living with HIV-1; instead, the presence or absence of such phenotypic selection footprints may vary among different people and might be critically influenced by the intensity of host immune activities; evaluations of the longitudinal coevolution of host immune responses and phenotypic profiles of viral reservoir cells may represent an attractive future research perspective for understanding viral reservoir dynamics. Although we do not claim that the phenotypic markers identified here may represent direct targets for immunotherapeutic interventions in HIV-1 cure research, we suggest that clinical strategies designed to intensify and accelerate immune selection of HIV-1 reservoir cells may be of benefit for reducing HIV-1 long-term persistence and inducing a drug-free remission of HIV-1 infection.


Study participants

Individuals infected with HIV-1 were recruited as study participants at the Massachusetts General Hospital in Boston, MA, and at the NIH Clinical Center in Bethesda, MD. PB mononuclear cell (PBMC) samples were obtained according to protocols approved by the respective Institutional Review Boards. All study participants gave written informed consent for blood collection or for LN biopsies. Clinical characteristics of study participants are summarized in Extended Data Figs. 2 and 9. Study participants were preselected on the basis of high frequencies of HIV-1-infected CD4+ T cells analysed in previous studies.

LN biopsies

Inguinal LNs were excised surgically with informed consent of study participants, according to protocols approved by the Massachusetts General Hospital Institutional Review Board. LN tissue was dissected and mechanically disaggregated through a 70-µm nylon cell strainer in RPMI medium supplemented with 10% fetal bovine serum.

Isolation of memory CD4+ T cells

PBMCs and LN mononuclear cells (LNMCs) were isolated using Ficoll–Paque density centrifugation. PBMCs and LNMCs were viably frozen in 90–95% fetal bovine serum and 5–10% dimethylsulfoxide. For analysis, cells were thawed and subjected to negative immunomagnetic isolation of memory CD4+ T cells, using a commercial product (Stemcell EasySep Human Memory CD4+ T cell Enrichment Kit, no. 18000) per the manufacturer’s protocol.

Surface labelling with monoclonal antibodies

Monoclonal antibodies tagged with distinct oligonucleotides were custom manufactured and supplied as lyophilized single-reaction vials by a commercial vendor (BioLegend). Antibodies to the following surface markers were used: PDL1 (clone 29E.2A3), CD276 (clone DCN.70), HVEM (clone 122), CD155 (clone SKII.4), CD154 (also known as CD40L) (clone 24-31), CCR4 (clone L291H4), PD1 (A17188B), TIGIT (clone A15153G), CD44 (clone BJ18), CXCR3 (clone G025H7), CCR5 (clone HEK/1/85a), CCR6 (clone G034E3), CXCR5 (clone J252D4), CCR7 (clone G043H7), KLRB1 (also known as CD161) (clone HP-3G10), CTLA4 (clone BNI3), LAG3 (clone 11C3C65), KLRG1 (clone 14C2A07), CD95 (clone, DX2), OX40 (also known as CD134) (clone Ber-ACT35), CD57 (clone HNK-1), TIM3 (clone F38-2E2), BTLA (also known as CD272); (clone MIH26), CD244 (also known as 2B4) (clone 2-69), IL-2R (clone TU27), CD137 (also known as 4-1BB) (clone 4B4-1), GITR (also known as CD357) (clone 108-17), CD28 (clone CD28.2), CD127 (clone A019D5), IL-6R (clone UV4), HLA-E (clone 3D12), MICA or MICB (clone 6D4), IL-15R (also known as CD215) (clone JM7A4), IL-21R (clone 2G1-K12), TNFR2 (clone 3G7A02), CD160 (clone BY55), LIGHT (also known as CD258) (clone T5-39), IL-10R (also known as CD210) (clone 3F9), TGFβR (clone W17055E), IL-12R (clone S16020B), CD6 (clone BL-CD6), CD49d (clone 9F10), CD25 (clone BC96), CD30 (clone BY88), CD69 (clone FN50), CD45RA (clone HI100), CD38 (clone HIT2), HLA-DR (clone L243), CD4 (clone RPA-T4), CD2 (clone TS1/8), CD3 (clone UCHT1), CD62L (clone DREG-56), CD45RO (clone UCHL1). Two mouse IgG control antibodies (BioLegend, no. 400299, no. 400383), conjugated with distinct TotalSeq-D oligonucleotides were included as the isotype controls. The lyophilized antibody cocktails containing the above-mentioned antibodies were reconstituted with 60-μl of cell staining buffer (BioLegend, no. 420201). Cells were blocked with Human TruStain FcX (BioLegend, no. 422301) for 15 min on ice and then incubated with 50 μl of reconstituted antibody cocktail for 30 min on ice. After three washes with pre-chilled cell staining buffer, cells were filtered with a 40-μm cell Flowmi strainer (Fisher Scientific, no. 14-100-150), counted with an automated cell counter, and then loaded into a microfluidic cartridge for single-cell multiplex PCR assays.

Single-cell multiplex PCR

Single-cell amplification of defined genomic DNA segments was carried out using the Tapestri platform (MissionBio) according to the manufacturer’s protocol18. Viable single cells were encapsulated into droplets with a lysis buffer (containing protease and a mild detergent) and incubated for 1 h at 50 °C, followed by 10 min at 80 °C for heat inactivation of enzymes. Droplets containing single-cell barcoding beads (tagged with oligonucleotides carrying the cellular barcodes and custom-designed primers) were fused with encapsulated cell lysates. A panel of primers designed to amplify n = 18 different genomic regions in HIV-1 and n = 27 specific HIV–host DNA junctions of previously defined large clonal HIV-1-infected T cell populations in our study participants were used (Fig. 1b and Supplementary Table 1) in addition to two primer sets amplifying control genomic regions in the RPP30 gene on chromosome 10. The droplets were placed under ultraviolet light to cleave PCR primers containing unique cell barcodes from beads. To amplify the selected genomic DNA segments and the antibody oligonucleotide tags, droplets were subjected to PCR for 24 cycles with temperature gradients recommended by the manufacturer.

Sequencing library construction

Amplification products were pooled, mixed with AMPure XP beads (Beckman Coulter, no. A63882) at a ratio of 0.7 and placed in a magnetic field for separating the DNA and the protein tag libraries. The DNA library bound to AMPure beads was washed with 80% ethanol, and the supernatant containing the protein tag library was aspirated and incubated with a biotinylated oligonucleotide complementary to the 5′ end of the antibody tags, followed by magnetic isolation using streptavidin beads. For library amplification, PCRs were carried out with Illumina index primers P5 and P7 on purified DNA and protein libraries, respectively, according to the manufacturer’s protocol; 12 cycles were carried out for the DNA library, and 18 cycles were run for the protein tag library.

Next-generation sequencing

The DNA and protein tag libraries were run on a High Sensitivity D1000 ScreenTape instrument (Agilent Technologies, 5067-5584) with the Agilent 4200 TapeStation System to evaluate DNA quality. Libraries were quantified by a fluorometer (Qubit 4.0, Invitrogen) and sequenced on Illumina next-generation sequencing platforms with a 20% spike-in of PhiX control DNA (Illumina, no. FC-110-3002). DNA and protein tag libraries were sequenced separately on a NextSeq 500 instrument (Illumina), using the NextSeq 500/550 High Output flow cell v2.5 (Illumina, no. 20022408) and the NextSeq 500/550 High Output Kit v2.5 (300 cycles; Illumina, no. 20024908) in 2 × 150-bp paired-end runs.

Bioinformatic analysis

The Tapestri pipeline (MissionBio, v2.0.1) with minor modifications was used to process the sequencing data. Briefly, for DNA library data, cutadapt (v2.5)54 was used to trim 5′ and 3′ adaptor sequences, and extract 18-bp cell barcode sequences from read 1. Cell barcodes that aligned to a unique barcode on a whitelist within a Hamming distance of 2 were used for downstream analysis. Using bwa (v0.7.12)55, sequences were aligned to custom reference genomes built from the human genome (GRCh38) and patient-specific autologous HIV-1 sequences identified in prior studies. Single-cell alignments were filtered according to criteria implemented in the Tapestri pipeline, and indexed using samtools (v1.9)56. Candidate HIV-1-infected cells were determined by the CellFinder algorithm built in the Tapestri pipeline. Bcftools (v1.9)57 was used to call variants and generate consensus sequences. To reduce spurious alignments, viral sequencing reads were considered valid only if they covered at least 80% of the length of the reference sequence for each given amplicon; host sequencing reads had to cover at least 50% of the length of respective amplicon. For antibody library data, cell barcodes were similarly extracted using cutadapt. For reads with valid cell barcodes, 15-bp antibody barcodes were extracted from read 2. Antibody barcode sequences within a Hamming distance of 1 from known antibody barcodes were accepted. Candidate cells appearing in both libraries were processed for downstream analysis. Read counts for each antibody were normalized to the total read count in each cell using centred log-ratio transformation58. Cutoffs for a given phenotypic marker to be considered positive were defined by marker-specific read counts higher than 1 mean absolute deviation of the normalized median read count corresponding to unspecific IgG control antibodies. To generate a more homogenous cell population for analysis, centred log-ratio values of all candidate cells that were CD3CD4 (non-CD4+ T cells) and CCR7+CD45RA+ (contaminating naive CD4+ T cells) were excluded.

Dimension reduction and clustering

UMAP embeddings in two dimensions of the centred log-ratio values was carried out through the Monocle 3 (ref. 59) and uwot60 packages with the number of principal components set to 15 for all cells, and to 10 for HIV-1-infected cells alone; numbers of neighbours to use during k-nearest neighbours graph construction were set to 9 and 5, respectively. All other settings were kept to the default values. The cells were clustered using the Leiden community detection algorithm through Monocle 3, with the k-near neighbours set to 500.

Phylogenetic analysis of HIV-1

HIV-1 sequencing reads corresponding to each of the 18 HIV-1 amplification products and of additional amplification products corresponding to predefined virus–host junctions were aligned to the reference HIV-1 genome HXB2, to autologous intact HIV-1 sequences and to the human reference genome GRCh38. The presence or absence of hypermutations associated with APOBEC3G or APOBEC3F was determined using the Los Alamos National Laboratory HIV Sequence Database Hypermut 2.0 program61. Sequence alignments were carried out using MUSCLE62. Phylogenetic distances between sequences were examined using maximum-likelihood trees in MEGA ( and MAFFT (, and visualized using highlighter plots ( Proviruses were classified in 4 different categories according to the following criteria—category 1 (any provirus): at least 20 total valid viral sequencing reads with at least 4 reads in at least 2 different HIV-1 amplicons each; category 2 (enriched for intact proviruses): a total of at least 20 viral sequencing reads with at least 4 sequencing reads from amplicons 2 and 16 (corresponding to IPDA amplicons) each or at least 4 reads from at least 15 amplification products each or at least 4 reads from amplification products spanning known virus–host junctions of intact proviruses (all viral sequencing reads in category 2 had to be without evidence of statistically significant hypermutation and could not correspond to virus–host junctions from known defective proviruses); category 3 (clonal proviruses): proviruses with identical proviral integration sites (based on at least 4 viral sequencing reads from amplification products of known virus–host junctions) or completely identical proviral sequences; category 4 (hypermutated proviruses): proviral sequences from category 1 that exhibited statistically significant sequence hypermutations (FDR-adjusted P < 0.05). Cells with HIV-1 sequencing reads that did not meet any of the above-mentioned criteria were excluded from the analysis. Category 0 cells were defined by complete absence of sequencing reads corresponding to HIV-1.

Cell sorting and flow cytometry

Memory CD4+ T cells isolated from PBMCs by negative immunomagnetic enrichment were incubated with defined fluorophore-labelled surface antibodies: CD3–PerCP–Cy5.5 (BioLegend, clone UCHT1, catalogue no. 300430), CD4–BUV395 (BD, clone RPA-T4, catalogue no. 564724), PD1–FITC (BioLegend, clone A17188B, catalogue no. 621612), TIGIT–BV421 (BioLegend, clone A15153G, catalogue no. 372710), PVR–PE (eBioscience, clone 2H7CD155, catalogue no. 12-1550-41), HVEM–APC (BioLegend, clone 122, catalogue no. 318808), BTLA–FITC (BioLegend, clone MIH26, catalogue no. 344524), KLRG1–APC (BioLegend, clone 14C2A07, catalogue no. 368606), HLA-E–BV421 (BioLegend, clone 3D12, catalogue no. 342612), PDL1–PE (BioLegend, clone MIH2, catalogue no. 393608). After 25 min of incubation, cells were washed, and indicated cell populations were sorted in a specifically designated biosafety cabinet (Baker Hood), using a FACSAria cell sorter (BD Biosciences) at 70 pounds per square inch. Cell sorting was carried out by the Ragon Institute Imaging Core Facility at Massachusetts General Hospital. Data were analysed using FlowJo software (Tree Star). Sorted cells were subjected to proviral sequencing analysis using the full-length individual proviral sequencing assay (FLIP-seq).

Full-length HIV-1 sequencing assay

A previously described protocol was used11,23. In brief, genomic DNA was extracted from sorted cell populations using a QIAGEN DNeasy Blood & Tissue kit. DNA diluted to single-genome levels based on Poisson distribution statistics was subjected to single-genome amplification using Invitrogen Platinum Taq and nested primers spanning near-full-length HIV-1. PCR products were visualized by agarose gel electrophoresis; amplification products were subjected to single-genome sequencing on the Illumina platform. Resulting short reads were de novo assembled and aligned to HXB2 to identify large deleterious deletions, out-of-frame indels, premature/lethal stop codons, internal inversions or packaging signal deletions, using an automated in-house pipeline ( The presence or absence of hypermutations associated with APOBEC3G or APOBEC3F was determined using the Los Alamos HIV Sequence Database Hypermut 2.0 program. Viral sequences that lacked all lethal defects listed above were classified as genome-intact. Sequence alignments were carried out using MUSCLE. Phylogenetic distances between sequences were examined using Clustal X-generated neighbour-joining algorithms. Proviral species that were completely sequence-identical were considered as clonal.

Ex-vivo HIV-1 reservoir cell killing assay

Memory CD4+ T cells isolated from PBMCs by negative immunomagnetic enrichment were incubated in R10 medium with or without epitopic peptides for 1 h and washed thoroughly. Afterwards, cells were co-incubated with previously isolated HIV-1-specific CD8+ T cell clones at selected effector-to-target (E/T) ratios for 16 h. After washing, cells were stained with blue viability dye (Invitrogen, catalogue no. L34962), CD3–FITC (BioLegend, clone UCHT1, catalogue no. 300406) and CD4–BV711 (BioLegend, clone RPA-T4, catalogue no. 300558) antibodies, followed by sorting of viable CD3+CD4+ events. Sorted cells were subjected to assessments of intact and total HIV-1 proviruses using the IPDA assay.


The IPDA uses digital droplet PCR to quantify proviruses lacking overt fatal defects, especially large deletions and hypermutations, and was carried out as previously described22.

Statistical analysis

The data are presented as pie charts, Venn diagrams, volcano plots, UMAP plots and heatmaps. Enrichment ratios between two cell populations for each marker were calculated as the ratio of the proportion of marker-positive cells in the first population divided by the proportion of marker-positive cells in the second population. Sensitivity values were calculated by dividing the number of marker-positive cells in each category of cells by the total number of cells in the respective category. A bootstrapped dataset was constructed by resampling equal numbers of cells from each cell category from each participant, while keeping the total number of cells from each cell category equal between the bootstrapped and raw datasets. Differences were tested for statistical significance using Fisher’s exact test, chi-squared test, t-test or Mann–Whitney U-test, as appropriate. P values of <0.05 were considered significant; FDR correction was carried out using the Benjamini–Hochberg method63 or the Bonferroni method. Analyses were carried out using Prism (GraphPad Software, Inc.), SPICE39, R (R Foundation for Statistical Computing64) and Python (Python Software Foundation). Figures were generated using Adobe Illustrator.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.