Introduction

Chronic human immunodeficiency virus (HIV) infection increases the risk of atherosclerosis (AS) associated with cardiovascular disease (CVD), which is a leading cause of morbidity in persons living with HIV (PLWH)1,2,3,4. PLWH have an increased prevalence of risk factors for AS5,6,7,8,9, some of which are driven by the off-target effects of antiretroviral drugs9,10. However, even after controlling for these risk factors, the risk of CVD remains significantly higher in PLWH4,11,12, suggesting a role of long-term HIV infection.

HIV infection causes metabolic changes leading to a pro-atherogenic inflammatory environment in the vasculature13,14,15,16. HIV infection and long-term antiretroviral therapy also mediate signaling dynamics, including inflammasome activation, cell migration and apoptosis, in PBMCs and the vasculature17,18. Biomarker studies highlight these atherogenic processes, especially in the context of activated monocyte/macrophages and T cells14,15,16. CD8+ T cells contribute to atherogenesis by secretion of cytotoxic granules and the formation of the necrotic core of atherosclerotic plaques19. Monocyte/macrophages migrate into the intima and form apoptotic atherosclerotic plaques20. CD4+ T cells and B cells have also been previously implicated in CVD21,22,23,24,25,26,27,28,29,30,31. The interplay between signaling pathways, immune cell activation and inflammation in HIV infection requires further investigation, which can be studied using single-cell sequencing (scRNA-seq) allowing simultaneous investigation of multiple cells. To investigate the mechanistic link between HIV infection and atherosclerosis, we sequenced PBMCs from PLWH with and without atherosclerosis, matched for the length of infection, cART, and other AS risk factors (Supplementary Note 1, Supplementary Fig. 1).

The inference of mechanisms from high-dimensional scRNA-seq data is not trivial32. Typically, the scRNA-seq analysis uses clustering to define cell subpopulations, followed by differential expression (DE) and gene set overrepresentation analysis (ORA) to estimate pathway modulation. This approach discounts pathway topology and cannot connect molecular state to cellular state. ORA ignores synergistic interactions among genes by treating genes as independent and equal33. Our previously published algorithm overcomes the above caveats by using discrete-state network modeling to perform pathway analysis using bulk transcriptomic data34. This algorithm has been rigorously tested and compared to other pathway analysis methods34. This algorithm employs discrete-state network modeling, which uses Boolean rules to explicitly define signal integration. These rules can be used to simulate dynamic trajectories and to perform in silico perturbation experiments. We have extensively used discrete-state network modeling to investigate virus infections and have experimentally validated the predictions35,36,37. Here, we extensively expand our method and present single-cell Boolean Omics Network Invariant-Time Analysis (scBONITA) to (a) infer Boolean rules describing signal integration for pathway topologies using scRNA-seq data and (b) use these rules to identify dysregulated pathways and to prioritize genes/proteins for further investigation. scBONITA returns precise modes of dysregulation, captured by node-specific scores that quantify the contribution of each node to the overall dysregulation of a pathway. scBONITA’s capability to perform in silico simulation and perturbation of molecular pathways shows that it is a powerful hypothesis-generating tool.

Here, we describe the immune cell subpopulations and subpopulation-specific gene expression programs in a novel scRNA-seq dataset obtained from PLWH with and without atherosclerosis. We use scBONITA to identify dysregulated pathways in individual cell subpopulations, focusing on CD8+ T cells and monocytes. scBONITA highlights dysregulated cell migration and lipid metabolism pathways in several subpopulations and influential genes, such as PI3K and PLC, which have high impacts on signal flow in these networks. Furthermore, we used a publicly available dataset of PBMCs from persons before and after HIV infection38 to show that cell migration pathways are also dysregulated in the early stages of HIV infection, suggesting modulation of these pathways both by HIV infection and in subsequent AS. We present a novel method (‘attractor analysis’) that uses the models learned by scBONITA to identify pathway-specific signaling states for CD8+ T cells and monocytes. Additional in silico experiments show that scBONITA can learn a context-specific rule set from the vast possible state space for a Boolean network and that scBONITA’s importance score provides novel information about signaling flow. This work hence provides insights into the mechanisms of HIV-associated atherosclerosis at the single-cell level by using a novel network modeling algorithm.

Results

PBMC subpopulations in AS+ and AS− PLWH

To investigate dysregulated immune signaling in People Living with HIV (PLWH), we recruited a cohort of eight PLWH, four with atherosclerosis (AS+) and four without atherosclerosis (AS−). Participants were matched for known atherosclerosis risk factors (see the “Methods” section, Supplementary Note 1, and Supplementary Fig. 1). We transcriptionally profiled ~1200 peripheral blood mononuclear cells (PBMCs) per subject. This data was processed using the Cell Ranger and Seurat pipelines39 to identify 16 subpopulations of immune cells (see the “Methods” section, Fig. 1a) annotated using CIBERSORT40 (Supplementary Fig. 2) and cell-lineage-specific markers (Supplementary Table 1). This scRNA-seq dataset is referred to in the text as the HIV/AS dataset.

Fig. 1: Characterization of PBMC subpopulations in people living with HIV (PLWH) with (AS+) or without (AS−) atherosclerosis.
figure 1

a t-SNE projection of 16 transcriptionally distinct cell subpopulations, shown in distinct colors. b Subpopulation-level differences in the percentage of sequenced cells corresponding to each cell type in panel ‘a’ between AS+ and AS− PLWH are identified using a t-test. The mean of each group is represented by a red asterisk. Panels cf show the expression of genes that are differentially expressed (DE) between cells derived from AS+ and AS− subjects. DE genes were identified using the Wilcoxon test (Bonferroni-adjusted p-value < 0.1, absolute log2 fold change > 0.3.) DE genes in c CD8 T cells/NK resting cells, d monocytes, e naïve B cells referred to as “B cells naïve-2” in panels a, b, and f T cells referred to as “T cells CD8/CD4/CD4 naïve” in panels a, b.

A population of CD8 T cells/NK resting cells were lower in AS- PLWH and a population of CD14+ CD16+ monocytes were higher in AS- PLWH (t-test, p < 0.05) (Fig. 1b). Cell subpopulation markers were identified using MAST41 as described in the Methods. The CD14+ CD16+ monocytes and CD8+ T cells/NK resting cell markers were enriched for migration-related pathways (Supplementary Table 1). Indeed, these cells are known to migrate into the intima during the formation of atherosclerotic lesions in the vascular wall42,43,44,45,46,47,48.

In addition, we tested the prevalence of the cell subpopulations identified in scRNA-seq data in an independent cohort using bulk RNA-seq dataset49. The bulk RNAseq dataset was deconvoluted using CIBERSORT40,50 to quantify the abundance of the cell subpopulations found in the scRNA-seq dataset (Supplementary Fig. 3). The most clusters were not significantly different between AS+ and AS− groups, as observed in scRNA-seq data. The subpopulation ‘T cells CD8 NK cells resting’ was more abundant in AS- PLWH (t-test, p < 0.1), while there was no significant difference in the abundance of monocytes. As these differences in the abundance of cell subpopulations were not robustly recapitulated in the independent cohort, here we focus on differentially regulated molecular mechanisms between AS+ and AS− PLWH.

Atherosclerosis-associated gene expression across PBMC subpopulations

Differentially expressed (DE) genes between AS+ and AS− PLWH (Fig. 1c–f and Supplementary Table 2) include upregulation of MHC Class I and Class II genes in multiple cell subpopulations. ITGB2, which is upregulated in CD8+ T cells from AS+ PLWH, is involved in leukocyte transendothelial migration (Supplementary Table 2). ACTB (β-actin) was upregulated in naïve B cells and CD8+ T cells from AS+ PLWH (Supplementary Table 2). CXCR4 was upregulated in cells from AS+ PLWH in naïve B cells, CD8+ T cells, and a population of resting NK cells. Both CXCR4 and ACTB modulate dynamic actin cytoskeleton remodeling in transendothelial migration. Gene set ORA of DE genes revealed enrichment of cell migration and mobility functions in cells from AS+ PLWH (Supplementary Table 2).

In AS- PLWH, S100A8/S100A9 are upregulated in three cell subpopulations (Fig. 1c–f, Supplementary Table 2). S100B is upregulated in CD8+ T cell populations from AS− subjects (Fig. 1c, f, Supplementary Table 2). Several ribosomal genes were DE in 12 subpopulations; out of which they were only upregulated in naïve B cells from AS- PLWH and naïve T cells from AS+ PLWH (Supplementary Table 2).

While DE and enrichment analysis indicated some mechanisms of HIV-associated atherosclerosis, an integrated model of how these genes jointly regulate signaling cascades did not emerge. Hence, we developed the network-based pathway analysis algorithm single-cell Boolean Omics Network Invariant-Time Analysis (scBONITA) to investigate signal integration and flow.

scBONITA learns discrete-state models of signaling pathways

scBONITA is a discrete-state modeling approach to developing executable models of immune signaling pathways34 (Fig. 2). The scBONITA algorithm requires two inputs: (a) a scRNA-seq dataset and (b) a prior knowledge network (PKN) (Fig. 2A). In these PKNs, genes and their regulatory interactions are represented by nodes and directed edges, respectively. scBONITA leverages the principle that the observed states of single cells correspond to the states of dynamic biological networks to identify regulatory rules (Fig. 2B). A genetic algorithm is used to identify a minimum-error rule set that is optimized by a node-wise local search. This returns a set of discrete-state models for pathways as multiple rule sets explain the training data equally well (equivalent rule set, ERS). A pathway is described as ‘optimized’ if scBONITA-RD successfully reduced the state space of the possible rules for at least one node in the pathway.

Fig. 2: scBONITA pipeline to infer Boolean rules and perform pathway analysis using single-cell expression measurements.
figure 2

A Input: a binarized single-cell RNA-seq dataset as a text file, and a prior knowledge network (PKN) describing the activating or inhibitory relationships between genes. B Rule determination: inference of logic rules that describe the regulatory relationships between nodes in the PKN by a global search followed by node-level rule refinement. C Pathway analysis: scBONITA calculates a gene importance score calculated by simulating network perturbations with inferred rules and combines these scores with fold-changes from scRNA-seq to identify differentially regulated pathways in a specified contrast. D Steady-state analysis: scBONITA simulates networks using learned rules to identify steady states which correspond to observed cellular states.

scBONITA models can be simulated to generate dynamic trajectories and in silico node perturbations. These models are used to perform node knock-out and knock-in simulations. The difference between network states after these simulations is weighted by the size of the ERS to calculate a node importance score. Thus, the node importance score measures the influence of a node over the network and the weight incorporates the uncertainty in rule determination. scBONITA combines these importance scores and comparison-specific fold changes to calculate a pathway modulation score (see the “Methods” section, Fig. 2C).

The simulation trajectories of these discrete-state models fall into steady states known as attractors, which have been hypothesized to correspond to signaling behavior characteristic of specific cell types51,52,53,54,55,56. Cells are assigned to the attractor closest to their expression (Fig. 2D) to characterize signaling states for a network. In conclusion, scBONITA allows in-depth comprehension of signaling pathways by incorporating network topology.

Dysregulated pathways in T cell populations linked to HIV-associated atherosclerosis

The scBONITA pathway analysis algorithm identifies dysregulated pathways in all subpopulations derived from AS+ and AS− PLWH, providing insights into mechanisms of HIV-associated atherosclerosis (Supplementary Table 3). In CD8+ T cells, these pathways included pro-inflammatory, anti-viral, cell migration and apoptosis pathways (Fig. 3a, Supplementary Table 3). All of which were downregulated in AS+ PLWH except the Th17 cell differentiation pathway, which includes genes involved in generic T cell differentiation. All these pathways were identified by scBONITA, but not by enrichr (Supplementary Table 2).

Fig. 3: scBONITA identifies dysregulated pathways in T cells derived from AS+ and AS− PLWH.
figure 3

Pathways (y-axis) dysregulated in the AS+ vs AS− contrast in PLWH in clusters of a CD8+ T cells and b CD4+ T cells and naïve T cells. Clusters are differentiated by point shape, as shown in the legend. Pathways that have Bonferroni-corrected p-value < 0.01 (x-axis) and a reduced ERS (see the “Methods” section for details) are shown. Pathways labeled with “***” were also dysregulated between cytotoxic T cells (a) and T cells (b) derived from HIV− subjects and subjects after 1 year of HIV infection38 (c). Network representation of the AGE-RAGE signaling pathway (Bonferroni-corrected p-value < 0.01) in a cluster of CD8+ T cells referred to as CD8 T cells −1 in Fig. 1a. Small black intermediate nodes indicate that the downstream nodes are controlled by an AND function of the upstream nodes. The size of nodes corresponding to genes is proportional to their importance score calculated by scBONITA. Nodes are colored according to the magnitude of their fold change between the HIV+AS+ and HIV+AS− groups. Violet edges indicate inhibition edges and black edges indicate activation edges.

scBONITA identified multiple optimized pathways in CD4+ T cells (subpopulations represented in Fig. 1a) as being dysregulated (padj < 0.01) in AS+ PLWH (Fig. 3b, Supplementary Table 3). CD4+ T cells may exert either an atherogenic or atheroprotective phenotype, depending on the subset and interactions with antigen-presenting cells in the adventitia or plaques57. Our results suggest that CD4+ and CD8+ T cells from AS+ PLWH play a role in cell adhesion, apoptosis and migration processes involved in atherosclerosis, and these atherogenic processes are mediated by the upregulated PI3K-AKT, mTOR, and cytoskeletal signaling pathways.

The AGE-RAGE signaling pathway was further investigated in CD8+ T cells because of its role in dysregulated lipid metabolism19,21,23,58,59,60,61,62 and to demonstrate the additional information obtained from scBONITA in comparison to ORA. This pathway had the highest pathway modulation score (0.8) (Supplementary Table 3). Most genes in this pathway were upregulated in AS+ PLWH (Fig. 3c). scBONITA optimized rules for DIAPH1 (uncertainty score = 0.5, see the “Methods” section) (Fig. 3c), which has a strong influence over the signal flow through the network due to its high centrality. The combination of scBONITA’s node importance score and fold change between subject groups were used to identify genes whose activity influences signal flow in AS+ PLWH. These genes include the class 1 PI3K genes, the P13K regulator PI3KR1, and PLC genes. All of them are highly expressed in AS+ PLWH (Fig. 3c). PI3K activates lipid metabolism, macrophage autophagy, phenotypic transition, and the expression of adhesion molecules (reviewed in ref. 63). In this manner, scBONITA identified dysregulated pathways and genes associated with atherosclerosis-linked migration of T cells derived from PLWH.

Dysregulated pathways in monocytes linked to HIV-associated atherosclerosis

In monocytes, scBONITA identified several dysregulated pathways (Fig. 4a, Supplementary Table 3) which are known to be involved in the pro-inflammatory behavior of pro-atherogenic monocytes43,64,65,66,67,68,69,70,71,72,73. Of these pathways, only cAMP signaling and endocrine resistance are overall upregulated in cells from AS+ PLWH, suggesting that monocytes from AS− are pro-atherogenic.

Fig. 4: scBONITA identifies dysregulated pathways in monocytes derived from AS+ and AS− PLWH.
figure 4

a Pathways (y-axis) dysregulated in the AS+ vs. AS− contrast in monocytes derived from PLWH. Only pathways that have Bonferroni-corrected p-value < 0.01 (x-axis) and reduced ERS (see Methods for details) are shown. Pathways labeled with “***” were also significantly dysregulated in monocytes after one year of HIV infection38 b Network representation of the leukocyte transendothelial migration pathway. Small black intermediate nodes indicate that the downstream nodes are controlled by an AND function of the upstream nodes. The size of nodes corresponding to genes is proportional to their importance score as calculated by scBONITA. Nodes are colored according to the magnitude of their fold change between the HIV+AS+ and HIV+AS− groups. Violet edges indicate inhibition edges and black edges indicate activation edges.

The leukocyte transendothelial migration pathway was further investigated as monocyte migration outside the vascular compartment plays a crucial role in the inflammatory cascade that leads to an atherosclerotic phenotype64,66,74. The pathway modulation score of 0.45 was the third highest amongst tested pathways (Supplementary Table 3) (Fig. 4b). scBONITA optimizes regulatory rules for the influential RHOA gene (uncertainty factor = 0.13). High importance scores were assigned to the NCF genes, PLCG genes, MSN, ROCK, CYBA, and CYBB. NCF genes are involved in superoxide production and are a positive regulator of P13K signaling75,76. The upstream regulator of PLCG1, MSN, is involved in cytoskeletal remodeling during leukocyte migration77. ROCK2 shows a small change across AS groups, which may be driven by feedback regulation, but has a stronger role in regulating signal flow, as shown by its high importance score. The high importance of the G protein GNAI3, downstream of CXCR4 indicates a role in the CXCR4-mediated activation of this pathway. GNAI3 is also present in two other significantly dysregulated networks in monocytes—Cushing Syndrome and cAMP signaling—but does not have a high importance score in those networks, indicating that it is critical in regulating signal coming from CXCR4 only in the leukocyte transendothelial migration network. The downstream effectors of GNAI3 and ROCK2 have higher fold changes, possibly due to feedback regulation. These effectors include ACTG1 and EZR, involved in cytoskeletal remodeling77,78,79, and ITGA4, ITGB1, and ITGB2, involved in cell adhesion. scBONITA thus identifies several genes and pathways associated with dysregulated transendothelial migration in the context of HIV-associated atherosclerosis.

Pathways dysregulated by HIV infection and in HIV-associated atherosclerosis

To identify the biological mechanisms modulated by HIV infection that may also contribute to the elevated risk of AS in PLWH, we used a publicly available scRNA-seq dataset of PBMCs from persons before and during acute HIV infection38. Kazer et al.38 identified gene expression programs in activated T cells, monocytes, and NK cells during HIV infection. We used this dataset with scBONITA to infer Boolean rules for KEGG networks and perform pathway analysis. The results were compared to the pathways dysregulated in HIV-associated atherosclerosis in the above-described HIV/AS dataset.

scBONITA identified 10 optimized pathways dysregulated after 1 year of HIV infection in cytotoxic T cells38 (Fig. 3a, b, Supplementary Figure 6, Supplementary Tables 5 and 6). Of these pathways, five signaling pathways were dysregulated in AS+ PLWH. Similarly, 5 out of 19 optimized pathways were dysregulated in monocytes upon HIV infection and in AS+ individuals (Fig. 4a, Supplementary Fig. 6, Supplementary Table 6). 81 and 41 genes from these overlapping pathways were upregulated both upon HIV infection and in AS+ PLWH in the cytotoxic T cell populations and monocyte populations, respectively (Supplementary Fig. 7, statistical significance was not tested as scBONITA does not depend on DE genes to perform pathway analysis). The genes upregulated after HIV infection and in AS+ PLWH in the CD8+ T cell subpopulation were involved in viral response pathways (Supplementary Fig. 7). Similarly, the genes upregulated after HIV infection and in AS+ PLWH in the monocyte subpopulation are involved in cell migration-related pathways (Supplementary Fig. 7). This suggests that the modulation of cell migration and inflammation processes upon HIV infection progresses over time, increasing risk of AS in PLWH.

Pathway-specific cellular signaling states associated with atherosclerosis

We used the network models learned by scBONITA to identify attractors for the pathways dysregulated between AS+ and AS− PLWH in CD8+ T cells discussed above (Cluster CD8 T cells -1 in Fig. 1a) and evaluated cellular states across subjects and disease groups. The simplest rules, which have the smallest number of “AND” terms, were chosen to simulate the network and identify attractors. The insulin resistance pathway, which is downstream of the AGE-RAGE and PI3K-AKT signaling pathways, was particularly interesting in CD8+ T cells (Cluster CD8 T cells −1 in Fig. 1a). 72 signaling states were identified by network simulation, of which 3 dominant signaling states mapped to 16.5%, 7.5% and 27.4% of cells respectively (Fig. 5a, b). There is an association between attractors and subjects (chi-square test, p-value < 0.05) but no association between attractors and atherosclerosis status (chi-square test, p-value > 0.05). Additionally, we also performed attractor analysis for other cell clusters. Among those, interestingly, assigned chemokine signaling attractors of CD4+ T cells were subject-specific (chi-square test, p < 0.01). Similarly, we observed subject-specific attractors of the PI3K–AKT signaling pathway for the T cells CD8/CD4/CD4 naive cluster (chi-square test, p < 0.01).

Fig. 5: CD8+ T cell states with respect to the insulin resistance pathway identified by attractor analysis with scBONITA.
figure 5

a UMAP representation of a cluster of CD8+ T cells (CD8+ T cells – 1 in Fig. 1a) colored by the attractor to which they are assigned, based on their similarity. The three dominant states (PI3KR+ PI3K+, TNFR1+TNF−, and TNFR1-TNF+ attractors) are represented by green, blue, and orange. All other attractors are collectively labeled in gray. b Percentages of CD8+ T cells derived from each subject, mapping to the three dominant and all other attractors. c Gene activity (ON- red, OFF- light blue) in the three dominant attractors. Only genes that are different between these states are shown. d Attractor gene values ranging from 0 (blue) to 1 (red) averaged for each individual subject. The top bar indicates AS+ (gray) and AS− (black) subjects.

The CD8+ T cell states for the insulin resistance pathway were characterized by differences in several key genes (Fig. 5c), including PI3K genes and the PI3K regulators that were identified as being influential in the AGE-RAGE signaling pathway (Fig. 3b). In addition, the two less abundant attractors differed in the activity of the key TNFR and TNF genes. Hence, these attractors are referred to as the PIK3R+PI3K+, TNFR+TNF− and TNFR-TNF+attractors. The activity of PI3K and AKT genes was higher in the most common signaling state (PI3KR+PI3K+attractor). However, the activity of the downstream targets of AKT, such as the CREB genes, NFKB1, FOXO1, were lower in the PI3KR+ PI3K+ attractor. TNF, which is produced at a low level by some subsets of T cells, also mediates a range of pro-inflammatory processes in vascular endothelial cells80,81. TNF-TNFR1 signaling mediates an apoptotic process mediated by TRADD and FADD82. TNFR1 activates PI3K signaling in regulatory T cells83. Thus, our attractor analysis reveals different T CD8+ cell states. This analysis also indicates differences in TNF production and response based on cell states and suggests the existence of distinct modes of operation across subjects (Fig. 5d).

Attractor analysis of the leukocyte transendothelial migration pathway in monocytes revealed 9 attractors that mapped to cells in the dataset. The two dominant signaling modes differed in the activity of the PECAM1 and F11R genes respectively (Fig. 6a–c). The attractors are referred to as the PECAM+ and F11R+ attractors and mapped to 51.04% and 30.73 % of cells, respectively. F11R is required for platelet adhesion to vascular endothelial cells84, which occurs prior to infiltration of monocytes into the endothelium and eventual plaque formation85,86. PECAM1 has widespread effects on vascular biology and atherosclerosis in particular87,88,89. These subject-specific attractors (chi-square test, p-value < 0.05, Fig. 6b) were not associated with atherosclerosis status. Similarly, the attractor activity of individual genes was not associated with atherosclerosis status (two-sided t-test, p-value > 0.05). Thus, scBONITA identifies two cellular states of monocytes driven by molecular signaling. These states are associated with monocyte migration into the vasculature and reflect inter-subject variability.

Fig. 6: Monocyte states with respect to the leukocyte transendothelial migration pathway identified by attractor analysis with scBONITA.
figure 6

a UMAP representation of the cluster of monocytes colored by the attractor to which they are assigned, based on their similarity. The two dominant modes (F11R+ and PECAM+ attractors) are represented by blue and orange. All other attractors are collectively labeled in gray. b Percentages of monocytes derived from each subject, mapping to the two dominant attractors and all other attractors for the leukocyte transendothelial migration pathway. c Attractor gene values for the leukocyte transendothelial migration pathway trained on monocytes, ranging from 0 (blue) to 1 (red), averaged for each individual subject. The top bar indicates AS+ (gray) and AS− (black) subjects. The genes that differ between the two dominant attractors F11R+ and PECAM+ are highlighted by a violet box.

Evaluation of scBONITA performance in silico

The BONITA algorithm has already been rigorously validated in our previous study34. Specifically, a comparison with other pathway analysis tools was performed. Here we evaluate the scRNA-seq specific components of the algorithm. To show that scBONITA rule determination is robust to training set size, we varied the size of the training data. The number of cells in the largest cluster (Naïve B cells −1) was varied by random selection from 1% of cells from that cluster to 200% by adding cells from neighboring clusters. The reduced size of the ERS for nodes with in-degree 3 (i.e., the most complex case considered by scBONITA) is a metric for certainty in rule inference by scBONITA (Fig. 7b, Supplementary Fig. 5). While there was a significant decline in performance when the data were downsampled to 1% of the original cluster, there was no significant increase in effect once 50% of the cells were used, or when the training dataset was augmented to 200% of its original size by using cells from adjacent clusters (Fig. 7a, see the “Methods” section). This indicates that scBONITA is robust to heterogeneity in the training data set.

Fig. 7: Performance of scBONITA rule determination.
figure 7

a The number of pathways identified as significantly dysregulated (Bonferroni-adjusted p value < 0.05) one year upon HIV infection38, between AS+ and AS− PLWH, and the intersections between these sets. Subpopulations from the two datasets were matched as shown in Supplementary Table 5. b Effects of the number of cells on the ERS size evaluated by downsampling and augmentation using the largest cluster (“B cells naïve – 1”) from the HIV/AS dataset. c Relation between importance scores in 130 KEGG networks evaluated using CD8+ T cells from two comparisons—AS+ and AS− PLWH and from persons before and one year after HIV infection38. d Spearman correlation coefficients (p < 0.01 for all comparisons) between scBONITA’s node importance score (labeled as ‘scBONITA score’) and 6 measures of node centrality (along x and y-axis). Correlation coefficients are depicted by colors ranging from blue (−1) to red (+1).

The node importance scores calculated by scBONITA for KEGG networks trained on the HIV/AS dataset are not correlated to six centrality measures (Fig. 7d). We found that the node importance scores between comparable cell subpopulations in Kazer et al.39 and HIV/AS datasets were correlated, as shown by a representative comparison between cytotoxic T cells from the Kazer et al. dataset and the subpopulations of CD8+ T cells from the HIV/AS dataset (Fig. 7c, 0.71 < Pearson correlation coefficient < 0.91, p < 0.01). Similarly, the node importance scores for the populations of monocytes were highly correlated (Pearson correlation coefficient = 0.78, Supplementary Table 4, Supplementary Fig. 4). The correlations were lower for other pairs of subpopulations (Supplementary Table 4, Supplementary Fig. 4). This indicates that scBONITA learns some characteristic features of network topologies, but node importance scores are still assigned in a context-dependent manner.

Discussion

Among people living with HIV, widespread use of cART has significantly reduced overall mortality. However, the earlier and increased incidence of cardiovascular diseases remains the major cause of mortality in an aging HIV+ population for multiple intersecting reasons4,9,10,11,12. We and others have attempted to identify the immune signaling mechanisms that lead to this increased incidence of atherosclerosis49. However, to the best of our knowledge, no study has systematically investigated cell-type specific signaling dysregulations at the single-cell level. To this end, we sequenced, for the first time, PBMCs from four AS + and four AS- PLWH. The cohort was closely matched for known AS risk factors. These risk factors will be harder to match exactly in a larger cohort. Although a small sample size in some contexts can limit the power of the study, previous scRNA-seq studies have provided insight into both changes in PBMC composition and signaling with cohort sizes similar to this study in the context of HIV, because of the depth of information on molecular mechanisms that can be obtained from scRNA-seq experiments38,90,91,92,93. To increase statistical power, we evaluated the proportion of the subpopulations identified from scRNA-seq in a bulk RNA-seq dataset from an independent cohort of PLWH with and without AS49 (Supplementary Fig. 3). Enrollment of more subjects in the future will help further elucidate the signaling dysregulations leading to HIV-associated atherosclerosis. We found that in accordance with the previous studies19,21,58,59,60,61,94,95, a population of CD8+ T cells was increased in PBMCs from AS+ PLWH. However, contrary to our expectations20,64,65,66,71,73,96,97,98, a population of monocytes was decreased in PBMCs from AS+ PLWH suggesting their migration into the intima. Here we focus on cell-type-specific molecular changes associated with atherosclerosis.

We observed that differentially expressed genes between AS+ and AS− were involved in cell migration; however relevant pathways were not identified using over-representation analysis (ORA). This demonstrated the inefficacy of conventional ORA methods in identifying coherent molecular processes. For example, CXCR4, a DE gene, leads to the activation of phosphatidylinositol-3-OH kinases (PI3K), which in turn activates the serine-threonine kinase AKT via PIP399. PI3K/AKT signaling leads to processes involved in plaque formation, such as cell migration, intracellular lipid accumulation, and smooth muscle cell proliferation63. None of these processes were enriched in the DE genes. This may be because of the technical limitations of scRNA-seq leading to nonspecific distortions of expression and identification of fewer differences across conditions100,101. Even DE analysis methods that are sensitive to the known characteristic distributions of scRNA-seq data are prone to false discoveries102. These methods failed to provide insights into how disparate genes involved in different pathways regulate cellular states. To mechanistically characterize signaling dysregulations in HIV-associated atherosclerosis, we developed the scBONITA algorithms for regulatory rule inference, network simulation, pathway analysis, and attractor/steady-state analysis. As a gene is evaluated in the context of its complete signaling pathway, our approach minimizes the impact of the caveats in scRNA-seq technology described in refs. 100,101.

scBONITA learns condition-specific logic models using scRNA-seq data in conjunction with published prior knowledge networks (PKNs). This study builds on our published BONITA method34 that inferred logic rules from bulk RNAseq data. scBONITA exploits the bimodal nature of scRNA-seq data41,103 and the cell-level resolution of expression to successfully learn regulatory rules and identify attractors for PKNs. These rules can be used to perturb and simulate pathways in silico. Unlike other algorithms to infer logic rules and reconstruct gene-regulatory networks on a small subset of genes from scRNA-seq data104,105,106, scBONITA does not pre-select genes. In addition, scBONITA is not dependent on time-series data, a significant advantage since time-series data is rarely available in human studies. In lieu of using time-series data, scBONITA hypothesizes that scRNA-seq data represents samples in the state space of a dynamic Boolean network. In addition, the use of known network topologies reduces the uncertainty of inferred rules. Indeed, we show that scBONITA can successfully restrict the state space of possible rules. scBONITA combines expression information with the importance score to create a unique metric of pathway dysregulation. scBONITA node scores can be directly translated into empirical perturbation studies.

In CD8+ T cells, we identified AGE-RAGE signaling, which elicits activation of multiple intracellular signaling pathways such as cell proliferation and apoptosis pathways (Fig. 3a)107,108,109,110,111,112,113,114,115. The PI3K family of genes, which had the highest importance score and were upregulated in AS+ PLWH, promote intracellular lipid deposition leading to the formation of foam cells and atherosclerotic plaques and can reduce the expression of lipid transporters and reduce the efflux of intracellular cholesterol depending on upstream signals63. PLC, which also has a high importance score, promotes leukocyte adhesion, VEC apoptosis, and plaque development induced by oxidized low-density lipids77,116.

In monocytes, dysregulation of lipid-metabolism pathways such as cAMP signaling and leukocyte transendothelial migration suggest the infiltration of monocytes into the intima during the formation of atherosclerotic lesions and progression of atherosclerosis42,43,44,45,46,47. scBONITA identified genes critical to the atherosclerotic process in the leukocyte transendothelial migration pathway. ROCK activation by atherogenic stimuli such as oxidized LDLs leads to pathophysiological changes including endothelial dysfunction and vascular remodeling117,118. ROCK inhibitors such as statins attenuate atherosclerosis by inhibiting chemotaxis of macrophages and their transformation into foam cells119. The dysregulation of glucose metabolism pathways in AS+ PLWH can induce the expression of adhesion molecules by VECs and increased monocyte transendothelial migration63,120. Thus, here we find migratory and lipid-metabolism pathways and key genes driving atherosclerosis in PLWH.

PI3K-AKT signaling is dysregulated in all cell subpopulations (Figs. 4 and 5, Supplementary Table 3). The activation and effector mechanisms of this cascade vary by cell type, as shown by the different dysregulation of linked signaling pathways in different cell types. For example, in two populations of naïve B cells derived from AS+ PLWH, the apelin signaling pathway was upregulated in one and the adipocytokine signaling pathway was downregulated in the other. The cardioprotective effect of apelin is modulated by (amongst other routes) the PI3K-AKT signaling and MAPK signaling pathways, which are also dysregulated in these naïve B cells. Apelin is also shown to be upregulated in human atherosclerotic coronary arteries and colocalized with markers for macrophages121,122.

We find that HIV infection dysregulates several pathways that are further impacted in PLWH with AS (Figs. 3a, 4a, Supplementary Fig. 6, and Supplementary Table 6). These pathways are suggestive of changes occurring in cell migration of cytotoxic T cells and monocytes123,124,125,126. The overlap also suggests that cAMP and PI3K-AKT signaling is affected during the course of HIV infection.

To map cells to distinct signaling modes of the pathways described above, we developed scBONITA’s attractor analysis functionalities. Attractors are regions in the state space of a dynamic system towards which simulation trajectories are “pulled” and are characteristics of a specific network with a specific set of regulatory rules. These attractors may correspond to observable cell states, or hallmarks of specific phenotypes such as cell type differentiation, disease state, or drug treatment51,52,53,54,55,56. These studies show that even simple dynamic models capture rich and nuanced cell behaviors. scRNA-seq allows the study of these dynamic landscapes and their attractors at an unprecedented resolution56,127,128. Attractor analysis with scBONITA allows users to characterize cells based on the dynamic properties of signaling networks, which dictate phenotype. scBONITA identifies these attractors and their master regulators that control the changes between these cell states, providing complex insights into cellular processes.

The importance of cell migration and lipid signaling in the development of HIV-associated atherosclerosis was underscored by attractor analysis in CD8+ T cells and monocytes. In most cases, only one dominant signaling state existed in the cell subpopulation. However, the three dominant signaling modes of the insulin resistance pathway in CD8+ T cells differed in the activity of PI3K and AKT genes and their downstream effectors, such as CREB and FOXO1. This suggests the existence of two distinct modes for this signaling pathway corresponding to a proliferative cell state (activation of PI3K and AKT) and a senescent cell state (transcription of CREB- and FOXO1-controlled genes)129,130,131. The insulin signaling pathway exerts immunomodulatory effects on T cells132. Similarly, we identified two dominant signaling modes (PECAM+ and F11R+) for the leukocyte transendothelial migration pathway in monocytes, suggesting variation in the cell states with respect to this pathway in PLWH. Dysregulation in insulin signaling, which is a risk factor for CVD promotes PECAM1-mediated migration of monocytes into the endothelium133,134. PECAM1’s loss contributes to atherosclerosis89. PECAM1+ cells may contribute to the suppression of inflammatory processes driving atherosclerosis. Thus, our analysis connects molecular processes to cellular states, unlike conventional DE and ORA analyses.

Although the BONITA algorithm has been rigorously validated in our prior publication135 here we wanted to evaluate the scRNA-seq-specific parts of the algorithm. We demonstrated that scBONITA can identify characteristic structural properties of networks and use this in conjunction with expression information to identify dysregulated pathways in a specified condition (Fig. 7). Moreover, scBONITA is minimally affected by the heterogeneity of training data and can narrow down the vast state space for a Boolean network (Fig. 7a). We expect that scBONITA inference will improve when pure cell populations are sequenced. While scBONITA is not strictly dependent on the clustering method used to classify scRNA-seq data into subpopulations, we used pre-classified subpopulations to reduce variability and to improve the specificity of scBONITA-PA. Additionally, scBONITA-RD requires a longer runtime (<12 h in our tests) and more powerful computational capabilities than a typical processing and clustering analysis pipeline run on datasets of typical size. These resources are usually available to academic users on computing clusters. In conclusion, we present a novel dynamic network modeling method that yields mechanistic insights into the cellular and immunological processes involved in HIV-associated atherosclerosis.

Materials and methods

Participant cohort summary, sample collection, and storage

Eight men living with HIV and ≥50 years of age on stable combined antiretroviral therapy (cART) for at least 1 year and with viral load ≤ 50 copies/mL were recruited. All methods were carried out in accordance with University of Rochester guidelines and regulations, and all experimental and study protocols were approved by the University of Rochester Institutional Review Board (#RSRB00063845). Informed consent was obtained from all subjects. Individuals were classified as having atherosclerosis (AS+) if they had plaques on the carotid arteries on ultrasound imaging. Four of the eight subjects were assigned as AS+ and had plaques in both right and left carotid arteries. AS− subjects were aged between 47 and 57 and AS+ subjects were aged between 51 and 66. AS+ subjects had mean serum cholesterol of 161.5 mg/dl (σ = 40.9) and mean serum high-density lipid HDL of 54.7 mg/dl (σ = 16.3). AS− subjects had mean serum cholesterol of 167.7 mg/dl (σ = 57.2) and mean serum high-density lipid HDL of 51 mg/dl (σ = 7.7). AS− subjects and AS + subjects had a mean CD4 + T cell count of 518.5 cells/µl (σ = 347.8 cells/µl) and 838.7 cells/µl (σ = 514.5 cells/µl) respectively. De-identified subject information is available in Supplementary Note 1 and Supplementary Fig. 1. Subjects were matched for lipid profiles, hypertension status, smoking status, CD4+ T cell counts, and age. In addition, all subjects were treated with cART for at least one year. Thirty milliliters of blood per study participant was collected in ACD vacutainers and was processed within 2–3 h of collection. Peripheral Blood Mononuclear Cells (PBMCs) were isolated using Ficoll density gradient centrifugation. 5 million PBMCs were preserved using RNAlater (Thermo Fisher) and were used for scRNA-seq.

Single-cell RNA sequencing and data processing

Frozen vials containing cells in RNAlater were thawed quickly in a 37 °C water bath. Cell suspension was transferred to a 15 ml conical tube. 10 ml PBS/2% FBS was slowly added. Samples were centrifuged at 500 × g for 6 min. Washes were repeated for an additional 2 times for a total of three washes. Using the MACS Miltenyi Biotec Dead Cell removal kit (PN130-090-101), dead cells were removed using the manufacturer’s recommendations. Cells were counted and cellular suspensions were loaded on a Chromium Single-Cell Instrument (10x Genomics, Pleasanton, CA, USA) to generate single-cell Gel Bead-in-Emulsions (GEMs). ScRNA-seq libraries were prepared using Chromium Single-Cell 3’ Library & Gel Bead Kit (10x Genomics). The beads were dissolved, and cells were lysed per the manufacturer’s recommendations. GEM reverse transcription (GEM-RT) was performed to produce a barcoded, full-length cDNA from poly-adenylated mRNA. After incubation, GEMs were broken, and the pooled post-GEM-RT reaction mixtures were recovered, and cDNA was purified with silane magnetic beads (DynaBeads MyOne Silane Beads, PN37002D, Thermo Fisher Scientific). The entire purified post-GEM-RT product was amplified by PCR. This amplification reaction generated sufficient material to construct a 3’ cDNA library. Enzymatic fragmentation and size selection was used to optimize the cDNA amplicon size and indexed sequencing libraries were constructed by End Repair, A-tailing, Adaptor Ligation, and PCR. Final libraries contain the P5 and P7 priming sites used in Illumina bridge amplification. Sequence data were generated using Illumina’s NovaSeq 6000. Approximately 2000 cells were sequenced from each subject. Cell Ranger (version 2.1.1; 10x Genomics) was used for demultiplexing and alignment with default parameters. Reads were aligned to the human reference genome GRCh38 (Ensembl 93). The Seurat R package (version 2.3.4)39 was used to further process the gene counts obtained from the CellRanger pipeline. Cells that express <200 genes, >2500 genes, or >5% mitochondrial genes were filtered out. Genes expressed in <3 cells were filtered out. Gene counts were per-cell normalized and log2-transformed. These preliminary filtering and selection procedures yielded a set of 9368 sequenced cells, approximately equally distributed between subjects (and hence conditions), and 14,017 genes. Note that sample collection, processing, and sequencing were performed in one batch, leading to extremely high-quality data where no subject-specific patterns were observed.

Classification into subpopulations using modularity-optimized Louvain community detection, and cluster labeling

Cells were classified into subpopulations using modularity-optimized community detection, implemented in the Seurat R package39. 664 highly variable genes were used to identify 10 principal components that explained the majority of variance in the data. These principal components were used to cluster the data. Clustering yielded 16 subpopulations. Cluster markers were identified using MAST41. As suggested in136, CIBERSORT40 was used to “deconvolute” the average gene expression of each cluster into the constituent canonical cell types. A reference expression set of 22 immune cell types and 547 genes was used40. Over-representation analysis was performed using the implementation of the hypergeometric test in the R package clusterprofiler (version 3.12.0) with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets downloaded from MSigDb137,138,139. Gene sets were identified as significantly over-represented if the Bonferroni-adjusted p-value was <0.05.

scBONITA algorithm for the development of discrete-state models of pathways

Network topologies: ScBONITA infers Boolean regulatory rules/ logic gates for directed networks wherein nodes represent genes and edges represent the regulatory relationships between those genes. These networks contain edge annotations denoting activation/inhibition relationships between nodes, which are exploited by scBONITA to restrict the search space for rule inference to sign-compatible canalyzing functions. Such network topologies of biological pathways are commonly obtained from pathway databases such as KEGG and WikiPathways139,140,141. ScBONITA offers an interface to KEGG and WikiPathways databases that allows automated download and processing of user-specified networks. Users can also provide custom networks in graphml format.

Boolean rule determination from scRNAseq data: scBONITA assumes that cross-sectional measurements of cells by scRNA-seq data represent a steady state of an underlying dynamic biological process. scBONITA’s rule determination (scBONITA-RD) algorithm, which has been extended from our previous BONITA algorithm exploits this property to infer Boolean rules for an input biological network, using a combination of a genetic algorithm (GA) and a node-wise local search135.

The global search uses a genetic algorithm (GA) infers a single candidate rule set that adequately describes the input data with respect to the network topology with minimum error34,142. The function to be minimized is described in Eq. (1):

$$\mathop {\sum}\limits_{c = 1}^{{\mathrm {cells}}} {{\rm{min}}\left( {\mathop {\sum}\limits_{n = 1}^{{\mathrm {nodes}}} {\left| {E_{c,n} - A_{c,n,a}} \right|\forall \,a\,{\mathrm {in}}\,T_c} } \right)}$$
(1)

where c from 1 to cells iterates over the number of cells in the training dataset (cell cluster), n iterates from 1 to a number of nodes in the network, Ec,n is the binarized expression of node n in cell c, Ac,n,a is the value of node n in the attractor a reachable from cell c, and Tc is the attractor reachable from state of the cell c. Note that Tc may have multiple repeating states in a limit cycle or only one steady state, i.e., it may be a singleton attractor. Tc is obtained after simulating the network with the candidate rule set for 100-time steps, which causes the simulation to reach an attractor state for all tested networks.

The minimum error rule set identified using the above-described GA strategy is further refined by a node-level local search that sequentially optimizes the rule for each node keeping the rules for all other nodes in the network constant. An optimal set of rules for a node n is obtained by minimizing the function in Eq. (2)

$$\mathop {\sum}\limits_{c = 1}^{{\mathrm {cells}}} {\min \left( {\left| {E_{c,n} - A_{c,n,a}} \right|\forall \,a\,{\mathrm {in}}\,T_c} \right)}$$
(2)

where variables and constants are the same as described above.

Several rules may satisfy the termination criteria. The local search returns a set of equivalent rules that all satisfactorily explain the observed state in the experimental data. This set of rules is referred to as the equivalent rule set (ERS) in the text.

Pathway analysis (PA) with scBONITA: scBONITA performs pathway analysis in a two-step process. In the first step, importance scores for each node in the biological network under consideration are calculated. In the second step, a pathway modulation metric incorporating both experiment-specific fold changes and the node importance scores calculated in step 1 is calculated.

scBONITA quantifies the influence In of node n over the state of the network by quantifying the overall effect of its perturbation on that network (Eq. (3)). This is achieved by simulating knock-in and knock-out of that node.

$$I_n = \mathop {\sum}\limits_{c = 1}^{{\mathrm {cells}}} {\left| {{\mathrm {KI}}_{c,n} - {\mathrm {KO}}_{c,n}} \right| \ast {\mathrm {Uncertainty}}\,{\mathrm {Factor}}}$$
(3)

where KIc,n and KOc,n are the discrete expression vectors of network node n in the attractors reached after a simulation starting from cell c where the node under consideration n is knocked in and knocked out respectively. The uncertainty factor is defined as follows (Eq. (4)):

$${{{\mathrm{Uncertainty}}}}\,{{{\mathrm{factor}}}} = \frac{{\left| {Maximum\,ERS_i} \right| - \left| {Observed\,ERS_i} \right| + 1}}{{\left| {Maximum\,ERS_i} \right|}}$$
(4)

where ERSi is the ERS for node i, |Maximum ERSi| is the maximum possible size of the ERS for a node i and |Observed ERSi| is the size of the ERS for a node i upon optimization by scBONITA.

The uncertainty factor weighs In relative to the maximum state space for that node, to capture the uncertainty in the rule determination for that node. The importance scores of the nodes in a network are scaled to [0, 1] by dividing by the maximum calculated importance score for the network under consideration.

A pathway modulation metric (MP) (Eq. (5)) is calculated by weighting the node importance score by the difference between the average gene expression in each group (relative abundance, RA) and the standard deviation of expression of that gene (σ) across cells. A p-value is calculated by bootstrapping, where a contrast-specific distribution of weighted importance scores is generated using randomly resampled RA values. Pathways are described in the text as being overall upregulated in a given contrast if the sum of fold changes of all genes in the pathway is positive. Conversely, pathways are described as being downregulated if the sum of fold changes of all genes in the pathway is negative.

$$M_{\mathrm {p}} = \mathop {\sum}\limits_{n = 1}^{{\mathrm {nodes}}} {{\mathrm {RA}}_n \ast \sigma _{{{n}}} \ast I_n}$$
(5)

Steady-state analysis with scBONITA

scBONITA assumes that the observed cellular states are defined by states of multiple dynamic cellular processes or signaling pathways. While observed cells are samples along a dynamic trajectory of signaling cascades, analyzing attractors upon randomly sampling the rules from ERS allows us to investigate the most common signaling states of a network under consideration. We sample 10 network-specific from the ERS inferred by scBONITA-RD to identify a set of reachable attractors. This is achieved by simulating the network synchronously as performed in other studies104,143,144,145 starting from an observed state (i.e., a cell expression vector) until a steady state (or an attractor cycle) is reached. By starting simulations from observed expression levels of all cells (i.e., all observed states), we can ensure that these simulations cover a large fraction of available state space for a given network. In this way, all reachable attractor states, corresponding to observable signaling states, can be identified. The similarity between cells and attractors is quantified using the Hamming distance. Cells are assigned to the attractor that is closest to their expression data.

Application of scBONITA on a publicly available data set

A scRNA-seq dataset obtained from four persons living with HIV (PLWH) before and during infection was selected to demonstrate the utility of the scBONITA pipeline on other datasets and to compare signaling dysregulations upon atherosclerosis in PLWH to signaling dysregulations upon HIV infection38. Log2-transformed TPM data and metadata processed and curated by the study authors were collected from the Single-Cell Portal database (https://singlecell.broadinstitute.org/single_cell/study/SCP256.). The complete scBONITA pipeline was used to compare samples collected before infection to samples collected 1 year after infection. We retained the cluster labels assigned by the authors of the original study. A set of 210 KEGG networks was used with the scBONITA pipeline.

In silico evaluation of scBONITA

To show that scBONITA-RD is robust to the training set size, we selected a cluster of B cells from the HIV/AS dataset. This subset of the dataset was manipulated to either downsample or augment the size of the training dataset (number of cells) presented to scBONITA-RD. The training dataset was downsampled to 1% and 50% of the original number of cells for cluster 0 (“B cells naïve – 1”). To augment the dataset and thereby introduce heterogeneity, the dataset was increased to 200% of its original size by adding cells from a neighboring cluster of B cells. A set of 210 KEGG networks was used to evaluate the sizes of the ERS obtained by scBONITA-RD using these manipulated training datasets. The size of the ERS is used as a proxy for scBONITA’s ability to successfully cut down the state space of the possible rules for each node using cross-sectional scRNA-seq data.

Implementation and availability

scBONITA is implemented in Python3 and C. Source code, Bash scripts, documentation, and tutorials are available on https://github.com/Thakar-Lab/scBONITA.

Ethics declarations

All methods were carried out in accordance with University of Rochester guidelines and regulations, and all experimental and study protocols were approved by the University of Rochester Institutional Review Board (#RSRB00063845). The project does not qualify as human subjects research (45 CFR 46.102) in that the activities do not involve human subjects as defined in federal regulations because this project utilizes anonymous information.