Mapping disease regulatory circuits at cell-type resolution from single-cell multiomics data

Chen, Xi; Wang, Yuan; Cappuccio, Antonio; Cheng, Wan-Sze; Zamojski, Frederique Ruf; Nair, Venugopalan D.; Miller, Clare M.; Rubenstein, Aliza B.; Nudelman, German; Tadych, Alicja; Theesfeld, Chandra L.; Vornholt, Alexandria; George, Mary-Catherine; Ruffin, Felicia; Dagher, Michael; Chawla, Daniel G.; Soares-Schanoski, Alessandra; Spurbeck, Rachel R.; Ndhlovu, Lishomwa C.; Sebra, Robert; Kleinstein, Steven H.; Letizia, Andrew G.; Ramos, Irene; Fowler, Vance G.; Woods, Christopher W.; Zaslavsky, Elena; Troyanskaya, Olga G.; Sealfon, Stuart C.

doi:10.1038/s43588-023-00476-5

Download PDF

Article
Open access
Published: 25 July 2023

Mapping disease regulatory circuits at cell-type resolution from single-cell multiomics data

Xi Chen^1,2,
Yuan Wang ORCID: orcid.org/0000-0001-7750-3587³,
Antonio Cappuccio⁴,
Wan-Sze Cheng⁴,
Frederique Ruf Zamojski ORCID: orcid.org/0000-0002-2451-8598⁴,
Venugopalan D. Nair⁴,
Clare M. Miller⁴,
Aliza B. Rubenstein⁴,
German Nudelman⁴,
Alicja Tadych²,
Chandra L. Theesfeld ORCID: orcid.org/0000-0002-8379-6600²,
Alexandria Vornholt⁴,
Mary-Catherine George⁴,
Felicia Ruffin⁵,
Michael Dagher⁵,
Daniel G. Chawla⁶,
Alessandra Soares-Schanoski⁴,
Rachel R. Spurbeck⁷,
Lishomwa C. Ndhlovu⁸,
Robert Sebra⁹,
Steven H. Kleinstein ORCID: orcid.org/0000-0003-4957-1544^6,10,
Andrew G. Letizia ORCID: orcid.org/0000-0002-7361-5917¹¹,
Irene Ramos ORCID: orcid.org/0000-0002-0223-0120^4,12,
Vance G. Fowler Jr⁵,
Christopher W. Woods⁵,
Elena Zaslavsky ORCID: orcid.org/0000-0002-4828-7771⁴^na1,
Olga G. Troyanskaya^1,2,3^na1 &
…
Stuart C. Sealfon ORCID: orcid.org/0000-0001-5791-1217⁴^na1

Nature Computational Science volume 3, pages 644–657 (2023)Cite this article

12k Accesses
1 Citations
21 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 31 August 2023

This article has been updated

A preprint version of the article is available at medRxiv.

Abstract

Resolving chromatin-remodeling-linked gene expression changes at cell-type resolution is important for understanding disease states. Here we describe MAGICAL (Multiome Accessibility Gene Integration Calling and Looping), a hierarchical Bayesian approach that leverages paired single-cell RNA sequencing and single-cell transposase-accessible chromatin sequencing from different conditions to map disease-associated transcription factors, chromatin sites, and genes as regulatory circuits. By simultaneously modeling signal variation across cells and conditions in both omics data types, MAGICAL achieved high accuracy on circuit inference. We applied MAGICAL to study Staphylococcus aureus sepsis from peripheral blood mononuclear single-cell data that we generated from subjects with bloodstream infection and uninfected controls. MAGICAL identified sepsis-associated regulatory circuits predominantly in CD14 monocytes, known to be activated by bacterial sepsis. We addressed the challenging problem of distinguishing host regulatory circuit responses to methicillin-resistant and methicillin-susceptible S. aureus infections. Although differential expression analysis failed to show predictive value, MAGICAL identified epigenetic circuit biomarkers that distinguished methicillin-resistant from methicillin-susceptible S. aureus infections.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Main

Gene expression can be modulated through the interplay of proximal and distal regulatory domains brought together in 3D space¹. Chromatin regulatory domains, transcription factors (TFs), and downstream target genes form regulatory circuits². Within circuits, the binding of TFs to chromatin regions and the 3D looping between these regions and gene promoters represent the mechanisms governing how TFs transform regulatory signals into changes in RNA transcription^3,4. In disease, these circuits could be dysregulated in a cell-type-specific manner and may not be observed from bulk samples⁵. Therefore, identifying the impact of disease on regulatory circuits requires a framework for mapping regulatory domains with chromatin accessibility changes to altered gene expression in the context of cell-type resolution⁶. Single-cell RNA sequencing (scRNA-seq) and single-cell transposase-accessible chromatin sequencing (scATAC-seq) characterizing disease states have improved the identification of differential chromatin sites and/or differentially expressed genes within individual cell types^5,7,8.

Yet, advances in single-cell assay technology have outpaced the development of methods to maximize the value of multiomics datasets for studying disease-associated regulation, especially for the regulatory interactions that are not directly measured by the omics data. Recent computational approaches^9,10,11,12 to support the multiomics data analysis demonstrate the promise of this area but still lack the capacity to resolve regulation changes within individual cell types, which precludes elucidating regulatory circuits affected by the disease or showing different responses in varying disease states.

To address these, we developed MAGICAL, a method that models coordinated chromatin accessibility and gene expression variation to identify circuits (both the units and their interactions) that differ between conditions. MAGICAL analyzes scRNA-seq and scATAC-seq data using a hierarchical Bayesian framework. To accurately detect differences in regulatory circuit activity between conditions, MAGICAL introduces hidden variables for explicitly modeling the transcriptomic and epigenetic signal variations between conditions and optimization against the noise in both scRNA-seq and scATAC-seq datasets. Because regulatory circuits are cell-type specific¹³, MAGICAL reconstructs them at cell-type resolution. Systematic benchmarking against multiple public datasets supported the accuracy of MAGICAL-identified regulatory circuits.

S. aureus, a bacterium often resistant to common antibiotics, is a major cause of severe infection and mortality^14,15. Using single-cell multiomics data generated from peripheral blood mononuclear cell (PBMC) samples of S. aureus-infected subjects and healthy controls, MAGICAL identified host response regulatory circuits that are modulated during S. aureus bloodstream infection, and circuits that discriminate the responses to methicillin-resistant S. aureus (MRSA) and methicillin-susceptible S. aureus (MSSA). Genes in the host circuits accurately predicted S. aureus infection in multiple validation datasets. Moreover, in contrast to conventional differential analysis that failed to identify specific genes for robust antibiotic-sensitivity prediction, MAGICAL-identified circuit genes can differentiate MRSA from MSSA. Therefore, MAGICAL can be used for multiomics-based gene signature development, providing a bioinformatic solution that can improve disease diagnosis.

Results

MAGICAL framework

MAGICAL identifies disease-associated regulatory circuits by comparing single-cell multiomics data (scRNA-seq and scATAC-seq) from disease and control samples (Fig. 1a). The framework incorporates TF motifs and chromatin topologically associated domain (TAD) boundaries as prior information to infer regulatory circuits comprising chromatin regulatory sites, modulatory TFs, and downstream target genes for each cell type. In brief, to build candidate disease-modulated circuits, differentially accessible sites (DAS) within each cell type are first associated with TFs by motif sequence matching and then linked to differentially expressed genes (DEG) in that cell type by genomic localization within the same TAD. Next, MAGICAL uses a Bayesian framework to iteratively model chromatin accessibility and gene expression variation across cells and samples in each cell type and to estimate the confidence of TF–peak and peak–gene linkages for each candidate circuit (Fig. 1b).

**Fig. 1: Overview of MAGICAL for mapping disease-associated regulatory circuits from scRNA-seq and scATAC-seq data.**

To accurately identify varying circuits between different conditions, MAGICAL explicitly models signal and noise in chromatin accessibility and gene expression data (see Methods section ‘MAGICAL’). A TF–peak binding variable and a hidden TF activity variable are jointly estimated to fit to the chromatin accessibility variation across cells from the conditions being compared. These two variables are then used together with a peak–gene looping variable to fit the gene expression variation. Using Gibbs sampling, MAGICAL iteratively estimates variable values and optimizes the states of circuit TF–peak–gene linkages. Finally, high-confidence circuits fitting the signal variation in both data types are selected.

TF activity represents the regulatory capacity (protein level) of a particular TF protein^16,17, which is distinct from TF expression. For each TF, we assume its hidden TF activities following an identical distribution across cells in the same cell type and the same sample, regardless of whether the cells are from the scATAC-seq assay, the scRNA-seq assay, or both. MAGICAL iteratively learns the activity distribution for each TF and estimates the specific activities of all TFs in each cell (Supplementary Fig. 1). This procedure eliminates the requirement of cell-level pairing of RNA-seq and ATAC-seq data. Thus, MAGICAL is a general tool that can analyze single-cell true multiome or sample-paired multiomics datasets.

We validated MAGICAL in multiple ways, demonstrating that it infers regulatory circuits accurately (Fig. 1c). MAGICAL-inferred linkages between chromatin sites and genes were validated using experimental 3D chromatin interactions. The resulting circuit genes, sites, and their regulatory TFs were evaluated in multiple independent studies. And finally, as one example of utility, we showed that the circuit genes can be used as features to classify disease states, providing a bioinformatics solution to challenging diagnostic problems.

Comparative analysis of performance

MAGICAL is a scalable framework. It can infer regulatory circuits of TFs, chromatin sites, and genes with differential activities between contrast conditions or infer regulatory circuits with active chromatin sites and genes in a single condition. Because existing integrative methods^11,12,18 can only be applied to single-condition data, to provide a comparative assessment of the performance of MAGICAL, we restricted MAGICAL to the single-condition data analysis possible with existing methods.

For peak–gene looping inference, we compared MAGICAL to the TRIPOD¹¹ and FigR¹⁸ methods, using the same benchmark single-cell multiome datasets as used by the authors reporting these methods. In the comparison of MAGICAL with TRIPOD using a 10x multiome single-cell dataset, MAGICAL-inferred peak–gene loops showed significantly higher enrichment of experimentally observed chromatin interactions in blood cells in the 4DGenome database¹⁹ (P < 0.0001, two-sided Fisher’s exact test, Supplementary Fig. 2a), the same validation data used by TRIPOD developers. MAGICAL also significantly outperformed FigR on the application to a GM12878 SHARE-seq dataset¹⁰. In that case, the peak–gene loops in MAGICAL-selected circuits had significantly higher enrichment of H3K27ac-centric chromatin interactions²⁰ than did FigR (P < 0.0001, two-sided Fisher’s exact test, Supplementary Fig. 2b).

Because the MAGICAL framework, unlike TRIPOD and FigR, used chromatin TAD boundaries as prior information, we evaluated whether the improvement in performance resulted solely from this additional information. To investigate this, we eliminated the use of TAD boundaries and modified MAGICAL for this test by assigning candidate linkages between peaks and genes within 500 kb (a naive distance prior). As shown in Supplementary Fig. 2a,b, even without the prior TAD information, MAGICAL still outperformed the competing methods (P < 0.001, two-sided Fisher’s exact test). Overall, these results suggest that in addition to the benefit of priors, explicit modeling of signal and noise in both chromatin accessibility and gene expression data increased the accuracy of peak–gene looping identification.

MAGICAL analysis of COVID-19 single-cell multiomics data

To demonstrate the accuracy of the primary application of MAGICAL on contrast-condition data to infer disease-modulated circuits, we applied MAGICAL to sample-paired PBMC scRNA-seq and scATAC-seq data from individuals infected with SARS-CoV-2 and healthy controls⁵. Because immune responses in patients with COVID-19 differ according to disease severity^21,22, MAGICAL inferred the regulatory circuits for mild and severe clinical groups separately. The chromatin sites and genes in the identified circuits were validated using newly generated and publicly available independent COVID-19 single-cell datasets (Fig. 2a). We primarily focused on three cell types that have been found to show widespread gene expression and chromatin accessibility changes in response to SARS-CoV-2 infection^23,24, including CD8 effector memory T (TEM) cells, CD14 monocytes (Mono), and natural killer (NK) cells. In total, MAGICAL identified 1,489 high-confidence circuits (1,404 sites and 391 genes) in these cell types for mild and severe clinical groups (Supplementary Table 1; Methods section ‘MAGICAL analysis of COVID-19 single-cell multiomics data’).

**Fig. 2: Validation of COVID-19-associated circuit chromatin sites and genes.**

To confirm the circuit chromatin sites selected by MAGICAL for mild COVID-19, we generated an independent PBMC scATAC-seq dataset from six people infected with SARS-CoV-2 with mild symptoms and three uninfected (polymerase chain reaction (PCR)-negative) controls (Fig. 2b; Supplementary Table 2). Approximately 25,000 quality cells were selected after quality-control (QC) analysis. These cells were integrated, clustered, and annotated using ArchR²⁵ (Supplementary Fig. 3; Supplementary Table 3). Peaks were called from each cell type using MACS2²⁶. In total, 284,909 peaks were identified (Supplementary Table 4). For the three selected cell types, differential analysis between COVID-19 and control groups returned 3,061 sites for CD8 TEM, 1,301 sites for CD14 Mono, and 1,778 sites for NK (Supplementary Table 5; Methods section ‘COVID-19 PBMC scATACseq data analysis’). This produced three validation peak sets for mild COVID-19 infection. For severe COVID-19, an existing study focused on T cells identified specific chromatin activity changes with severe COVID-19 in CD8 T cells²⁷. We used their reported chromatin sites to validate the circuit chromatin sites identified in CD8 T cells. In all four validation sets, the precision (proportion of sites that are differential in the validation data) of the MAGICAL-selected chromatin sites was significantly higher than the original DAS (P < 0.001, two-sided Fisher’s exact test, Fig. 2c,d).

When multiple potential chromatin regulatory loci are identified in the vicinity of a specific gene, it is commonly assumed that the locus closest to the transcriptional starting site (TSS) is likely to be the most important regulatory site. Challenging this assumption, however, are experimental studies that show genes may not be regulated by the nearest region^28,29. Supporting the importance of more distal regulatory loci, MAGICAL-selected chromatin sites significantly outperformed the nearest DAS to the TSS of DEG or all DAS within the same TAD with DEG, and the improvement is substantial (precision is approximately 50% better with MAGICAL, P < 0.05, two-sided Fisher’s exact test, Fig. 2c,d).

To validate the circuit genes modulated by mild or severe COVID-19, we used genes reported by external COVID-19 single-cell studies^21,30,31. In total, we collected six validation gene sets (three cell types for mild COVID-19 and three cell types for severe COVID-19). The precision of MAGICAL-selected circuit genes is significantly higher than that of original DEG in all validations (precision is approximately 30% better with MAGICAL, P < 0.05, two-sided Fisher’s exact test, Fig. 2e,f). These results confirmed the increased accuracy of disease association for both chromatin sites and genes in MAGICAL-identified regulatory circuits.

MAGICAL analysis of S. aureus single-cell multiomics data

We applied MAGICAL to the clinically important challenge of distinguishing MRSA and MSSA infections^32,33,34. We profiled sample-paired scRNA-seq and scATAC-seq data using human PBMCs from adults whose blood cultures were positive for S. aureus (10 MRSA and 11 MSSA), and from 23 uninfected control subjects (Fig. 3a; Supplementary Table 6). To integrate scRNA-seq data from all samples, we implemented a Seurat-based³⁵ batch correction and cell type annotation pipeline (Methods section ‘S. aureus scRNA-seq data analysis’). In total, 276,200 quality cells were selected and labeled (Fig. 3b; Supplementary Fig. 4; Supplementary Table 7). For scATAC-seq data, we integrated the fragment files from quality samples using ArchR²⁵ and selected and annotated 70,174 quality cells (Fig. 3c; Supplementary Fig. 5; Supplementary Table 8). In total, 388,860 peaks were identified (Supplementary Fig. 5b; Supplementary Table 9; Methods section ‘S. aureus scATAC-seq data analysis’). In total, 13 major cell types that surpassed the 200-cell threshold in both scRNA-seq and scATAC-seq data were selected for subsequent analysis (Supplementary Fig. 6). Differential analysis for three contrasts (MRSA versus control, MSSA versus control, and MRSA versus MSSA) in each cell type returned a total of 1,477 DEG and 23,434 DAS (Supplementary Fig. 7; Supplementary Tables 10 and 11).

**Fig. 3: MAGICAL accurately identified distal regulatory chromatin sites and epi-driven genes associated with S. *aureus* infection.**

MAGICAL identified 1,513 high-confidence regulatory circuits (1,179 sites and 371 genes) within cell types for three contrasts (MRSA versus control, MSSA versus control, and MRSA versus MSSA) (Supplementary Table 12; Methods: MAGICAL analysis of S. aureus single-cell multiomics data). It has been reported that activation of CD14 Mono plays a principal role in response to S. aureus infection^36,37. In MAGICAL analysis, CD14 Mono showed the highest number of regulatory circuits (Fig. 3d). Comparing circuits between cell types we found that these disease-associated circuits are cell-type-specific (Fig. 3e). For example, circuits rarely overlapped between very distinct cell types like monocytes and T cells. Between relevant cell types like CD14 Mono and CD16 Mono, or between subtypes of T cells, most circuits are still specific for one cell subtype. These circuits were further validated using cell type-specific chromatin interactions reported in a reference promoter capture (pc) Hi-C dataset¹³. In all the cell types for which the cell-type-specific pcHi-C data was available (B cells, CD4 T cells, CD8 T cells, CD14 Mono), the circuit peak–gene interactions showed significant enrichment of pcHi-C interactions in matched cell types (Fig. 3f; P < 0.01, one-sided hypergeometric test). For comparison, we also performed the peak–gene interaction enrichment analysis between different cell types, finding significantly lower enrichment levels. These results indicate the cell-type specificity of MAGICAL-identified circuits.

In CD14 Mono, MAGICAL identified AP-1 complex proteins as the most important regulators, especially at chromatin sites with increased activity in cells exposed to infections (Fig. 3g). This finding is consistent with the importance of this protein complex in gene regulation in response to a variety of infections^5,38,39. Supporting the accuracy of the identified TFs, we compared circuit chromatin sites with ChIP-seq peaks from the Cistrome database⁴⁰. The most similar TF ChIP-seq profiles were from AP-1 complex proteins JUN and FOS in blood or bone marrow samples (Supplementary Fig. 8). Moreover, functional enrichment analysis⁴¹ of the circuit genes showed that cytokine signaling, a known pathway mediated by AP-1 complex and associated with the inflammatory responses in macrophages^42,43, was the most enriched (adjusted P = 2.4 × 10^–11, one-sided hypergeometric test).

MAGICAL modeled regulatory effects of both proximal and distal sites on genes. We examined the chromatin site location relative to the target gene TSS, for the identified circuits in CD14 Mono. Compared to all ATAC peaks called around the circuit genes, a substantially increased proportion of circuit chromatin sites were located 15Kb to 25Kb away from the TSS (Fig. 3h). This pattern is consistent with the 24Kb median enhancer distance found by CRISPR-based perturbation in a blood cell line⁴⁴. In addition, nearly 50% of circuit chromatin sites were overlapping with enhancer-like regions in the ENCODE database⁴⁵, further emphasizing that MAGICAL circuits are enriched in distal regulatory loci. We also found that these circuit chromatin sites were significantly enriched in inflammatory-associated genomic loci reported in the genome-wide association studies (GWAS) catalog database⁴⁶, suggesting active host epigenetic responses to infectious diseases (Supplementary Fig. 9; P < 0.005 when compared to control diseases, two-sided Wilcoxon rank sum test). Notably, one distal chromatin site (hg38 chr6: 32,484,007–32,484,507) looping to gene HLA-DRB1 is within the most significant GWAS region (hg38 chr6: 32,431,410–32,576,834) previously reported to associate with S. aureus infection⁴⁷.

We finally compared circuit genes to existing epi-genes whose transcriptions were significantly driven by epigenetic perturbations in CD14 Mono⁴⁸. MAGICAL-identified circuit genes were significantly enriched with epi-genes (Fig. 3i; adjusted P < 0.005, one-sided hypergeometric test) while the remaining DEG not selected by MAGICAL, or DEG mappable with DAS either within the same topological domains or closest to each other showed no evidence of being epigenetically driven. These results suggest that MAGICAL accurately identified regulatory circuits activated in response to S. aureus infection.

S. aureus infection prediction

Early diagnosis of S. aureus infection and the strain’s antibiotic sensitivity is critical to choosing appropriate treatment for this life-threatening condition. We first evaluated whether the MAGIC-identified circuit genes that are common to MRSA and MSSA infections could provide a robust signature for predicting the diagnosis of S. aureus infection in general. Within each cell type, we selected circuit genes common to both the MRSA versus control and MSSA versus control analyses, resulting in 152 genes (Fig. 4a; Supplementary Table 12). To evaluate the prediction accuracy of these molecular features on S. aureus infection, we collected external, public expression data of S. aureus infected subjects. In total, we found one adult whole-blood⁴⁹ and two pediatric PBMC bulk microarray datasets^50,51 that comprised a total of 126 subjects infected with S. aureus and 68 uninfected controls. The use of pediatric validation data has the advantage of providing a much more rigorous test of the robustness of MAGICAL-identified circuit genes for classifying disease samples in this very different cohort.

**Fig. 4: MAGICAL-identified circuit genes robustly predict S. *aureus* infection and bacteria antibody sensitivity.**

To allow validation using public bulk transcriptome datasets, we refined the 152 circuit genes set by selecting those with robust performance in our dataset at pseudobulk level. We calculated an area under the receiver operating characteristic curve (AUROC) for each circuit gene by classifying S. aureus infected and control subjects using pseudobulk gene expression (aggregated from the discovery scRNA-seq data). In total, 117 circuit genes with AUROCs greater than 0.7 were selected (Supplementary Table 13; Supplementary Fig. 10a). Functional gene enrichment analysis showed that interleukin (IL)-17 signaling was significantly enriched (adjusted P = 2.4 × 10^–4, one-sided hypergeometric test), including genes from the AP-1, Hsp90, and S100 families. IL-17 has been found to be essential for the host defense against cutaneous S. aureus infection in mouse models⁵². We trained a support vector machine (SVM) model using the selected circuit genes as features and the discovery pseudobulk gene expression data as input. We then applied the trained SVM model to each of the three validation datasets. The model achieved high prediction performance on all datasets, with AUROCs from 0.93 to 0.98 (Fig. 4a).

This generalizability of circuit genes for predicting infection in different cohorts suggested that MAGICAL identifies regulatory processes that are fundamental to the host response to S. aureus sepsis. We further evaluated this by comparing the 117 circuit genes to 366 filtered DEG (with per gene AUROC > 0.7 in the discovery pseudobulk gene expression data). We examined the differential expression π value⁵³ (a statistic score that combines both fold change (FC) and P-values) of genes in the validation datasets and found significantly higher π values for the circuit genes (Supplementary Fig. 10b; P = 9.0 × 10^–3, one-sided Wilcoxon rank sum test).

S. aureus antibiotic sensitivity prediction

We then addressed the more challenging problem of predicting strain antibiotic sensitivity among S. aureus infected subjects. When we tested the predictive models trained with DEG for the contrast of MRSA and MSSA on three pediatric PBMC microarray datasets^50,51,54 (comprising a total of 66 MRSA and 45 MSSA samples), we did not find predictive value (median of prediction areas under the curve (AUCs) close to 0.5; Supplementary Fig. 10d–f). And in all tests, the statistical difference of DEG-based prediction scores between MRSA and MSSA samples in the validation datasets was never significant. These results suggest that using host scRNA-seq data alone fails to identify robust molecular features for predicting the antibiotic sensitivity of the infected strain. Our observation echoes previous studies showing that in some challenging cases, differential expression analysis using RNA-seq data had limited power to identify robust features for disease-control sample classification⁵⁵.

With MAGICAL we identified 53 circuit genes from the comparative multiomics data analysis between MRSA and MSSA (Supplementary Table 14). A model trained using 32 circuit genes that were robustly differential in the discovery pseudobulk data (per gene discovery AUROC > 0.7, Supplementary Fig. 10c) best distinguished antibiotic-resistant and antibiotic-sensitive samples in all three validation datasets, with AUROCs from 0.67 to 0.75 (Fig. 4b). And the statistical difference between prediction scores of MRSA and MSSA samples was significant (P = 9.2 × 10^–3, two-sided Wilcoxon rank sum test). The success of the circuit-gene-based model demonstrated that MAGICAL captured generalizable regulatory differences in the host immune response to these closely related bacterial infections.

Discussion

MAGICAL addressed the previously unmet need of identifying differential regulatory circuits based on single-cell multiomics data from contrast conditions. Critically, it identifies regulatory circuits involving distal chromatin sites. The previously difficult-to-predict distal regulatory sites are increasingly recognized as key for understanding gene regulatory mechanisms. As MAGICAL uses DAS and DEG called from a pre-selected cell type, for less distinct cell types or conditions, it is harder for MAGICAL to infer circuits at cell-type resolution as there will be few candidate peaks and genes. Also, MAGICAL analyzes each cell type separately, and cell-type specificity is not directly modeled for disease circuit identification. Incorporating an approach to directly identify cell-type-specific circuits regulated in disease conditions would be valuable. In future work, we hope to extend the MAGICAL framework to improve circuit identification when cell types are poorly defined and to model cell-type specificity.

Methods

Human participants

The COVID-19 study protocol was approved by the Naval Medical Research Center institutional review board (protocol number NMRC.2020.0006) in compliance with all applicable federal regulations governing the protection of human subjects. The S. aureus sepsis study protocol was reviewed and approved by the Duke Medical School institutional review board (protocol number Pro00102421). Subjects provided written informed consent prior to participation.

Statistics and reproducibility

No statistical methods were used to pre-determine sample sizes. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

S. aureus patient and control samples selection

Patients with culture-confirmed S. aureus bloodstream infection transferred to Duke University Medical Center were eligible if pathogen speciation and antibiotic susceptibilities were confirmed by the Duke Clinical Microbiology Laboratory. DNA and RNA samples, PBMCs, clinical data, and the bacterial isolate from the subject were cataloged using an institutional review board-approved notification of decedent research. We excluded samples with prior enrollment of the patient in this investigation (to ensure statistical independence of observations) or they were polymicrobial (that is, more than one organism in blood or urine culture). In total, 21 adult patients were selected (10 MRSA and 11 MSSA). None of them received any antibiotics in the 24 h before the bloodstream infection. Control samples were obtained from uninfected healthy adults matching the sample number and age range of the patient group. In total, 23 samples were collected from two cohorts: 14 controls (provided by Weill Cornell Medicine, New York, NY), and 9 controls (provided by the Battelle Memorial Institute, Columbus, OH). Meta information of the selected subjects are provided in Supplementary Table 6.

PBMC thawing

Frozen PBMC vials were thawed in a water bath at 37 °C for 1–2 minutes and placed on ice. Roswell Park Memorial Institute (RPMI) medium with 20% fetal bovine serum (FBS) (500 μl) was added dropwise to the thawed vial, the content was aspirated and added dropwise to 9 ml of RPMI/20% FBS. The tube was gently inverted to mix, before being centrifuged at 300g for 5 min. After removal of the supernatant, the pellet was resuspended in 1–5 ml of RPMI/10% FBS depending on the size of the pellet. Cell count and viability were assessed with Trypan Blue on a Countess II cell counter (Invitrogen).

S. aureus scRNA-seq data generation

ScRNA-seq was performed (10x Genomics, Pleasanton, CA), following the Single Cell 3′ Reagents Kits V3.1 User Guidelines. Cells were filtered, counted on a Countess instrument, and resuspended at a concentration of 1,000 cells μl^–1. The number of cells loaded on the chip was determined based on the 10x Genomics protocol. The 10x chip (Chromium Single Cell 3′ Chip kit G PN-200177) was loaded to target 5,000–10,000 final cells. Reverse transcription was performed in the emulsion and complementary DNA was amplified following the Chromium protocol. Quality control and quantification of the amplified cDNA were assessed on a Bioanalyzer (High-Sensitivity DNA Bioanalyzer kit) and the library was constructed. Each library was tagged with a different index for multiplexing (Chromium i7 Multiplex Single Index Plate T Set A, PN-2000240) and quality controlled using a Bioanalyzer prior to sequencing.

S. aureus scRNA-seq data analysis

Reads of scRNA-seq experiments were aligned to human reference genome (hg38) using 10x Genomics Cell Ranger software (version 1.2). The filtered feature-by-barcode count matrices were then processed using Seurat³⁵. Quality cells were selected as those with more than 400 features (transcripts), fewer than 5,000 features, and less than 10% of mitochondrial content (Supplementary Fig. 4; Supplementary Table 7). Cell cycle phase scores were calculated using the canonical markers for G2M and S phases embedded in the Seurat package. Finally, the effects of mitochondrial reads and cell cycle heterogeneity were regressed out using SCTransform.

To integrate cells from heterogeneous disease samples, we first built a reference by integrating and annotating cells from the uninfected control samples using a Seurat-based pipeline. For batch correction, we identified the intrinsic batch variants and used Seurat to integrate cells together with the inferred batch labels. All control samples were integrated into one harmonized query matrix. Each cell was assigned a cell-type label by referring to a reference PBMC single cell dataset. The cell-type label of each cell cluster was determined by most cell labels in each. Canonical markers were used to refine the cell-type label assignment. This integrated control object was used as reference to map the infected samples.

To avoid artificially removing the biological variance between each infected sample during batch correction, we computationally predicted and manually refined cell types for each sample. All infection samples were projected onto the UMAP (Uniform Manifold Approximation and Projection) of the control object for visualization purpose. In total, 276,200 high-quality cells and 19 cell types with at least 200 cells in each were selected for the subsequent analysis. Within each cell type, DEG between contrast conditions were first called using the Findmarkers function of the Seurat V4 package³⁵ with default parameters. DEG with Wilcoxon test false discovery rate (FDR) < 0.05, |log₂(FC)| >0.1 and actively expressed in at least 10% of cells (pct > 0.1) from either condition were selected. To correct potential bias caused by the different sequencing depth between samples, we ran DEseq2⁵⁶ on the aggregated pseudobulk gene expression data. Refined DEG passing pseudobulk differential statistics P < 0.05 and |log₂(FC)| >0.3 were selected as the final DEG (Supplementary Table 10).

Nuclei isolation for scATACseq

Thawed PBMCs were washed using phosphate-buffered saline (PBS) with 0.04% bovine serum albumin (BSA). Cells were counted and 100,000–1,000,000 cells were added to a 2 ml microcentrifuge tube. Cells were centrifuged at 300g for 5 min at 4 °C. The supernatant carefully completely removed, and 0.1X lysis buffer (1x: 10 mM Tris-HCl pH 7.5, 10 mM NaCl, 3 mM MgCl₂, nuclease-free H₂O, 0.1% v/v NP-40, 0.1% v/v Tween-20, 0.01% v/v digitonin) was added. After 3 min incubation on ice, 1 ml of chilled wash buffer was added. The nuclei were pelleted at 500g for 5 min at 4 °C and resuspended in a chilled diluted nuclei buffer (10x Genomics) for scATAC-seq. Nuclei were counted and the concentration was adjusted to run the assay.

S. aureus scATAC-seq data generation

ScATAC-seq was performed immediately after nuclei isolation and following the Chromium Single Cell ATAC Reagent Kits V1.1 User Guide (10x Genomics, Pleasanton, CA). Transposition was performed in 10 μl at 37 °C for 60 min on at least 1,000 nuclei, before loading of the Chromium Chip H (PN-2000180). Barcoding was performed in the Gel Bead-in-Emulsion (GEMs) (12 cycles) following the Chromium protocol. After post-GEM cleanup, libraries were prepared following the protocol and were indexed for multiplexing (Chromium i7 Sample Index N, Set A kit PN-3000427). Each library was assessed on a Bioanalyzer (High-Sensitivity DNA Bioanalyzer kit).

S. aureus scATAC-seq data analysis

Reads of scATAC-seq experiments were aligned to human reference genome (hg38) using 10x Genomics Cell Ranger software (version 1.2). The resulting fragment files were processed using ArchR²⁵. Quality cells were selected as those with TSS enrichment >12, the number of fragments >3,000 and <30,000, and nucleosome ratio <2 (Supplementary Fig. 5a; Supplementary Table 8). The likelihood of doublet cells was computationally assessed using the addDoubletScores function and cells were filtered using the filterDoublets function with default settings. Cells passing quality and doublet filters from each sample were combined into a linear dimensionality reduction using the addIterativeLSI function with the input of the tile matrix (read counts in binned 500 bp across the whole genome) with iterations = 2 and varFeatures = 20,000. This dimensionality reduction was then corrected for batch effect using the Harmony method⁵⁷, via the addHarmony function. The cells were then clustered based on the batch-corrected dimensions using the addClusters function. We annotated scATAC-seq cells using the addGeneIntegrationMatrix function, referring to a labeled multimodal PBMC single cell dataset. Doublet clusters containing a mixture of many cell types were manually identified and removed. In total, 70,174 high-quality cells and 13 cell types with at least 200 cells in each were selected.

Peaks were called for each cell type using the addReproduciblePeakSet function with the MACS2 peak caller²⁶ (Supplementary Fig. 5b). In total, 388,859 peaks were identified (Supplementary Table 9). Within each cell type, differentially accessible chromatin sites (DAS) between contrast conditions (MRSA versus control, MSSA versus control or MRSA versus MSSA) were called from the single cell chromatin accessibility count data using the getMarkerFeatures function²⁵, with parameter settings as testMethod = wilcoxon, bias = log10(nFrags), normBy = ReadsInPeaks, and maxCells = 15,000. Peaks with single cell differential statistics FDR < 0.05, |log₂(FC)| >0.1, and actively accessible in at least 10% of cells (pct > 0.1) from either condition were selected as DAS. Owing to the high false positive rate in single-cell-based differential analysis⁵⁸, we further refined the DAS by fitting a linear model to the aggregated and normalized pseudobulk chromatin accessibility data and tested DAS individually about their covariance with sample conditions⁵⁶. Refined DAS passing pseudobulk differential statistics P < 0.05 and |log₂(FC)| >0.3 between the contrast conditions were selected as the final DAS (Supplementary Table 11).

MAGICAL

To build candidate regulatory circuits, TFs were mapped to the candidate chromatin sites by searching for human TF motifs from the chromVARmotifs library⁵⁹ using the addMotifAnnotations function (ArchR). The TF binding sites were then linked with the candidate genes by requiring them in the same TAD within boundaries. Then, a candidate circuit is constructed with a chromatin site and a gene in the same domain, with at least one TF binding at the site.

For each cell type (that is, the i-th cell type), MAGICAL inferred the confidence of TF–peak binding and peak–gene looping in each candidate circuit using a hierarchical Bayesian framework with two models: a model of TF–peak binding confidence (B) and hidden TF activity (T) to fit chromatin accessibility (A) for M TFs and P chromatin sites in K_A,S,i cells with scATAC-seq measures from S samples; a second model of peak–gene interaction (L) and the refined (noise removed) regulatory region activity (BT) to fit gene expression (R) of G genes in K_R,S,i cells with scRNA-seq measures from the same S samples.

$${{{A}}}_{P\times {K}_{A,S},i}={{{B}}}_{P\times M,i}{{{T}}}_{M\times {K}_{A,S},{i}}+{{{N}}}_{P\times {K}_{A,S},{i}},$$

(1)

$${{{R}}}_{G\times {K}_{R,S},{i}}={{{L}}}_{G\times P,i}{{{B}}}_{P\times M,i}{{{T}}}_{M\times {K}_{R,S},{i}}+{{{N}}}_{G\times {K}_{R,S},{i}},$$

(2)

${{{A}}}_{P\times {K}_{A,S},i}$ was a P by K_A,S,i matrix with each element ${a}_{p,{k}_{A,s},i}$ representing the ATAC read count of p-th chromatin site (ATAC peak) in the k_A,s-th cell in the s-th sample.

${{{R}}}_{G\times {K}_{R,S},i}$ was a G by K_R,S,i matrix with each element ${r}_{g,{k}_{R,s},i}$ representing the RNA read count of g-th gene in the k_R,s-th cell of the s-th sample.

${{{N}}}_{P\times {K}_{A,S},{i}}$ and ${{{N}}}_{G\times {K}_{R,S},{i}}$ represented data noise corresponding to ${{{A}}}_{P\times {K}_{A,S},{i}}$ and ${{{R}}}_{G\times {K}_{R,S},{i}}$.

${{{B}}}_{P\times M,i}$ was a P by M matrix with each element ${b}_{p,m,i}$ representing the binding confidence of the m-th TF on the p-th candidate chromatin site.

${{{L}}}_{G\times P,i}$ was a G by P matrix with each element ${l}_{p,g,i}$ representing the interaction between the p-th chromatin site and the g-th gene.

${{{T}}}_{M\times {K}_{A,S},{i}}$ was an M by K_A,S ,i matrix with each element $\,{t}_{m,{k}_{A,s},i}$ representing the hidden TF activity of the m-th TF in the ${k}_{A,s}$-th ATAC cell of s-th sample.

${{{T}}}_{M\times {K}_{R,S},{i}}$ was an M by K_R,S ,i matrix with each element $\,{t}_{m,{k}_{R,s},i}$ representing the hidden TF activity of the m-th TF in the ${k}_{R,s}$-th RNA cell of s-th sample.

${{{T}}}_{M\times {K}_{A,S},{i}}$ and ${{{T}}}_{M\times {K}_{R,S},{i}}$ were both extended from the same ${{{T}}}_{M\times S,{i}}$ (with elements ${t}_{m,s,i}$) by assuming that in the i-th cell type and the s-th sample, the m-th TF’s regulatory activities in all ATAC cells and all RNA cells followed an identical distribution of a single variable ${t}_{m,s,i}$. Therefore, ${K}_{A,S,i}$ and ${K}_{R,S,i}$ can be different numbers and MAGICAL will only estimate the matrix ${{{T}}}_{M\times S,{i}}$.

To select high-confidence regulatory circuits, MAGICAL estimated the confidence (probability) of TF–peak binding ${{{B}}}_{P\times M,i}$ and peak–gene interaction ${{{L}}}_{G\times P,i}$ together with the hidden variable ${{{T}}}_{M\times S,{i}}$ in a Bayesian framework.

$$P\left({{B}},{{T}},{{L}}{\rm{|}}{{A}},{{R}}\right)\propto P({{R}}{\rm{|}}{{L}},{{B}},{{T}}\,)P({{A}}{\rm{|}}{{B}},{{T}}\,)P({{L}})P({{B}})P\left(T\,\right).$$

(3)

Based on the regulatory relationship among chromatin sites, upstream TFs, and downstream genes (as illustrated in Fig. 1), the posterior probability of each variable can be approximated as:

$$P\left({{T}}{{|}}{{A}}{{,}}{{B}}\right)\propto P({{A}}{{|}}{{B}}{{,}}{{T}}\,)P\left({{T}}\,\right),$$

(4)

$$P\left({{B}}{{|}}{{A}}{{,}}{{T}}\,\right)\propto P({{A}}{{|}}{{B}}{{,}}{{T}}\,)P\left({{B}}\right),$$

(5)

$$P\left({{L}}{\rm{|}}{{R}},{{B}},{{T}}\,\right)\propto P\left({{R}}{\rm{|}}{{L}},{{B}},{{T}}\,\right)P({{L}}).$$

(6)

Although the prior states of ${b}_{p,m,i}$ and ${l}_{p,g,i}$ were obtained from the prior information of TF motif–peak mapping and topological-domain-based peak–gene pairing, their values were unknown. We assumed zero-mean Gaussian priors for B, L and the hidden variable T by assuming that positive regulation and negative regulation would have the same priors, which is likely to be true given the fact that there were usually similar numbers of upregulated and downregulated peaks and genes after the differential analysis. We set a high variance (non-informative) in each prior distribution to allow the algorithm to learn the distributions from the input data.

$${b}_{p,m,i} \sim\mathrm{normal}(\,{\mu }_{B},{\sigma }_{B}^{2}),$$

(7)

$${t}_{m,s,i} \sim\mathrm{normal}(\,{\mu }_{T},{\sigma }_{T}^{2}),$$

(8)

$${l}_{p,g,i} \sim\mathrm{normal}(\,{\mu }_{L},{\sigma }_{L}^{2}).$$

(9)

where $(\,{\mu }_{B},{\sigma }_{B}^{2})$, $(\,{\mu }_{T},{\sigma }_{T}^{2})$, and $(\,{\mu }_{L},{\sigma }_{L}^{2})$ are hyperparameters representing the prior mean and variance of TF–peak binding, TF activity, and peak–gene looping variables.

The likelihood functions $P({{A|B}}{{,}}{{T}}\,)$ and $P\left({{R|L}}{{,}}{{B}}{{,}}{{T}}\,\right)$ represent the fitting performance of the estimated variables to the input data. These two conditional probabilities are equal to the probabilities of the fitting residues ${{{N}}}_{P\times {K}_{A,S},{i}}$ and ${{{N}}}_{G\times {K}_{R,S},{i}}$, for which we assumed zero-mean Gaussian distributions.

$${{A}}|{{B}}{{,}}{{T}} \sim\mathrm{normal}({\mu }_{{N}_{A}},{\sigma }_{{N}_{A}}^{2}),\,{\sigma }_{{N}_{A}}^{2} \sim\mathrm{{inverse}\,{gamma}}({\alpha }_{{N}_{A}},\,{\beta }_{{N}_{A}}),$$

(10)

$${{R}}|{{L}}{{,}}{{B}}{{,}}{{T}} \sim\mathrm{normal}({\mu }_{{N}_{R}},{\sigma }_{{N}_{R}}^{2}),{\sigma }_{{N}_{R}}^{2} \sim\mathrm{{inverse}\,{gamma}}({\alpha }_{{N}_{R}},\,{\beta }_{{N}_{R}}),$$

(11)

where $({\mu }_{{N}_{A}},{\sigma }_{{N}_{A}}^{2})$ and $({\mu }_{{N}_{R}},{\sigma }_{{N}_{R}}^{2})$ are hyperparameters representing the prior mean and variance of data noise in the ATAC and RNA measures. Here, the variance of the signal noise is modeled using inverse gamma distributions, with hyperparameters $({\alpha }_{{N}_{A}},\,{\beta }_{{N}_{A}})$ and $({\alpha }_{{N}_{R}},\,{\beta }_{{N}_{R}})$ to control the variance of fitting residues (very low probabilities on large variances).

Then, the posterior probability of each variable defined in equations (4)–(6) was still a Gaussian distribution with poster mean $\hat{\mu }$ and variance $\hat{\sigma }$ as shown below:

$${\hat{b}}_{p,m,i} \sim\mathrm{normal}(\,{\hat{\mu }}_{B,m,i},{{{\hat{\sigma }}_{B,m,i}}}^{2}),$$

(12)

$${\hat{t}}_{m,s,i} \sim\mathrm{normal}(\,{\hat{\mu }}_{T,m,s,i},{{\hat{\sigma }}_{T,m,s,i}}^{2}),$$

(13)

$${\hat{l}}_{p,g,i} \sim\mathrm{normal}(\,{\hat{\mu }}_{L,i},{{\hat{\sigma }}_{L,i}}^{2}).$$

(14)

Gibbs sampling was used to iteratively learn the posterior distribution mean and variance of each set of variables and draw samples of their values accordingly.

For the TF–peak binding events, the posterior mean ${\hat{\mu }}_{B,{m},i}$ and variance ${{\hat{\sigma }}_{B,m,i}}^{2}$ were estimated specifically for m-th TF since the number of binding sites and the positive or negative regulatory effects between TFs could be very different.

$$\begin{array}{l}{\hat{\mu }}_{B,m,i}={\frac{{\sum }_{s}{\sum }_{k}{t}_{m,s,i}\left({a}_{p,{ks},i}-{\sum} _{{m\prime}}{b}_{p,{m\prime},i}{t}_{{m\prime},s,i}\right){\sigma }_{B}^{2}+{\mu }_{B,t}{\sigma }_{{N}_{A}}^{2}}{{\sum }_{s}{K}_{A,s}{t}_{m,s,i}^{2}{\sigma }_{B}^{2}+{\sigma }_{{N}_{A}}^{2}}}\,{\mathrm{and}}\\{{\hat{\sigma}}_{B,m,i}}^{2}={\frac{{\sigma }_{{N}_{A}}^{2}{\sigma }_{B}^{2}}{{\sum }_{s}{K}_{A,s}{t}_{m,s,i}^{2}{\sigma }_{B}^{2}+{\sigma }_{{N}_{A}}^{2}}}.\end{array}$$

(15)

For TF activities, the posterior mean ${\hat{\mu }}_{T,m,s,i}$ and variance ${{\hat{\sigma }}_{T,m,s,i}}^{2}$ were estimated specifically for the m-th TF and s-th sample using chromatin accessibility data as follows:

$$\begin{array}{l}{\hat{\mu }}_{T,m,s,i}={\frac{{\sum }_{p}{\sum }_{k}{b}_{p,m}\left({a}_{p,k,s,i}-{\sum} _{m{\prime} }{b}_{p,m{\prime} }{t}_{m{\prime} ,s}\right){\sigma }_{T}^{2}+{\mu }_{T}{\sigma }_{{N}_{A}}^{2}}{{\sum }_{p}{K}_{A,s}{{b}_{p,m,i}^{2}{{\sigma }_{T}^{2}}+{\sigma }_{{N}_{A}}^{2}}}}\,{\mathrm{and}}\\{{\hat{\sigma }}_{T,m,s,i}}^{2}=\frac{{\sigma }_{{N}_{A}}^{2}{\sigma }_{T}^{2}}{{{\sum }_{p}{K}_{A,s}{{b}_{p,m,i}^{2}}{\sigma }_{T}^{2}}+{{\sigma}_{{N}_{A}}^{2}}}.\end{array}$$

(16)

Then, based on the estimated distribution parameters of ${\hat{\mu }}_{T,m,s,i}$ and ${{\hat{\sigma }}_{T,m,s,i}}^2$ of ${\hat{t}}_{m,s,i}$, for the ${k}_{R,s}$-th RNA cell in the same s-th sample we draw a TF regulatory activity sample as ${\hat{t}}_{m,{k}_{R},s,i}$. For p-th peak, we were able to reconstruct its chromatin activity in the RNA cell as ${\hat{a}}_{p,{k}_{R},s,i}={\sum }_{m}{\hat{b}}_{p,m,i}{\hat{t}}_{m,{k}_{R},s,i}$, and for g-th gene, we further estimated the interaction confidence ${\hat{l}}_{p,g,i}$ between p-th peak and g-th gene. The peak–gene interaction distribution parameters ${\hat{\mu }}_{L,i}$ and ${{\hat{\sigma }}_{L,i}}^{2}$ were estimated as follows:

$$\begin{array}{l}{\hat{\mu }}_{L,i}=\frac{{\sum }_{s}{\sum }_{k}{\hat{a}}_{p,{k}_{R},s,i}\left({r}_{g,k,s,i}-{\sum }_{{p\prime}}{l}_{g,{p\prime}}{\hat{a}}_{{p\prime},{k}_{R},s,i}\right){\sigma }_{L}^{2}+{\mu }_{L}{\sigma }_{{N}_{R}}^{2}}{{\sum }_{s}{\sum }_{{k}_{R,s}}{({\hat{a}}_{p,{k}_{R},s,i})}^{2}{\sigma }_{L}^{2}+{\sigma }_{{N}_{R}}^{2}}\,{\rm{and}}\\{{\hat{\sigma }}_{L}}^{2}=\frac{{\sigma }_{{N}_{R}}^{2}{\sigma }_{L}^{2}}{{\sum }_{s}{\sum }_{{k}_{R,s}}{({{\hat{a}}_{p,{k}_{R},s,i})}}^{2}{\sigma }_{L}^{2}+{\sigma }_{{N}_{R}}^{2}}.\end{array}$$

(17)

In n-th round of Gibbs estimation, after learning all distributions, we estimated the confidence of each linkage by linearly mapping the sampled values of ${\hat{b}}_{p,m,i}$ and ${\hat{l}}_{p,g,i}$ in the range of (–∞,∞) to probabilities in (0,1) as follows:

$$\begin{array}{l}P\left(\mathrm{state}\left({b}_{p,m,i}{\rm{|}}n\right)=1\right)\\=\frac{{{\exp }}\left\{\left({\hat{b}}_{p,m,i}-{\hat{\mu }}_{B,m,i}\right)/2{{\hat{\sigma }}_{B,m,i}}^{2}\right\}}{{{\exp }}\left\{\left({\hat{b}}_{p,m,i}-{\hat{\mu }}_{B,m,i}\right)/2{{\hat{\sigma }}_{B,m,i}}^{2}\right\}+{{\exp }}\left\{\left(0-{\hat{\mu }}_{B,m,i}\right)/2{{\hat{\sigma }}_{B,m,i}}^{2}\right\}}.\end{array}$$

(18)

$$\begin{array}{l}P\left(\mathrm{state}\left({l}_{p,g,i}{\rm{|}}n\right)=1\right)\\=\frac{{{\exp }}\left\{\left({\hat{l}}_{p,g,i}-{\hat{\mu }}_{L,i}\right)/2{{\hat{\sigma }}_{L,i}}^{2}\right\}}{{{\exp }}\left\{\left({\hat{l}}_{p,g,i}-{\hat{\mu }}_{L,i}\right)/2{{\hat{\sigma }}_{L,i}}^{2}\right\}+{{\exp }}\left\{\left(0-{\hat{\mu }}_{L,i}\right)/2{{\hat{\sigma }}_{L,i}}^{2}\right\}}.\end{array}$$

(19)

Binary state samples were then drawn based on the confidence of each linkage and were then used to initiate the next round of estimations. After running a long sampling process (in total, N rounds) and accumulating enough samples on the binary states of TF–peak bindings and peak–gene interactions, we calculated the sampling frequency of each linkage as a posterior probability.

$$\left\{\begin{array}{c}P\left(\mathrm{state}\left({b}_{p,m,i}\right)=1\right)=\frac{{\sum }_{n}\mathrm{state}\left({b}_{p,m,i}{\rm{|}}n\right)}{N}\\ P\left(\mathrm{state}\left({l}_{p,g,i}\right)=1\right)=\frac{{\sum }_{n}\mathrm{state}\left({l}_{p,g,i}{\rm{|}}n\right)}{N}\end{array}\right.$$

(20)

MAGICAL analysis of S. aureus single-cell multiomics data

For each cell type, given DAS and DEG of contrast conditions (MRSA versus control, MSSA versus control or MRSA versus MSSA), MAGICAL was first initialized by mapping prior TF motifs from the chromVARmotifs library to DAS using addMotifAnnotations (ArchR). Because there is no PBMC cell type Hi-C data publicly available, we are using TAD boundaries from a lymphoblastoid cell line, GM12878, which was originally generated by EBV transformation of PBMCs⁶⁰. The TAD boundary structure is closely conserved between the lymphoblastoid cell lines and primary PBMC⁶¹ and between cell types^62,63. We called TAD boundaries from a GM12878 cell line Hi-C profile⁶³ using TopDom⁶⁴. About 6,000 topological domains were identified. For each contrast, we built candidate circuits by pairing DAS with TF binding sites with DEG in the same domain. MAGICAL was run 10,000 times to ensure that the sampling process converged to stable states. This process was repeated for all cell types and the top 10% high confidence circuit predictions were selected from each cell type for validation analysis.

MAGICAL analysis of COVID-19 single-cell multiomics data

As a proof of concept for contrast condition, single-cell multiomics data analysis, MAGICAL was applied to a public PBMC COVID-19 single-cell multiomics dataset⁵ with samples collected from patients with different severity and heathy controls. For each of the three selected cell subtypes (CD8 TEM, CD14 Mono, and NK), from the original publication we downloaded DEG for two contrasts: mild versus control and severe versus control. For each of the selected cell types, DAS were called respectively for mild versus control and severe versus control using ArchR functions and thresholds as introduced in the paper. MAGICAL was initialized by mapping TF motifs from the chromVARmotifs library to DAS using addMotifAnnotations (ArchR). As explained above, we used TAD boundary information of ~6,000 domains identified in the GM12878 cell line⁶³ as prior to pair DAS with TF binding sites and DEG. Then, the initial candidate regulatory circuits were constructed. Respectively for mild and severe COVID-19, MAGICAL was run 10,000 times to ensure that the sampling process converged to stable states. This process was repeated for the three selected cell types. The chromatin sites and genes in the top 10% predicted high confidence circuits in each cell type were selected as disease associated.

COVID-19 PBMC samples of validation scATAC-seq data

To validate chromatin sites associated with mild COVID-19, PBMC samples were obtained from the COVID-19 Health Action Response for Marines (CHARM) cohort study, which has been previously described⁶⁵. The cohort is composed of Marine recruits who arrived at Marine Corps Recruit Depot—Parris Island for basic training between May and November 2020, after undergoing two quarantine periods (first a home quarantine, and next a supervised quarantine starting at enrollment in the CHARM study) to reduce the possibility of SARS-CoV-2 infection at arrival. Participants were regularly screened for SARS-CoV-2 infection during basic training by PCR, serum samples were obtained using serum separator tubes at all visits, and a follow-up symptom questionnaire was administered. At selected visits, blood was collected in BD Vacutainer CPT Tube with Sodium Heparin and PBMC were isolated following the manufacturer’s recommendations. We used PBMC samples from six participants (five males and one female) who had a positive COVID-19 PCR test and had mild symptoms (sampled 3–11 days after the first PCR positive test), and from three control participants (three males) who had a PCR negative test at the time of sample collection and were seronegative for SARS-CoV-2 immunoglobulin G. New scATAC-seq data were generated following the same protocol as described in "S. aureus scATAC-seq data generation" (Supplementary Table 2).

COVID-19 PBMC scATACseq data analysis

Reads of scATAC-seq experiments were aligned to human reference genome (hg38) using 10x Genomics Cell Ranger software (version 1.2). The resulting fragment files were processed using ArchR²⁵. Quality cells were selected as those with TSS enrichment >12, a number of fragments >3,000 and <30,000, and a nucleosome ratio <2. The likelihood of doublet cells was computationally assessed using the addDoubletScores function and cells were filtered using the filterDoublets function with default settings. A total of 15,836 high-quality cells in the infection group and 9,125 cells in the control group were selected after QC analysis (Supplementary Fig. 3; Supplementary Table 3). These cells were combined into a linear dimensionality reduction using the addIterativeLSI function with the input of the tile matrix (read counts in binned 500 bp across the whole genome) with iterations = 2 and varFeatures = 20,000. The cells were then clustered using the addClusters function. We annotated scATAC-seq cells using the addGeneIntegrationMatrix function, referring to a labeled multimodal PBMC single cell dataset. Doublet clusters containing a mixture of many cell types were manually identified and removed.

Peaks were called for each cell type using the addReproduciblePeakSet function with peak caller MACS2²⁶ (Supplementary Fig. 3). In total, 284,525 peaks were identified (Supplementary Table 4). For each of the three selected cell types (CD8 TEM, CD14 Mono, and NK), chromatin sites with single cell differential statistics FDR < 0.05 and |log₂(FC)| >0.1 between COVID-19 and control conditions and actively accessible in at least 10% of cells (pct > 0.1) from either condition were selected. Refined peaks passing pseudobulk differential statistics P < 0.05 and |log2(FC)| >0.3 between the contrast conditions were finally selected as the validation peak set (Supplementary Table 5).

COVID-19 circuit peaks and genes accuracy evaluation

The number of infection-associated peaks/genes reported by each COVID-19 study would be different owing to the difference in the number of recruited patients and collected cells. To overcome the issue caused by the imbalanced number between discovery and validation datasets or between differential peaks/genes and circuit sites/genes, in each comparison, the larger peak/gene set was randomly downsampled to match the smaller number of peaks/genes in the other set. The precision (site reproduction rate) is calculated to assess the accuracy of each peak/gene set.

MAGICAL analysis of 10x PBMC single-cell true multiome data

For benchmarking, MAGICAL was applied to a 10x PBMC single-cell multiome dataset including 108,377 ATAC peaks, 36,601 genes, and 11,909 cells from 14 cell types. MAGICAL used the same candidate peaks and genes as selected by TRIPOD¹¹ for fair performance comparison. Two different priors were used to pair candidate peaks and genes: (1) the peaks and genes were within the same TAD from the GM12878 cell line; (2) the centers of peaks and the TSS of genes were within 500 kbp. MAGICAL inferred regulatory circuits with each prior and used the top 10% of predictions for accuracy assessment. High-confidence peak–gene interactions predicted by TRIPOD on the same data were directly downloaded from the supplementary tables of their publication¹¹. Two baseline approaches of peak–gene pairing were included: pairing all peaks with a gene if they are in the same TAD or pairing only the nearest peak to a gene based on their genomic distance. To fairly assess the accuracy of MAGICAL weighted peak–gene interactions and the results (paired or non-paired) from TRIPOD or baseline approaches, we selected the top 10% of predictions by MAGICAL as the final peak–gene pairing. We overlapped these pairs with the curated 3D genome interactions in blood context from the 4DGenome database¹⁹ and calculated the precision for each approach.

MAGICAL analysis of GM12878 cell line SHARE-seq data

For benchmarking, MAGICAL was also applied to a GM12878 cell line SHARE-seq dataset¹⁰. For fair comparison, MAGICAL used the same candidate peaks and genes as selected by FigR¹⁸. MAGICAL was initialized with two different priors to pair candidate peaks and genes: (1) the peaks and genes were within the same prior TAD from the GM12878 cell line; (2) the centers of peaks and the TSS of genes were within 500k bps. MAGICAL inferred regulatory circuits under each setting and used the top 10% predictions for accuracy assessment. High-confidence peak–gene interactions predicted by FigR were directly downloaded from the supplementary tables of the original publication¹⁰. Similarly, the top 10% predictions by MAGICAL and interactions paired by the two baseline approaches mentioned above were selected. We overlapped peak–gene interactions predicted by each approach with GM12878 H3K27ac HiChIP chromatin interactions²⁰ for precision evaluation.

Validating predicted peak–gene interactions

To assess the precision of the predicted circuit peak–gene interactions, we assumed a correctly inferred peak–gene pair should be also connected by a chromatin interaction reported by Hi-C or similar experiments. To check this, each peak was extended to 2 kb long and then checked for overlap with one end of a physical chromatin interaction. For genes, we checked whether the gene promoter (–2 kb to 500 b of TSS) overlapped the other end of the interaction. Precision was calculated as the proportion of overlapped chromatin interactions among the predicted peak–gene pairs. The significance of enrichment of overlapped chromatin interactions was assessed using hypergeometric P value, with all candidate peak–gene pairs as background.

GWAS enrichment analysis

To assess the enrichment of GWAS loci of inflammatory diseases in circuit chromatin sites in each cell type, significant GWAS loci were downloaded from GWAS catalog⁴⁶ for inflammatory diseases and control diseases. GREGOR⁶⁶ was used to assess the enrichment of GWAS loci at which either the index single nucleotide polymorphism (SNP) or at least one of its LD proxies overlaps with a circuit chromatin site, using pre-calculated LD data from 1,000 G EUR samples. The enrichment P value of each disease GWAS was converted to a z-score. With each cell type, enrichment scores for traits with fewer than five overlapped GWAS SNPs with circuit sites were hold out. Also, as all reference data used by GREGOR is hg19-based, genome coordinates of testing regions were mapped from hg38 to hg19.

Predicting S. aureus infection state

To refine circuit genes for predicting infection diagnosis in microarray gene expression data, the capability of each circuit gene on distinguishing infection and control samples, or MRSA and MSSA samples, was assessed using sample level pseudobulk gene expression data, aggregated from the discovery scRNA-seq datasets. The total number of reads of each sample was normalized to 1 × 10⁷. The normalized RNA read counts across all samples were then log and z-score transformed. For each circuit gene, a discovery AUROC was calculated by comparing the scRNA-seq gene-expression-based sample ranking against the contrasted sample groups. Circuit genes were prioritized based on AUROCs. An SVM model was trained using the top-ranked circuit genes as features and their normalized pseudobulk expression data as input. The model was then tested on independent microarray datasets. The microarray gene expression data was also log and z-score transformed to ensure a similar distribution to the training data. For comparison, top DEG prioritized by discovery AUROC or by other approaches like the minimum redundancy maximum relevance algorithm or LASSO regression were also tested on the same microarray datasets.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The 10x PBMC single cell multiome dataset can be downloaded from https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets/1.0.0/pbmc_granulocyte_sorted_10k. Users will need to provide their contact information to access the download webpage where the filtered feature barcode matrix (HDF5 format) can be downloaded. The reference multimodal PBMC single cell dataset (H5 Seurat data file) can be downloaded from https://atlas.fredhutch.org/nygc/multimodal-pbmc/. The GWAS catalog database can be accessed at https://www.ebi.ac.uk/gwas/docs/file-downloads. SNPs associated with each disease used in this paper can be extracted from the downloadable file “All associations v1.0”. Home sapiens chromatin interactions data can be downloaded from https://4dgenome.research.chop.edu/Download.html. Home sapiens TF ChIP-seq profiles can be downloaded at http://cistrome.org/db/. Users can also provide their customized peaks in BED format to the server http://dbtoolkit.cistrome.org/ and identify TFs that have a significant binding overlap. Home sapiens candidate enhancers annotated by ENCODE can be downloaded at https://screen.encodeproject.org/. The chromVARmotifs library is available at https://github.com/GreenleafLab/chromVARmotifs. The source single cell data collected in this study is publicly accessible at the GEO repository (https://www.ncbi.nlm.nih.gov/geo/, accession no. GSE220190) and the Zenodo repository⁶⁷. Source data for Figs. 2–4 is available with this manuscript.

Code availability

The source code of MAGICAL is available on GitHub at https://github.com/xichensf/magical and the Zenodo repository⁶⁸.

Change history

31 August 2023
A Correction to this paper has been published: https://doi.org/10.1038/s43588-023-00523-1

References

Schoenfelder, S. & Fraser, P. Long-range enhancer-promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).
Article Google Scholar
Kim, H. D., Shay, T., O’Shea, E. K. & Regev, A. Transcriptional regulatory circuits: predicting numbers from alphabets. Science 325, 429–432 (2009).
Article Google Scholar
modENCODE Consortium et al.Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797 (2010).
Article Google Scholar
Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods 13, 366–370 (2016).
Article Google Scholar
Wilk, A. J. et al. Multi-omic profiling reveals widespread dysregulation of innate immunity and hematopoiesis in COVID-19. J. Exp. Med. 218, e20210582 (2021).
Article Google Scholar
Krijger, P. H. & de Laat, W. Regulation of disease-associated gene expression in the 3D genome. Nat. Rev. Mol. Cell Biol. 17, 771–782 (2016).
Article Google Scholar
Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).
Article Google Scholar
Kreitmaier, P., Katsoula, G. & Zeggini, E. Insights from multi-omics integration in complex disease primary tissues. Trends Genet. 39, 46–58 (2022).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Article Google Scholar
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116 (2020).
Article Google Scholar
Jiang, Y. et al. Nonparametric single-cell multiomic characterization of trio relationships between transcription factors, target genes, and cis-regulatory regions. Cell Syst. 13, 737–751 (2022).
Article Google Scholar
Cao, Z. J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).
Article Google Scholar
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Article Google Scholar
Arnold, S. R. et al. Changing patterns of acute hematogenous osteomyelitis and septic arthritis: emergence of community-associated methicillin-resistant Staphylococcus aureus. J. Pediatr. Orthop. 26, 703–708 (2006).
Article Google Scholar
Saavedra-Lozano, J. et al. Changing trends in acute osteomyelitis in children: impact of methicillin-resistant Staphylococcus aureus infections. J. Pediatr. Orthop. 28, 569–575 (2008).
Article Google Scholar
Liao, J. C. et al. Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA 100, 15522–15527 (2003).
Article Google Scholar
Tran, L. M., Brynildsen, M. P., Kao, K. C., Suen, J. K. & Liao, J. C. gNCA: a framework for determining transcription factor activity based on transcriptome: identifiability and numerical implementation. Metab. Eng. 7, 128–141 (2005).
Article Google Scholar
Kartha, V. K. et al. Functional inference of gene regulation using single-cell multi-omics. Cell Genom. 2, 100166 (2022).
Article Google Scholar
Teng, L., He, B., Wang, J. & Tan, K. 4DGenome: a comprehensive database of chromatin interactions. Bioinformatics 31, 2560–2564 (2015).
Article Google Scholar
Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).
Article Google Scholar
Arunachalam, P. S. et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science 369, 1210–1220 (2020).
Article Google Scholar
Lucas, C. et al. Longitudinal analyses reveal immunological misfiring in severe COVID-19. Nature 584, 463–469 (2020).
Article Google Scholar
Mathew, D. et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science 369, eabc8511 (2020).
Article Google Scholar
Schulte-Schrepping, J. et al. Severe COVID-19 is marked by a dysregulated myeloid cell compartment. Cell 182, 1419–1440 (2020).
Article Google Scholar
Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403–411 (2021).
Article Google Scholar
Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nat. Protoc. 7, 1728–1740 (2012).
Article Google Scholar
Li, S. et al. Epigenetic landscapes of single-cell chromatin accessibility and transcriptomic immune profiles of T cells in COVID-19 patients. Front Immunol. 12, 625881 (2021).
Article Google Scholar
Jung, I. et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 51, 1442–1449 (2019).
Article Google Scholar
Chen, X. et al. Tissue-specific enhancer functional networks for associating distal regulatory regions to disease. Cell Syst. 12, 353–362 (2021).
Article Google Scholar
Yao, C. et al. Cell-type-specific immune dysregulation in severely ill COVID-19 patients. Cell Rep. 34, 108590 (2021).
Article Google Scholar
Unterman, A. et al. Single-cell multi-omics reveals dyssynchrony of the innate and adaptive immune system in progressive COVID-19. Nat. Commun. 13, 440 (2022).
Article Google Scholar
Magill, S. S. et al. Changes in prevalence of health care-associated infections in U.S. hospitals. N. Engl. J. Med. 379, 1732–1744 (2018).
Article Google Scholar
Tong, S. Y., Davis, J. S., Eichenberger, E., Holland, T. L. & Fowler, V. G. Jr. Staphylococcus aureus infections: epidemiology, pathophysiology, clinical manifestations, and management. Clin. Microbiol Rev. 28, 603–661 (2015).
Article Google Scholar
Marquez-Ortiz, R. A. et al. USA300-related methicillin-resistant Staphylococcus aureus clone is the predominant cause of community and hospital MRSA infections in Colombian children. Int J. Infect. Dis. 25, 88–93 (2014).
Article Google Scholar
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Article Google Scholar
Skjeflo, E. W., Christiansen, D., Espevik, T., Nielsen, E. W. & Mollnes, T. E. Combined inhibition of complement and CD14 efficiently attenuated the inflammatory response induced by Staphylococcus aureus in a human whole blood model. J. Immunol. 192, 2857–2864 (2014).
Article Google Scholar
Kusunoki, T., Hailman, E., Juan, T. S., Lichenstein, H. S. & Wright, S. D. Molecules from Staphylococcus aureus that bind CD14 and stimulate innate immune responses. J. Exp. Med. 182, 1673–1682 (1995).
Article Google Scholar
Ludwig, S. et al. Influenza virus-induced AP-1-dependent gene expression requires activation of the JNK signaling pathway. J. Biol. Chem. 276, 10990–10998 (2001).
Article Google Scholar
Gjertsson, I., Hultgren, O. H., Collins, L. V., Pettersson, S. & Tarkowski, A. Impact of transcription factors AP-1 and NF-κB on the outcome of experimental Staphylococcus aureus arthritis and sepsis. Microbes Infect. 3, 527–534 (2001).
Article Google Scholar
Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).
Article Google Scholar
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
Article Google Scholar
Kyriakis, J. M. Activation of the AP-1 transcription factor by inflammatory cytokines of the TNF family. Gene Expr. 7, 217–231 (1999).
Google Scholar
Hannemann, N. et al. The AP-1 transcription factor c-Jun promotes arthritis by regulating cyclooxygenase-2 and arginase-1 expression in macrophages. J. Immunol. 198, 3605–3614 (2017).
Article Google Scholar
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).
Article Google Scholar
Consortium, E. P. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Article Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article Google Scholar
DeLorenze, G. N. et al. Polymorphisms in HLA class II genes are associated with susceptibility to Staphylococcus aureus infection in a white population. J. Infect. Dis. 213, 816–823 (2016).
Article Google Scholar
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 (2016).
Article Google Scholar
Ahn, S. H. et al. Gene expression-based classifiers identify Staphylococcus aureus infection in mice and humans. PLoS One 8, e48979 (2013).
Article Google Scholar
Ramilo, O. et al. Gene expression patterns in blood leukocytes discriminate patients with acute infections. Blood 109, 2066–2077 (2007).
Article Google Scholar
Ardura, M. I. et al. Enhanced monocyte response and decreased central memory T cells in children with invasive Staphylococcus aureus infections. PLoS One 4, e5446 (2009).
Article Google Scholar
Cho, J. S. et al. IL-17 is essential for host defense against cutaneous Staphylococcus aureus infection in mice. J. Clin. Invest. 120, 1762–1773 (2010).
Article Google Scholar
Xiao, Y. et al. A novel significance score for gene selection and ranking. Bioinformatics 30, 801–807 (2014).
Article Google Scholar
Chaussabel, D. et al. A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity 29, 150–164 (2008).
Article Google Scholar
Wenric, S. & Shemirani, R. Using supervised learning methods for gene selection in RNA-Seq case-control studies. Front. Genet. 9, 297 (2018).
Article Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Article Google Scholar
Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).
Article Google Scholar
Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).
Article Google Scholar
Anderson, M. A. & Gusella, J. F. Use of cyclosporin A in establishing Epstein-Barr virus-transformed human lymphoblastoid cell lines. Vitro 20, 856–858 (1984).
Article Google Scholar
Tan, L., Xing, D., Chang, C. H., Li, H. & Xie, X. S. Three-dimensional genome structures of single diploid human cells. Science 361, 924–928 (2018).
Article Google Scholar
McArthur, E. & Capra, J. A. Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability. Am. J. Hum. Genet 108, 269–283 (2021).
Article Google Scholar
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article Google Scholar
Shin, H. et al. TopDom: an efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Res. 44, e70 (2016).
Article Google Scholar
Letizia, A. G. et al. SARS-CoV-2 seropositivity and subsequent infection risk in healthy young adults: a prospective cohort study. Lancet Respir. Med. 9, 712–720 (2021).
Article Google Scholar
Schmidt, E. M. et al. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics 31, 2601–2606 (2015).
Article Google Scholar
Chen, X. Source data for paper “Mapping disease regulatory circuits at cell-type resolution from single-cell multiomics data”. Zenodo https://doi.org/10.5281/zenodo.7992711 (2023).
Chen, X. MAGICAL (v1.1). Zenodo https://doi.org/10.5281/zenodo.7951577 (2023).

Download references

Acknowledgements

We thank the Single-Cell and Spatial Technologies team at the Center for Advanced Genomics Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai for providing the experimental, computational, data resources, and staff expertise. S.C.S. was supported by the Defense Advanced Research Projects Agency under contract N6600119C4022. A.G.L. is supported by Defense Health Agency grant 9700130 through the Naval Medical Research Center. O.G.T. is supported by the National Institutes of Health under grant R01GM071966 and Simons Foundation under grant 395506. L.C.N. is supported by the National Institute of Mental Health, the National Institute of Neurological Disorders and Stroke, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Heart, Lung, and Blood Institute, and the National Institute of Allergy and Infectious Diseases under grant UM1AI164559 and by the National Institute on Drug Abuse under grants U01 DA53625 and U01DA058527.

Author information

These authors jointly supervised this work: Elena Zaslavsky, Olga G. Troyanskaya, Stuart C. Sealfon.

Authors and Affiliations

Center for Computational Biology, Flatiron Institute, New York, NY, USA
Xi Chen & Olga G. Troyanskaya
Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, NJ, USA
Xi Chen, Alicja Tadych, Chandra L. Theesfeld & Olga G. Troyanskaya
Department of Computer Science, Princeton University, Princeton, NJ, USA
Yuan Wang & Olga G. Troyanskaya
Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Antonio Cappuccio, Wan-Sze Cheng, Frederique Ruf Zamojski, Venugopalan D. Nair, Clare M. Miller, Aliza B. Rubenstein, German Nudelman, Alexandria Vornholt, Mary-Catherine George, Alessandra Soares-Schanoski, Irene Ramos, Elena Zaslavsky & Stuart C. Sealfon
Division of Infectious Diseases, Department of Medicine, Duke University School of Medicine, Durham, NC, USA
Felicia Ruffin, Michael Dagher, Vance G. Fowler Jr & Christopher W. Woods
Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
Daniel G. Chawla & Steven H. Kleinstein
Battelle Memorial Institute, Columbus, OH, USA
Rachel R. Spurbeck
Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
Lishomwa C. Ndhlovu
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Robert Sebra
Department of Pathology and Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
Steven H. Kleinstein
Naval Medical Research Center, Silver Spring, MD, USA
Andrew G. Letizia
Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Irene Ramos

Authors

Xi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Cappuccio
View author publications
You can also search for this author in PubMed Google Scholar
Wan-Sze Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Frederique Ruf Zamojski
View author publications
You can also search for this author in PubMed Google Scholar
Venugopalan D. Nair
View author publications
You can also search for this author in PubMed Google Scholar
Clare M. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Aliza B. Rubenstein
View author publications
You can also search for this author in PubMed Google Scholar
German Nudelman
View author publications
You can also search for this author in PubMed Google Scholar
Alicja Tadych
View author publications
You can also search for this author in PubMed Google Scholar
Chandra L. Theesfeld
View author publications
You can also search for this author in PubMed Google Scholar
Alexandria Vornholt
View author publications
You can also search for this author in PubMed Google Scholar
Mary-Catherine George
View author publications
You can also search for this author in PubMed Google Scholar
Felicia Ruffin
View author publications
You can also search for this author in PubMed Google Scholar
Michael Dagher
View author publications
You can also search for this author in PubMed Google Scholar
Daniel G. Chawla
View author publications
You can also search for this author in PubMed Google Scholar
Alessandra Soares-Schanoski
View author publications
You can also search for this author in PubMed Google Scholar
Rachel R. Spurbeck
View author publications
You can also search for this author in PubMed Google Scholar
Lishomwa C. Ndhlovu
View author publications
You can also search for this author in PubMed Google Scholar
Robert Sebra
View author publications
You can also search for this author in PubMed Google Scholar
Steven H. Kleinstein
View author publications
You can also search for this author in PubMed Google Scholar
Andrew G. Letizia
View author publications
You can also search for this author in PubMed Google Scholar
Irene Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Vance G. Fowler Jr
View author publications
You can also search for this author in PubMed Google Scholar
Christopher W. Woods
View author publications
You can also search for this author in PubMed Google Scholar
Elena Zaslavsky
View author publications
You can also search for this author in PubMed Google Scholar
Olga G. Troyanskaya
View author publications
You can also search for this author in PubMed Google Scholar
Stuart C. Sealfon
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.C.S., O.G.T, and E.Z conceived the study and supervised the research. X.C. designed and implemented the computational framework, conducted benchmarks and case studies with Y.W., wrote the code, and set up the web access with the help of A.T.; A.C. was involved in the S. aureus study. W.-S.C. managed and processed single-cell sequencing data with help from A.B.R., G.N., and A.V.; S.H.K. and D.G.C. conducted the public microarray data search. F.R.Z., V.D.N., M.-C.G., and R.S. generated the PBMC single-cell multiomics data for the S. aureus infected and control subjects. The S. aureus patient blood samples were provided by C.W.W., V.G.F., F.R., and M.D. The control samples were provided by R.R.S. and L.C.N.; I.R. and C.M.M. provided immunological interpretations of the results. I.R., A.G.L., and A.S.-S. provided the validation PBMC scATAC-seq data of patients with COVID-19 and uninfected controls. S.C.S., O.G.T., X.C., E.Z., A.C., and C.L.T. wrote the first draft of the manuscript. All authors proofread the submitted version.

Corresponding authors

Correspondence to Elena Zaslavsky, Olga G. Troyanskaya or Stuart C. Sealfon.

Ethics declarations

Competing interests

A.G.L. is a military service member. This work was prepared as part of his official duties. Title 17, US Code §105 provides that copyright protection under this title is not available for any work of the US Government. Title 17, US code §101 defines a US Government work as a work prepared by a military service member or employee of the US Government as part of that person’s official duties. The views expressed in the article are those of the authors and do not necessarily express the official policy and position of the US Navy, the Department of Defense, the US Government, or the institutions affiliated with the authors. V.G.F. reports personal fees from Novartis, Debiopharm, Genentech, Achaogen, Affinium, Medicines Co., MedImmune, Bayer, Basilea, Affinergy, Janssen, Contrafect, Regeneron, Destiny, Amphliphi Biosciences, Integrated Biotherapeutics; C3J, Armata, Valanbio; Akagera, Aridis, Roche, grants from NIH, MedImmune, Allergan, Pfizer, Advanced Liquid Logics, Theravance, Novartis, Merck; Medical Biosurfaces; Locus; Affinergy; Contrafect; Karius; Genentech, Regeneron, Deep Blue, Basilea, Janssen; Royalties from UpToDate, stock options from Valanbio and ArcBio, Honoraria from Infectious Diseases of America for his service as Associate Editor of Clinical Infectious Diseases, and a patent sepsis diagnostics pending. L.C.N. has received consulting fees from work as a scientific advisor for AbbVie, ViiV Healthcare, and Cytodyn and also serves on the Board of Directors of CytoDyn and has financial interests in Ledidi AS, all for work outside of the submitted work. S.C.S. is a founder of GNOMX Corp and serves as chief scientific officer. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–10.

Reporting Summary

Supplementary Table 1

COVID-19 host regulatory circuits identified by MAGICAL. COVID-19-associated circuit genes, chromatin sites and regulatory TFs in each cell type.

Supplementary Table 2

PBMC samples from COVID-19 patients and controls. Subject sex, age and disease condition.

Supplementary Table 3

COVID-19 PBMC scATAC-seq quality cell QC information. QC thresholds and the number of quality cells in each scATAC-seq profile.

Supplementary Table 4

COVID-19 PBMC scATAC-seq peaks. ATAC peaks genomic coordinates and annotation.

Supplementary Table 5

COVID-19 CD8 T, Mono, and NK DAS. DAS genomic coordinates and the differential statistics in each cell type.

Supplementary Table 6

S. aureus-infected and control PBMC samples. Subject sex, age and disease condition

Supplementary Table 7

S. aureus PBMC scRNA-seq quality cell QC information. QC thresholds and the number of quality cells in each scRNA-seq profile.

Supplementary Table 8

S. aureus PBMC scATAC-seq quality cell QC information. QC thresholds and the number of quality cells in each scATAC-seq profile.

Supplementary Table 9

S. aureus PBMC scATAC-seq peaks. ATAC peaks genomic coordinates and annotation.

Supplementary Table 10

S. aureus DEG per cell type for three contrasts. DEG symbols and the differential statistics in each cell type.

Supplementary Table 11

S. aureus DAS per cell type for three contrasts DAS genomic coordinates and the differential statistics in each cell type.

Supplementary Table 12

S. aureus host regulatory circuits identified by MAGICAL. S. aureus-associated circuit genes, chromatin sites and regulatory TFs in each cell type.

Supplementary Table 13

Circuit genes for S. aureus infection prediction. Cell types, gene symbols, and the discovery AUC for the selected feature genes.

Supplementary Table 14

Circuit genes for S. aureus antibiotic sensitivity prediction. Cell types, gene symbols, and the discovery AUC for the selected feature genes.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, X., Wang, Y., Cappuccio, A. et al. Mapping disease regulatory circuits at cell-type resolution from single-cell multiomics data. Nat Comput Sci 3, 644–657 (2023). https://doi.org/10.1038/s43588-023-00476-5

Download citation

Received: 16 November 2022
Accepted: 06 June 2023
Published: 25 July 2023
Issue Date: July 2023
DOI: https://doi.org/10.1038/s43588-023-00476-5

This article is cited by

Uncovering the genetic circuits that drive diseases
- Pawel F. Przytycki
Nature Computational Science (2023)

Subjects

Abstract

Similar content being viewed by others

Main

Results

MAGICAL framework

Comparative analysis of performance

MAGICAL analysis of COVID-19 single-cell multiomics data

MAGICAL analysis of S. aureus single-cell multiomics data

S. aureus infection prediction

S. aureus antibiotic sensitivity prediction

Discussion

Methods

Human participants

Statistics and reproducibility

S. aureus patient and control samples selection

PBMC thawing

S. aureus scRNA-seq data generation

S. aureus scRNA-seq data analysis

Nuclei isolation for scATACseq

S. aureus scATAC-seq data generation

S. aureus scATAC-seq data analysis

MAGICAL

MAGICAL analysis of S. aureus single-cell multiomics data

MAGICAL analysis of COVID-19 single-cell multiomics data

COVID-19 PBMC samples of validation scATAC-seq data

COVID-19 PBMC scATACseq data analysis

COVID-19 circuit peaks and genes accuracy evaluation

MAGICAL analysis of 10x PBMC single-cell true multiome data

MAGICAL analysis of GM12878 cell line SHARE-seq data

Validating predicted peak–gene interactions

GWAS enrichment analysis

Predicting S. aureus infection state

Reporting summary

Data availability

Code availability

Change history

31 August 2023

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links