Abstract
Singlecell multiomics (scMultiomics) allows the quantification of multiple modalities simultaneously to capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Existing tools cannot effectively infer the active biological networks in diverse cell types and the response of these networks to external stimuli. Here we present DeepMAPS for biological network inference from scMultiomics. It models scMultiomics in a heterogeneous graph and learns relations among cells and genes within both local and global contexts in a robust manner using a multihead graph transformer. Benchmarking results indicate DeepMAPS performs better than existing tools in cell clustering and biological network construction. It also showcases competitive capability in deriving celltypespecific biological networks in lung tumor leukocyte CITEseq data and matched diffuse small lymphocytic lymphoma scRNAseq and scATACseq data. In addition, we deploy a DeepMAPS webserver equipped with multiple functionalities and visualizations to improve the usability and reproducibility of scMultiomics data analysis.
Similar content being viewed by others
Introduction
Singlecell sequencing, such as singlecell RNA sequencing (scRNAseq) and singlecell ATAC sequencing (scATACseq), reshapes the investigation of cellular heterogeneity and yields insights in neuroscience, cancer biology, immunooncology, and therapeutic responsiveness^{1,2}. However, an individual singlecell modality only reflects a snapshot of genetic features and partially depicts the peculiarity of cells, leading to characterization biases in complex biological systems^{2,3}. Singlecell multiomics (scMultiomics) allows the quantification of multiple modalities simultaneously to fully capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Such analyses advance various biological studies when paired with robust computational analysis methods^{4}.
The existing tools for integrative analyses of scMultiomics data, e.g., Seurat^{5}, MOFA+^{6}, Harmony^{7}, and totalVI^{8}, reliably predict cell types and states, remove batch effects, and reveal relationships or alignments among multiple modalities. However, most existing methods do not explicitly consider the topological information sharing among cells and modalities. Hence, they cannot effectively infer the active biological networks of diverse cell types simultaneously with cell clustering and have limited power to elucidate the response of these complex networks to external stimuli in specific cell types.
Recently, graph neural networks (GNN) have shown strength in learning lowdimensional representations of individual cells by propagating neighbor cell features and constructing cellcell relations in a global cell graph^{9,10}. For example, our inhouse tool scGNN, a GNN model, has demonstrated superior cell clustering and gene imputation performance based on largescale scRNAseq data^{11}. Furthermore, a heterogeneous graph with different types of nodes and edges has been widely used to model a multirelational knowledge graph^{12}. It provides a natural representation framework for integrating scMultiomics data and learning the underlying celltypespecific biological networks. Moreover, a recent development in the attention mechanism for modeling and integrating heterogeneous relationships has made deep learning models explainable and enabled the inference of celltypespecific biological networks^{12,13}.
In this work, we developed DeepMAPS (Deep learningbased Multiomics Analysis Platform for Singlecell data), a heterogeneous graph transformer framework for celltypespecific biological network inference from scMultiomics data. This framework uses an advanced GNN model, i.e., heterogeneous graph transformer (HGT), which has the following advantages: (i) It formulates an allinone heterogeneous graph that includes cells and genes as nodes, and the relations among them as edges. (ii) The model captures both neighbor and global topological features among cells and genes to construct cellcell relations and genegene relations simultaneously^{9,14,15,16}. (iii) The attention mechanism in this HGT model enables the estimation of the importance of genes to specific cells, which can be used to discriminate gene contributions and enhances biological interpretability. (iv) This model is hypothesisfree and does not rely on the constraints of gene coexpressions, thus potentially inferring gene regulatory relations that other tools usually cannot find. It is noteworthy that DeepMAPS is implemented into a codefree, interactive, and nonprogrammatic interface, along with a Docker, to lessen the programming burden for scMultiomics data.
Results
Overview of DeepMAPS
Overall, DeepMAPS is an endtoend and hypothesesfree framework to infer celltypespecific biological networks from scMultiomics data. There are five major steps in the DeepMAPS framework (Fig. 1 and Methods). (i) Data are preprocessed by removing lowquality cells and lowlyexpressed genes, and then different normalization methods are applied according to the specific data types. An integrated cellgene matrix is generated to represent the combined activity of each gene in each cell. Different data integration methods are applied for different scMultiomics data types^{5,6,7,8}. (ii) A heterogeneous graph is built from the integrated matrix, including cells and genes as nodes and the existence of genes in cells as edges. (iii) An HGT model is built to jointly learn the lowdimensional embedding for cells and genes and generate an attention score to indicate the importance of a gene to a cell. (iv) Cell clustering and functional gene modules are predicted based on HGTlearned embeddings and attention scores. (v) Diverse biological networks, e.g., gene regulatory networks (GRN) and gene association networks, are inferred in each cell type.
To learn joint representations of cells and genes, we first generate a cellgene matrix integrating the information of the input scMultiomics data. A heterogeneous graph with cell nodes and gene nodes is then constructed, wherein an unweighted cellgene edge represents the existence of gene activity of a gene in a cell, and the initial embedding of each node is learned from the genecell integrated matrix via twolayer GNN graph autoencoders (Methods). Such a heterogeneous graph offers an opportunity to clearly represent and organically integrate scMultiomics data so that biologically meaningful features can be learned synergistically. The entire heterogeneous graph is then sent to a graph autoencoder to learn the relations between the cells and genes and update the embedding of each node. Here, DeepMAPS adopts a heterogeneous multihead attention mechanism to model the overall topological information (global relationships) and neighbor message passing (local relationships) on the heterogeneous graph. The heterogeneous graph representation learning provides a way to enable the embedding of cells and genes simultaneously using the transformer in DeepMAPS. The initial graph determines the path of message passing and how the attention scores can be calculated in DeepMAPS.
In each HGT layer, each node (either a cell or a gene) is considered a target, and its 1hop neighbors as sources. DeepMAPS evaluates the importance of its neighbor nodes and the amount of information that can be passed to the target based on the synergy of node embedding (i.e., attention scores). As a result, cells and genes with highly positive correlated embeddings are more likely to pass messages within each other, thus maximizing the similarity and disagreement of the embeddings. To make the unsupervised training process feasible on a large heterogeneous graph, DeepMAPS is performed on 50 subgraphs sampled from the heterogeneous graph, covering a minimum of 30% of cells and genes to train for the shared parameters between different nodes, information which is later used for testing of the whole graph. As an important training outcome, an attention score is given to represent the importance of a gene to a cell. A high attention score for a gene to a cell implies that the gene is relatively important for defining cell identity and characterizing cell heterogeneity. This discrimination allows for the construction of reliable gene association networks in each cell cluster as the final output of DeepMAPS. We then build a Steiner Forest Problem (SFP) model^{17} to identify genes with higher attention scores and similar embedding features to a cell cluster. The genegene and genecell relations in the optimized solution of the SFP model mirror the embedding similarity of genes and the attention importance of genes to each cell cluster. A gene association network can be established by genes with the highest importance in characterizing the identity of that cell cluster based on their attention scores and embedding similarities, and these genes are considered to be celltypeactive.
DeepMAPS achieves superior performances in cell clustering and biological network inference from scMultiomics data
We benchmarked the cell clustering performance of DeepMAPS on ten scMultiomics datasets, including three multiple scRNAseq datasets (Rbench1, 2, and 3), three CITEseq datasets (Cbench1, 2, and 3), and four matched scRNAseq and scATACseq (scRNAATACseq) datasets measured from the same cell (Abench1, 2, 3, and 4) (Supplementary Data 1). Specifically, the six Rbench and Cbench datasets have benchmark annotations provided in their original manuscripts, while the four Abench datasets do not. These datasets cover the number of cells ranging from 3,009 to 32,029; an average read depth (considering scRNAseq data only) ranging from 2,885 to 11,127; and a zeroexpression rate (considering scRNAseq data only) from 82 to 96% (Supplementary Data 1).
We compared DeepMAPS with four benchmarking tools (Seurat v3 and v4^{5,18}, MOFA + ^{6}, TotalVI^{8}, Harmony^{7}, and GLUE^{19} (Methods)) in terms of the Average Silhouette Width (ASW), CalinskiHarabasz (CH), DaviesBouldin Index (DBI), and Adjusted Rand Index (ARI) to evaluate cell clustering performance. For each dataset, we trained DeepMAPS on 36 parameter combinations, including the number of heads, learning rate, and the number of training epochs. To ensure fairness, each benchmarking tool was also tuned with different parameter combinations (Methods). DeepMAPS achieves the best performance comparing all benchmark tools in all test datasets in terms of ARI (for Rbenches and Cbenches) and ASW (for Abenches) (Fig. 2a, Supplementary Figs. 1–3, and Source Data 13). We also noticed that Seurat was the secondbest performed tool, with small variances for different parameter selections in all benchmark datasets. We selected the default parameter per data type based on the performance of parameter combinations on the gridsearch benchmarking. The parameter combination with the highest median ARI/ASW scores averaged in all benchmark datasets was considered as the default parameters for the corresponding data type.
Additional benchmarking experiments were carried out to justify the selection of different integration methods in DeepMAPS. Specifically, for the analysis of scRNAATACseq data, we designed an integration method using gene velocity to balance the weight between gene expressions and chromatin accessibilities in characterizing cell activities and states (Methods). This integration process can ensure harmonizing datasets (especially for multiple scRNAseq data) and generate an integrated matrix (with genes as rows and cells as columns) as the input for HGT. Our results showed that, for benchmark data 1 and 2 (Abench1 and −2), the velocitybased approach showed significantly (pvalue <0.05) higher ASW scores than the weighted nearest neighbor (WNN) approach in Seurat v 4.0 on all gridsearch parameter combinations (Supplementary Fig. 4 and Source Data 4). We reason that with the inclusion of velocity information, the modality weight between gene expression and chromatin accessibility that contribute to recognize cell types are better balanced (Supplementary Fig. 5). The comparison of modality weight of scATACseq in different cell clusters by using or without using the velocityweighted balance method. In addition, we compared different clustering methods (i.e., Leiden, Louvain, and SLM) in DeepMAPS and compared the impact of clustering resolutions (i.e., 0.4, 0.8, 1.2, and 1.6) to cell clustering results. We found no significant differences among these clustering methods, and Louvain showed slightly better performance than the other two (Supplementary Fig. 6 and Source Data 5). Lastly, DeepMAPS achieved higher scores than other tools when selecting the same clustering resolution. We also found that, in most cases, higher resolution lower cell clustering prediction scores; therefore, we selected resolution at 0.4 as the default parameter in DeepMAPS (Supplementary Fig. 7 and Source Data 6–8).
We further independently tested our default parameter selection on five independent datasets, named Rtest, Ctest, Atest1, −2, and −3, by comparing our results with the same benchmarking tools using their default parameters. For the three test datasets with benchmarking cell labels, DeepMAPS performs the best in terms of ARI score, while for the two scRNAATACseq datasets without cell labels, the benchmarking tools in the comparison achieve similar performance (Fig. 2b and Source Data 9). In order to evaluate the robustness of DeepMAPS, a leaveoneout test was performed on the three independent test datasets with benchmark labels (Fig. 2c and Source Data 10). We first removed all cells in a cell cluster based on benchmark labels and then applied DeepMAPS and other tools on the remaining cells. For each dataset, the leaveoneout results of DeepMAPS were better than the other tools with higher ARI scores, indicating that the message passing and attention mechanism used in DeepMAPS maintains cellcell relations in a robust manner.
The cell clustering UMAP on the three independent datasets with benchmarking labels showcased that the latent representations obtained in DeepMAPS can better preserve the heterogeneity of scRNAseq data (Fig. 2d–f). For the Rtest dataset, all tools showed the ability to separate mesenchymal, leukocyte, and endothelial cells, but failed to separate urothelium basal cells and bladder cells. However, cells on the DeepMAPS UMAP are more compact, and the bladder cells (red dots) are grouped better than MOFA + and Seurat (Fig. 2d). For the Ctest dataset, cells in the same cluster are more ordered and compact (e.g., the B cell cluster and NK cell cluster), while cells from different clusters are more apart from each other on DeepMAPS UMAP (e.g., CD8 cell clusters and CD4 cell clusters). (Fig. 2e). For the Atest1 dataset, DeepMAPS was the only tool that accurately separated each cell type. In contrast, Seurat and MOFA + mistakenly divided the PDX1 or PDX2 population into two clusters and included more mismatches (Fig. 2f).
DeepMAPS can infer statistically significant and biologically meaningful gene association networks from scMultiomics data
We evaluated the two kinds of biological networks that DeepMAPS can infer, gene association network and GRN, in terms of centrality scores and functional enrichment. For the Rtest dataset (Fig. 3a) and Ctest dataset (Fig. 3b), we used two centrality scores, closeness centrality (CC) and eigenvector centrality (EC), that have been used in previous singlecell gene association network evaluations^{20}, to compare the identified gene association networks from all the tools in this comparison. CC reflects the average connectivity of a node to all others in a network, and EC reflects the importance of a node based on its connected nodes. Both CC and EC can interpret node’s influence in identifying genes that may play more critical roles in the network. A gene association network with higher node centrality indicates that the detected genes are more likely to be involved in critical and functional biological systems. We also constructed gene coexpression networks as a background using all genes in a dataset by calculating Pearson’s correlation coefficients of gene expression in a cell cluster. pvalue = 0.05 was set as the edge cutoff. We compared celltypeactive gene association networks generated in DeepMAPS with those generated in IRIS3^{15} and the background coexpression networks. The average CC and EC of networks constructed by DeepMAPS in Rtest and Ctest datasets showed significantly higher scores than IRIS3 and the background coexpression networks (Source Data 11). We reason that the gene association network generated in DeepMAPS is not only coexpressed but also of great attention impact on cells; thus, genes in the network tend to be more important to the cell type.
To evaluate whether DeepMAPS can identify biologically meaningful GRNs in specific cell types, we performed enrichment tests on basic gene regulatory modules (i.e., regulons^{14}), with three public functional databases, Reactome^{21}, DoRothEA^{22}, and TRRUST v2^{23}. To avoid any bias in comparison, we compared celltypespecific GRNs inferred from DeepMAPS with (i) IRIS3 and SCENIC^{14} on the scRNAseq matrix, (ii) IRIS3 and SCENIC on a genecell matrix recording the gene activity scores (GAS) calculated in DeepMAPS based on the velocitybased integration method, (iii) MAESTRO^{24} on scATACseq matrix, and (iv) MAESTRO on original scRNAseq and scATACseq matrix. The six datasets collected from human tissue were used (i.e., Atest1, Abench2, Abench3, Abench4, Atest1, Atest2). We first showcased that the GRNs identified in DeepMAPS included more unique transcription factor (TF) regulations than the other tools, except for the enrichment to the DoRothEA database (Fig. 3c and Source Data 12). We considered that a highly celltypespecific regulon (CTSR) might represent only one significant enriched functionality; alternatively, a generic regulon might improperly contain genes involved in several pathways. Therefore, we compared the number of CTSRs enriched to one function/pathway across different tools. DeepMAPS outperformed (pvalue<0.05) other tools on most of the six scRNAATACseq datasets in terms of the number of regulons that enrich only one function/pathway and the enrichment F1 scores (Fig. 3d, e and Source Data 12). For the F1 score of the enrichment test to the TRRUST v2 database, DeepMAPS (median F1 score is 0.026) was slightly lower than IRIS3 using the GAS matrix (median F1 score is 0.031). We also noticed that all tools did not achieve good enrichments in the TRRUST v2 database mainly due to the small number of genes (on average, 10 genes are regulated by one TF; 795 TFs in total). SCENIC also showed competitive scaled precision scores (scaled mean: 0.47 for Reactome, 0.66 for DoRothEA, and 0.61 for TRRUST v2), while achieving lower scaled recall scores, making the F1 scores smaller than DeepMAPS for most datasets. IRIS3 and SCENIC performed on the GAS matrix showed better enrichment results than using scRNAseq data only, indicating that integrating information from scRNAATACseq data is more useful for GRN inference than using scRNAseq data alone.
DeepMAPS accurately identifies cell types and infers cellcell communication in PBMC and lung tumor immune CITEseq data
We present a case study that applies DeepMAPS to a published mixed peripheral blood mononuclear cells (PBMC) and lung tumor leukocytes CITEseq dataset (10× Genomics online resource, Supplementary Data 1) to demonstrate capacity in modeling scMultiomics in characterizing cell identities. The dataset includes RNAs and proteins measured on 3485 cells. DeepMAPS identified 13 cell clusters, including four CD4^{+} T cell groups (naive, central memory (CM), tissueresident memory (TRM), and regulatory (Treg)), two CD8^{+} T cell groups (CM and TRM), a natural killer cell group, a memory B cell group, a plasma cell group, two monocyte groups, one tumorassociated macrophage (TAM) group, and a dendritic cell (DC) group. We annotated each cluster by visualizing the expression levels of curated maker genes and proteins (Fig. 4a, b and Supplementary Data 2). Compared to cell types identified using only proteins or RNA, we isolated or accurately annotated cell populations that could not be characterized using the individual modality analysis. For example, the DC cluster was only successfully identified using the integrated protein and RNA. By combining signals captured from both RNA and proteins, DeepMAPS successfully identified biologically reasonable and meaningful cell types in the CITEseq data.
We then compared the modality correlation between the two cell types. We used the top differentially expressed genes and proteins between memory B cells and plasma cells, and performed hierarchical clustering of the correlation matrix. The result clearly stratified these features into two anticorrelated modules: one associated with memory B cells and the other with plasma cells (Fig. 4c). Furthermore, we found that the features in the two modules significantly correlated with the axis of maturation captured by our HGT embeddings (Supplementary Fig. 8 and Supplementary Data 3). For example, one HGT embedding (the 51^{st}) showed distinctive differences between plasma cells and memory B cells (Fig. 4d, e). Similar findings were also observed when comparing EM CD8^{+} T cells with TRM CD8^{+} T cells (Fig. 4f). Nevertheless, it is possible to identify a representative HGT embedding (56^{th}) that maintains embedding signals for a defined separation of the two groups (Fig. 4g, h). These results point to any two cell clusters consisting of coordinated activation and repression of multiple genes and proteins, leading to a gradual transition in cell state that can be captured by a specific dimension of the DeepMAPS latent HGT space. On the other hand, we generated the geneassociated networks with genes showing high attention scores for EM CD8^{+} T cells, TRM CD8^{+} T cells, memory B cells, and plasma cells and observed diverse patterns (Supplementary Fig. 9).
Based on the cell types and raw data of gene and protein expressions, we inferred cell–cell communications and constructed communication networks among different cell types within multiple signaling pathways using CellChat^{25} (Fig. 4i). For example, we observed a CD6ALCAM signaling pathway existing between DC (source) and TRM CD4^{+} T cells (target) in the lung cancer tumor microenvironment (TME). Previous studies have shown that ALCAM on antigenpresenting DCs interacts with CD6 on the T cell surface and contributes to T cell activation and proliferation^{26,27,28}. As another example, we identified the involvement of the NECTINTIGIT signaling pathway during the interaction between the TAM (source) and TRM CD8^{+} T cells (target), which is supported by a previous report that NECTIN (CD155) expressed on TAM could be immunosuppressive when interacting with surface receptors, TIGIT, on CD8^{+} T cells in the lung cancer TME^{29,30}.
DeepMAPS identifies specific GRNs in diffuse small lymphocytic lymphoma scRNAseq and scATACseq data
To further extend the power of DeepMAPS to GRN inference, we used a singlecell Multiome ATAC + Gene expression dataset available on the 10× Genomics website (10× Genomics online resource). The raw data is derived from 14,566 cells of flashfrozen intraabdominal lymph node tumor from a patient diagnosed with diffuse small lymphocytic lymphoma (DSLL) of the lymph node lymph. We integrate gene expression and chromatin accessibility by balancing the weight of each modality of a gene in a cell based on RNA velocity (Fig. 5a and Method). To build TFgene linkages, we considered gene expression, gene accessibility, TFmotif binding affinity, peaktogene distance, and TFcoding gene expression. Genes found to be regulated by the same TF in a cell cluster are grouped as a regulon. We considered regulons with higher centrality scores to have more significant influences on the characterization of the cell cluster. Regulons regulated by the same TF across different cell clusters are compared for differential regulon activities. Those with significantly higher regulon activity scores (RAS) are considered as the celltypespecific regulons in the cell cluster.
DeepMAPS identified 11 cell clusters in the DSLL data. All clusters were manually annotated based on curated gene markers (Fig. 5b and Supplementary Data 4). Two DSLLlike cell clusters (DSLL state1 and state2) were observed. The RNA velocitybased pseudotime analysis performed on the three B cell clusters (normal B cell and two DSLL states) assumed that the two DSLL states were derived from normal B cells, and state1 was derived earlier than state2, although the two states seemed to be partially mixed (Fig. 5c). We further selected the top 20 TFs with the highest regulon centrality scores in each of the three cell clusters (Fig. 5d and Source Data 13). Interestingly, these TFs showed distinctions between the normal and the two DSLL states and inferred variant regulatory patterns within the two DSLL states. For regulons shared by all three B cell clusters, EGR1, MEF2B, and FOS were transcriptionally active in both normal B and DSLL cells and responsible for regulating B cell development, proliferation, and germinal center formation^{31,32,33,34}. E2F6, ELF3, and KLF16 were identified as shared only in the two DSLL states, with reported roles in tumorigenesis^{35,36,37,38,39,40}. Further, JUN, MAFK, and MAFG, which encode the compartments of the activating protein1 (AP1),^{34,41,42} were found to be active in DSLL state1 while NFKB1, coding for a subunit of the NFκB protein complex^{43,44}, was found to be active in DSLL state2.
We constructed a GRN consisting of the four celltypespecific regulons (JUN, KLF16, GATA1, and FOS) (Fig. 5e and Supplementary Fig. 10) in DSLL state1 with RAS that is significantly higher than normal B cells and DSLL state2 (Fig. 5f). KLF16 reportedly promotes the proliferation of both prostate^{39} and gastric cancer cells^{40}. FOS and JUN are transcription factors in the AP1 family, regulating the oncogenesis of multiple types of lymphomas^{34,41,42,45}, and GATA1 is essential for hematopoiesis, the dysregulation implicated in multiple hematologic disorders, and malignancies^{46,47}. Distinct regulatory patterns were also observed when we zoomed in on a single regulon (Fig. 5g and Supplementary Figs. 1112). As the most active regulon in DSLL state1, JUN was found to regulate five unique downstream genes and 12 genes shared with DSLL state2. Downstream genes, including CDK6^{33,34}, IGF2R^{48}, and RUNX1^{49}, are critical for cell proliferation, survival, and development functions in DSLL.
Moreover, we further built connections between upstream cellcell communication signaling pathways and downstream regulatory mechanisms in DSLL cells. We identified a cellcell communication between macrophage and the two DSLL states via the B cell activation factor (BAFF) signaling pathway, based on the integrated GAS matrix using CellChat^{25}, which includes BAFF as the ligand on macrophage cells and TACI (transmembrane activator and calciummodulator and cyclophilin ligand interactor) as the receptor on DSLL cells (Fig. 5h). BAFF signaling is critical to the survival and maturation of normal B cells^{50,51}, while aberrations contribute to the resistance of malignant B cells to apoptosis^{52,53}. We observed that the expression of the TACI coding gene, TNFRSF13B, was explicitly higher in the two DSLL states, while the corresponding chromatin accessibility maintained high peaks in state1 (Fig. 5i). Upon engagement with its ligand, TACI has been reported to transduce the signal and eventually activate the AP1^{54,55} and NFκB^{56,57} transcriptional complexes for downstream signaling in B cells. JUN (a subunit of AP1) was identified as the most specific and key regulator in state1 responsible for cell proliferation and regulating downstream oncogenes, such as CDK6, that has been reported to promote the proliferation of cancer cells in multiple types of DSLLs as well as other hematological malignancies^{58,59,60}. It is clear that BAFF signaling first appears in DSLL state1 and triggers the activation of the JUN regulatory mechanism, leading to a high regulon activity of JUN. The JUN regulon accelerates the proliferation and oncogenesis explicitly in DSLL, leading to a more terminal differential stage of DSLL (state2). As a result, state1 includes cells undergoing rapid cell proliferation and differentiation, transitioning from normal B cells to matured DSLL. In short, DeepMAPS can construct GRNs and identify celltypespecific regulatory patterns to offer a better understanding of cell states and developmental orders in diseased subpopulations.
DeepMAPS provides a multifunctional and userfriendly web portal for analyzing scMultiomics data
Due to the complexity of singlecell sequencing data, more webservers and dockers have been developed in the past three years^{61,62,63,64,65,66,67,68,69,70,71,72,73} (Supplementary Data 5). However, most of these tools only provide minimal functions such as cell clustering and differential gene analysis. They do not support the joint analysis of scMultiomics data and especially lack sufficient support for biological network inference. On the other hand, we recorded the running time of DeepMAPS and benchmark tools on different datasets with cell numbers ranging from 1000 to 160,000 (Supplementary Data 6). The deep learning models (DeepMAPS and TotalVI) have longer running time than Seurat and MOFA + . To these ends, we provided a codefree, interactive, and nonprogrammatic interface to lessen the programming burden for scMultiomics data (Fig. 6a). The webserver supports the analysis of multiple RNAseq data, CITEseq data, and scRNAATACseq data using DeepMAPS (Fig. 6b). Some other methods, e.g., Seurat, are also incorporated as an alternative approach for the users’ convenience. Three major steps—data preprocessing, cell clustering and annotation, and network construction—are included in the server. In addition, the DeepMAPS server supports realtime computing and interactive graph representations. Users may register for an account to have their own workspace to store and share analytical results. Other than the advances mentioned, the DeepMAPS webserver highlights an additional function for the elucidation of complex networks in response to external stimuli in specific cell types. The user can upload a metadata file with phenotype information (e.g., cells with treatment and without treatment), select, and relabel the corresponding cells (e.g., CD8+ T cells with treatment and CD8+ T cells without treatment). In this way, DeepMAPS will predict the treatmentrelated networks in CD8+ T cells. Examples are given in the online tutorial at https://bmblx.bmi.osumc.edu/tutorial.
Discussion
DeepMAPS is a deeplearning framework that implements heterogeneous graph representation learning and a graph transformer in studying biological networks from scMultiomics data. By building a heterogeneous graph containing both cells and genes, DeepMAPS identifies their joint embedding simultaneously and enables the inference of celltypespecific biological networks along with cell types in an intact framework. Furthermore, the application of a heterogeneous graph transformer models the cellgene relation in an interpretable uniform multirelation. In such a way, the training and learning process in a graph can be largely shortened to consider cell impacts from a further distance.
By jointly analyzing gene expression and protein abundance, DeepMAPS accurately identified and annotated 13 cell types in a mixed CITEseq data of PBMC and lung tumor leukocytes based on curated markers that cannot be fully elucidated using a single modality. We have also proved that the embedding features identified in DeepMAPS capture statistically significant signals and amplify them when the original signals are noisy. Additionally, we identified biologically meaningful cellcell communication pathways between DC and TRM CD4^{+} T cells based on the gene association network inferred in the two clusters. For scRNAATACseq, we employed an RNA velocitybased method to dynamically integrate gene expressions and chromatin accessibility that enhanced the prediction of cell clusters. Using this method, we identified distinct gene regulatory patterns among normal B cells and two DSLL development states. We further elucidated the deep biological connections between cellcell communications and the downstream GRNs, which helped characterize and define DSLL states. The identified TFs and genes can be potential markers for further validation and immunotherapeutical targets in DSLL treatment.
While there are advantages and improved performances for analyzing scMultiomics data, there is still room to improve the power of DeepMAPS further. First, the computational efficiency for superlarge datasets (e.g., more than 1 million cells) might be a practical issue considering the complexity of the heterogeneous graph representation (which may contain billions of edges). Moreover, DeepMAPS is recommended to be run on GPUs, which leads to a potential problem of reproducibility. Different GPU models have different floatingpoint numbers that may influence the precision of loss functions during the training process. For different GPU models, DeepMAPS may generate slightly different cell clustering and network results. Lastly, the current version of DeepMAPS is based on a bipartite heterogeneous graph with genes and cells. Separate preprocessing and integration steps are required to transfer different modalities to genes for integration into a cellgene matrix. To fully achieve an endtoend framework for scMultiomics analysis, the bipartite graph can be extended to a multipartite graph, where different modalities can be included as disjoint node types (e.g., genes, proteins, or peak regions). Such a multipartite heterogeneous graph can also include knowledgebased biological information, such as known molecular regulations and more than two modalities in one graph. However, by including more node types, the computational burden will be increased geometrically, which requires a dedicated discovery of model and parameter optimization in the future.
In summary, we evaluated DeepMAPS as a pioneer study for the integrative analysis of scMultiomics data and celltypespecific biological network inference. It will likely provide different visions of deep learning deployment in singlecell biology. With the development and maintenance of the DeepMAPS webserver, our longterm goal is to create a deep learningbased ecocommunity for archiving, analyzing, visualizing, and disseminating AIready scMultiomics data.
Methods
Data description
We included ten public datasets (i.e., Rbench13, Cbench13, and Abench14) for gridtest benchmarking among DeepMAPS and existing tools and additional five datasets (i.e., Rtest1, Ctest1, and Atest13) for independent test with optimized parameters. The human PBMC and lung tumor leukocyte CITEseq data and 10× lymph node scRNAseq & scATACseq data were used for the two case studies, respectively. All data are publicly available (Supplementary Data 1 and Data Availability).
Data preprocessing and integration
The analysis of DeepMAPS takes the raw counts matrices of multiple scRNAseq (multiple gene expression matrices), CITEseq (gene and surface protein expressions matrices), and scRNAATACseq (gene expression and chromatin accessibility matrices) data as input. For each data matrix, we define modality representations as rows (genes, proteins, or peak regions) and cells as columns across the paper unless exceptions are mentioned. In each data matrix, a row or a column is removed if it contains less than 0.1% nonzero values. The data quality control is carried out by Seurat v3, including but not limited to total read counts, mitochondrial gene ratio, and blacklist ratio. Additional data preprocessing and integration methods are showcased below.
Multiple scRNAseq data
Gene expression matrices are lognormalized, and the top 2000 highly variable genes are selected using Seurat v3^{18} from each matrix. If there are less than 2000 genes in the matrix, all of them will be selected for integration. We then apply the widelyused canonical correlation analysis (CCA) in Seurat to align these matrices and harmonize scRNAseq data, leading to a matrix \(X=\{{x}_{{ij}}i={1,2},\ldots,{I;j}={1,2},\ldots,\,J)\) for I genes in J cells.
CITEseq data
Gene and surface protein expression matrices are lognormalized. The top I_{1} highly variable genes (I_{1} = 2000) and I_{2} proteins (I_{2} is the total number of proteins in the matrix) are selected. The two matrices are then concatenated vertically, leading to a matrix \(X=({x}_{{ij}}i={{1,2}},\ldots,({{{\rm I}_{1}}}+{{{\rm I}_{2}}}){{;}} \, {{{{{\rm{j}}}}}}={{1,2}},\ldots,\,J)\) for I_{1} genes and I_{2} proteins in J cells (can be treated as \({I}_{1}+{I}_{2}=I\) genes in J cells). A centered logratio (CLR) transformation is performed on X as follows:
where \({{{{{{\mathcal{Z}}}}}}}_{j}\) represents the set of indices for nonzero genes in cell j, and ∙ means the number of elements in the set.
scRNAATACseq data
The gene expression matrix \({X}^{R}=\{{x}_{{ij}}^{R}i={1,2},\ldots,{I;j}={1,2},\ldots,\,J\}\) with I genes and J cells is lognormalized. Then a lefttruncated mixture Gaussian (LTMG) model is used to provide a qualitative representation of each gene over all cells, through the modeling of how underlying regulatory signals control gene expressions in a cell population^{74}. Specifically, if gene i can be represented by G_{i} Gaussian distributions over all J cells, that means there are potentially G_{i} regulatory signals regulating this gene. A matrix \({X}^{{R{{\hbox{'}}}}}=\{{x}_{{ij}}^{{R{{\hbox{'}}}}}\}\) with the same dimension as X^{R} can be generated, where the gene expressions are labeled by discrete values of \({x}_{{ij}}^{{R{{\hbox{'}}}}}={1,2},\ldots {G}_{i}\).
The chromatin accessibility matrix is represented as \({X}^{A}=\{{x}_{{kj}}^{A}k={1,2},\ldots,{K;j}={1,2},\ldots,\,J\}\) for K peak regions in J cells. We annotate peak regions in X^{A} into corresponding genes based on the method described in MAESTRO^{24}. Specifically, a regulatory potential weight w_{ik} for peak k to gene i is calculated conditional to the distance of peak k to gene i in the genome:
where \({d}_{{ik}}\) is the distance between the center of peak \(k\) and the transcription start site of gene \(i\), and \({d}_{0}\) is the halfdecay of the distance (set to be 10 kb). The regulatory potential weight \({w}_{{ik}}\) of peak \(k\) to gene \(j\) is normally calculated by \({2}^{\frac{{d}_{{ik}}}{{d}_{0}}}\). For peaks with \({d}_{{ik}} \, > \, 150{kb}\), \({w}_{{ik}}\) will be less than 0.0005, and thus we set it to 0 for convenience. In MAESTRO, for peaks located in the exon region, \({d}_{0}\) is 0, so that \({w}_{{ik}}\) should be \(1\) according to the formula, and \({w}_{{ik}}\) is normalized by the total exon length of gene \(i\). The reason is that, in bulk ATACseq data, it is observed that many highly expressed genes will also have ATACseq peaks in the exon regions, mainly due to the temporal PolII and other transcriptional machinery bindings. Based on that observation, to better fit the model with gene expression, MAESTRO added the signals from the exon regions. However, as reads tend to be located in longer exons more easily than shorter exons, to normalize the possibility of background reads, it normalizes the total reads on exons by the total exon length for each gene. Eventually, a regulatory potential score of peak \(k\) to gene \(i\) in cell \(j\) can be calculated as \({r}_{{ikj}}={w}_{{ik}}{\times x}_{{kj}}^{A}\). The scATACseq matrix \({X}^{A}\) can then be transformed into a gene regulatory potential matrix by summing up the regulatory potential scores of peaks that regulate the same gene:
giving rise to the regulatory potential matrix \({X}^{A^{\prime} }=\{{x}_{ij}^{A^{\prime} }i=1,2,\ldots,I;\,j=1,2,\ldots,\,J\}\) for same \(I\) genes in \(J\) cells with \({X}^{R}\).
We assume that the activity of a gene in a cell is determined by gene expression and gene regulatory activity with different contributions. Unlike the contribution weights determined directly based on the expression and chromatin accessibility values in Seurat v4 (weighted nearest neighbor)^{5}, we hypothesize that the relative contribution of the expression and chromatin accessibility of a gene to a cell is dynamic rather than static and not accurately determined with a snapshot of the cell. RNA velocity is determined by the abundance of unspliced and spliced mRNA in a cell. The amount of unspliced mRNA is determined by gene regulation and gene transcription rate, and the amount of spliced mRNA is determined by the difference between unsliced mRNA and degraded mRNA. We reasoned that for genes with positive RNA velocities in a cell, there are higher potentials to drive gene transcription. Thus, their regulatory activity related to chromatin accessibility has a greater influence than the gene expression in defining the overall transcriptional activity in the cell of the current snapshot. For genes with negative velocities, the transcription rate tends to be decelerated; hence chromatin accessibility has less influence on transcriptional activity than gene expression.
A velocity matrix \({X}^{V}=\{{x}_{{ij}}^{V}i={1,\, 2},\ldots {I;\, j}={1,\, 2},\ldots,\,J\}\) is generated using scVelo with the default parameters^{75}. Considering that some genes may fail to obtain valid velocity or regulatory potential values, we simultaneously remove the genes that have allzero rows in \({X}^{{A{{\hbox{'}}}}}\) or \({X}^{V}\) from the four matrices \({X}^{R},\, {X}^{{R{{\hbox{'}}}}},\,{X}^{{A{{\hbox{'}}}}},\,{X}^{V}.\) Without loss of generality, we still use \(I\) and \(J\) represent the size of these new matrices. Furthermore, considering the potential bias when interpreting the velocity of a gene in a cell, we use the LTMG representations \({x}_{{ij}}^{{R{{\hbox{'}}}}}\in \{{{{{\mathrm{1,2}}}}},\ldots,{G}_{i}\}\) to discretize \({x}_{{ij}}^{V}\). For gene \(i\), let \({{{{{{\mathcal{J}}}}}}}_{g}\) be the cell set where gene \(i\) has the same LTMG signal \(g\in \{{{{{\mathrm{1,\, 2}}}}},\ldots,{G}_{i}\}\). For the cells in \({{{{{{\mathcal{J}}}}}}}_{g}\), we use the mean velocity of gene \(i\) in these cells to replace the original velocities. To calculate a velocity weight \(\beta\) of gene \(i\) in cell \(j\), we first extract \({X}_{i}^{V}=({x}_{i1}^{V},\, {x}_{i2}^{V},\ldots {x}_{{iJ}}^{V})\) for the velocity of gene \(i\) in all cells and \({X}_{j}^{V}=({x}_{1j}^{V},\, {x}_{2j}^{V},\ldots {x}_{{I}_{2}j}^{V})\) for the velocity of all genes in cell \(j\). Then, for \({X}_{i}^{V}\), let \({{{{{{\mathcal{X}}}}}}}_{i}^{V+}=\{{x}_{{ij}}^{V}{x}_{{ij}}^{V} \, > \, 0,j={1,\, 2},\ldots,J\}\) for all cells with positive velocities of gene \(i\) and \({{{{{{\mathcal{X}}}}}}}_{i}^{V}=\{{x}_{{ij}}^{V}{x}_{{ij}}^{V} \, < \, 0,j={1,2},\ldots,J\}\) for all cells with negative velocities of gene \(i\). Similarly, for \({X}_{j}^{V}\), let \({{{{{{\mathcal{X}}}}}}}_{j}^{V+}=\{{x}_{{ij}}^{V}{x}_{{ij}}^{V} \, > \, 0{;i}={1,2},\ldots I\}\) for all genes with positive velocities in cell \(i\) and \({{{{{{\mathcal{X}}}}}}}_{j}^{V}=\{{x}_{{ij}}^{V}{x}_{{ij}}^{V} \, < \, 0{;}i={1,2},\ldots I\}\) for all genes with negative velocities in cell \(i\). For \({x}_{{ij}}^{V} \, > \, 0\), rank \({{{{{{\mathcal{X}}}}}}}_{i}^{V+}\) and \({{{{{{\mathcal{X}}}}}}}_{j}^{V+}\) from high to low based on velocity values with the ranking starting from 1 and calculate the velocity weight as:
where \(a\) is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{i}^{V+}\), b is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{j}^{V+}\).
Similarly, for \({x}_{{ij}}^{V} \, < \, 0\), rank \({{{{{{\mathcal{X}}}}}}}_{i}^{V}\) and \({{{{{{\mathcal{X}}}}}}}_{j}^{V}\) from high to low based on absolute value of velocities with ranking starting from 0 and calculate the velocity weight as:
where \(a\) is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{i}^{V}\) and \(b\) is the rank of \({x}_{{ij}}^{V}\) in \({{{{{{\mathcal{X}}}}}}}_{j}^{V}\).
We now generate a gene activity matrix \({X}^{G}=\{{x}_{{ij}}^{G}\}\), integrating gene expression and chromatin accessibility based on the velocity weight. \({x}_{{ij}}^{G}\) is the gene activity score (GAS) of gene \(i\) in cell \(j\):
Construction of genecell heterogeneous graph
To simplify notations, we now redefine any integrative matrix generated in the previous section as \(X=\{{x}_{{ij}}i={1,\, 2},\ldots,\, {I;\,j}={1,2},\ldots,J\}\) with \(I\) genes and \(J\) cells. \({x}_{{ij}}\) represents either normalized expressions (for multiple scRNAseq and CITEseq) or GAS (for scRNAATACseq) of gene \(i\) in cell \(j\). We calculate initial embeddings for genes and cells via two autoencoders. We used two autoencoders to generate the initial embeddings for cells and genes, respectively. The cell autoencoder reduces gene dimensions for each cell from \({{{{{\rm{I}}}}}}\) dimensions to 512 dimensions and eventually to 256 dimensions; a gene autoencoder reduces cell dimensions for each gene from \({{{{{\rm{J}}}}}}\) dimensions to 512 and 256 dimensions. So that, each cell and gene have the same initial embedding of 256 dimensions. The number of lower dimensions is optimized as a hyperparameter that differs for each dataset. The output layer is a reconstructed matrix \(\hat{X}\) with the same dimension as \(X\). The loss function of cell autoencoder is the mean squared error (MSE) of \(X\) and \(\hat{X}\):
The gene autoencoder learns low dimensional features of genes from all cells, which has an encoder, latent space, and a decoder similar to the cell autoencoder, while the input \({X}^{T}\) is the transposed matrix of \(X\). The loss function of gene autoencoder is
where \(\hat{{X}^{T}}\) is the reconstructed matrix in the output layer with the same dimensions as \({X}^{T}\).
Definition 1 (Heterogeneous graph): A heterogeneous graph is a graph with multiple types of nodes and/or multiple types of edges. We denote a heterogeneous graph as \(G=(V,\, E,\, A,\, R)\), where \(V\) represents nodes, \(E\) represents edges, \(A\) represents the node type union, and \(R\) represents the edge type union.
Definition 2 (Node type and edge type mapping function): We define \(\tau \left(v\right):V\to A\) and \(\phi \left(e\right):E\to R\) as the mapping function for node types and edge types, respectively.
Definition 3 (Node meta relation): For a node pair of \({v}_{1}\) and \({v}_{2}\) linked by an edge \({e}_{{{{{\mathrm{1,2}}}}}}\), the meta relation between \({v}_{i}\) and \({v}_{j}\) is denoted as \(langle \tau ({v}_{i}) \,,\phi ({e}_{i,j}),\tau ({v}_{j})\rangle\).
Giving the integrated matrix \(X\), we construct a bipartite genecell heterogeneous graph \(G\) with two node types (cell and gene) and one edge type (genecell edge). \({{{{{\bf{V}}}}}}={{{{{{\bf{V}}}}}}}^{{{{{{\bf{C}}}}}}}\cup {{{{{{\bf{V}}}}}}}^{{{{{{\bf{G}}}}}}}\), where \({{{{{{\bf{V}}}}}}}^{{{{{{\bf{G}}}}}}}=\left\{{v}_{i}^{G}i={1,\, 2},\ldots,I\right\}\) denotes all genes, and \({{{{{{\bf{V}}}}}}}^{{{{{{\bf{C}}}}}}}=\big\{{v}_{j}^{C}\,j={{{{\mathrm{1,\, 2}}}}},\ldots,J\big\}\) denotes all cells. \(E=\big\{{e}_{i,j}\big\}\) represents the edge between \({v}_{i}^{G}\) and \({v}_{j}^{C}\). For \({x}_{{ij}} \, > \, 0\), the weight of the corresponding edge \(\omega ({e}_{i,j})=1,\) otherwise, \(\omega ({e}_{i,j})=0\).
Joint embedding via a heterogeneous graph transformer
We propose an unsupervised HGT framework^{12,13} to learn graph embeddings of all the nodes and mine relationships between genes and cells. The input of HGT is the integrated matrix \(X\), and the outputs are the embeddings of cells and genes and attention scores representing the importance of genes to cells.
Definition 4 (Target node and source node): A node in \({{{{{\bf{V}}}}}}\) is considered as a target node, represented as \({v}_{t}\), when performing HGT to aggregate information and update embeddings of this node. A node is considered as a source node, represented as \({v}_{s},{v}_{s}\,\ne\, {v}_{t}\), if there is an edge between \({v}_{s}\) and \({v}_{t}\) in \(E\), denoted as \({e}_{s,t}\) for convenience.
Definition 5 (Neighborhood graph of target node): A neighborhood graph of a target node v_{t} is induced from G and denoted as G′ = (V′, E′, A′, R′), where \({{{{{\bf{V}}}}}}^{\prime}=\{{v}_{t}\} {{\cup }}{{{{{\mathscr{N}}}}}}\left({v}_{t}\right)\), \({{{{{\mathscr{N}}}}}}\left({v}_{t}\right)\) is the complete set of neighbors of \({v}_{t}\), \({{{{{\bf{E}}}}}}^{\prime}=\{{e}_{i,j}\in {E}{v}_{i},{v}_{j}\in {V{{\hbox{'}}}}\}\), A′ marks the target and source node types, and R′ represents the targetsource edge. e_{s,t} ∈ E′ represents the edge between \({v}_{s}\) and \({v}_{t}\). As only one edge type is included in \(G\), the node meta relation of \({v}_{s}\) and \({v}_{t}\) is denoted as \(\left\langle \tau \left({v}_{s}\right),\phi \left({e}_{s,t}\right),\tau \left({v}_{t}\right)\right\rangle\).

1.
Multihead attention mechanism and linear mapping of vectors.
Let \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}\) denotes the embedding of the \({l}^{{th}}\) HGT layer (\(l={{{{\mathrm{1,2}}}}},\ldots,L\)). The embedding of \({v}_{t}\) and \({v}_{s}\) on the \({l}^{{th}}\) layer is denoted as \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\). A multihead mechanism is applied to equally divide both \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\) into \(H\) heads. Multihead attention allows the model to jointly attend to information from different embedding subspaces, and each head can run through an attention mechanism in parallel to reduce computational time.
For the \({h}^{{th}}\) head in the \({l}^{{th}}\) HGT layer, the \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) is updated from \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}{{{{{\boldsymbol{}}}}}}{{{{{\bf{1}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}{{{{{\boldsymbol{}}}}}}{{{{{\bf{1}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\). The \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{0}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{0}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}]\) are the initial embedding of \({v}_{t}\) and \({v}_{s}\), respectively. Three linear projection functions are applied to map node embeddings into the \({h}^{{th}}\) vector. Specifically, the \({{{{{{\rm{Q}}}}}}\_{{{{{\rm{linear}}}}}}}_{\tau \left({{{{{{\rm{v}}}}}}}_{{{{{{\rm{t}}}}}}}\right)}^{{{{{{\rm{h}}}}}}}\) function maps \({v}_{t}\) into the \({h}^{{th}}\) query vector \({{{{{{\bf{Q}}}}}}}^{{{{{{\bf{h}}}}}}}\left({{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}\right)\), with dimension \({{\mathbb{R}}}^{d}\to {{\mathbb{R}}}^{\frac{d}{H}}\), where \(d\) is the dimension of \({{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}{{{{{\boldsymbol{}}}}}}{{{{{\bf{1}}}}}}}[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}]\) and \(\frac{d}{H}\) is the vector dimension per head. Similarly, the \({{{{{{\rm{K}}}}}}\_{{{{{\rm{linear}}}}}}}_{\tau \left({{{{{{\rm{v}}}}}}}_{{{{{{\rm{s}}}}}}}\right)}^{{{{{{\rm{h}}}}}}}\) and \({{{{{{\rm{V}}}}}}\_{{{{{\rm{linear}}}}}}}_{\tau \left({{{{{{\rm{v}}}}}}}_{{{{{{\rm{s}}}}}}}\right)}^{{{{{{\rm{h}}}}}}}\) function map the source node \({v}_{s}\) into the \({h}^{{th}}\) key vector \({{{{{{\bf{K}}}}}}}^{{{{{{\bf{h}}}}}}}\left({{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}\right)\) and the \({h}^{{th}}\) value vector \({{{{{{\bf{V}}}}}}}^{{{{{{\bf{h}}}}}}}\left({{{{{{\bf{v}}}}}}}_{{{{{{\bf{s}}}}}}}\right)\).
$${Q}^{h}\left({v}_{t}\right)={{Q}_{{linear}}}_{\tau \left({v}_{t}\right)}^{h}\left({{{{{{\mathcal{H}}}}}}}^{\left(l1\right)}\left[{v}_{t}\right]\right)$$(9)$${K}^{h}\left({v}_{s}\right)={K{{{{{\rm{\_}}}}}}{linear}}_{\tau \left({v}_{s}\right)}^{h}\left({{{{{{\mathcal{H}}}}}}}^{\left(l1\right)}\left[{v}_{s}\right]\right)$$(10)$${V}^{h}\left({v}_{s}\right)={V{{{{{\rm{\_}}}}}}{linear}}_{\tau \left({v}_{s}\right)}^{h}\left({{{{{{\mathcal{H}}}}}}}^{\left(l1\right)}\left[{v}_{s}\right]\right)$$(11)Each type of node has a unique linear projection to maximally model the distribution differences.

2.
Heterogeneous mutual attention
To calculate the mutual attention between \({v}_{t}\) and \({v}_{s}\), we introduce the Attention operator which estimates the importance of each \({v}_{s}\) to \({v}_{t}\):
$${Attention}\left({v}_{s},\, {e}_{s,\, t},\,{v}_{t}\right)=\mathop{{{{{{\rm{Softmax}}}}}}}\limits_{\forall v{{\in }}{{{{{\mathscr{N}}}}}}\left({v}_{t}\right)}\left(\mathop{\parallel }\limits_{H}\left({{ATT}{{\_}}{head}}^{h}\left({v}_{s},\, {e}_{s,\,t},\,{v}_{t}\right)\right)\right)$$(12)The attention function can be described as mapping a query vector and a set of keyvalue pairs to an output for each node pair \(e=({v}_{s},\, {v}_{t})\). The overall attention of \({v}_{t}\) and \({v}_{s}\) is the concatenation of the attention weights in all heads, followed by a softmax function. \(\mathop{{{{{{\rm{}}}}}}}\limits_{H}(\cdot )\) is the concatenation function. The ATT_head^{h}\(\left({v}_{s},\, {e}_{s,\,t},\, {v}_{t}\right)\) term is the \({h}^{{th}}\) head attention weight between the \({v}_{t}\) and \({v}_{s}\), which can be calculated by:
$${{ATT}{{{{{\rm{\_}}}}}}{head}}^{h}\left({v}_{s},\, {e}_{s,\, t},\,{v}_{t}\right)=\left({K}^{h}\left({v}_{s}\right){W}_{\phi \left({e}_{s,t}\right)}^{{ATT}}{Q}^{h}{\left({v}_{t}\right)}^{T}\right)\\ \cdot \frac{\mu \left\langle \tau \left({v}_{s}\right),\phi \left({e}_{s,\, t}\right),\,\tau \left({v}_{t}\right)\right\rangle }{\sqrt{d}},$$(13)The similarity between the queries and keys was measured where \({W}_{\phi \left({e}_{s,t}\right)}^{{ATT}}\in {{\mathbb{R}}}^{\frac{d}{H}\times \frac{d}{H}}\) is a transformation matrix to capture metarelation features. \({(\cdot )}^{T}\) is the transposal function and \(\mu\) is a prior tensor to denote the significance for each the node meta relation \(\left\langle \tau \left({v}_{s}\right),\,\phi \left({e}_{s,t}\right),\,\tau \left({v}_{t}\right)\right\rangle\), serving as an adaptive scaling to the attention. The concatenation of attention heads results in the attention coefficients between \({v}_{s}\) and \({v}_{t}\), followed by a Softmax function in Eq. 12.

3.
Heterogeneous message passing
A Message operator is used to extract the message of \({v}_{s}\) that can be passed to \({v}_{t}\). The multihead \({Message}\) is defined by:
$${Message}\left({v}_{s},\,{e}_{s,\,t},\,{v}_{t}\right)=\mathop{\parallel }\limits_{H}\left({{MSG}{{{{{\rm{\_}}}}}}{head}}^{h}\left({v}_{s},\,{e}_{s,\,t},\,{v}_{t}\right)\right)$$(14)The \({h}^{{th}}\) head message MSG_head^{h}\(\left({v}_{s},\,{e}_{s,\,t},\,{v}_{t}\right)\) for each edge \(\left({v}_{s},{v}_{t}\right)\) is defined as:
$${{MSG}{{{{{\rm{\_}}}}}}{head}}^{h}\left({v}_{s},\,{e}_{s,\,t},\,{v}_{t}\right)={V}^{h}\left({v}_{s}\right){W}_{\phi \left({e}_{s,t}\right)}^{{MSG}}$$(15)where each source node \({v}_{s}\) in the head \(h\) was mapped into a message vector by a linear projection \({V}^{h}\left({v}_{s}\right):{{\mathbb{R}}}^{d}\to \times {{\mathbb{R}}}^{\frac{d}{H}}\). \({W}_{\phi \left(e\right)}^{{MSG}}\in {{\mathbb{R}}}^{\frac{d}{H}\times \frac{d}{H}}\) is also a transformation matrix similar to \({W}_{\phi \left({e}_{s,t}\right)}^{{ATT}}\).

4.
Target specific aggregation
To update the embedding of \({v}_{t}\), the final step in the \({l}^{{th}}\) HGT layer is to Aggregate the neighbor information obtained in this layer \(\widetilde{{{{{{\boldsymbol{{{{{\mathcal{H}}}}}}}}}}}^{{{{{{\bf{l}}}}}}}}\left[{{{{{{\bf{v}}}}}}}_{{{{{{\bf{t}}}}}}}\right]\) into the target node embedding \({{{{{{\mathcal{H}}}}}}}^{l1}\left[{v}_{t}\right]\).
$$\widetilde{{{{{{{\mathcal{H}}}}}}}^{l}}\left[{v}_{t}\right]=\mathop{{Aggregate}}\limits_{\forall {v}_{s}{{\in }}{{{{{\mathscr{N}}}}}}\left({v}_{t}\right)}\left({Attention}\left({v}_{s},\,{e}_{s,\, t},\,{v}_{t}\right)\cdot {Message}\left({v}_{s},\,{e}_{s,\,t},\,{v}_{t}\right)\right)$$(16)$${{{{{{\mathcal{H}}}}}}}^{l}\left[{v}_{t}\right]=\theta \left({ReLU}(\widetilde{{{{{{{\mathcal{H}}}}}}}^{l}}\left[{v}_{t}\right])\right)+\left(\theta 1\right){{{{{{\mathcal{H}}}}}}}^{l1}\left[{v}_{t}\right],$$(17)where \(\theta\) is a trainable parameter and \({ReLU}\) is the activation function. The final embedding of \({v}_{t}\) is obtained by stacking information via all \(L\) HGT layers, and \(L\) is set to be 2 in DeepMAPS.

5.
Determination of gene to cell attention
We call out the final attention score \({{{{{{\mathcal{a}}}}}}}_{i,j}\) of gene \(i\) to cell \(j\) in the last HGT layer after the completion of the HGT process:
$${{{{{{\mathcal{a}}}}}}}_{i,j}=\sqrt{\mathop{\sum}\limits_{h}{{{ATT}\_{head}}^{h}\left(i,\, j\right)}^{2}}$$(18)
HGT training on subgraphs
To improve the efficiency and capability of the HGT model on a giant heterogeneous graph (tens of thousands of nodes and millions of edges), we deploy a modified HGSampling method for subgraph selection and HGT training on multiple minibatches^{12}. For the graph \(G\) with \(I\) genes and \(J\) cells, the union of subgraphs should cover \(a\)% (set to be 30%) nodes of gene and cell nodes to ensure the training power. As such, the sampler constructs a number of small subgraphs (50 in DeepMAPS) from the given heterogeneous graph \(G\), and feeds the subgraphs into the HGT model in different batches using multiple GPUs. Each graph should include \(a\%\times I/50\) genes, and \(a\%\times J/50\) cells. Take a cell \(j\) as a target node \({v}_{t}\) and its neighbor \({v}_{s}{{\in }}{{{{{\mathscr{N}}}}}}\left({v}_{t}\right)\), corresponding to gene \(i\), as source nodes, we calculate the probability on the edge \({e}_{s,t}\) as:
where \(x({v}_{s},{v}_{t})={x}_{{ij}}\) refers to the expression or GAS value of the gene \(i\) in cell \(j\) in the integrated matrix \(X\). Thus, for each target node \({v}_{t}\), we randomly select \(\frac{a\%\times I/50}{a\%\times J/50}\) neighbor genes for \({v}_{t}\) based on sampling probability \({{{{{\bf{Prob}}}}}}\left({{{{{{\bf{e}}}}}}}_{{{{{{\bf{s}}}}}}{{,}}{{{{{\bf{t}}}}}}}\right)\). HGT hyperparameters, such as \({W}_{\phi \left({e}_{s,t}\right)}^{{ATT}}\), \({W}_{\phi \left({e}_{s,t}\right)}^{{MSG}}\), and \(\theta\), will be trained and inherited sequentially from subgraphs 1 to 50 in one epoch. The subgraph training is performed in an unsupervised way with a graph autoencoder (GAE). The HGT is the encoder layer, and the inner product of embeddings is the decoder layer. We calculate the loss function of the GAE as the KullbackLeibler divergence (KL) of reconstructed matrix \(\hat{X}\) and the integrated matrix \(X\):
The subgraph training will be completed if the loss is restrained or reaches 100 epochs, whichever happens first.
Determination of active genes module in cell clusters
Predict cell clusters
We deploy a Louvain clustering (Seurat v3) to predict cell clusters cell embeddings \({{{{{{\mathcal{H}}}}}}}^{{{{{{\boldsymbol{L}}}}}}}\left[{{{{{{\boldsymbol{v}}}}}}}_{{{{{{\boldsymbol{c}}}}}}}\right]\) generated from the final HGT layer. The resolution of Louvain clustering is determined by a gridsearch test of multiple HGT hyperparameter combinations, and we set the clustering resolution of 0.4 as the default.
Identify cell clusteractive gene association network
We used an SFP model^{17} to select genes that highly contribute to cell cluster characterization and construct cell clusteractive gene association networks. Define a new heterogeneous graph \(\widetilde{G}=\left(V,\,\widetilde{E}\right),V\in {V}^{G}\cup {V}^{C},\widetilde{E}\in {\widetilde{E}}^{1}\cup {\widetilde{E}}^{2}\), where \({\widetilde{E}}^{1}\) represents the genegene relations, and \({\widetilde{E}}^{2}\) represents the genecell relations. The weight of the corresponding edge \(\omega ({\widetilde{e}}_{{i}_{1},{i}_{2}}^{1})\) of \({v}_{{i}_{1}}^{G}\in {V}^{G}\) and \({v}_{i}^{G}\in {V}^{G}\) is the Pearson’s correlation of the HGT embeddings between \({v}_{{j}_{1}}^{G}\) and \({v}_{{j}_{2}}^{G}\). The weight of the corresponding edge \(\omega \left({\widetilde{e}}_{i,j}^{2}\right)\) of \({v}_{i}^{G}\in {V}^{G}\) and \({v}_{j}^{G}\in {V}^{C}\) is the final attention score \({{{{{{\mathcal{a}}}}}}}_{i,j}\). Only edges with \(\omega \left({\widetilde{e}}_{{i}_{1},{i}_{2}}^{1}\right) \, > \, 0.5\) and \(\omega \left({\widetilde{e}}_{i,j}^{2}\right) \, > \, \mu \left({{{{{{\mathcal{a}}}}}}}_{i,j}\right)+{sd}({{{{{{\mathcal{a}}}}}}}_{i,j})\), where \(\mu\)() represents the mean and \({sd}()\) represents the standard deviation of \({{{{{{\mathcal{a}}}}}}}_{i,j}\), will be kept within a cell cluster. The weight of the remaining edges will then be maxmin normalized to ensure an edge with the largest weight being rescaled to be 0 and an edge with the smallest weight being rescaled to be 1.
Let \(Z\) be the number of clusters predicted via Louvain clustering, and \({V}^{C\left[z\right]}=\{{v}^{C[z]}\}\) be the node set corresponding with the cell set in cluster label of \(z={{{{\mathrm{1,2}}}}},\ldots,Z\). We then formulate this problem using a combinatorial optimization model defined below
s.t.
where \({{{{{\mathcal{L}}}}}}({v}_{{j}_{1}}^{C},{v}_{{j}_{2}}^{C})\) is a binary indicator function representing whether two cell nodes,\({v}_{{j}_{1}}^{C}\) and \({v}_{{j}_{2}}^{C}\), can be connected (1) or not (0) in \(\widetilde{G}\) via a \({\widetilde{E}}_{{j}_{1},{j}_{2}}^{{{{{{\mathcal{L}}}}}}}=\{{\widetilde{e}}_{{i}_{1},{j}_{1}}^{2},\,{\widetilde{e}}_{{i}_{1},{i}_{2}}^{1},\,{\widetilde{e}}_{{i}_{2},{i}_{3}}^{1}\ldots,{\widetilde{e}}_{{i}_{t1},{i}_{t}}^{1},{\widetilde{e}}_{{i}_{t},{j}_{2}}^{2}\}\) path. Denote \({\widetilde{E}}^{{{{{{\mathcal{L}}}}}}}=\{{\widetilde{E}}_{{j}_{1},{j}_{2}}^{{{{{{\mathcal{L}}}}}}}\}\) as the complete collection of \({\widetilde{E}}_{{j}_{1},{j}_{2}}^{{{{{{\mathcal{L}}}}}}}\) connecting \({v}_{{j}_{1}}^{C}\) and \({v}_{{j}_{2}}^{C}\). The combinatorial optimization model aims to identify the path connecting \({v}_{{j}_{1}}^{C}\) and \({v}_{{j}_{2}}^{C}\) with the minimum summed edge weight. We consider the gene networks remained in an SFP result of cluster \(z\) as the clusteractive gene association networks.
Construct GRNs from scRNAATACseq data
For genes in a cell clusteractive gene association network resulting from SFP, a set of TFs \(q={{{{\mathrm{1,2}}}}},\ldots,Q\) can then be assigned to genes. The TFpeak relations are retrieved by finding alignments between the TF binding sites with peak regions in the scATACseq data, and the peakgene relations are established previously when calculating the potential regulation scores \({r}_{{ikj}}\) (Eq. 3). We design a regulatory intensity (RI) score \({{{{{{\mathcal{s}}}}}}}_{i,j,q}\) to quantify the intensity of TF \(q\) in regulating gene \(i\) in the cell \(j\):
where \({b}_{{qk}}^{A}\) is the binding affinity score of TF \(q\) to peak \(k\). The binding affinity score is calculated by three steps: (a) We retrieved the genome browser track file from JASPAR, which stores all known TF binding sites of each TF. A pvalue score was calculated as log_{10} (p) × 100 in JASPAR, where 0 corresponds to a pvalue of 1 and 1,000 corresponds to a pvalue <10^{−10}. We removed TF binding sites with pvalue scores smaller than 500. (b) If a TF binding site overlaps with any peak regions in the scATACseq profile, it will be kept, otherwise, it will be removed. (c) Divide the corresponding pvalue score by 100. We claim that a gene set regulated by the same TF is a regulon.
We calculate a regulon activity score (RAS) \({{{{{\mathcal{r}}}}}}\left(q,z\right)\) of a regulon with genes regulated by TF \(q\) in cell cluster \(z\) as:
where \({I}_{q}\) denotes genes regulated by TF \(q\) in cell cluster \(z\). We used the Wilcoxon ranksum test to identify differentially active regulons in a cluster based on RAS. If the BHadjusted pvalue is less than 0.05 between different cell clusters and the log fold change larger than 0.10, we consider the regulon to be differentially active in this cluster, and it is defined as a celltypespecific regulon (CTSR).
A GRN in a cell cluster is constructed by merging regulons in a cell cluster. The eigenvector centrality (\({{{{{{\mathcal{c}}}}}}}_{v}\)) of a TF node \(v\) in GRN was defined as:
where \({\alpha }_{\max }\) is the eigenvector corresponding to the largest eigenvalue of the weighted adjacency matrix of a GRN. TFs with higher \({{{{{{\mathcal{c}}}}}}}_{v}\) ranks were regarded as master TFs (top 10 by default).
Benchmarking quantification and statistics
Gridsearch parameter test for cell clustering on benchmark data
To determine the default parameters of HGT on different data types, we performed a gridsearch test on HGT parameters, including the pair of number of embeddings and number of heads (91/13, 104/13, 112/16, and 128/16), learning rate (0.0001, 0.001, and 0.01), and training epochs (50, 75, and 100). Altogether, 36 parameter combinations were tested. For each of the three data types, the HGT parameter training were performed on three benchmark data, and the default parameter combination was selected based on the highest median score (ARI for multiple scRNAseq data and CITEseq data with benchmark labels and AWS for scRNAATACseq data without benchmark labels) of the three datasets.
To assess the performance of DeepMAPS alongside other proposed scMultiomics benchmark tools, we compared DeepMAPS with Seurat (v3.2.3 and v4.0, https://github.com/satijalab/seurat), MOFA + (v1.0.0, https://github.com/bioFAM/MOFA2), Harmony (v0.1, https://github.com/immunogenomics/harmony), TotalVI (v0.10.0, https://github.com/YosefLab/scvitools), and GLUE (v0.3.2, https://github.com/gaolab/GLUE). Because of the integration capability for different data types, DeepMAPS was compared with Seurat v 3.2.3 and Harmony on multiple scRNAseq data, with Seurat v4.0.0, MOFA+, and TotalVI on CITEseq data, and with Seurat v4.0.0, MOFA+, and GLUE on scRNAATACseq data. For each benchmarking tool, gridsearch tests were also applied to a combination of parameters, such as the number of dimensions for cell clustering and clustering resolution.
The default HGT parameter combination selected for each data type was then applied to additional datasets (one multiple scRNAseq, one CITEseq, and three scRNAATACseq data) for independent tests. All benchmarking tools use their default parameters.
To showcase the rationale for selecting integrative methods and cell clustering methods in DeepMAPS, we evaluated the cell clustering performances by replacing the methods with several others. Specifically, for data integration, we replaced the CCA method with Harmony integration (multiple scRNAseq), replaced the CLR method with Seurat weighted nearest neighbor method (CITEseq), and replaced the velocityweighted method with Seurat weighted nearest neighbor method and without using velocity (scRNAATACseq). For the cell clustering method, we replaced Louvain clustering with Leiden and the smart local moving (SLM) method. We also compared the influence of clustering resolution (use 0.4, 0.8, 1.2, and 1.6) to the cell clustering results in DeepMAPS. Each comparison was performed on all 36 parameter combinations as used in the gridsearch test. For DeepMAPS without velocity, we simply add up the gene expression matrix from scRNAseq data and the gene potential activity matrix derived from scATACseq data, considering the balance weight introduced by velocity for gene j in cell i as 1.
Adjusted rand index (ARI)
ARI is used to compute similarities by considering all pairs of the samples assigned in clusters in the current and previous clustering adjusted by random permutation. A contingency table is built to summarize the overlaps between the two cell label lists with \(b\) elements (cells) to calculate the ARI. Each entry denotes the number of objects in common between the two label lists. The ARI can be calculated as:
Where \(E\left[.\right]\) is the expectation, \({J}_{a}\) is the number of cells assigned to the same cell cluster as benchmark labels; \({J}_{b}\) is the number of cells assigned to different cell clusters as benchmark labels; \({C}_{n}^{2}\) is the combination of selecting two cells from the total of \(n\) cells in the cluster.
Average Silhouette Width (ASW)
Unlike ARI, which requires known ground truth labels, a silhouette score refers to a method of interpretation and validation of consistency within clusters of data. The silhouette weight indicates how similar an object is to its cluster (cohesion) compared to other clusters (separation). The silhouette width ranges from −1 to +1, where a high value indicates that the object is wellmatched to its cluster. The silhouette score sil(j) can be calculated by:
where \(m(j)\) is the average distance between a cell \(j\) and all other cells in the same cluster, and \(n\left(j\right)\) is the average distance of \(i\) to all cells in the nearest cluster to which \(j\) does not belong. We calculated the mean silhouette score of all cells as the ASW to represent the silhouette score of the dataset.
CalinskiHarabasz index
The CH index calculates the ratio of the sum of betweenclusters dispersion and intercluster dispersion for all clusters. A higher CH index indicates a better performance. For a set of data E of size \({n}_{E}\) with \(k\) clusters, the CH index is defined as:
where \(t\left({B}_{k}\right)\) is the trace of the between group dispersion matrix, and \(t\left({W}_{k}\right)\) is the trace of the withincluster dispersion matrix. \({C}_{q}\) is the set of points in cluster \(q\), \({c}_{q}\) is the center of cluster \(q\), \({c}_{E}\) is the center of E, and \({n}_{q}\) is the number of points in cluster \(q\). \(T\) refers to the matrix transformation.
DaviesBouldin index
The DB index signifies the average ‘similarity’ between clusters, where the similarity is a measure that compares the distance between clusters with the size of the clusters themselves. A lower DB index relates to a model with better separation between the clusters. For data with \(k\) clusters, \(i\in k\) and \(j\in k\), the DB index is defined as:
where \({s}_{i}\) and \({s}_{j}\) are the average distance between each point within the cluster to the cluster centroid. \({d}_{{ij}}\) is the distance of cluster centroids of \(i\) and \(j\).
Gene association network evaluations
We evaluated the performance of the gene association network identified in DeepMAPS by comparing it with IRIS3^{15} and a normal gene coexpression network inference using all genes. We calculated the closeness centrality and eigenvector centrality for the network generated in each tool. The formulations are given below.
Closeness centrality (CC)
The closeness centrality (CC)^{76} of a vertex \(u\) is defined by the inverse of the sum length of the shortest paths to all the other vertices \(v\) in the undirected weighted graph. The formulation is defined as:
where \({d}_{w}\left(u,\,v\right)\) is the shortest weighted path between \(u\) and \(v\). If there is no path between vertex \({{{{{\rm{u}}}}}}\) and \({{{{{\rm{v}}}}}}\), the total number of vertices is used in the formula instead of the path length. A higher CC indicates a node is more centralized in the network, reflecting a more important role of this gene in the network. The CC is calculated using igraph R package with function igraph::betweenness. We take the average CC of all nodes in a network to represent the network CC.
Eigenvector centrality (EC)
Eigenvector centrality (EC)^{77} scores correspond to the values of the first eigenvector of the graph adjacency matrix. The EC score of \(u\) is defined as:
where λ is inverse of the eigenvalue of eigenvector \(x=({x}_{1},{x}_{2},\ldots,{x}_{n})\), \({a}_{{uv}}\) is the adjacent weighted matrix of undirect graph G. A node with a high eigenvector centrality score means that it is connected to many nodes which themselves have high scores. The EC is calculated using igraph R package with function igraph::evcent. We take the average EC of all nodes in a network to represent the network EC.
Evaluations on GRN
For scRNAATACseq data, we compared celltypespecific GRNs inferred from DeepMAPS with (i) IRIS3 and SCENIC on the scRNAseq matrix, (ii) IRIS3 and SCENIC on GAS matrix, (iii) MAESTRO on scATACseq matrix, and (iv) MAESTRO on original scRNAseq and scATACseq matrix. For each dataset comparison, we set the cell clusters used in the benchmarking tool the same as generated in DeepMAPS to ensure fairness. GRNs generated from each tool were compared with three public functional databases, including Reactome^{21}, DoRothEA^{22}, and TRRUST v2^{23}. Only human sample datasets were used for comparison as these databases are all humanrelated. We performed hypergeometric tests for GRN resulting in each tool to each database and compared the precision, recall, and F1 score of enriched GRNs and functional terminologies.
Cell cluster leaveout test
For a benchmark dataset with a real cell type label, we removed all cells in one cell type and ran DeepMAPS. Then, we traversed all cell types (one at a time) to evaluate the robustness of ARI. We removed cells in predicted cell clusters from DeepMAPS and other benchmark tools for data without benchmark labels.
DeepMAPS server construction
DeepMAPS is hosted on an HPE XL675d RHEL system with 2 × 128core AMD EPYC 7H12 CPU, 64GB RAM, and 2×NVIDIA A100 40GB GPU. The backend server is written in TypeScript using the NestJs framework. Auth0 is used as an independent module to provide user authentication and authorization services. Redis houses a queue of all pending analysis jobs. There are two types of jobs in DeepMAPS: The stateful jobs are handled by the Plumber R package to provide realtime interactive analysis; and the stateless jobs, such as CPUbound bioinformatics pipelines and GPU training tasks that could take a very long time, are constructed using Nextflow. All running jobs are orchestrated using Nomad, allowing each job to be assigned with proper cores and storage and keeping jobs scalable based on the server load. The job results are deposited into a MySQL database. The frontend is built with NUXT, Vuetify as the UI library, Apache ECharts, and Cytoscape.js for data visualization. The frontend server and backend server are communicated using REST API.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data used in this study are from public domain. The raw data are downloaded from the GEO database with the accession numbers for: human pancreatic islets scRNAseq data GSE84133 and healthy bone marrow mononuclear cell CITEseq data: GSE194122. The following datasets were obtained from figshare: the human pancreas scRNAseq data [https://figshare.com/articles/dataset/Benchmarking_atlaslevel_data_integration_in_singlecell_genomics__integration_task_datasets_Immune_and_pancreas_/12420968/8], the mouse bladder from the Tabula Muris scRNAseq data [https://doi.org/10.6084/m9.figshare.5968960.v1], and the human lung adenocarcinoma PBMC CITEseq data [https://doi.org/10.6084/m9.figshare.c.5018987.v1]. The following paired scRNAseq and scATACseq datasets were obtained from the 10X Genomics website: 3k healthy PBMC data [https://www.10xgenomics.com/resources/datasets/pbmcfromahealthydonornocellsorting3k1standard200], 10k healthy PBMC data [https://www.10xgenomics.com/resources/datasets/pbmcfromahealthydonorgranulocytesremovedthroughcellsorting10k1standard200], frozen human healthy brain data [https://www.10xgenomics.com/resources/datasets/frozenhumanhealthybraintissue3k1standard200], 10k human PBMC data [https://www.10xgenomics.com/resources/datasets/10khumanpbmcsmultiomev10chromiumx1standard200], healthy PBMC data [https://www.10xgenomics.com/resources/datasets/pbmcfromahealthydonorgranulocytesremovedthroughcellsorting3k1standard200], fresh embryonic data [https://www.10xgenomics.com/resources/datasets/freshembryonice18mousebrain5k1standard200], and lymph node data [https://www.10xgenomics.com/resources/datasets/freshfrozenlymphnodewithbcelllymphoma14ksortednuclei1standard200]. The scRNAseq and scATACseq cancer cell line data was downloaded from CNGB Nucleotide Sequence Archive with an accession code of CNP0000213. All datasets are publicly available without restrictions. Details of data information can be found in Supplementary Data 1. Source data are provided with this paper.
Code availability
The python source code of DeepMAPS Docker is freely available at https://github.com/OSUBMBL/deepmaps and the DeepMAPS webserver is freely available at https://bmblx.bmi.osumc.edu/. The source code is also available on Zenodo^{78}.
References
Stuart, T. & Satija, R. Integrative singlecell analysis. Nat. Rev. Genet. 20, 257–272 (2019).
Ma, A., McDermaid, A., Xu, J., Chang, Y. & Ma, Q. Integrative methods and practical challenges for singlecell multiomics. Trends Biotechnol. 38, 1007–1022 (2020).
Argelaguet, R., Cuomo, A. S. E., Stegle, O. & Marioni, J. C. Computational principles and challenges in singlecell data integration. Nat. Biotechnol. 39, 1202–1215 (2021).
S Teichmann, M. E. Method of the year 2019: singlecell multimodal omics. Nat. Methods 17, 1 (2020).
Hao, Y. et al. Integrated analysis of multimodal singlecell data. Cell 184, 3573–3587.e3529 (2021).
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multimodal singlecell data. Genome Biol. 21, 111 (2020).
Korsunsky, I. et al. Fast, sensitive and accurate integration of singlecell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Gayoso, A. et al. Joint probabilistic modeling of singlecell multiomic data with totalVI. Nat. Methods 18, 272–282 (2021).
Li, Y. et al. Elucidation of biological networks across complex diseases using singlecell omics. Trends Genet. 36, 951–966 (2020).
Ma, Q. & Xu, D. Deep learning shapes singlecell data analysis. Nat. Rev. Mol. Cell Biol. 23, 303–304 (2022).
Wang, J. et al. scGNN is a novel graph neural network framework for singlecell RNASeq analyses. Nat. Commun. 12, 1882 (2021).
Hu, Z., Dong, Y., Wang, K. & Sun, Y. In Proceedings of The Web Conference 2020 2704–2710 (Association for Computing Machinery, Taipei, Taiwan; 2020).
Wang, X. et al. In The World Wide Web Conference 2022–2032 (Association for Computing Machinery, San Francisco, CA, USA; 2019).
Aibar, S. et al. SCENIC: singlecell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Ma, A. et al. IRIS3: integrated celltypespecific regulon inference server from singlecell RNASeq. Nucleic Acids Res. 48, W275–W286 (2020).
Han, P., Gopalakrishnan, C., Yu, H. & Wang, E. Gene regulatory network rewiring in the immune cells associated with cancer. Genes (Basel) 8, 308 (2017).
Gassner, E. The Steiner Forest Problem revisited. J. Discret. Algorithms 8, 154–163 (2010).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating singlecell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Cao, Z.J. & Gao, G. Multiomics singlecell data integration and regulatory inference with graphlinked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).
Iacono, G., MassoniBadosa, R. & Heyn, H. Singlecell transcriptomics unveils gene regulatory network plasticity. Genome Biol. 20, 110 (2019).
JoshiTope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).
GarciaAlonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & SaezRodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019).
Han, H. et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 46, D380–D386 (2018).
Wang, C. et al. Integrative analyses of singlecell transcriptome and regulome using MAESTRO. Genome Biol. 21, 198 (2020).
Jin, S. et al. Inference and analysis of cellcell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Ampudia, J. et al. CD6ALCAM signaling regulates multiple effector/memory T cell functions. J. Immunol. 204, 150.113–150.113 (2020).
Skonier, J. E. et al. Mutational analysis of the CD6 ligand binding domain. Protein Eng. Des. Selection 10, 943–947 (1997).
Gimferrer, I. et al. Relevance of CD6mediated interactions in T cell activation and proliferation. J. Immunol. (Baltim., Md.: 1950) 173, 2262–2270 (2004).
Johnston, R. J., Lee, P. S., Strop, P. & Smyth, M. J. Cancer immunotherapy and the nectin family. Annu. Rev. Cancer Biol. 5, 203–219 (2021).
Li, X.Y. et al. CD155 loss enhances tumor suppression via combined host and tumorintrinsic mechanisms. J. Clin. Invest. 128, 2613–2625 (2018).
Gururajan, M. et al. Early growth response genes regulate B cell development, proliferation, and immune response. J. Immunol. (Baltim., Md.: 1950) 181, 4590–4602 (2008).
Oh, Y.K., Jang, E., Paik, D.J. & Youn, J. Early growth response1 plays a nonredundant role in the differentiation of B cells into plasma cells. Immune Netw. 15, 161–166 (2015).
Brescia, P. et al. MEF2B instructs germinal center development and acts as an oncogene in B cell lymphomagenesis. Cancer Cell 34, 453–465.e459 (2018).
Trøen, G. et al. Constitutive expression of the AP1 transcription factors cjun, junD, junB, and cfos and the marginal zone Bcell transcription factor Notch2 in splenic marginal zone lymphoma. J. Mol. Diagn. 6, 297–307 (2004).
SánchezBeato, M. et al. Abnormal PcG protein expression in Hodgkin’s lymphoma. Relation with E2F6 and NFkappaB transcription factors. J. Pathol. 204, 528–537 (2004).
Saha, A., Robertson, E. S. & Goodrum, F. Mechanisms of Bcell oncogenesis induced by EpsteinBarr virus. J. Virol. 93, e00238–00219 (2019).
Yachida, S. et al. Genomic sequencing identifies ELF3 as a driver of ampullary carcinoma. Cancer Cell 29, 229–240 (2016).
Wang, H. et al. Overexpression of ELF3 facilitates cell growth and metastasis through PI3K/Akt and ERK signaling pathways in nonsmall cell lung cancer. Int. J. Biochem. Cell Biol. 94, 98–106 (2018).
Zhang, J. et al. KLF16 affects the MYC signature and tumor growth in prostate cancer. Onco Targets Ther. 13, 1303–1310 (2020).
Ma, P. et al. KLF16 promotes proliferation in gastric cancer cells via regulating p21 and CDK4. Am. J. Transl. Res. 9, 3027–3036 (2017).
Mathas, S. et al. Aberrantly expressed cJun and JunB are a hallmark of Hodgkin lymphoma cells, stimulate proliferation and synergize with NFκB. EMBO J. 21, 4104–4113 (2002).
Eferl, R. & Wagner, E. F. AP1: a doubleedged sword in tumorigenesis. Nat. Rev. Cancer 3, 859–868 (2003).
Nagel, D., Vincendeau, M., Eitelhuber, A. C. & Krappmann, D. Mechanisms and consequences of constitutive NFκB activation in Bcell lymphoid malignancies. Oncogene 33, 5655–5665 (2014).
Jost, P. J. & Ruland, J. R. Aberrant NFκB signaling in lymphoma: mechanisms, consequences, and therapeutic implications. Blood 109, 2700–2707 (2006).
Garces de Los Fayos Alonso, I. et al. The role of activator protein1 (AP1) family members in CD30positive lymphomas. Cancers (Basel) 10, 93 (2018).
Crispino, J. D. & Horwitz, M. S. GATA factor mutations in hematologic disease. Blood 129, 2103–2110 (2017).
Shimizu, R., Engel, J. D. & Yamamoto, M. GATA1related leukaemias. Nat. Rev. Cancer 8, 279–287 (2008).
Mosquera Orgueira, A. et al. Detection of new drivers of frequent Bcell lymphoid neoplasms using an integrated analysis of whole genomes. PLoS ONE 16, e0248886 (2021).
Blyth, K. et al. Runx1 promotes Bcell survival and lymphoma development. Blood Cells Mol. Dis. 43, 12–19 (2009).
Mackay, F., Schneider, P., Rennert, P. & Browning, J. BAFF AND APRIL: a tutorial on B cell survival. Annu Rev. Immunol. 21, 231–264 (2003).
Smulski, C. R. & Eibel, H. BAFF and BAFFreceptor in B cell selection and survival. Front. Immunol. 9, 2285 (2018).
Yang, S., Li, J. Y. & Xu, W. Role of BAFF/BAFFR axis in Bcell nonHodgkin lymphoma. Crit. Rev. Oncol. Hematol. 91, 113–122 (2014).
He, B. et al. Lymphoma B cells evade apoptosis through the TNF family members BAFF/BLyS and APRIL. J. Immunol. (Baltim., Md.: 1950) 172, 3268–3279 (2004).
Xia, X. Z. et al. TACI is a TRAFinteracting receptor for TALL1, a tumor necrosis factor family member involved in B cell regulation. J. Exp. Med 192, 137–143 (2000).
Laâbi, Y., Egle, A. & Strasser, A. TNF cytokine family: more BAFFling complexities. Curr. Biol. 11, R1013–R1016 (2001).
Mackay, F. & Schneider, P. TACI, an enigmatic BAFF/APRIL receptor, with new unappreciated biochemical and biological properties. Cytokine Growth Factor Rev. 19, 263–276 (2008).
Rihacek, M. et al. Bcell activating factor as a cancer biomarker and its implications in cancerrelated Cachexia. Biomed. Res. Int. 2015, 792187–792187 (2015).
Su, H., Chang, J., Xu, M., Sun, R. & Wang, J. CDK6 overexpression resulted from microRNA‑320d downregulation promotes cell proliferation in diffuse large B‑cell lymphoma. Oncol. Rep. 42, 321–327 (2019).
Lee, C., Huang, X., Di Liberto, M., Martin, P. & ChenKiang, S. Targeting CDK4/6 in mantle cell lymphoma. Ann. Lymphoma 4, 1 (2020).
Otto, T. & Sicinski, P. Cell cycle proteins as promising targets in cancer therapy. Nat. Rev. Cancer 17, 93–115 (2017).
Li, K. et al. cellxgene VIP unleashes full power of interactive visualization, plotting and analysis of scRNAseq data in the scale of millions of cells. bioRxiv, 2020.2008.2028.270652 (2020).
Pereira, W. et al. AscSeurat – Analytical singlecell Seuratbased web application. bioRxiv, 2021.2003.2019.436196 (2021).
Gardeux, V., David, F. P. A., Shajkofci, A., Schwalie, P. C. & Deplancke, B. ASAP: a webbased platform for the analysis and interactive visualization of singlecell RNAseq data. Bioinformatics 33, 3123–3125 (2017).
Li, B. et al. Cumulus provides cloudbased data analysis for largescale singlecell and singlenucleus RNAseq. Nat. Methods 17, 793–798 (2020).
Hillje, R., Pelicci, P. G. & Luzi, L. Cerebro: interactive visualization of scRNAseq data. Bioinformatics 36, 2311–2313 (2019).
Prompsy, P. et al. Interactive analysis of singlecell epigenomic landscapes with ChromSCape. Nat. Commun. 11, 5702 (2020).
Bolisetty, M. T., Stitzel, M. L. & Robson, P. CellView: Interactive exploration of high dimensional single cell RNAseq data. bioRxiv, 123810 (2017).
Mohanraj, S. et al. CReSCENT: CanceR Single Cell ExpressioN Toolkit. Nucleic Acids Res. 48, W372–W379 (2020).
Patel, M. V. iSCellR: a userfriendly tool for analyzing and visualizing singlecell RNA sequencing data. Bioinformatics 34, 4305–4306 (2018).
Yousif, A., Drou, N., Rowe, J., Khalfan, M. & Gunsalus, K. C. NASQAR: a webbased platform for highthroughput sequencing data analysis and visualization. BMC Bioinforma. 21, 267 (2020).
Zhu, Q. et al. PIVOT: platform for interactive analysis and visualization of transcriptomics data. BMC Bioinforma. 19, 6 (2018).
Innes, B. & Bader, G. scClustViz  Singlecell RNAseq cluster assessment and visualization. F1000Res. 7, ISCB Comm J1522 (2018).
Granja, J. M. et al. ArchR is a scalable software package for integrative singlecell chromatin accessibility analysis. Nat. Genet. 53, 403–411 (2021).
Wan, C. et al. LTMG: a novel statistical modeling of transcriptional expression states in singlecell RNASeq data. Nucleic Acids Res. 47, e111 (2019).
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Li, G. S., Li, M., Wang, J. X., Li, Y. H. & Pan, Y. United Neighborhood Closeness Centrality and Orthology for Predicting Essential. Proteins Ieee Acm T Comput Bi 17, 1451–1458 (2020).
Parisutham, N. & Rethnasamy, N. Eigenvector centrality based algorithm for finding a maximal common connected vertex induced molecular substructure of two chemical graphs. J. Mol. Struct. 1244, 130980 (2021).
Ma, A. et al. Singlecell biological network inference using a heterogeneous graph transformer. Zenodo https://doi.org/10.5281/zenodo.7559037 (2023).
Acknowledgements
This work was supported by awards R01GM131399 (Q.M.), R35GM126985 (D.X.), and U54AG075931 (Q.M.) from the National Institutes of Health. The work was also supported by the award NSF1945971 (Q.M.) from the National Science Foundation. This work was supported by the Pelotonia Institute of ImmunoOncology (PIIO). The content is solely the responsibility of the authors and does not necessarily represent the official views of the PIIO. In addition, we thank Dr. Fei He from the Northeast Normal University for his valued suggestions in framework construction and data testing.
Author information
Authors and Affiliations
Contributions
Q.M., B.L., and D.X. conceived the basic idea and designed the framework. X.W. wrote the backbone code of DeepMAPS. C.W. and H.C. built the backend and frontend servers. S.G. designed the interactive figures on the server. Y.Liu carried out RNA velocity calculations. Y.Li designed the SFP model for gene module predictions. A.M, X.W., and J.L. carried out benchmark experiments. X.W., Y.C., and B.L. performed robustness tests. A.M., J.L., X.W., G.X., Z.L. and T.X. carried out the case study. J.W., D.W., Y.J., J.L., and L.S. performed tool optimizations. A.M. and Q.M. led the figure design and manuscript writing. All authors participated in the interpretation and writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Saugato Rahman Dhruba and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ma, A., Wang, X., Li, J. et al. Singlecell biological network inference using a heterogeneous graph transformer. Nat Commun 14, 964 (2023). https://doi.org/10.1038/s41467023365590
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467023365590
This article is cited by

scCompressSA: dualchannel selfattention based deep autoencoder model for singlecell clustering by compressing gene–gene interactions
BMC Genomics (2024)

The diversification of methods for studying cell–cell interactions and communication
Nature Reviews Genetics (2024)

MarsGT: Multiomics analysis for rare population inference using singlecell graph transformer
Nature Communications (2024)

Deciphering cell types by integrating scATACseq data with genome sequences
Nature Computational Science (2024)

Graph machine learning for integrated multiomics analysis
British Journal of Cancer (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.