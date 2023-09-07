Cross-modality matching via iterative smoothed embedding

The input to MaxFuse are data from two modalities in the form of two pairs of matrices (Fig. 1a). For convenience, we can call the two modalities Y and Z. First, we have a pair of cell-by-feature matrices that contain all measured features in each modality. In addition, we represent the initial knowledge about the linkage between the two modalities as another pair of cell-by-feature matrices whose columns have one-to-one correspondences. To distinguish between these two pairs of matrices, we call the former all-feature matrices and the latter linked-feature matrices. For example, when one modality is protein abundance over a small antibody panel and the other is RNA expression over the whole transcriptome, the two all-feature matrices have drastically different numbers of columns, one being the number of proteins in the panel and the other being the number of genes in the transcriptome; the linked-feature matrices, on the other hand, have an equal number of columns, where each column in the protein matrix is one protein and its corresponding column in the RNA linked-feature matrix is its coding gene. When the number of cells is large, we recommend aggregating cells with similar features into meta-cells, as described in the Methods, before applying MaxFuse. In that case, each row in the above matrices would represent a meta-cell. The procedure below does not depend on whether single cells or meta-cells are used, and thus we will refer to each row as a ‘cell’.

Fig. 1: Overview of MaxFuse pipeline. a, The input consists of two pairs of matrices. The first pair consists of all features from each modality, and the second pair consists of only the linked features. MaxFuse uses all features within each modality to create a nearest-neighbor graph (that is, all-feature NN-graph) for cells in that modality. Fuzzy smoothing induced by the all-feature NN-graph is applied to the linked features in each modality. Cross-modal cell matching based on the smoothed linked features initializes the iterations in b. b, In each iteration, MaxFuse starts with a list of matched cell pairs. A cross-modal cell pair is called a pivot. MaxFuse learns canonical correlation analysis (CCA) loadings over all features from both modalities based on these pivots. These CCA loadings allow the computation of CCA scores for each cell (including cells not in any pivot), which are used to obtain a joint embedding of all cells across both modalities. For each modality, the embedding coordinates then undergo fuzzy smoothing based on the modality-specific all-feature NN-graphs (obtained in a). Next, the smoothed embedding coordinates are supplied to a linear assignment algorithm that produces an updated list of matched pairs to start the next iteration. c, After iterations end, MaxFuse screens the final list of pivots to remove low-quality matches. The retained pairs are called refined pivots. Within each modality, any cell that is not part of a refined pivot is connected to its nearest neighbor that belongs to a refined pivot and is matched to the cell from the other modality in this pivot. This propagation step results in a full matching. MaxFuse further learns the final CCA loadings over all features from both modalities based on the refined pivots. The resulting CCA scores give the final joint embedding coordinates. Full size image

During stage 1 of the MaxFuse pipeline, cell–cell similarities are identified within each modality and initial cross-modal matching of cells is performed. This stage consists of three major steps (Fig. 1a). In step 1, for each modality, we use all features to compute a fuzzy nearest-neighbor graph connecting all cells measured in that modality. This graph, by utilizing the information in all features, provides the best possible summary of the cell–cell similarity for the given modality. In particular, cells that are close in this graph should have comparable values for their linked features. Thus, in step 2, MaxFuse boosts the signal-to-noise ratio in the linked features within each modality by shrinking their values, for each cell, towards the cell’s graph-neighborhood average. We call this step ‘fuzzy smoothing’. In step 3, MaxFuse computes distances between all cross-modal cell pairs based on the smoothed, linked features and applies linear assignment32 on the cross-modal pairwise distances to obtain an initial matching of cells. The initial matching serves as the starting point for stage 2.

Stage 2 of MaxFuse improves cross-modal cell matching quality by iterating the sequence of joint embedding, fuzzy smoothing and linear assignment steps (Fig. 1b). Starting with the initial matches obtained in stage 1, in each iteration, MaxFuse first learns a linear joint embedding of cells across modalities by computing a canonical correlation based on all features of the cross-modal matched cell pairs. Then, coordinates of this joint embedding are treated as new linked features of each modality and fuzzy smoothing is applied on them based on the all-feature nearest-neighbor graphs computed in stage 1. Finally, MaxFuse updates the cell-matching across modalities by applying linear assignment on the pairwise distances of these fuzzy-smoothed joint embedding coordinates. The resulting matching is used to start the next iteration. Matching quality improves with each iteration until available information in all features, and not just the linked features, has been used.

In stage 3, MaxFuse processes the last cross-modal cell matching from stage 2 and produces final outputs. First, MaxFuse screens the matched pairs from the last iteration, retaining high-quality matches as pivots. The pivots are used in two complementary ways: (1) they are used one last time to compute a final joint embedding of all cells in both modalities; (2) for any unmatched cell in either modality, its closest neighbor within the same modality that belongs to a pivot is identified and, as long as its distance to this neighbor is below a threshold, the match in the pivot is propagated to the cell. Thus, the final output of MaxFuse has two components: (1) a list of matched pairs across modalities, and (2) a joint embedding of all cells in both modalities. See the Methods for more MaxFuse algorithm details.

Integration of transcriptome and targeted protein data

We benchmarked MaxFuse on a cellular indexing of transcriptomes and epitopes sequencing (CITE-seq) dataset33 that included measurements of 228 protein markers and whole transcriptome on peripheral blood mononuclear cells (PBMCs). For comparison, we also applied four state-of-the-art integration methods, Seurat (V3) (ref. 24), Liger22, Harmony20 and BindSC34, to this same dataset. Protein names were converted to RNA names manually to link the features between datasets. In each repetition of our experiment, we randomly subsampled 10,000 cells and applied all methods, and assessed using the benchmarking criteria to be described below. We performed five such repetitions and averaged the criteria across repetitions. For all integration methods, we masked the known cell–cell matching between the protein and RNA modalities, and then used the known matching for assessment.

Methods were assessed using six different criteria that measure both cell-type-level label transfer accuracy as well as cell-level matching accuracy. Two criteria were used to judge cell-type-level label transfer accuracy. Cells were annotated at two levels of granularity (from ref. 33): level 1, which differentiates between eight major cell types; and level 2, a finer classification which differentiates between 31 cell types. The proportions of matched pairs that shared the same label at both annotation levels were reported, with higher proportions indicating higher matching quality. Two criteria assessed the quality of cross-modal joint embedding of cells. A high-quality joint embedding should preserve biological signal, as reflected by the separation of known cell types, while mixing the two modalities as uniformly as possible. Usually, there is a trade-off between these two goals. To aggregate quality assessments of biological signal preservation and modality mixing, we calculated F 1 scores based on average silhouette width (slt_f1) and on adjusted Rand index (ari_f1), as proposed in ref. 35. For both criteria, higher F 1 indicates a better embedding. The fifth criterion, Fraction Of Samples Closer Than True Match (FOSCTTM)19,36,37, was used to quantify the quality of joint embedding at single-cell resolution. For each cell, we computed the fraction of cells in the other modality that is closer than its true match in the joint embedding space. FOSCTTM is the average of this fraction over all cells in both modalities. The lower the value of this score, the closer the true matches are in the joint embedding, and, hence, the better the joint embedding. The last criterion is Fraction Of Samples whose true matches are among their K-Nearest Neighbors (FOSKNN) in the joint embedding space. For any given k ≥ 1, the higher this proportion, the better the joint embedding. For precise definitions of these criteria, see the Methods.

Based on all these criteria, MaxFuse was superior by a sizable margin (Fig. 2a). Importantly, MaxFuse resulted in accurate cell matching across weakly linked modalities (for example, level 1 accuracy 93.9%, better by over 7% in absolute scale than the second best method (Extended Data Fig. 1)). The Uniform Manifold Approximation and Projection (UMAP) plots calculated based on the postintegration embedding from respective methods (Fig. 2b and Extended Data Fig. 1), colored by modality and by level 2 cell-type annotation, showed that MaxFuse achieved both better mixing of the two modalities (left panel) and better preservation of biological signals (right panel). For example, a clearly resolved trajectory of B cell subtypes (B naive, intermediate and memory cells) was apparent after MaxFuse integration but not after integration by other methods.

Fig. 2: Benchmarking of MaxFuse and other integration methods on ground-truth CITE-seq PBMC data. a, Matching and integration performance of MaxFuse and other methods on CITE-seq PBMC dataset with the full antibody panel (228 antibodies). The barplot and the line plot show mean value with the error bar or shadow area covering 95% CI on both sides, from n = 5 randomly subsampled cell batches. b, UMAP visualization of MaxFuse and Seurat (V3) integration results of CITE-seq PBMC dataset with the full panel, colored by modality (left) or cell type (right). c, Matching and integration performance of MaxFuse and other methods on CITE-seq PBMC dataset with reduced antibody panels (full 228 antibodies or the most informative 100, 50 or 30 antibodies.) For each method, the line indicates mean value with the shadow area covering 95% CI on both sides, from n = 5 randomly subsampled cell batches. d, UMAP visualization of MaxFuse and Seurat (V3) integration results of CITE-seq PBMC dataset with the 30 most informative of the original 228 antibodies, colored by modality (left) or cell type (right). 95% CI, 95% confidence interval; cDC, classical dendritic cells; CTL, cytotoxic T lymphocytes; gDT, gamma delta T cells; KNN, k-nearest neighbors; MAIT, mucosal-associated invariant T cells; NK, natural killer cells; pDC, plasmacytoid dendritic cells; TM, T memory cells; T reg , T regulatory cells. Full size image

It is common to have an antibody panel that is of substantially smaller size than 228, especially for spatial proteomic datasets. To benchmark the performance of MaxFuse against existing methods with smaller antibody panels, we ordered the proteins according to their importance for differentiating cell types (see the Methods for details). We repeated the matching and integration process with the top 100, 50 and 30 most important proteins used in the matching and integration process. With each panel size, we ran the experiment over five independent repetitions with 10,000 randomly subsampled cells, and averaged the cell-type annotation matching accuracy (level 1 and level 2), FOSCTTM and FOSKNN scores across repetitions (Fig. 2c). Regardless of panel size, MaxFuse consistently outperformed other methods. Additionally, MaxFuse successfully mitigated the effect of reduced panel size on integration quality: even when the antibody panel size was reduced to 30, MaxFuse had approximately 90% accuracy for level 1 annotation, whereas accuracy of the other methods ranged from around 15% to 75% (Extended Data Fig. 2). With a reduced panel of 30 antibodies, the integrated UMAP embedding38 produced by other methods blurred the distinction between cell types, whereas MaxFuse embedding still accurately captured the subtle structure of highly granular cell subtypes, such as the B cell subpopulations (Fig. 2d and Extended Data Fig. 2).

In addition, we evaluated the impact of tuning parameter choice on MaxFuse integration results using ground-truth CITE-seq PBMC data. The investigated tuning parameters include matrix singular value decomposition components used for different modalities, smoothing weights used during initialization and refinement, number of refinement iterations, dimension for final canonical correlation analysis (CCA) embedding, filtering percentages on pivot and on full matching, meta-cell size and nearest-neighbor graph neighborhood size. Benchmarking on both the full panel of 228 antibodies and a reduced panel of the 50 most informative antibodies revealed that MaxFuse performance was robust with respect to the investigated tuning parameters (Extended Data Figs. 3 and 4 and Supplementary Figs. 1 and 2). Furthermore, we assessed the performance of MaxFuse when certain cell subpopulations were absent from one modality. Benchmark tests considering three different missing cell subpopulations in protein modality showed that MaxFuse was robust with respect to mismatch of cell populations between the two modalities (Supplementary Table 5).

Benchmarking on multiple ground-truth multiome modalities

We further benchmarked MaxFuse on four additional single-cell multiome datasets. The first was a CITE-seq dataset of human bone marrow mononuclear cells that provides cell-matched measurements of the full transcriptome along with an antibody panel of size 25 (ref. 33). The second was an Ab-seq dataset, also of bone marrow mononuclear cells, with an antibody panel of size 97 and the whole transcriptome39. The third was an ATAC with select antigen profiling sequencing (ASAP-seq) PBMC dataset40 with 227 antibodies and the whole epigenome measured in ATAC fragments. The fourth was a transcription, epitopes, and accessibility sequencing (TEA-seq) PBMC dataset41 where we focused on the simultaneous measurements of 46 antibodies and the whole epigenome measured in ATAC fragments. Together, these datasets represent a diverse collection of measurement technologies over different modality pairs. We benchmarked the performance of MaxFuse against Seurat (V3), Liger, Harmony and BindSC on these datasets. For datasets with simultaneous RNA and protein features, we linked each protein to its coding gene. For datasets with simultaneous ATAC and protein measurements, we linked each protein to the gene activity score42 computed from the ATAC fragments mapping near its coding gene. The known cell–cell correspondences across modalities were masked in the integration stage for all methods, but used afterwards for evaluation.

We compared the performances of MaxFuse and the other four methods on these datasets based on cell-type annotation matching accuracy, FOSCTTM, FOSKNN (k set as 1/200 dataset size), Silhouette F1 score and Adjusted Random Index (ARI) F1 score. Overall, MaxFuse outperformed other methods, often by a sizable margin (Fig. 3a and Supplementary Figs. 3–6). UMAPs of MaxFuse cross-modal joint embeddings for each dataset are shown in Fig. 3b. Across the integration scenarios, MaxFuse mixed different modalities well in joint embeddings while retaining separation between cell types. Compared with UMAPs of joint embeddings produced by other methods, MaxFuse consistently achieves substantial improvements (Fig. 3b and Supplementary Figs. 3–6).

Fig. 3: Benchmarking of MaxFuse versus other integration methods across multiple ground-truth data types. a, Four different multiome datasets, generated by different technologies, were benchmarked. Cell-type matching accuracy, FOSCTTM, FOSKNN (with k = 0.5% total cell counts of each dataset), and ARI and Silhouette F1 were evaluated across all five methods. b, UMAP visualization of MaxFuse integration results for the four ground-truth multiome datasets, colored by modality (top panel) and cell type (bottom panel). BM, bone marrow; DC, dendritic cells; EMP, erythro-myeloid progenitors; mem, memory; prog, progenitor; trans, transitional. Full size image

We also considered integration of scRNA-seq and scATAC-seq data. This is a representative example of integrating strongly linked modalities for which multiple methods have demonstrated feasibility18,19,22. It has been shown in ref. 43 that, in terms of cell population structure, the information shared across RNA and ATAC is much higher than the information shared between RNA and protein for commonly used targeted protein panels. Thus, RNA and ATAC data have stronger linkage and should be easier to integrate. We benchmarked MaxFuse against state-of-the-art methods (Maestro44, scJoint45 and scGLUE19) that are specific for RNA–ATAC integration on four public multiome datasets that simultaneously measured the chromatin accessibility and transcriptome expression for each cell: cells from human PBMCs46, cells from embryonic mouse brain at day 18 postconception46, cells from developing human cerebral cortex47 and cells from human retina48 (Extended Data Fig. 5a). The integration quality criteria described in the previous subsection were used to assess all methods. MaxFuse achieved best or close-to-best integration performance among the tested methods, and was comparable to scGLUE (Extended Data Fig. 5c–f). However, MaxFuse is computationally much faster than scGLUE. For example, for the integration of a dataset of 20,000 cells, MaxFuse completed within 5 min on a MacBook Pro laptop with M1 Max CPU, while scGLUE took hours to complete the job on the same platform. Even with CUDA GPU acceleration, scGLUE still used around 30 min to finish on a computing platform with dual Intel i9-10980XE CPUs and dual NVIDIA Quadro RTX 8000 GPUs (Extended Data Fig. 5b).

MaxFuse enables information-rich spatial pattern discovery

MaxFuse is motivated by scenarios where the signal-to-noise ratio in the cross-modal linked features is low. Weak linkages are especially common in spatial-omic data types due to technical limitations. For example, high-resolution spatial proteomic methods such as CODEX, MIBI-TOF, IMC and CosMx SMI can profile, at subcellular resolution, a panel of 30–100 proteins10,11,12,13. Integration of such spatial proteomics datasets with single-cell transcriptomic and epigenomic datasets of the same tissue is often of interest, but is particularly challenging due to the small number of markers in the spatial dataset and the weak linkage between modalities which is caused by both biological and technical differences. To test MaxFuse on this type of cross-modal integration, we evaluated its performance on integrating a CODEX multiplex imaging dataset obtained using 46 markers49 with scRNA-seq data50 of human tonsils from two separate studies (Fig. 4a). MaxFuse produced an embedding that integrated the two modalities while preserving the cell population structure (Fig. 4b).

Fig. 4: MaxFuse enables information-rich spatial pattern discovery. a, Schematic of integration of CODEX data from Kennedy-Darling et al.49 (upper panel), with scRNA-seq data from King et al.50 (lower panel) obtained from human. b, UMAP visualization of MaxFuse integration of tonsil CODEX and scRNA-seq data, colored by modality (upper panel) and cell type (lower panel). c, Metrics (cell-type matching accuracy, Silhouette F1 score and ARI F1 score) evaluating performance for MaxFuse and other methods. Five batches of CODEX and scRNA-seq cells (10,000 scRNA-seq cells and 30,000 CODEX cells in each batch) were randomly sampled and used for benchmarking for all methods. The barplot of cell-type matching accuracy shows mean value with 95% CI for each method, with raw values from five random samples plotted as dots. d, Illustration of cell layers extending inwards/outwards from the GC boundary. Each layer consisted of 30 pixels (~11 μm). A total of ten layers extending in each direction were examined. e, Average messenger RNA counts (linked by MaxFuse) across cells in each layer plotted versus the position of the layer in reference to the GC boundary (inward on the left of boundary, outward on the right). Expected expression profiles relative to the GC boundary are shown to the right of each group of three transcripts. Each line indicates mean value with the shadow area covering 95% CI for the mean at each position. Except for CD3 and CD4, none of the other seven reported transcripts had its corresponding protein measured in the CODEX panel. f, Benchmarking of MaxFuse and other methods for cell-type annotation on human tonsil CODEX data49,50. Automated annotations were compared with human-expert annotations of human tonsil CODEX data. Left, MaxFuse cell-type annotation of CODEX cells by label transfer of matched human tonsil scRNA-seq cells. Middle, CELESTA57 cell-type annotation by using CODEX protein expression levels and previous knowledge on marker expression and cell population information. Right, Astir58 cell-type annotation by using CODEX protein expression levels and previous knowledge on marker expression and cell population information. Acc, accuracy; DC, dendritic cells. Full size image

Based on the predescribed benchmarking metrics, MaxFuse is the only method capable of integrating spatial proteomic and scRNA-seq data. Seurat (V3), Liger, BindSC and Harmony failed to produce an embedding that integrates the two modalities while preserving the cell population structure (Fig. 4b and Extended Data Fig. 6). Evaluation results based on cell-type matching accuracy are consistent with evaluation results based on the joint embedding. At the level of the six major cell types presented in the tissue, MaxFuse achieved high label transfer accuracy (93.3%), while the other methods failed to preserve cell-type distinctions (40–60%; Fig. 4b and Extended Data Fig. 6).

To assess whether MaxFuse preserves subtle spatial variations within a cell type that are captured by CODEX, we manually delineated the boundaries of each individual germinal center (GC) from the CODEX tonsil images based on CD19, CD21 and Ki67 protein expression patterns. We then extended outward or inward from these boundaries, with each step covering roughly one layer of cells (one step = 30 pixels erosion/dilation) (Fig. 4c). For each layer of cells, we calculated the average counts of specific genes, based on the scRNA-seq cells matched to CODEX cells in that layer. We then asked if known position-specific gene expression patterns relative to the GC boundary are recovered in the integrated scRNA-seq data. Indeed, MaxFuse was able to reconstruct the spatial pattern of the GC from disassociated transcriptomic data (Fig. 4d,e): for GC-specific transcripts BCL6, AICDA and FOXP1 (refs. 51,52,53) which relate to GC functionality, we observed high expression within the boundary and a sharp drop in expression after passing the boundary layer; for transcripts related to B cell memory, CCR6, BANK1 and FCER2 (refs. 53,54,55), which should be enriched in B cells exiting from the GC, we indeed saw a gradual increase outside of the GC and then a quick decrease as the layer fully expanded into the T cell region; and finally for T cell-related transcripts, for example CD4, GATA3 and CD3 (ref. 56), we indeed saw a rapid increase outside of the GC boundary but no expression within. In comparison, the integration produced by other methods did not accurately reconstruct the GC spatial pattern (Supplementary Fig. 7). Except for CD3 and CD4, none of the other seven transcripts had its corresponding protein measured in the CODEX panel. We also followed with experimental validation via RNAscope, where we observed consistent spatial patterns of AICDA and CCR6 in human tonsil, as predicted by MaxFuse integration (Extended Data Fig. 7).

Furthermore, MaxFuse can be utilized for automated cell-type annotation of CODEX cells, given that the scRNA-seq data to be matched are annotated. We evaluated the automated annotations on all CODEX cells produced by MaxFuse, comparing them with those generated by two cutting-edge CODEX cell-type annotation methods, CELESTA57 and Astir58. This comparison was benchmarked against annotations made by human experts. MaxFuse achieved an annotation accuracy of nearly 90%, substantially improving upon these two methods for direct annotation of CODEX data, which had accuracy within the 70–75% range (Fig. 4f).

Tri-modal atlas-level integration with MaxFuse

In the consortium-level effort to generate a comprehensive atlas across different regions of the human intestine, colon and small bowel tissues from healthy human donors were collected and systematically profiled by CODEX, snRNA-seq and snATAC-seq31. We applied MaxFuse to the integration of these three datasets obtained from analyses of colon (Fig. 5a), with the goal of constructing high-resolution spatial maps of full transcriptome RNA expression and transcription factor binding accessibility. We first conducted pairwise alignment of cells between protein (CODEX) and RNA (snRNA-seq), and cells between RNA (snRNA-seq) and ATAC (snATAC-seq), as previously described. The two sets of bimodal cell-pairing pivots were then ‘chained’ together, with the pivot cells in the RNA modality serving as the intermediary. This ‘chaining’ created a set of pivots linking all three modalities: protein, RNA and ATAC. Subsequently, we used these pivots to calculate a tri-omic embedding via generalized CCA (gCCA)2159. This allowed calculation of a joint embedding of the three modalities (Fig. 5b). The MaxFuse integration preserved distinctions between major cell types, and modalities were mixed within each cell type. See Supplementary Fig. 8 for a comparison between using RNA and using ATAC as the baseline (intermediary) modality. Additionally, the design of batching in MaxFuse allowed the integration of atlas-level datasets with limited time and space resources (Extended Data Fig. 8).

Fig. 5: MaxFuse enables tri-modal integration with HUBMAP data. a, Overview of CODEX, snRNA-seq and snATAC-seq single-cell human intestine data from the HUBMAP consortium (left). Representative cell-type locations based on CODEX data (right). Colon and small bowel data were integrated by MaxFuse, respectively, and this figure shows part of the colon data (CODEX data from one donor; snRNA-seq and snATAC-seq data from four donors). b, UMAP visualization of the tri-modal integration embedding produced by MaxFuse, colored by modality: protein, RNA and ATAC (left panel) and colored by cell type (right panel). c, Upper row, UMAP visualization of CODEX cells based on the integration embedding, overlaid with CD163 protein expression (from CODEX cells themselves, left panel), CD163 mRNA expression (from matched snRNA-seq cells, middle panel) and CD163 gene activity score (from matched snATAC-seq cells, right panel). Lower row, spatial locations of CODEX cells based on x–y positions of centroids, overlaid with the same expression features as in the corresponding panels of the upper row. d, Spatial locations of CODEX cells based on x–y positions of centroids, overlaid with the transcription factor motif enrichment scores (Z-scores, calculated by chromVAR60), based on their matched snATAC-seq cells. TF, transcription factor. Full size image

Effectively, the MaxFuse integration produced a joint profile of protein abundance, RNA expression and chromatin accessibility at single-cell spatial resolution on the same tissue section. To confirm the validity of this tri-modal integration, we inspected whether CODEX’s protein abundance aligned spatially with the expression and chromatin activity of the protein-coding gene, the spatial measurements of the latter two modalities imputed based on the MaxFuse integration. In one example, the protein expression, RNA expression and gene activity of CD163 were, as expected for this macrophage marker, uniquely enriched in the macrophage cell cluster (Fig. 5c, top row). Furthermore, protein, RNA and ATAC activities of this gene all localized to the same spatial positions on the tissue section (Fig. 5c, bottom row). See Extended Data Fig. 9 for additional examples.

With the integration of the snATAC-seq and CODEX data, we were able to map the spatial enrichment of transcription factor binding site accessibility. For each transcription factor, we first computed a motif enrichment score for each cell in the snATAC-seq data using chromVAR60, and then the scores were transferred to the CODEX spatial positions based on the MaxFuse integration. Figure 5d shows such spatial profiles for three transcription factors. Binding motifs of IRF4, a key regulator in immune cell differentiation61, had increased accessibility in the immune-enriched compartments of the mucosa and submucosa layers31. Binding motifs of KLF4, known to be required for the terminal differentiation of goblet cells62, had heightened accessibility in the colonic crypts of the mucosa layer where goblet cells mature. Finally, binding motifs of SRF, a master regulator of smooth muscle gene expression63, had heightened accessibility in neighborhoods that are enriched for smooth muscle cells. In addition, we performed the same analysis on the HUBMAP data collected on small bowel and MaxFuse showed consistent results (Extended Data Fig. 10).

Additional benchmarking of MaxFuse

We further compared the integration quality within MaxFuse results, across different smoothing schemes (Supplementary Fig. 9), and between pivot and nonpivot cells (Supplementary Fig. 10 and Supplementary Table 1). We validated the improved gene imputation accuracy by MaxFuse-enabled matching in a ground-truth multiome dataset, using targeted proteomic features to predict transcript expression at single-cell level (Supplementary Fig. 11). One important potential application of MaxFuse is imputing unmeasured features (for example, transcripts) in spatial proteomic datasets. We benchmarked the effect on integration quality of sequentially reduced antibody panel sizes (Supplementary Fig. 12) and the area-level gene imputation correlation by artificially dropping protein features in CODEX data (Supplementary Fig. 13).