Single-cell spatial multi-omics and deep learning dissect enhancer-driven gene regulatory networks in liver zonation

In the mammalian liver, hepatocytes exhibit diverse metabolic and functional profiles based on their location within the liver lobule. However, it is unclear whether this spatial variation, called zonation, is governed by a well-defined gene regulatory code. Here, using a combination of single-cell multiomics, spatial omics, massively parallel reporter assays and deep learning, we mapped enhancer-gene regulatory networks across mouse liver cell types. We found that zonation affects gene expression and chromatin accessibility in hepatocytes, among other cell types. These states are driven by the repressors TCF7L1 and TBX3, alongside other core hepatocyte transcription factors, such as HNF4A, CEBPA, FOXA1 and ONECUT1. To examine the architecture of the enhancers driving these cell states, we trained a hierarchical deep learning model called DeepLiver. Our study provides a multimodal understanding of the regulatory code underlying hepatocyte identity and their zonation state that can be used to engineer enhancers with specific activity levels and zonation patterns.


Batch effects in hepatocytes
We identified an interesting batch effect in hepatocytes, related to differences in physiological state between the mice (Supplementary Fig. 1), including circadian rhythm, nutritional status, and hormone levels.For instance, Gene Ontology analysis with GREAT links two complementary topics (Topic 17 and Topic 75) to positive and negative regulation of the circadian rhythm (adjusted p-value = 10 -19 and 10 -7 , respectively); while four topics show enrichment for different GO terms related to hormone response and metabolism (Supplementary Fig. 2).In addition, motif enrichment analysis revealed the presence of binding sites of the circadian rhythm transcription factor CLOCK in topic 75, with a Normalized Enrichment Score (NES) of 4.25.Importantly, topic 17 is specifically enriched in hepatocytes from starved mice, while topic 75 is enriched in hepatocytes from mice fed ad libidum.These analyses suggest that regions specifically accessible in the two groups of samples are controlling the circadian rhythm genes.Using publicly available scRNA-seq data on the mouse liver during different phases of the circadian rythm 1 , we confirmed that the animals that were fed ad libitum mapped to early zeitgeber timepoints, while samples for which food was removed the night before mapped to late zeitgeber timepoints (Supplementary Fig. 3, Supplementary Fig. 4, see Methods).
As starved samples were profiled by single cell multiomics and unstarved samples were profiled by snRNA-seq, we performed two additional single cell multiomics runs in a mouse fed ad libidum.Analysis with the new data again revealed two topics specific to starved (topic 72) and unstarved (topic 68) mice, with enrichment for positive and negative regulation of the circadian rhythm (adjusted p-value = 10 -4 and 10 -9 , respectively) based on GO analysis with GREAT; and motif enrichment analysis revealed the presence of CLOCK binding sites in the unstarved mice topic (topic 68).
The new multiomics samples fed ad libidum also mapped to early zeitgeber points (Supplementary Fig. 5).Altogether, the circadian rhythm regulatory topics and the differential expression of circadian rhythm genes observed are driven by nutritional status, rather than experimental set-up, which suggests that the observed effect is of biological nature.
Next, we examined signaling pathways that have been reported to be affected by liver zonation, including RAS and WNT signaling, hypoxia and pituitary hormone responses [2][3][4][5][6] .Our analyses revealed that all these pathways change along the portocentral axis (adjusted p-value < 0.05).Particularly, hypoxia and WNT target genes are upregulated pericentrally and repressed periportally; RAS target genes are upregulated periportally and repressed pericentrally, and pituitary hormone targets are repressed pericentrally.However, hypoxia, pituitary hormones and RAS signaling target gene expression is highly variable between individuals and circadian rhythm phases (adjusted p-value < 0.05); while WNT signaling targets are consistently expressed across all samples (adjusted p-value = 0.17, Supplementary Fig. 3, Supplementary Fig. 4).

Impact of sex dimorphism
To assess whether the core hepatocyte eGRN was affected by sex dimorphism, we analyzed additional snRNA-seq data from wild-type male and female livers 7 .After quality control (see Methods), 4,860 liver cells from 3 different male mice and 5,342 cells from 3 different female mice were retained (Supplementary Fig. 6a-b).
Importantly, neither the expression of the core hepatocyte TFs nor their targets (measured as the AUC enrichment of the SCENIC+ eRegulons) were affected by gender (Supplementary Fig. 6c-d); in other words, the expression of Tbx3, Tcf7l1, Hnf4a, Hnf1a, Cebpa, Foxa1, Onecut1 and their targets is comparable between male and female mice.
We performed differential gene expression based on gender for all the cell types in the liver and identified 56 and 65 genes upregulated in male and female hepatocytes, respectively (LogFC > 0.75, adjusted p-value < 0.05, Supplementary Fig. 6e).Only 1 TF was upregulated in male hepatocytes, Bcl6, while 4 TFs were upregulated in female hepatocytes, namely Rfx4, Cux2, Esr1 and Esrrg.Except for Rfx4, these TFs have been previously reported to regulate sex differences in the mouse liver [8][9][10] .Gene Ontology analysis of these differentially expressed genes points to metabolic differences between male and female hepatocytes, with genes related to lipid metabolism upregulated in females (adjusted p-value: 10 -7 , Supplementary Fig. 6f-g).Importantly, none of these genes have been found as TCF7L1 or TBX3 targets.
Altogether, this data suggests that, while there are differentially regulated genes in male and females, the core hepatocyte eGRN is not affected by sexual dimorphism.
Nevertheless, additional single-cell multiomics data in female livers and/or disease models may be needed in the future to study sex-biased gene regulation in wild-type mice and upon disease at the enhancer-GRNs resolution.

Comparison of in vitro models to test hepatocyte specific enhancer regions
In vitro models can be used to perform MPRA experiments and more sensitive enhancer assays that are not feasible in vivo (i.e.luciferase assays).We compared three different hepatocyte cell lines, namely HepG2 (human hepatocellular carcinoma), Hepa1-6 (murine hepatoma), and AML12 (hepatocytes from a mouse transgenic for human TGF).To assess their relevance as model system, we re-used RNA-seq and performed ATAC-seq on the two mouse hepatocyte cell lines (see Methods).At the transcriptome level, HepG2 expresses several hepatocyte master regulators from hepatocytes, such as Hnf4a, Cebpa, Hnf1a, Tcf7l2, Foxa1, and Onecut1, among others.On the other hand, AML12 shows reduced expression of Cebpa, Foxa1, Hnf1a, and Hnf4a compared to HepG2; and Hepa1-6, of Cebpa and Foxa1 (Supplementary Fig. 7a).We also compared the accessibility of the 12,000 enhancers library on the different cell lines.This library included shared regions (accessible in hepatocytes and other cell types, of which 56% are promoters) and hepatocyte-specific regions (generally accessible and zonated).We observed that shared regions were largely accessible across the 3 cell lines, while HepG2 showed more accessibility in hepatocyte specific regions compared to the other cell lines (Supplementary Fig. 8a).To further assess differences at the enhancer activity level, we performed an MPRA experiment in AML12.As expected from the chromatin accessibility profiles, only positive controls and shared regions were significantly active compared to the negative control (Supplementary Fig. 7b-d).Altogether, HepG2 is more similar to in vivo hepatocytes in terms of TF repertoire and functionality of the enhancer library than AML12 and Hepa1-6.
Due to a mutation in beta-catenin, WNT signaling is active in HepG2 11 .It has also been reported in literature that HepG2 exhibits a more pericentral identity 12 .HepG2 expresses Tbx3 and other pericentral markers (including Tbx3), while periportal genes are not expressed or very lowly expressed, as observed in pericentral mouse hepatocytes (Supplementary Fig. 7a).Nevertheless, despite WNT activation, HepG2 does not fully recapitulate the pericentral hepatocyte identity.For instance, out of the 3,939 pericentrally (and pericentral-intermediate) zonated regions found in the mouse liver and conserved in the human genome, only 1,258 are accessible in HepG2 (Supplementary Fig. 8b).

oo--
/scope.aertslab.org/#/Bravo_et_al_Liver.SCope is a fast, user-friendly visualization tool for large-scale single cell data sets18 .The following files are available at SCope: -ATAC o Mouse ▪ Liver-cell_gene-all-2: Loom file containing data processed with pycisTopic of the 2 snATAC-seq and 2 multiome (snATAC-seq layer) samples.The data set contains 21,393 cells and gene activities as values.In the metadata, 'sample_id' indicates the sample of origin, and 'Refined_cell_type' indicates the cell type annotation.Under coordinates, the uncorrected and batch corrected UMAPs are labelled as 'probability_UMAP' and 'harmony_probability_UMAP', respectively.▪ Cell_region-all_v2: Loom file containing data processed with pycisTopic of the 2 snATAC-seq and 2 multiome (snATAC-seq layer) samples.The data set contains 21,393 cells, region accessibility probabilities as values, and topics as regulons.In the metadata, 'sample_id' indicates the sample of origin, and 'Refined_cell_type' indicates the cell type annotation.Under coordinates, the uncorrected and batch corrected UMAPs are labelled as 'probability_UMAP' and 'harmony_probability_UMAP', respectively.o Human ▪ pycisTopic_gene_activity_wsignatures_v2: Loom file containing human liver data from Zhang et al. (2021) 19 processed with pycisTopic.The data set contains 6,366 cells, gene activities as values, and the SCENIC+ gene-based eRegulons inferred in the mouse liver (and converted to human symbols) as regulons.In the metadata, 'cell_type_refined' indicates the cell type annotation.▪ pycisTopic_region_accessibility-1: Loom file containing human liver data from Zhang et al. (2021) processed with pycisTopic.The data set contains 6,366 cells, region accessibility probabilities as values, and topics as regulons.In the metadata, 'cell_type_refined' indicates the cell type annotation.-RNA o RNA+Multiome_integrated_HQ-10-1: Loom file containing data processed with VSN-pipelines of the 4 snRNA-seq and 2 multiome (snRNA-seq layer) samples.The data set contains 29,798 cells and gene expression as values.In the metadata, 'sample_id' indicates the sample of origin, and 'Refined_cell_type' indicates the cell type annotation.Under coordinates, the uncorrected and batch corrected UMAPs are labelled as 'Uncorrected UMAP' and 'HVG UMAP', respectively.Sex_dimorphism: Loom file containing data from male and female livers from Goldfarb et al. (2022) processed with VSN-pipelines.The data set contains 10,202 cells, gene expression as values, and the SCENIC+ gene-based eRegulons inferred in the mouse liver as regulons.In the metadata, 'sample_id' indicates the sample of origin and 'cluster_ann' indicate the cell type annotation.-Spatial o Reference_Free_ScoMAP_ATAC_MO: Loom file containing mouse liver spatial data (Resolve), in which snATAC-seq data has been projected using SCoMAP.The data set contains 14,296 cells, mapped region accessibility probabilities as values a mapped topics as regulons.In the metadata, 'Cell type' indicates the cell type annotation.o Reference_Free_ScoMAP_RNA_MO: Loom file containing mouse liver spatial data (Resolve), in which snRNA-seq data has been projected using SCoMAP.The data set contains 14,296 cells and mapped gene expression as values.In the metadata, 'Cell type' indicates the cell type annotation.Resolve_Liver: Loom file containing mouse liver spatial data (Resolve), with the measured gene expression values (for 100 genes).The data set contains 14,296 cells and gene expression as values.In the metadata, 'sample_id' indicates the sample of origin and 'cell_type' indicates the cell type annotation.o SCENIC+_gene_based_ScoMAP_RNA_MO: Loom file containing SCoMAP liver lobule template, in which snRNA-seq data has been projected.The data set contains 4,498 cells, gene expression as values, and SCENIC+ gene-based eRegulons inferred in the mouse liver as regulons.In the metadata, 'Cell type' indicates the cell type annotation.o ScoMAP_ATAC_MO: Loom file containing SCoMAP liver lobule template, in which snATAC-seq data has been projected.The data set contains 4,498 cells, region accessibility probabilities as values, and topics as regulons.In the metadata, 'Cell type' indicates the cell type annotation.-SCENIC+ o SCENIC+_curated_gene_based-3: Loom file containing data processed with VSN-pipelines of the 4 snRNA-seq and 2 multiome (snRNA-seq layer) samples.The data set contains 29,798 cells, gene expression as values, and the SCENIC+ gene-based eRegulons inferred in the mouse liver as regulons.In the metadata, 'sample_id' indicates the sample of origin, and 'Refined_cell_type' indicates the cell type annotation.Under coordinates, the uncorrected and batch corrected UMAPs are labelled as 'Uncorrected UMAP' and 'HVG UMAP', respectively.o SCENIC+_curated_region_based: Loom file containing data processed with pycisTopic of the 2 snATAC-seq and 2 multiome (snATAC-seq layer) samples.The data set contains 21,393 cells, region accessibility probabilities as values, and the SCENIC+ region-based eRegulons inferred in the mouse liver as regulons.In the metadata, 'sample_id' indicates the sample of origin, and 'Refined_cell_type' indicates the cell type annotation.Under coordinates, the uncorrected and batch corrected UMAPs are labelled as 'probability_UMAP' and 'harmony_probability_UMAP', respectively./genome.ucsc.edu/s/cbravo/Bravo_et_al_Liver.The following files are available at the UCSC genome browser session: Custom tracks: Bed files containing mouse liver accessible regions identified from snATAC-seq and multiome data (486,888 regions, 'Consensus peaks'), shared hepatocyte regions (14,005 regions, 'Shared hepatocyte regions'), periportal shared regions (1,324 regions, 'Periportal shared'), intermediate shared regions (3,511 regions, 'Intermediate shared'), pericentral shared regions (1,495 regions, 'Pericentral shared') and general shared regions (7,675 regions, 'General shared') and mouse liver regions included in the CHEQ-seq library (10,845 regions).Bravo et al.Liver hub o Gene Expression: Bar charts showing gene expression levels per cell type.o ATAC-profile Celltype-ATAC: Pseudobulk bigwig files based on the ATAC-seq based cell type annotation.o ATAC-profile Celltype-RNA: Pseudobulk bigwig files based on the RNAseq based cell type annotation.o ATAC-profile Celltype-ATACpersample: Pseudobulk bigwig files based on the ATAC-seq based cell type annotation and sample of origin.o ATAC-profile Non-parenchymal-zonation: Pseudobulk bigwig files based on zonation subtypes for LSEC and HSC o SCENIC+ eRegulons: Bigbed file containing regions in each SCENIC+ regulon o SCENIC+ eRegulons (selected): Bigbed file containing regions in selected hepatocyte SCENIC+ regulons o SCENIC+ R2G: Region-to-gene links inferred by SCENIC+ o SCENIC+ R2G (eGRN): Region-to-gene links kept in the eRegulons inferred by SCENIC+ o Mouse Liver ChIP-seq: Bigwig files of mouse liver TF ChIP-seq profiles (TBX3, HNF4A, ONECUT1, CEBPA and FOXA1) o FACS ATAC-seq: Bigwig profiles of the FAC-sorted hepatocyte fractions