Evaluating the role of the nuclear microenvironment in gene function by population-based modeling

Yildirim, Asli; Hua, Nan; Boninsegna, Lorenzo; Zhan, Yuxiang; Polles, Guido; Gong, Ke; Hao, Shengli; Li, Wenyuan; Zhou, Xianghong Jasmine; Alber, Frank

doi:10.1038/s41594-023-01036-1

Download PDF

Article
Open access
Published: 14 August 2023

Evaluating the role of the nuclear microenvironment in gene function by population-based modeling

Nature Structural & Molecular Biology volume 30, pages 1193–1206 (2023)Cite this article

2579 Accesses
1 Citations
12 Altmetric
Metrics details

Subjects

Abstract

The nuclear folding of chromosomes relative to nuclear bodies is an integral part of gene function. Here, we demonstrate that population-based modeling—from ensemble Hi-C data—provides a detailed description of the nuclear microenvironment of genes and its role in gene function. We define the microenvironment by the subnuclear positions of genomic regions with respect to nuclear bodies, local chromatin compaction, and preferences in chromatin compartmentalization. These structural descriptors are determined in single-cell models, thereby revealing the structural variability between cells. We demonstrate that the microenvironment of a genomic region is linked to its functional potential in gene transcription, replication, and chromatin compartmentalization. Some chromatin regions feature a strong preference for a single microenvironment, due to association with specific nuclear bodies in most cells. Other chromatin shows high structural variability, which is a strong indicator of functional heterogeneity. Moreover, we identify specialized nuclear microenvironments, which distinguish chromatin in different functional states and reveal a key role of nuclear speckles in chromosome organization. We demonstrate that our method produces highly predictive three-dimensional genome structures, which accurately reproduce data from a variety of orthogonal experiments, thus considerably expanding the range of Hi-C data analysis.

A maximum-entropy model to predict 3D structural ensembles of chromatin from pairwise distances with applications to interphase chromosomes and structural variants

Article Open access 01 March 2023

Chromatin alternates between A and B compartments at kilobase scale for subgenic organization

Article Open access 06 June 2023

Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations

Article Open access 11 July 2022

Main

The spatial organization of eukaryotic genomes is linked to regulation of gene transcription, DNA replication, cell differentiation, and, upon malfunction, to cancer and other diseases^1,2. Recent advances have led to a prolific development of improved technologies in live-cell and super-resolution microscopy^{3,4,5,6,7,8,9,10}, as well as mapping technologies based on high-throughput sequencing^{11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}, for probing chromosome interactions and three-dimensional (3D) organization^27,28,29,30. However, mapping the 3D nuclear locations of all genes simultaneously in single cells remains a major challenge. Several experimental technologies probe the mean distances (tyramide signal amplification sequencing (TSA-seq)¹³) or association frequencies (nucleolus-associated domain sequencing (NAD-seq)³¹; DNA adenine methyltransferase identification (DamID)¹⁶) of genes with nuclear speckles, lamina-associated domains (LADs), and nucleoli. However, these methods do not have the technical capacity to collect all this information simultaneously within the same cell, and the considerable cell-to-cell variability of chromosomal structures adds additional layers of complexity. Several multiplex fluorescence in situ hybridization (FISH) and super-resolution microscopy techniques have recently provided such information^5,6,7. For instance, DNA- and RNA-multiplexed error-robust FISH (MERFISH) imaging has detected, within the same cells, the nuclear locations of 1,137 genes, together with the positions of nuclear speckles and nucleoli, as well as the amount of mRNA transcripts⁶. However, at this point, the amount of probed genomic DNA regions is still sparse, representing ~1% of entire genomes.

Here, we introduce an approach for modeling a population of single-cell 3D genome structures to describe the nuclear microenvironment of all genomic regions in single-cell models, defined by their nuclear locations relative to nuclear landmarks and nuclear compartments. Our aim is to evaluate the roles of the nuclear microenvironment and its cell-to-cell variability in chromatin function and identify characteristic nuclear microenvironments that distinguish chromatin in different functional states.

We achieve this goal by using a population-based genome structure modeling approach, which takes Hi-C data to generate a population of diploid genome structures statistically consistent with it^32,33,34. We demonstrate that our method produces—from Hi-C data alone—highly predictive genome structures, which predict with high correlation the cytological distances of genomic regions to nuclear speckles and lamina from SON TSA-seq¹³ and lamin-B1 TSA-seq¹³ experiments, contact probabilities to the nuclear lamina from lamin-B1 protein A-DamID (pA-DamID)³⁵ experiments, mean radial positions from genomic loci positioning by sequencing (GPSeq)³⁶ experiments, and distance distributions and single-cell chromosome tracing data from 3D FISH¹⁸ and DNA-MERFISH⁶ experiments, respectively. We define the nuclear microenvironment of a genomic region by an array of structural descriptors, including its nuclear radial position; association frequencies with and mean distances to nuclear speckles, the lamina, and nucleoli; the local chromatin fiber compaction; and local compartmentalization in form of the trans A/B ratio, defined as the fraction of its inter-chromosomal interactions with chromatin in the A (active) or B (inactive) compartment⁶ (Fig. 1a,b). These structural descriptors are determined in single-cell models, thereby revealing the cell-to-cell variability of the nuclear microenvironment for a genomic region across the population of models.

**Fig. 1: Microenvironment and structural features of genomic regions.**

Our genome structure analysis provides several key findings. First, genomic regions with a strong preference for the same specific microenvironment across cells, thus having low structural cell-to-cell variability, are also most homogeneous in their functional properties. These chromatins are associated in most cells with either nuclear speckles or constitutive LADs and act as structural anchor points to genome organization. Second, our analysis shows that the subnuclear microenvironment of a genomic region reflects its transcriptional potential upon activation. Genes with high expression heterogeneity³⁷ often show increased structural variability in the nucleus, indicating a contribution of extrinsic noise to gene expression heterogeneity³⁸. Third, our observations confirm that Hi-C subcompartments³⁹ define physically distinct chromatin environments, some of which (like A1) are linked to associations with nuclear speckles.

Although other computational approaches have modeled entire chromosomes, or even diploid genomes, from Hi-C data^{18,34,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56}, none so far has documented the predictive accuracy in reproducing multimodal experimental data, as presented here. Our findings demonstrate that our approach, from Hi-C data alone, produces predictive models that provide a detailed description of the subnuclear locations, folding, and compartmentalization of chromatin in diploid genomes. Therefore, our approach considerably expands the scope of Hi-C data analysis and is widely applicable to any cell type for which Hi-C data are available.

Results

Assessment of 3D genome structures

Here, we study 3D structures of diploid lymphoblastoid genomes (GM12878) from in situ Hi-C data³⁹ at 200-kb (kilobase) resolution. Our method generates a population of 10,000 genome structures, in which all accumulated chromatin contacts are statistically consistent with contact probabilities from Hi-C experiments^33,34,32. Structure optimization is achieved by solving a maximum likelihood estimation problem in an iterative fashion^33,34,48,32 (Methods). The resulting genome structure population accurately reproduces experimental Hi-C contact probabilities (Pearson’s r = 0.98, genome-wide; 0.99 and 0.83 for cis and trans contacts, P = ~0, average chromosome SCC⁵⁷ = 0.87; Extended Data Fig. 1a–c and Supplementary Information).

Our method is robust against missing data, as models generated from sparse Hi-C data (50% entries randomly removed) accurately predict the missing Hi-C contact frequencies (Pearson’s r = 0.93 (cis) and 0.69 (trans) of missing data, P = ~0; Extended Data Fig. 1d,e and Methods). Moreover, our models accurately predict, with significant correlation to their experimental values, a host of orthogonal data from lamin-B1 pA-DamID³⁵, lamin-B1 TSA-seq¹³, SON TSA-seq¹³, and genomic loci positioning by sequencing (GPSeq)³⁶ experiments (Pearson’s r = 0.80, 0.78, 0.87, and 0.80, respectively; Table 1 and Methods), which we will discuss in greater detail throughout this paper. Our models also confirm preferences for interior radial positions of chromatin replicated in the earliest G1b phase (P = 2.39 × 10⁻⁷⁷, Mann–Whitney–Wilcoxon test, two-sided) and predict a gradual increase in average radial positions for chromatin replicated at later times⁵⁸ (Extended Data Fig. 1f). Our results also agree with those of 3D FISH experiments¹⁸, namely co-location frequencies of four inter-chromosomal pairs of loci (Pearson’s r = 0.99, P = 0.014; Extended Data Fig. 1g) and distance distributions between three loci on chromosome 6 and relative differences in radial positions of these loci (Extended Data Fig. 1h). We also assessed our single-cell chromosome structures with data from multiplex DNA-MERFISH⁶ imaging and single cell Dip-C²⁵ experiments, and found good agreement between the single-cell chromosome conformations in our models and those from DNA-MERFISH (Extended Data Fig. 2a–c) and Dip-C experiments (Extended Data Fig. 2d and Methods). All results were reproduced using technical replicates (Methods and Supplementary Information).

Table 1 Genome-wide Pearson and Spearman correlations between experimental and predicted omics and imaging data

Full size table

We now characterize the nuclear microenvironment of genomic regions by calculating a variety of structural descriptors for each genomic region in each single-cell model (Fig. 1a,b). Our aim is to identify characteristic nuclear microenvironments distinguishing chromatin of different functional states and to evaluate the roles of the nuclear topography and its cell-to-cell variability in regulating transcription and replication.

Average nuclear position and its cell-to-cell heterogeneity

The nuclear positions of genes are of functional relevance: FISH experiments revealed for some genes, upon transcriptional activation, a statistical shift of their locations towards the nuclear center^59,60. Owing to the stochastic nature of genome structures, the radial nuclear position of a locus can vary between individual cells (Fig. 2a). However, the average radial position over all the models in the population reveals distinct preferences, which vary between different genomic loci (Fig. 2b, upper panel). The minima in the average radial profiles of chromosomes overlap with regions of lowest lamin-B1 DamID signals which have the lowest probabilities to interact with the nuclear envelope⁶¹ (Extended Data Fig. 3a,b). Our predictions also reproduce average radial locations, inferred from DNA digestion timing, detected in GPSeq experiments³⁶ (Pearson’s r = 0.80, P = ~0; Extended Data Fig. 3c,d).

**Fig. 2: Radial chromatin positions and their cell-to-cell variability.**

Notably, sequence positions that coincide with large transitions in the average radial profile often overlap with borders between the five primary Hi-C subcompartments identified by Rao et al.³⁹ (Fig. 2b, top, and Methods) (that is, two transcriptionally active (A1, A2) and three inactive subcompartments (B1, B2, B3)). Chromatin in different subcompartments displays distinct distributions of average radial positions (Fig. 2c) and radial shell occupancy (Extended Data Fig. 3e), confirming previous observations^25,36. For example, both A1 and B1 chromatins preferentially occupy the most interior radial shells of the nucleus, whereas B3 chromatin (mostly associated with LADs) shows a preferential location at the periphery and A2 chromatin shows a wide range of average locations without a marked radial preference (Fig. 2c and Extended Data Fig. 3e).

Structural variability correlates with functional properties

We also calculated the cell-to-cell variability for gene locations (δ_RAD), to quantify stochastic variations of radial positions between cells (Methods). δ_RAD differs distinctly between genomic loci (Fig. 2b, bottom). Sections of chromatin with high structural variability (δ_RAD > 0) alternate, in sharp transition, with regions of low variability (δ_RAD < 0)—transitions between high and low variability occur over relatively small sequence distances (Fig. 2b, bottom). These transitions align well with borders between subcompartments, most prominently between the A2 and B3 subcompartments (Fig. 2b, bottom). Continuous sections with similar δ_RAD values are often part of the same subcompartment.

We noticed that the structural variability of a genomic region is a strong indicator of its functional properties, for both the active A and inactive B compartment. Chromatin in the A compartment with low structural variability (δ_RAD < 0) (A-LV) (Fig. 2d) is enriched for high SON TSA-seq¹³ signals and low signals from lamin-B1 pA-DamID³⁵ experiments; thus, A-LV regions have relatively short mean distances to nuclear speckles and are excluded from the nuclear periphery (Fig. 2e). Moreover, A-LV chromatin is highly enriched for constitutive inter-LADs⁶¹ (that is, regions never observed as LADs in any cell type) and is mostly replicated at the earliest G1b phase⁵⁸. A-LV chromatin also shows significantly higher transcriptional activity than chromatin in the A compartment with high structural variability (δ_RAD > 0) (A-HV) (P = 1.35 × 10⁻⁴⁰, Mann–Whitney–Wilcoxon test, two-sided; Fig. 2f). Overall, active genes with the highest number of transcripts in single-cell RNA-seq (scRNA-seq) experiments³⁷ have a significantly lower δ_RAD compared to genes with the lowest number of transcripts (P = 3.45 × 10⁻¹⁸, Mann–Whitney–Wilcoxon test, two-sided; Fig. 2g).

By contrast, A-HV chromatin lacks SON TSA-seq signal enrichment and thus has larger mean distances to nuclear speckles, and is enriched for facultative inter-LADs (Fig. 2e). Notably, A-HV regions with the largest structural variability often show a bimodal distribution in their single-cell radial positions, an indication of two favored nuclear locations—at the nuclear interior and a peripheral location (Fig. 2h). We hypothesize that genes in these regions may exist in two functional states: active in the transcriptionally favorable interior, and silenced in the periphery. Indeed, compared with A-LV chromatin, A-HV chromatin is more enriched for the repressive trimethylated H3 K9 (H3K9me3) mark and depleted of the activating acetylated H3 K9 (H3K9ac) mark (Fig. 2i), which could point to a higher functional heterogeneity in single cells. Notably, the structural variability can distinguish A1 from A2 subcompartment chromatin (Fig. 2j, left, and Extended Data Fig. 3f)—93% of all A-HV regions in the active compartment are A2 chromatin, whereas A1 chromatin is strongly enriched in A-LV (Fig. 2j, left, and 2k).

Similar to the active compartment, B compartment chromatin also shows substantial differences in functional properties between the highly variable (B-HV) (δ_RAD > 0) and lowly variable (B-LV) (δ_RAD < 0) (Fig. 2d, right) genomic regions (Fig. 2e). Subsequently, the B1, B2, and B3 subcompartments are well distinguished by their structural variability and average radial positions (Fig. 2j, right), and B2 and B3 are enriched in B-HV and B-LV regions, respectively (Fig. 2k).

Subcompartments separate into spatial partitions

Chromosome folding permits functionally related chromatin, separated in sequence, to assemble into spatial compartments (Fig. 3a). The single-cell interaction networks (CINs) of chromatin in the same subcompartment show a heterogeneous network organization with clusters of highly connected and physically separated subgraphs (that is, local partitions) reminiscent of microphase fragmentation⁶² (Fig. 3b and Methods). These spatial partitions can be visualized in single genome structures by the occupied volume of the contained chromatin (Fig. 3b,c).

**Fig. 3: Spatial partitions of subcompartments.**

Network structures differ between individual subcompartments. While A1 chromatin is fragmented into the smallest number of partitions with the largest sizes (Fig. 3d,e and Extended Data Table 1) and highest fraction of inter-chromosomal interactions (Fig. 3f), A2 chromatin is fragmented into substantially larger numbers of smaller partitions, dominated by intra-chromosomal interactions (Extended Data Table 1 and Fig. 3d–f). Among the B compartment, B3 chromatin has the largest partitions, dominated by intra-chromosomal interactions (Fig. 3e,f).

The larger partition sizes of A1 and B3 chromatin lead to a more homogenous compartmentalization, with each having a higher neighborhood enrichment score with its own kind (see high enrichment fold along the diagonal in Fig. 3g and Methods). Smaller partition sizes of A2 and B1 chromatin lead to relatively high neighborhood enrichment with other chromatin (see off diagonal enrichment in Fig. 3g). A2 partitions are often associated with B3 chromatin, whereas B1 partitions are associated with A1 chromatin³⁶ (Fig. 3g,h).

When we mapped nascent RNA expression from GRO-seq experiments⁶³ onto our genome structures, we found increasing transcriptional activities towards the centers of A1 partitions (Fig. 4a). A2 partitions show similar trends, although substantially lower signals (Fig. 4a). We also observe that highly expressed genes reside preferably in larger partitions, and expression levels at the centers of large A1 and A2 partitions are notably higher than those of smaller ones (Fig. 4b). These observations indicate that spatial partitions of active chromatin are regional territories of highest transcriptional activities.

**Fig. 4: SON TSA-seq predictions using 3D structures.**

Predicting locations of nuclear speckles

Mapping TSA-seq data¹³ onto our genome structures revealed the strongest TSA-seq signals—and thus the smallest mean speckle distances—for chromatin located towards the central regions of A1 partitions (Fig. 4c); this suggests that the center locations of A1 partitions could represent positions of nuclear speckles in individual cell models. To test this assumption, we simulated the experimental TSA-seq process by using A1 partition centers as approximate speckle locations (Fig. 4d and Methods). The simulated, population-averaged SON TSA-seq data from our models show highly significant correlation with the experimental SON TSA-seq values¹³ (Pearson’s r = 0.87, P = ~0), capturing well both peak sizes and signal distributions (Fig. 4e,f). For instance, the TSA-seq profile of chromosome 2 is reproduced with high correlation (Pearson’s r = 0. 90, P = ~0) across the entire chromosome profile, despite containing few A1 regions (6.4%) (Fig. 4e). Chromatins grouped by predicted TSA-seq signals show characteristic enrichment of histone modifications, identical to those observed in the experiment¹³ (Extended Data Fig. 4a). Moreover, predicted speckle locations confirm the proposed correlation between mean speckle distances of chromatin and its experimental TSA-seq signal (Extended Data Fig. 4b).

We then found out that speckle locations can be predicted accurately even without relying on A1 subcompartment annotations, which are only available for a limited number of cell lines. We found that spatial partitions of chromatin with lowest average radial positions in the bottom 10% (labeled as internal (INT) in Fig. 4c) predict speckle locations within 500 nm to those derived from A1 partitions in 99% of structures (78% of chromatin with 10% lowest average radial positions are part of A1). Subsequently, the SON TSA-seq data can also be predicted from INT centers with almost identical accuracy (Pearson’s r = 0.86, P = ~0) (Extended Data Fig. 4d and Extended Data Table 2). Further investigations showed that only INT or A1 chromatin partition centers predict accurately the nuclear speckle locations in our models (Extended Data Fig. 4c,d and Extended Data Table 2).

Predicting speckle-associated structural features

With predicted speckle locations as reference points, we can now calculate speckle-associated features (SpD, SAF, δ_SpD, S-TSA in Fig. 1b) for each genomic region (Methods). The predicted speckle association frequencies (SAFs) of genomic regions agree with a recent DNA-MERFISH microscopy study⁶ with high correlation (Pearson’s r = 0.77, P = 1.2 × 10⁻²⁰²; Fig. 4g and Methods). Predicted trans A/B ratios also show a high correlation with those from DNA-MERFISH⁶ (Pearson’s r = 0.70, P = 7.6 × 10⁻¹⁰⁹; Fig. 4h). We also found a moderate but highly significant correlation for the cell-to-cell variability of speckle distances (δ_SpD) between our models and the experiment (Extended Data Fig. 4e; Pearson’s r = 0.352, P = 7 × 10⁻³⁰). Interestingly, we find a strong anticorrelation between the inter-chromosomal contact probability (ICP) of a genomic region and its mean speckle distance (SpD) (Pearson’s r = −0.95, P = ~0; Supplementary Fig. 1). Thus, the surroundings of speckles are strongly enriched in interchromosomal interactions, in particular for A compartment chromatin. This observation is confirmed by a strong correlation between a gene’s SAF and trans A/B ratio⁶ (Pearson’s r = 0.98, P = ~0; Fig. 4i).

Defining lamina- and nucleoli-associated features

Our models also accurately predict structural features describing chromatin positioning relative to the nuclear lamina (LAF, L-TSA in Fig. 1b). For instance, our models predict experimental lamin-B1 TSA-seq data with high correlation¹³, thus revealing accurate mean distances of genomic regions to the nuclear envelope (Pearson’s r = 0.78, P = ~0; Extended Data Fig. 4f and Table 1). Our models also predict lamin-B1 pA-DamID³⁵ data with high correlation (Pearson’s r = 0.80, P = ~0; Extended Data Fig. 4g and Table 1), and thus could predict well the contact frequencies of genomic regions with the nuclear periphery. Finally, our models also reproduce experimental lamina association frequencies (LAFs)⁶ (Pearson’s r = 0.64, P = ~3.6 × 10⁻¹¹⁹; Extended Data Fig. 4h), despite the differences in shape between IMR-90 and GM12878 cell nuclei. Predicted LAF values are inversely correlated with a gene’s trans A/B ratios, confirming previous observations from DNA-MERFISH imaging⁶ (Extended Data Fig. 4i).

Moreover, our models also predict nucleolus-associated structural features (NuD, δ_NuD, NAF (nucleoli association frequencies), N-TSA in Fig. 1; Extended Data Fig. 4j and Methods).

Finally, we also calculate structural features of the chromatin fiber (Extended Data Fig. 5a and Methods), including local chromatin compaction (RG), which confirm the locations of TAD borders (Extended Data Fig. 5b–d).

The role of the nuclear microenvironment in gene function

Overall, we calculate a total of 17 structural features from our single-cell genome structure models (Fig. 1 and Supplementary Figs. 2–22). Collectively, these features define the nuclear microenvironment of each genomic region, which allows us to assess the role of the nuclear microenvironment in explaining functional differences between chromatin, in particular for gene transcription, DNA replication, and chromatin compartmentalization.

Gene transcription

First, we compare the stochastic variability of gene–speckle distances (δ_SpD) in single-cell models with the heterogeneity of single-cell gene expression from single-cell RNA sequencing (scRNA-seq) experiments³⁷. Cumulatively ranked single-cell distances of a genomic region to its nearest predicted speckle (Fig. 5a, top) show striking similarities to the cumulatively ranked number of gene transcripts of the corresponding genes in a cell population from scRNA-seq³⁷ (Fig. 5b, top, and Methods). Subsequently, the gene transcription frequency (TRF), defined as the fraction of cells a gene transcript is detected in scRNA-seq³⁷ (Fig. 5a,b, bottom), shows a highly significant correlation with the SAF predicted from the models (Fig. 5c, left panel, Spearman’s r = 0.51, P = ~0). Thus, genes with transcripts in a large fraction of cells are also located close to speckles in a large fraction of models. We also validated these findings with transcription frequencies measured from RNA-MERFISH microscopy for 1,137 genes⁶. Here as well, we observe the identical highly significant correlation between TRF and SAF (Spearman’s r = 0.51, P = 1.6 × 10⁻⁶⁴) (Fig. 5c, right panel). Interestingly, a gene’s interior location frequency (ILF; Methods) shows substantially smaller correlation with the TRF than with the SAF, for both scRNA-seq³⁷ and RNA-MERFISH⁶ data (Spearman’s r = 0.42, P = ~0 (scRNA-seq) and r = 0.45, P = 4.1 × 10⁻⁵⁰ (RNA-MERFISH)) (Fig. 5c). These observations indicate a possible role for single-cell variations of a gene’s nuclear microenvironment in its expression heterogeneity.

**Fig. 5: Relationship between 3D chromatin structure and transcriptional activity.**

Moreover, we found that genes in the top 10% of genes with the highest numbers of transcripts (T10) are distinguished in their nuclear microenvironment from genes in the bottom 10% (B10). T10 genes show strong enrichment for several structural features (Fig. 5d, for example SAF and trans A/B), while being depleted in δ_RAD, δ_SpD, and δ_NuD. Thus, T10 genes show a strong preference for the same specific microenvironment in different cells, while B10 genes do not—their microenvironment is highly variable between cells without clear association preferences to nuclear bodies.

The distribution of feature values for the most discriminative features (SpD, ILF, SAF, RAD, and trans A/B) are quite different between the T10 and B10 genes (Fig. 5e). However, SAF and the highly correlated trans A/B ratio outperform all other features, including the radial gene position (RAD), in distinguishing T10 from B10 genes, as shown by the receiver operating characteristic (ROC) curves (Fig. 5f) (AUC for SAF = 0.85, RAD = 0.65). This finding could indicate that the general preference of highly expressed genes at interior radial positions may be an indirect consequence of favored associations with nuclear speckles, which themselves show stochastic preferences towards the nuclear interior^13,64.

Moreover, genes controlled by superenhancers (SEN) show overall higher fold enrichments in structural features than genes controlled by regular enhancers (EN) (Methods). Thus, SEN genes reveal stronger preferences in their nuclear microenvironment between cells, particularly for higher SAF, interior positions, trans A/B, ICP, and depletion of LAF values (Fig. 5g).

The organizing role of nuclear speckles and lamina

Our approach allows a detailed analysis of chromatin speckle interactions. Chromatins divided into ten groups on the basis of their experimental SON TSA-seq signals¹³ show distinct structural enrichment patterns, which gradually change with increasing SON TSA-seq values (Fig. 6a). Chromatins in deciles d4–d7 (intermediate mean speckle distances) are highly variable in their nuclear positions (δ_RAD) and show no preferred associations with nuclear bodies studied here (Fig. 6a,b). By contrast, chromatins in the first (d1, d2) and last (d9, d10) deciles show the highest fold enrichments and thus the most stable microenvironment with strong structural homogeneity between cells in the population; these regions show the lowest δ_RAD and have the smallest and largest SpD, respectively (Fig. 6b). The latter coincides with mostly B-LV chromatin located at the nuclear periphery (Fig. 2d, right panel) and subsequently high lamin-B1 TSA-seq signal enrichment (Fig. 2e, left panel). Thus, these genomic regions provide stable structural anchor points at the nuclear periphery. Speckle-associated chromatin with the highest SON TSA-seq signals (d8–d10 in Fig. 6b) and SAF values also show relatively low cell-to-cell structural variability in their radial positions (δ_RAD) (Fig. 6b, mostly A-LV in Fig. 2d, left panel). Since speckle locations are mostly excluded from the nuclear periphery^13,64, these regions act as stable anchor points at the nuclear interior (mostly A-LV in Fig. 2d, left panel). Therefore, both the lamina compartment and nuclear speckles act as anchor points for scaffolding the organization of the spatial genome. These observations provide a structural interpretation of the steep transitions between low and high signal peaks in SON TSA-seq profiles, previously reported as TSA-seq trajectories¹³ (Fig. 6c and Extended Data Fig. 6a). These transitions correspond to the sequence stretches between two consecutive anchor points, each with relatively low δ_RAD, and coincide with steep transitions in average speckle distances and radial positions (Fig. 6c and Extended Data Fig. 6a–c). In a fraction of models, these chromosome regions fold from anchor regions at the outer nuclear periphery towards anchor points at the nuclear interior, where the SON TSA-seq peak region is often associated with a nuclear speckle and forms the apex of a chromosomal loop, which then traces back to the nuclear periphery (Fig. 6d and Extended Data Fig. 6c). We found that δ_RAD in long trajectories (median length of 19.1 Mb between two consecutive anchor points) is significantly larger than for chromatin regions in short trajectories (median length 4.8 Mb) (Mann–Whitney two-sided test, P = 1.48 × 10⁻¹⁸; Extended Data Fig. 6d). Therefore, sequence locations of consecutive anchor points can modulate the structural properties for chromatin between anchor points over an extended genomic range, and disruption of an anchor point would likely affect structural properties of genomic regions over an extended sequence distance.

**Fig. 6: Structural features of microenvironments.**

We also observe that SON TSA-seq signals (that is, mean speckle distances) positively correlate with both the ICP (Methods) (Pearson’s r = 0.76 P = ~0, Fig. 6e, top) and trans A/B ratio (Fig. 6e, bottom). These observations imply that surroundings of nuclear speckles act as major hubs for inter-chromosomal interactions of transcriptionally active genomic regions, confirming similar findings reported earlier^13,23.

Finally, our models reveal distinct structural differences for genomic regions with high and intermediate SON TSA-seq signal peaks (that is, previously labeled type I and type II transcription ‘hot zones’¹³) (Fig. 6f). The vast majority of type II peaks show significantly higher speckle distance and radial variability (δ_SpD, δ_RAD) (Fig. 6f,g) than type I peaks and thus do not reside stably at intermediate speckle distances. Instead, they show a wider, in many cases bimodal, speckle distance distribution in comparison to type I peaks (Fig. 6h).

The role of microenvironment in replication timing

Variations in the replication timing⁵⁸ of chromatin are mirrored by distinct differences in their nuclear microenvironment (Fig. 6i). For example, chromatin that replicates at early time points (G1b, S1) is most enriched for high SAF and trans A/B ratio, as well as low structural variability (Fig. 6i,j), whereas late-replicating chromatin (S4 and G2 phase) are depleted of interior locations and SAF, and strongly enriched for lamina-associated features (Fig. 6i). Overall, SAF, SpD, and trans A/B ratio are more discriminative (that is, these features have higher fold changes) than features related to radial positions (RAD, ILF) in distinguishing early-replicating (G1b) from late-replicating chromatin (G2) (Fig. 6k).

Chromatin compartmentalization

Chromatins in different subcompartments are well separated in terms of their enrichment patterns for structural features, and thus represent distinct physical microenvironments (Fig. 6l, Extended Data Fig. 7a, and Methods) (speckle features are predicted without A1 subcompartment annotations). While A1 chromatin shows strong preferences in its nuclear microenvironment, particularly for speckle-associated features, A2 chromatin lacks clear location preferences, with high cell-to-cell variability in radial locations, overall weak enrichment patterns, and wide distributions of feature values (Fig. 6l and Extended Data Fig. 7a). Similarly, the three inactive B subcompartments are well distinguished in terms of their characteristic enrichment patterns. Indeed, these differences are so pronounced that we are able to predict Hi-C subcompartments from structural features alone without explicit considerations of chromatin interactions. Unsupervised K-means clustering based on structural feature vectors for compartment A chromatin predicts A1 and A2 subcompartment annotations with 94% accuracy. Chromatins in inactive subcompartments were predicted with an accuracy of 84% (Fig. 6m and Methods). These results are comparable in accuracy to supervised methods using Hi-C contact frequencies⁶⁵. Our approach provides an alternative way of detecting subcompartment annotations while providing underlying structural interpretations.

Moreover, we confirmed our findings with other chromatin compartment annotations, such as SCI states⁶⁶ and SPIN states⁶⁷, which showed distinct structural enrichment patterns for each chromatin state (Extended Data Fig. 7b–d).

Discussion

We introduce an approach to determine a population of single-cell 3D genome structures from ensemble Hi-C data. Our method predicts a host of structural features in single-cell models to provide information about the nuclear microenvironment of genomic regions in single cells, which is not available from ensemble Hi-C data itself. Therefore, our method expands the scope of Hi-C data analysis and is widely applicable to other cell types and tissues for which Hi-C data is available.

The models and derived structural features are a powerful resource to unravel relationships between genome structure and function. We found that cell-to-cell heterogeneity of structures varies by genomic loci and is a strong indicator of functional properties. Structurally stable chromatin in the A compartment is dominantly associated with nuclear speckles, and shows relatively high speckle association frequencies, a high trans A/B ratio, and the overall lowest average radial positions. These regions contain highly transcribed genes, are enriched for superenhancers and SON TSA-seq signals, and are replicated at the earliest time points. Moreover, these genomic regions compartmentalize in relatively large spatial partitions, formed by a high fraction of inter-chromosomal interactions. Chromatin in the A1 subcompartment is enriched in this category.

By contrast, active chromatin with high structural variability is characterized by a lack of preferences in nuclear locations. In a fraction of cells, these regions can be located in a silencing environment at the nuclear periphery; in others, it can be located towards the transcriptionally favorable interior. These genes show relatively low transcript frequencies, low inter-chromosomal contact probabilities with low trans A/B ratios, and intermediate replication timing (phases S2, S3). In TSA-seq experiments, most of these regions were identified as type II peaks, with intermediate TSA-seq values. We also noticed that these regions compartmentalize into relatively small spatial partitions, dominated by intra-chromosomal interactions. Chromatin in the A2 subcompartment is enriched in this category. It is possible that the high structural variability of these regions could be linked to functional heterogeneity between cells. For instance, although they are transcriptionally active, these regions have higher levels of the silencing H3K9me3 mark and reduced levels of the activating H3K9ac mark than active regions with low structural variability. Moreover, gene transcripts for these regions are found in a smaller fraction of cells and show lower transcriptional activity.

Interestingly, structural heterogeneity is also an indicator that can distinguish nucleoli- and lamina-associated chromatin in the B compartment. Genomic regions with low structural variability are dominantly associated with the lamina compartment, contain constitutive LADs and are enriched in the B3 subcompartment. Genomic regions with high structural variability are associated with nucleoli and pericentromeric heterochromatin and are enriched in the B2 subcompartment.

Our results suggest that nuclear speckles, together with the lamina compartment, are a major organizing factor in genome structure. Chromatin with low structural variability between cells is dominantly associated with either nuclear speckles or constitutive LADs. LADs are mostly located at the nuclear periphery while speckles are mostly excluded from the periphery^13,64. Therefore, LADs and nuclear speckles provide structural anchor points at the periphery and nuclear interior. We hypothesize that A-LV and B-LV regions associated to these anchors act similarly to recently reported fixed points in the nuclear organization of mouse embryonic stem cells⁷.

Moreover, the observed anticorrelation between inter-chromosomal contact probabilities and mean speckle distances suggests that speckles are hubs that facilitate inter-chromosomal interactions for active chromatin, confirming similar observations from SPRITE experiments²³. The high fraction of inter-chromosomal interactions for speckle-associated chromatin could explain the preferential locations of speckles toward the nuclear interior. The probability of inter-chromosomal interactions increases towards the nuclear interior (Fig. 7a). If speckles associate with multiple chromosomes, their locations are more likely at the nuclear interior. Over time, dynamic interactions with multiple chromosomes may restrain their locations towards the interior (Fig. 7b). These cooperative effects could bias the global speckle distributions towards the nuclear interior.

**Fig. 7: Inter-chromosomal interactions and speckle locations.**

Chromatins with highest and lowest transcriptional activity are distinguished by their nuclear microenvironment. The SAF shows the highest correlation with the gene transcription frequency^7,68,69,70. Therefore, the interior preferences of highly activated genes could be a consequence of preferential locations close to nuclear speckles, which in turn have a stochastic preference towards the nuclear interior, confirming previous observations from TSA-seq experiments¹³. Chromatin replicated at the earliest time are also distinguished in their structural features from late-replicating chromatin. Moreover, our observations confirm that Hi-C subcompartments define physically distinct chromatin environments, some of which (such as A1) linked to associations with nuclear bodies.

In summary, our method defines the nuclear microenvironment of a genomic region by calculating a large number of structural features from 3D genome structures. The nuclear microenvironment of a gene can be linked to its functional potential in transcription and replication and thus is relevant for a better understanding of genome structure function relationships. These features can be calculated from Hi-C data, and thus are applicable to many different cell types.

Methods

Population-based 3D structural modeling

General description

Our goal is to generate a population of 10,000 diploid genome structures, so that the accumulated chromatin contacts across the entire population are statistically consistent with the contact probability matrix A = (a_IJ)_{(N × N)} derived from Hi-C experiments^18,34, with I and J as two chromatin regions in the genome. To achieve this goal, we utilize population-based modeling, our previously described probabilistic framework to de-multiplex the ensemble Hi-C data into a large population of individual genome structures of diploid genomes statistically consistent with all contact frequencies in the ensemble Hi-C data^33,34,32.

The structure optimization is formulated as a maximum likelihood estimation problem solved by an iterative optimization algorithm with a series of optimization strategies for efficient and scalable model estimation^33,34,48. Briefly, given a contact probability matrix A = (a_IJ)_{(N × N)}, we aim to reconstruct all 3D structures X = {X₁, X₂…X_M} in the population of M models, each containing 2 × N genomic regions for the diploid genome (at 200 kb base-pair resolution), and $ \vec{x}_{im} \in {{\mathfrak{R}}}^{3}$, i = 1,…,2N as coordinates of all diploid genomic regions in model m (we use lowercase letters i and i' to indicate a given copy of the genomic region I). We introduce a latent indicator variable $\mathbf{W} = \left( w_{ijm} \right)_{2N}$ for complementing missing information (that is, missing phasing and ambiguity owing to genome diploidy). W is a binary-valued third-order tensor specifying the contacts of homologous genomic regions in each individual structure of the population, such that $\mathop{\sum }\limits_{m=1}^{M}{{\boldsymbol{W}}}^{m}/M={\boldsymbol{A}}$, with W^m = (w^m)_{2N × 2N} such that ${w}_{{ij}}^{m}={w}_{{ijm}}$. We can jointly approximate the structure population (X) and the contact tensor (W) by maximizing the log-likelihood of the probability:

$${\rm{log }}P\left({\bf{X|A}}{\boldsymbol{,}}{\bf{W}}\right)={\rm{log }}P\left({\bf{A}}{\boldsymbol{,}}{\bf{W|X}}\right)$$

$$\mathrm{subject}\,\mathrm{to}\,\left\{\begin{array}{c}\mathrm{nuclear}\,\mathrm{volume}\,\mathrm{confinement}\\ \mathrm{excluded}\,\mathrm{volume}\\ \mathrm{chain}\,\mathrm{connectivity}\,\mathrm{restraint}\end{array}\right.$$

where

i.
Nuclear volume constraint: all chromatin spheres are constrained to the nuclear volume with radius R_nuc; ${\|{\vec{x}}_{{im}}\|}_{2}\le {R}_{\mathrm{nuc}},$ where ${\|{\vec{x}}_{{im}}\|}_{2}$ is the distance of the region i from the nuclear center in structure m.
ii.
Excluded volume constraint: this constraint prevents overlap between two regions represented by spheres, defined by their excluded volume radii (R_ex); ${d_{ijm} = \|{\vec{x}}_{{im}}-{\vec{x}}_{{jm}}\|}_{2}\ge 2{\times R}_{\mathrm{ex}}$.
iii.
Polymer chain constraint: distances between two consecutive 200-kb spheres within the same chromosomes are constrained to their contact distance to ensure chromosomal chain integrity; ${\|{\vec{x}}_{\left(i+1\right)m}-{\vec{x}}_{{im}}\|}_{2}\le 2{\times R}_{\mathrm{soft}}$, where ${R}_{\mathrm{soft}}=\,2{\times R}_{\mathrm{ex}}$.

Our modeling pipeline uses a step wise iterative process in which the optimization hardness is gradually increased by adding contacts with decreasing contact probabilities in the input matrix. The iterative optimization procedure involves two steps, each optimizing local approximations of the likelihood function: (1) assignment step (A-step)—given the estimated structures X^k at step k, estimate W^k; and (2) modeling step (M-step)—given the estimated W^k, generate model population X^k+1 at step k + 1 that maximizes likelihood to observe W. Structures in the M-step are calculated using a combination of optimization approaches, including simulated annealing molecular dynamics simulations.

Moreover, during each optimization cycle, we also use iterative refinement steps, a methodological innovation for effective reassignment of restraints during the optimization process, which allows genome structure generation at higher resolution and improved accuracy in comparison to our previous approach^33,34 (see iterative refinement method in Supplementary Information).

After 11 iterations, our method converged and the genome-wide contact probabilities from the structure population agreed with those from the Hi-C experiment.

Genome representation

The nucleus is modeled as a sphere with 5-μm radius (R_nuc)³⁴. Chromosomes are represented by a chromatin chain model at 200-kb base-pair resolution. Each 200-kb chromatin region, in the diploid genome, is modeled as a sphere, defined by an excluded volume radius (R_ex = 118 nm). R_ex is estimated from the sequence length, the nuclear volume and the genome occupancy (40%), as described in ref. ³⁴. The full diploid genome is represented with a total of 30,332 spheres.

Random starting configurations

Optimizations are initiated with random chromosome configurations. Chromatin regions are randomly placed in a bounding sphere proportional to its chromosome territory size and randomly placed within the nucleus.

Comparison between contact frequency maps from Hi-C experiment and model population

To quantify the agreement between Hi-C experiment and model population, we perform the following analyses:

1.
Comparison between input and output Hi-C maps are evaluated by Pearson and stratum adjusted (SCC)⁵⁷ correlation coefficients (Supplementary Table 1).
2.
Restraint residual. On average about 175,304 contact restraints are imposed in each of the 10,000 structures. The restraint residual of each contact restraint between loci k and l is calculated as: ${\eta }_{kl}=\frac{{d}_{kl}-D}{D}$, where d_kl is the distance between the contact loci in the model, and D is the target contact distance (2 × R_soft).
3.
Residual ratio. The residual ratio Δr is defined as:
$${\Delta r}_{{kl}}=\,\left({f}_{{kl}}^{\,\mathrm{input}}-\,{f}_{{kl}}^{\,\mathrm{model}}\right)/{f}_{{kl}}^{\,\mathrm{input}}$$
with ${f}_{{kl}}^{\,\mathrm{input}}$ and ${f}_{{kl}}^{\,\mathrm{model}}$ as the contact probabilities between regions k and l from experiment and models, respectively. Residual ratios are very small, and centered at a median of 0.03 (mean = −0.05) for intra-chromosomal and 0.001 (mean = −0.002) for inter-chromosomal contacts (Supplementary Fig. 23), showing agreement between experiment and model.
4.
Prediction of missing Hi-C data from sparse data model. A sparse Hi-C input data set is generated by randomly removing 50% of the non-zero data entries from the Hi-C contact frequency matrix.

Comparison of simulated single cell chromosome structures with those from DNA-MERFISH imaging

Preprocessing of the DNA-MERFISH dataset⁶: please refer to the methods in Boninsegna et al.³².

Preprocessing Dip-C dataset²⁵: we collected both homologous chromosome copies from each of the 16 single cells. To match our model resolution, we generated 200-kb-resolution models by averaging coordinates of loci that map to 200-kb bins.

Calculation and comparison of distance matrices: please refer to Methods in Boninsegna et al.³².

Robustness and converge analysis

Replicates

Technical replicates are calculated from different random starting configurations. Resulting contact frequency maps and the average radial positions of all chromatin regions between replica populations are nearly identical (Supplementary Fig. 24). All observed structural features discussed in this paper are reproduced in the technical replicate population.

Population size

To assess convergence with respect to population size, we generated 5 populations with 50, 100, 1,000, 5,000, or 10,000 structures. Chromatin contact frequencies and structural features for each structure populations were compared against results with a population size of 10,000 structures. At a population of 1,000 structures, a size much smaller than our target population, contact frequency values and average radial positions were already converged at a very high correlation with those from a 10,000-structure population (Supplementary Fig. 25).

Chromatin interaction networks and identification of spatial partitions

Building chromatin interaction networks

A chromatin interaction network (CIN) is calculated for each model and for chromatin in each subcompartment separately as follows (Supplementary Fig. 26): each vertex represents a 200-kb chromatin region. An edge between two vertices i and j is drawn if the corresponding chromatin regions are in physical contact in the model, if the spatial distance d_ij ≤ 2 × R_soft).

Network properties

Maximal clique enrichment: A clique is a subset of nodes in a network where all nodes are adjacent to each other and fully connected. The maximal clique refers to the clique that cannot be further enlarged. The number of maximal cliques, c, is calculated using the graph_number_of_cliques function in the NetworkX python package⁷¹. The maximal clique enrichment (MCE) of the subcompartment s in the structure m is calculated as:

$${{MCE}}_{s,m}=\,\frac{{c}_{s,m}}{\frac{1}{10}\mathop{\sum }\limits_{r=1}^{10}{c}_{r,m}}$$

Where c_s,m is number of maximal cliques for subcompartment s in structure m; and c_r,m is the number of maximal cliques of a CIN constructed from randomly shuffled subcompartment regions in the same structure m. High MCE values show formation of a structural subcompartment with high connectivity between 200-kb regions of the same state.

Neighborhood connectivity: To calculate the neighborhood connectivity (NC) of a subcompartment CIN, we first calculate the average neighbor degree for each node using the average_neighbor_degree function in the NetworkX python package⁷¹. The overall neighborhood connectivity of the subcompartment s in the structure m is then calculated as:

$${{NC}}_{s,m}=\frac{1}{{N}_{s,m}}\mathop{\sum }\limits_{j=1}^{{N}_{s,m}}{\deg }_{j}\,$$

where N_s,m is the number of nodes in the CIN of the subcompartment s in the structure m, and deg_j is the average neighbor degree of node j.

Identifying spatial partitions via Markov clustering

Spatial partitions of subcompartments are identified by applying the Markov Clustering Algorithm (MCL)⁷², a graph clustering algorithm, which identifies highly connected subgraphs within a network. MCL clustering is performed for each subcompartment CIN in each structure by using the mcl tool in the MCL-edge software⁷². Unless otherwise noted, the 25% smallest subgraphs (with less than 7 nodes, many of those being singletons) are discarded from further analysis, to focus on highly connected subgraphs. The highly connected subgraphs are referred to as ‘spatial partitions’ throughout the text.

In addition to subcompartment partitions, we also predict speckle and nucleoli partitions as follows:

Speckle partitions

Case 1: Predictions of speckle locations with knowledge of A1 subcompartment annotations.

Speckle locations are identified as the geometric center of A1 spatial partitions identified by Markov clustering of A1 CINs. In each structure, only A1 spatial partitions with sizes larger than three nodes (chromatin regions) are considered for downstream analysis.

Case 2: Predictions of speckle locations without knowledge of subcompartments.

We first identify chromatin expected to have high speckle association. These regions are identified as those with unusually low and stable interior radial positions. We select 10% chromatin regions with the lowest average radial positions (78.4% of these regions are part of the A1 subcompartment). We then generate CINs for the selected group of chromatin regions in each structure of the population. Approximate speckle locations are then identified as the geometric center of the resulting spatial partitions identified by Markov clustering of the CINs. Only spatial partitions with sizes larger than three nodes (chromatin regions) are considered for downstream analysis.

Case 3: Predictions using locations of A2 partition centers.

For comparison, we also identify speckle locations as the geometric center of A2 spatial partitions identified by Markov clustering of A2 CINs similar to case 1. In each structure, only A2 spatial partitions with sizes larger than three nodes (chromatin regions) are considered for downstream analysis.

Nucleoli partitions

Following the same protocol as in case 2 for speckle partitions, we first identify chromatin expected to have high nucleoli association. These regions are identified as those previously reported nucleoli-associated domain (NAD)⁷³ regions and nucleolus organizing regions (NOR, on short arms of chromosomes 13, 14, 15, 21, and 22). Using these regions, we generate CINs in each structure of the population. Approximate nucleoli locations are then identified as the center of mass of the resulting spatial partitions identified by Markov clustering of the CINs. Only the top 25% largest spatial partitions are used as predicted nucleoli. For NOR regions, we use the first 25 restrained 200-kb regions that are closest in sequence to NOR regions in these five chromosomes, as NOR regions do not have Hi-C data and they are not restrained during the modeling protocol.

Properties of partitions

Size of partitions: The size of a spatial partition is calculated as 0.2 × N Mb, where N is the number of nodes in the partition that represents a 0.2-Mb region.

Fraction of inter-chromosomal edges (contacts): For each spatial partition, the inter-chromosomal edge fraction (ICEF) is calculated as:

$${ICEF}=\,\frac{{E}_{\mathrm{inter}}}{{E}_{\mathrm{intra}}+{E}_{\mathrm{inter}}}$$

where E_intra and E_inter are the number of intra- and inter- edges in the partition, respectively.

Structural features

Unless otherwise noted, mean values of structural features for each genomic region I are calculated from 2 copies (i and i') and 10,000 structures (total 20,000 configurations) in the following structural feature calculations.

Mean radial position (RAD, no. 1)

Radial position of a chromatin region i in structure m is calculated as:

$${r}_{i,m}=\frac{{d}_{i,m}}{{R}_{\mathrm{nuc}}}$$

where d_i,m is the distance of i to the nuclear center, and R_nuc is the nucleus radius which is 5 μm. r_i,m = 0 means the region i is at the nuclear center, while r_i,s = 1 means it is located at the nuclear surface.

Local chromatin fiber decompaction (RG, no. 2)

The local compaction of the chromatin fiber at the location of a given locus is estimated by the radius of gyration (RG) for a 1 Mb region centered at the locus (that is, comprising +500 kb up- and 500 kb downstream of the given locus). To estimate the RG values along an entire chromosome we use a sliding-window approach over all chromatin regions in a chromosome.

The RG for a 1 Mb region centered at locus i in structure m is calculated as:

$${{RG}}_{i,m}=\,\mathop{\sum }\limits_{j=1}^{N}{{d}_{j,m}}^{2}$$

where N is the number of chromatin regions in the 1-Mb window, and d_j,m is the distance between the chromatin region j to the center of mass of the 1-Mb region, in structure m.

Mean gene–speckle and gene–nucleolus distances (SpD and NuD, nos. 3 and 4)

For each 200-kb region, the closest speckle partition (or nucleolus partition) in each single structure is identified and the center-to-center distance is calculated (from the center of the region to the geometric center of the partition). The distances across the population are then averaged for each region to calculate mean speckle (or nucleolus) distances.

Cell-to-cell variability of features (δ _RAD, δ _RG, δ _SpD, and δ _NuD, nos. 5–8)

Cell-to-cell variability of any structural feature F (${\delta }_{I}^{\mathrm{RAD}}$ for radial positions, ${\delta }_{I}^{\mathrm{SpD}}$ speckle distances, ${\delta }_{I}^{\mathrm{NuD}}$ nucleoli distances, and ${\delta }_{I}^{\mathrm{RG}}$ local decompaction) for a chromatin region I is calculated as:

$${\delta }_{I}^{F}={\log }_{2}\,\frac{{\sigma }_{I}^{F}}{\bar{{\sigma }^{F}}}$$

where ${\sigma }_{I}^{F}$ is the s.d. of the values for structural feature F calculated from both homologous copies i and i' of the region I across all 10,000 genome structures in the population; $\bar{{\sigma }^{F}}$ is the mean s.d. of the feature value calculated from all regions within the same chromosome of region I. Positive ${\delta }_{I}^{F}$ values (${\delta }_{i}^{F} > 0$) result from high cell-to-cell variability of the feature (for example radial position); negative values (${\delta }_{i}^{F} < \,0$) indicate low variability.

Regions in the A compartment with positive and negative ${\delta }_{I}^{{RAD}}$ are called A-HV (high variability) and A-LV (low variability), respectively. Likewise, regions in the B compartment with positive and negative ${\delta }_{I}^{{RAD}}$ are called B-HV and B-LV, respectively. The number of 200-kb regions in each group are 3,164, 2,731, 3,839, and 3,918 for A-LV, A-HV, B-LV, and B-HV, respectively.

Interior localization frequency (ILF, no. 9)

For a given 200-kb region, the interior localization frequency (ILF) is calculated as:

$${{ILF}}_{I}=\,\frac{{n}_{r_I < 0.5}}{M}$$

where ${{n}_{rI < 0.5}}$ is the number of structures where either copy of the region I has a radial position lower than 0.5, and M is the total number of structures which is 10,000 in our population.

Nuclear-body association frequencies (SAF, LAF, and NAF, nos. 10–12)

For a given 200-kb region, the association frequency to nuclear bodies (SAF, LAF, and NAF for speckle, lamina, and nucleoli association frequencies, respectively) is calculated as:

$${{SAF} (\text{or } LAF \text{ or } NAF)}_{I}=\,\frac{{n}_{{d}_{i} < {d}_{t}}+{n}_{{d}_{{i}^{{\prime} }} < {d}_{t}}}{2M}$$

where M is the number of structures in the population (two homologous copies of each chromosome are present per structure); ${n}_{{d}_{i} < {d}_{t}}$ and ${n}_{{d}_{{i}^{{\prime} }} < {d}_{t}}$ are the number of structures, in which region i and its homologous copy i′ have a distance to the nuclear body of interest (NB) smaller than the association threshold, d_t, respectively. The d_ts are set to 500 nm, 0.35 × R_nuc, and 1,000 nm for SAF, LAF, and NAF, respectively. We tried different distance thresholds, and the selected thresholds resulted in the best correlations with experimental data. For SAF and NAF calculations, we use the predicted speckle and nucleolus partitions to calculate distances (see ‘Identifying spatial partitions via Markov clustering’). For LAF, we use the direct distances of regions to the nuclear envelope. For all association frequency calculations, we calculate distances from the surface of the region to the center-of-mass of the partition or to the surface of the nuclear envelope.

TSA-seq (S-TSA, L-TSA, N-TSA, nos. 13–15)

To predict TSA-seq signals for speckle, nucleoli, and lamina from our models, we use the following equation:

$${{sig}}_{i}=\,\frac{1}{M}\mathop{\sum }\limits_{m=1}^{M}\mathop{\sum }\limits_{l=1}^{L}{e}^{-{R}_{0}\|{d}_{{il}}\|}$$

where M is the number of models, L is the number of predicted speckle locations in structure m, d_il is the distance between the region i and the predicted nuclear body location l, and R₀ is the estimated decay constant in the TSA-seq experiment¹³ which is set to 4 in our calculations. The normalized TSA-seq signal for region i then becomes:

$${\mathrm{predicted}\;\mathrm{TSA}{\hbox{-}}\mathrm{seq}\;\mathrm{signal}}_{i}=\log \left(\frac{{{sig}}_{i}}{\overline{{sig}}}\right)$$

where $\overline{{sig}}$ is the mean signal calculated from all regions in the genome. The predicted signal is then averaged over two copies for each region. The predicted speckle and nucleoli partitions are used for distance calculations (see ‘Identifying spatial partitions via Markov clustering’). For lamina TSA-seq, we use direct distances of each 200-kb chromatin region to the nuclear surface in each structure, which is calculated as (1 − r_i,m) × R_nuc, where r_i,m is the radial position of the 200-kb region i in structure m and R_nuc is the nucleus radius, which is set to 5 μm.

Mean inter-chromosomal neighborhood probability (ICP, no. 16)

For each target chromatin region i, we define the neighborhood {j} if the center-to-center distances of other regions {j} to the target region are smaller than 500 nm, which can be expressed as a set; Ne_i = {j: j ≠ i, $d_{ij}$ < 500 nm}. Inter-chromosomal neighborhood probability (ICP) is then calculated as:

$${{ICP}}_{I}=\frac{1}{2M\,}\mathop{\sum }\limits_{m=1}^{M}\mathop{\sum }\limits_{i=1}^{2}\frac{{n}_{{inter}}(m,i)}{{n}_{{inter}}\left(m,i\right)+{n}_{{intra}}(m,i)}$$

where M is the number of structures, n_intra (m,i) and n_inter (m,i) are the number of intra- and inter-chromosomal regions in the set Ne_i in structure m for region i.

Median trans A/B ratio (no. 17)

For each chromatin region i, we define the trans neighborhood {j} if the center-to-center distances of other regions from other chromosomes to itself are smaller than 500 nm, which can be expressed as a set; ${{Ne}}_{i}^{t}=\{j:\,{{chrom}}_{i}\ne \,{{chrom}}_{j},\,{d}_{{ij}} < 500{nm}\}$. The trans A/B ratio is then calculated as:

$${Trans}\,\mathrm{A/B}\,{\mathrm{ratio}}_{i}=\frac{{n}_{A}^{t}}{{n}_{B}^{t}}$$

where ${n}_{A}^{t}$ and ${n}_{B}^{t}$ are the number of trans A and B regions in the set ${{Ne}}_{i}^{t}$ for region i. The median of the trans A/B ratios for a region is then calculated from all the trans A/B ratios of the homologous copies of the region observed in all the structures of the population. The values are then rescaled to have values between 0 and 1.

Data analysis

The analyses and most of the figure panels were performed using custom Python scripts (matplotlibv3.4 (ref. ⁷⁴), Scikit-learnv1.0 (ref. ⁷⁵), scipyv1.5 (ref. ⁷⁶), and networkxv2.3 (ref. ⁷¹)) together with the publicly available alabtools platform (https://github.com/alberlab/alabtools). The remaining panels and the final figures were assembled using Adobe Illustrator. Correlations between input and output contact matrices were calculated using HiCRep⁵⁷ (https://github.com/TaoYang-dev/hicrep). Spatial partitions were identified using the MCL algorithm⁷² (https://micans.org/mcl/). Chromatin interaction networks were visualized with Cytoscape⁷⁷. Images of 3D genome structures were generated using UCSF Chimera1.13 (ref. ⁷⁸). For all other analysis, please refer to the Supplementary Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The genome structure population and genome-wide structural features are available at https://doi.org/10.5281/zenodo.7352276. The accession codes for the experimental data used in our analyses are as follows: GEO: GSE63525 (Hi-C), GSE63525 (subcompartments), GSE81553 (SON TSA-seq), GSE81553 (lamin-B1 TSA-seq), GSE56465 (single cell lamina DamID), GSM1480326 (GRO-seq), GSE135882 (GPSeq), GSM923451 (Repli-seq), GSM3596321 (scRNA-seq); 4DN: 4DNFIGL8MCSJ (lamin-B1 pA-DamID), 4DNFILYQ1PAY (compartments); ENCODE: ENCFF313LYI, ENCFF171MDW, ENCFF776DPQ, ENCFF309OEW, ENCFF028KBY, ENCFF601YET, ENCFF831ZHL, ENCFF039HDL, ENCFF340JIF, ENCFF803DJF, ENCFF683HCZ (ChIP–seq, histone modifications), https://zenodo.org/record/3928890 (DNA-MERFISH imaging). The complete list of the datasets used in this study and their accession numbers are also tabulated in Supplementary Table 2.

Code availability

The software used to generate the genome structure population and the accompanying documentation are available at https://github.com/alberlab/igm.

References

Misteli, T. The self-organizing genome: principles of genome architecture and function. Cell 183, 28–45 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chakraborty, A. & Ay, F. The role of 3D genome organization in disease: from compartments to single nucleotides. Semin. Cell Dev. Biol. 90, 104–113 (2019).
Article CAS PubMed Google Scholar
Bintu, B. et al. Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science 362, eaau1783 (2018).
Article PubMed PubMed Central Google Scholar
Nguyen, H. Q. et al. 3D mapping and accelerated super-resolution imaging of the human genome using in situ sequencing. Nat. Methods 17, 822–832 (2020).
Article CAS PubMed PubMed Central Google Scholar
Payne, A. C. et al. In situ genome sequencing resolves DNA sequence and structure in intact biological samples. Science 371, eaay3446 (2021).
Article CAS PubMed Google Scholar
Su, J. H., Zheng, P., Kinrot, S. S., Bintu, B. & Zhuang, X. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Cell 182, 1641–1659 (2020).
Article CAS PubMed PubMed Central Google Scholar
Takei, Y. et al. Integrated spatial genomics reveals global architecture of single nuclei. Nature 590, 344–350 (2021).
Article CAS PubMed PubMed Central Google Scholar
Takei, Y. et al. Single-cell nuclear architecture across cell types in the mouse brain. Science 374, 586–594 (2021).
Viana, M. P. et al. Integrated intracellular organization and its variations in human iPS cells. Nature 613, 345–354 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science 353, 598–602 (2016).
Article CAS PubMed PubMed Central Google Scholar
Beagrie, R. A. et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543, 519–524 (2017).
Article CAS PubMed PubMed Central Google Scholar
Belaghzal, H. et al. Liquid chromatin Hi-C characterizes compartment-dependent chromatin interaction dynamics. Nat. Genet. 53, 367–378 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. Mapping 3D genome organization relative to nuclear compartments using TSA-Seq as a cytological ruler. J. Cell Biol. 217, 4025–4048 (2018).
Article CAS PubMed PubMed Central Google Scholar
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Article CAS PubMed Google Scholar
Fang, R. et al. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res. 26, 1345–1348 (2016).
Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).
Article CAS PubMed Google Scholar
Hsieh, T. H. et al. Mapping nucleosome resolution chromosome folding in yeast by micro-C. Cell 162, 108–119 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
Article PubMed PubMed Central Google Scholar
Li, X. et al. Long-read ChIA-PET for base-pair-resolution mapping of haplotype-specific chromatin interactions. Nat. Protoc. 12, 899–915 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central Google Scholar
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013).
Article CAS PubMed Google Scholar
Quinodoz, S. A. et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell 174, 744–757 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263–266 (2017).
Article CAS PubMed PubMed Central Google Scholar
Tan, L., Xing, D., Chang, C. H., Li, H. & Xie, X. S. Three-dimensional genome structures of single diploid human cells. Science 361, 924–928 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zheng, M. et al. Multiplex chromatin interactions with single-molecule precision. Nature 566, 558–562 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bickmore, W. A. & van Steensel, B. Genome architecture: domain organization of interphase chromosomes. Cell 152, 1270–1284 (2013).
Article CAS PubMed Google Scholar
Dekker, J. et al. The 4D nucleome project. Nature 549, 219–226 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kempfer, R. & Pombo, A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. 21, 207–226 (2020).
Article CAS PubMed Google Scholar
Mirny, L. A., Imakaev, M. & Abdennur, N. Two major mechanisms of chromosome organization. Curr. Opin. Cell Biol. 58, 142–152 (2019).
Article CAS PubMed PubMed Central Google Scholar
Vertii, A. et al. Two contrasting classes of nucleolus-associated domains in mouse fibroblast heterochromatin. Genome Res. 29, 1235–1249 (2019).
Boninsegna, L. et al. Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations. Nat. Methods 19, 938–949 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hua, N. et al. Producing genome structure populations with the dynamic and automated PGS software. Nat. Protoc. 13, 915–926 (2018).
Article CAS PubMed PubMed Central Google Scholar
Tjong, H. et al. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl Acad. Sci. USA 113, E1663–E1672 (2016).
Article CAS PubMed PubMed Central Google Scholar
van Schaik, T., Vos, M., Peric-Hupkes, D., Hn Celie, P. & van Steensel, B. Cell cycle dynamics of lamina-associated DNA. EMBO Rep. 21, e50636 (2020).
Article PubMed PubMed Central Google Scholar
Girelli, G. et al. GPSeq reveals the radial organization of chromatin in the cell nucleus. Nat. Biotechnol. 38, 1184–1193 (2020).
Article CAS PubMed PubMed Central Google Scholar
Osorio, D., Yu, X., Yu, P., Serpedin, E. & Cai, J. J. Single-cell RNA sequencing of a European and an African lymphoblastoid cell line. Sci. Data 6, 112 (2019).
Finn, E. H. & Misteli, T. Molecular basis and biological function of variability in spatial genome organization. Science 365, eaaw9498 (2019).
Article CAS PubMed PubMed Central Google Scholar
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central Google Scholar
Barbieri, M. et al. Complexity of chromatin folding is captured by the strings and binders switch model. Proc. Natl Acad. Sci. USA 109, 16173–16178 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tjong, H., Gong, K., Chen, L. & Alber, F. Physical tethering and volume exclusion determine higher-order genome organization in budding yeast. Genome Res. 22, 1295–1305 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bau, D. et al. The three-dimensional folding of the alpha-globin gene domain reveals formation of chromatin globules. Nat. Struct. Mol. Biol. 18, 107–114 (2011).
Article CAS PubMed Google Scholar
Chiariello, A. M., Annunziatella, C., Bianco, S., Esposito, A. & Nicodemi, M. Polymer physics of chromosome large-scale 3D organisation. Sci. Rep. 6, 29775 (2016).
Article CAS PubMed PubMed Central Google Scholar
Di Pierro, M., Zhang, B., Aiden, E. L., Wolynes, P. G. & Onuchic, J. N. Transferable model for chromosome architecture. Proc. Natl Acad. Sci. USA 113, 12168–12173 (2016).
Article PubMed PubMed Central Google Scholar
Di Stefano, M., Paulsen, J., Lien, T. G., Hovig, E. & Micheletti, C. Hi-C-constrained physical models of human chromosomes recover functionally-related properties of genome organization. Sci. Rep. 6, 35985 (2016).
Article PubMed PubMed Central Google Scholar
Esposito, A. et al. Polymer physics reveals a combinatorial code linking 3D chromatin architecture to 1D chromatin states. Cell Rep. 38, 10601 (2022).
Le, T. B., Imakaev, M. V., Mirny, L. A. & Laub, M. T. High-resolution mapping of the spatial organization of a bacterial chromosome. Science 342, 731–734 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, Q. et al. The three-dimensional genome organization of Drosophila melanogaster through data integration. Genome Biol. 18, 145 (2017).
Article PubMed PubMed Central Google Scholar
Lin, X., Qi, Y., Latham, A. P. & Zhang, B. Multiscale modeling of genome organization with maximum entropy optimization. J. Chem. Phys. 155, 010901 (2021).
Article CAS PubMed PubMed Central Google Scholar
Paulsen, J. et al. Chrom3D: three-dimensional genome modeling from Hi-C and nuclear lamin-genome contacts. Genome Biol. 18, 21 (2017).
Article PubMed PubMed Central Google Scholar
Qi, Y. et al. Data-driven polymer model for mechanistic exploration of diploid genome organization. Biophys. J. 119, 1905–1916 (2020).
Article CAS PubMed PubMed Central Google Scholar
Serra, F. et al. Automatic analysis and 3D-modelling of Hi-C data using TADbit reveals structural features of the fly chromatin colors. PLoS Comput. Biol. 13, e1005665 (2017).
Article PubMed PubMed Central Google Scholar
Umbarger, M. A. et al. The three-dimensional architecture of a bacterial genome and its alteration by genetic perturbation. Mol. Cell 44, 252–264 (2011).
Article CAS PubMed Google Scholar
Wong, H. et al. A predictive computational model of the dynamic 3D interphase yeast nucleus. Curr. Biol. 22, 1881–1890 (2012).
Article CAS PubMed Google Scholar
Yildirim, A. & Feig, M. High-resolution 3D models of Caulobacter crescentus chromosome reveal genome structural variability and organization. Nucleic Acids Res. 46, 3937–3952 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, B. & Wolynes, P. G. Topology, structures, and energy landscapes of human chromosomes. Proc. Natl Acad. Sci. USA 112, 6062–6067 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402–405 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bickmore, W. A. The spatial organization of the human genome. Annu. Rev. Genomics Hum. Genet. 14, 67–84 (2013).
Article CAS PubMed Google Scholar
Takizawa, T., Meaburn, K. J. & Misteli, T. The meaning of gene positioning. Cell 135, 9–13 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kind, J. et al. Genome-wide maps of nuclear lamina interactions in single human cells. Cell 163, 134–147 (2015).
Article CAS PubMed PubMed Central Google Scholar
Hildebrand, E. M. & Dekker, J. Mechanisms and functions of chromosome compartmentalization. Trends Biochem. Sci. 45, 385–396 (2020).
Article CAS PubMed PubMed Central Google Scholar
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Article CAS PubMed PubMed Central Google Scholar
Carter, K. C., Taneja, K. L. & Lawrence, J. B. Discrete nuclear domains of poly(A) RNA and their relationship to the functional organization of the nucleus. J. Cell Biol. 115, 1191–1202 (1991).
Article CAS PubMed Google Scholar
Xiong, K. & Ma, J. Revealing Hi-C subcompartments by imputing inter-chromosomal chromatin interactions. Nat. Commun. 10, 5069 (2019).
Article PubMed PubMed Central Google Scholar
Ashoor, H. et al. Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data. Nat. Commun. 11, 1173 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. SPIN reveals genome-wide landscape of nuclear compartmentalization. Genome Biol. 22, 36 (2021).
Article PubMed PubMed Central Google Scholar
Ding, F. & Elowitz, M. B. Constitutive splicing and economies of scale in gene expression. Nat. Struct. Mol. Biol. 26, 424–432 (2019).
Article CAS PubMed PubMed Central Google Scholar
Khanna, N., Hu, Y. & Belmont, A. S. HSP70 transgene directed motion to nuclear speckles facilitates heat shock activation. Curr. Biol. 24, 1138–1144 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kim, J., Venkata, N. C., Hernandez Gonzalez, G. A., Khanna, N. & Belmont, A. S. Gene expression amplification by nuclear speckle association. J. Cell Biol. 219, e201904046 (2020).
PubMed Google Scholar
Hagberg, A. A., Schult, D. A. & Swart, P. J. in 7th Python in Science Conference (SciPy2008) (ed. Gäel Varoquaux, T. V. & Millman, J.) 11–15 (2008).
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Article CAS PubMed PubMed Central Google Scholar
Nemeth, A. et al. Initial genomics of the human nucleolus. PLoS Genet. 6, e1000889 (2010).
Article PubMed PubMed Central Google Scholar
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Article CAS PubMed Google Scholar
Shin, H. et al. TopDom: an efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Res. 44, e70 (2016).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Institutes of Health (grant U54DK107981 and UM1HG011593 to F.A.) as part of the 4D Nucleome Initiative, and an NSF CAREER grant (1150287 to F.A.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
Asli Yildirim, Nan Hua, Lorenzo Boninsegna, Yuxiang Zhan, Guido Polles, Ke Gong, Shengli Hao & Frank Alber
Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, USA
Asli Yildirim, Nan Hua, Lorenzo Boninsegna, Yuxiang Zhan, Guido Polles, Ke Gong, Shengli Hao & Frank Alber
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Yuxiang Zhan & Frank Alber
Department of Pathology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
Wenyuan Li & Xianghong Jasmine Zhou

Authors

Asli Yildirim
View author publications
You can also search for this author in PubMed Google Scholar
Nan Hua
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Boninsegna
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiang Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Guido Polles
View author publications
You can also search for this author in PubMed Google Scholar
Ke Gong
View author publications
You can also search for this author in PubMed Google Scholar
Shengli Hao
View author publications
You can also search for this author in PubMed Google Scholar
Wenyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianghong Jasmine Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Frank Alber
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.Y., N.H., and F.A. designed the research. A.Y, N.H., and L.B. performed all calculations and data analysis. A.Y., N.H., and F.A. interpreted results with input from X.J.Z., G.P., K.G., and S.H. Y.Z. and W.L. helped with data interpretations. A.Y., N.H., and F.A. wrote the manuscript with input from X.J.Z. All authors agreed on the manuscript.

Corresponding author

Correspondence to Frank Alber.

Ethics declarations

Competing interests

X.J.Z. is a co-founder and co-CEO of EarlyDiagnostics. F.A. is shareholder of EarlyDiagnostics. A.Y. is an employee of Illumina. All other authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks Raphael Mourad and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Carolina Perdigoto and Dimitris Typas, in collaboration with the Nature Structural & Molecular Biology team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 3D chromatin structure modeling and assessment.

a, The contact probability matrix calculated from Hi-C (left) and the structure population (right) for chromosome 2. Zoomed-in heatmaps show the matrix for 40–80 Mb. b, Density scatter plot of contact probabilities from Hi-C data and structure population (Pearson correlation: r = 0.98, p = ~0). c, Histograms of restraint residuals from the structure population (Methods). A restraint residual less than 1.05 is considered satisfied and is not displayed in the histograms (99.9% of restraints fall in this category). d, The contact probability matrix for chromosome 2 showing the 50% randomly chosen dataset used as input (lower triangle) vs. the matrix generated from the structure population (upper triangle). e, Density plot of Hi-C contact probabilities missing in the input and their predictions from the structure population (Pearson correlation: r = 0.93, p = ~0). f, Average radial positions of chromatin in different replication phases⁵⁸. Numbers of regions in each boxplot are (from left to right): 1,714, 1,921, 2,166, 2,246, 2,068, 2,064. g, Comparison of the inter-chromosomal loci co-localization frequencies between the observed occurrence in FISH experiments¹⁸ and in the structure population shown as bar plots (left), and scatter plot (right). h, A FISH image with three different probes at far-separating loci on chromosome 6 (left), the comparison of pair-wise distances of these loci in experiment and models (middle), and their relative radial positions in experiment and models (right). Numbers of regions in each boxplot for models: 10,000, for experiment: 892. In f and h, the box and the middle line in the box show the the interquartile range (IQR = Q3 – Q1) and the median. The vertical lines outside the box (and whiskers in h) extend to a maximum of 1.5*IQR beyond the box. Q1 and Q3 are the lower and upper quartile of the distribution. Outliers are shown as dots.

Extended Data Fig. 2 Assessment of single cell chromosome structures.

a, Comparison of selected structures of chromosome 6 modeled in our simulations and imaged in DNA-MERFISH experiments⁶. 3D chromosome structures are shown to the right of the corresponding normalized distance matrix for structures from DNA MERFISH experiments (top row) and structure modeling (bottom row). Modeled structures at 200 kb base pair resolution have higher coverage than those from experiment. To allow direct comparison, we highlight those genomic regions that are imaged in DNA-MERFISH by opaque ball and stick representations, while other genomic regions are shown as translucent genomic regions. b, Comparison of distance matrices for single cell structures of chromosome 6 from DNA-MERFISH experiments (top) and structure models (bottom); models reproduce folding patterns of different chromosome conformations observed in experiment. c, Comparison of distance matrices for single cell structures of chromosome 2 from DNA-MERFISH experiments (top) and structure models (bottom). d, Comparison of chromosome structures between models from Dip-C experiment25 (top row) and our structure population (bottom row) for chromosome 2 (left) and chromosome 6 (right). Shown are the normalized distance matrices for chromosome structures in different conformational states. Both experimental and modeled structures show high degree of similarity. In b, c, and d, the numbers indicate the Pearson correlation between the experimental and the predicted distance matrices.

Extended Data Fig. 3 Assessment of radial positions.

a, Average radial position profiles in chromosomes 3 (left), 5 (middle), and 11 (right). Also shown in blue are lamina contact frequencies (CF) from single cell lamin DamID experiments⁶¹. Valleys in the average radial position plots match well with low lamina CF regions (red dashed lines). b, Density scatter plot of average radial positions (RAD) of chromatin regions from the structure population against the lamina CF from single cell lamin DamID experiments in haploid KBM7 cell type⁶¹. 93% of chromatin regions with the 25% lowest average radial positions show either no detectable or only occasional contact with lamina (CF < 20%). Vertical and horizontal black dashed lines show the 25^th percentile average radial position and the 20% CF values, respectively. c, Scatter plot showing the comparison between experimental and predicted GPSeq scores³⁵ (Pearson correlation: r = 0.80, p = ~0) d, Comparison of experimental and predicted GPSeq³⁵ profiles for the 0–80 Mb region in chromosome 2. e, Probabilities for chromatin region of a given subcompartment to be located in any of the five concentric shells, each containing the same total amount of chromatin (Methods). Shell 1 is the most interior shell. Error bars show mean +/− s. d. Numbers of regions used in each bar for each shell for A1: 372, A2: 545, B1: 316, B2: 402, B3: 837. f, Violin plots for distributions of cell-to-cell variabilities of radial positions (δ_RAD) for chromatin regions in different subcompartments. White circles and black bars show the median value and the interquartile range (IQR: Q3 – Q1). Whiskers show minima and maxima. Q1 and Q3 are the lower and upper quartile of the distribution. Numbers of regions used in each violin plot are: A1: 1,858, A2: 2,723, B1: 1,581, B2: 2,008, B3: 4,187. Dashed line separates low and high levels of variability.

Extended Data Fig. 4 Predictions of nuclear body related experimental data using 3D models.

a, Fraction of mapped histone mark peaks and number of A1/A2 chromatin regions in experimental¹³ (top) and predicted SON TSA-seq deciles (bottom). b, Predicted mean speckle distances (SpD) for chromatin in experimental SON TSA-seq deciles¹³. The box and the middle line show the interquartile range (IQR = Q3–Q1) and the median. The vertical lines and whiskers extend to a maximum of 1.5*IQR. Q1 and Q3 are the lower and upper quartile of the distribution. Number of regions used in each decile boxplot is 1,368. c, Spearman correlations between the experimental¹³ and predicted SON TSA-seq signals (top) using sequence distances to A1 clusters (red), 3D distances to A1 partitions in random territories (blue), 3D distances to A1 regions in the same chromosome (green), and 3D distances to A1 partitions (black), and chromosome 17 profiles (Spearman correlations: 0.37, 0.30, 0.38, 0.78, respectively, bottom). d, Spearman correlations between experimental¹³ and predicted SON TSA-seq signals (top) using A1 (black) or A2 (red) partitions, or partitions from chromatin with 10% lowest average radial positions (INT, blue), and chromosome 3 profiles (Spearman correlations: 0.88, 0.89, 0.58, respectively, bottom). e, Cell-to-cell variability of speckle distances (δ_SpD) from DNA-MERFISH⁶ and models. Error bars show mean +/− s.d. of predicted δ_SpD values in each imaged δ_SpD quartile. Number of regions used in each quartile (left to right): 245, 241, 247, 246. Scatter plots of the experimental and predicted signals (left) and chromosome 7 profiles (right) for (f) LaminB1 TSA-seq¹³ (Pearson correlation: r = 0.78, p = ~0) and (g) LaminB1 pA-DamID³⁴ (Pearson correlation: r = 0.80, p = ~0). h, Scatter plot of lamina association frequencies (LAF) from DNA-MERFISH⁶ and models (Pearson correlation: r = 0.64, p = ~0). i, Scatter plot of median trans A/B ratios and LAF for each region predicted from models (Pearson correlation: r = −0.90, p = ~0). j, Comparison of nucleoli association frequencies (NAF) from DNA-MERFISH⁶ and models (Pearson correlation: r = 0.71, p = ~0).

Extended Data Fig. 5 Chromatin compaction and TAD borders.

a, Average radius of gyration (RG, that is local decompaction) profile for chromatin in the 40–90 Mb region of chromosome 4. The background is color coded by the subcompartment annotations of chromatin (top). Cell-to-cell variability of RG values (δ_RG) in the structure population for the same chromatin regions. Negative values indicate regions with low RG variability (bottom). Bars are color coded by the subcompartment annotations of the corresponding chromatin regions. b, RG peak frequencies (that is, the fraction of models showing a RG maximum at a given position) for a 6-Mb region in chromosome 4 (80–86 Mb) (top), and Hi-C contact frequency heatmap for the same region showing TAD borders identified by TopDom⁷⁹ (bottom). Regions with RG peak frequency maxima are shown with gray dashed lines, and either overlap or are very close to TAD borders identified by TopDom (red dashed lines). c, Two representative structures showing chromatin folding patterns for the same chromatin region in b. TAD identities are shown by color code. d, Averaged RG peak frequencies for loci at TopDom TAD borders (green) compared to randomly selected loci (gray). In around 50% of structures, there is a RG peak in the immediate neighboring region of a TAD border (±200 kb, Mann-Whitney-Wilcoxon test, two-sided, p = 1.47×10⁻²⁰⁰ compared to random) In ~70% of structures there is a RG peak within a ±400 kb range of a TAD border (Mann-Whitney-Wilcoxon test, two-sided, p = 2.46 × 10⁻⁷ compared to random). Data are presented as mean values +/− s.d. Numbers of data points at each distance used for TopDom borders (green): 1,839, for random borders (gray): 2,491.

Extended Data Fig. 6 Chromatin trajectories with speckle and lamina anchor points.

a. Structural feature profiles for a representative example of a long trajectory (chromosome 3 128.4–146.8 Mb). Calculated feature profiles include the experimental SON TSA-seq data, Lamin B1 TSA-seq data, predicted mean radial positions (RAD), and predicted structural variability (δ_RAD). b. The same structural feature profiles for a representative example of a short trajectory (chromosome 7 5.4–9.2 Mb). Anchor points (indicated by arrows in both a and b) are defined by extreme average radial positions (interior for speckle anchor and exterior position for lamina anchor points) as well as high SON TSA-seq and lamin B1 TSA seq values. c. Schematic illustration of short and long chromatin trajectories with speckle and lamina anchor points and two hypothetical chromatin conformations connecting the anchor points. Chromatin trajectories are defined as consecutive chromatin sequence regions between a speckle and nearest lamina anchor points. d. Box plots of radial structural variability (δ_RAD) distributions for chromatin regions in short and long trajectories (Mann-Whitney-Wilcoxon test, two-sided, p-value = 1.48 × 10–18). The box and the middle line in the box show the the interquartile range (IQR = Q3 – Q1) and the median. The vertical lines outside the box extend to a maximum of 1.5*IQR beyond the box. Q1 and Q3 are the lower and upper quartile of the distribution. Outliers are shown as dots. Numbers of regions used in each boxplot for short: 321, for long: 1,929.

Extended Data Fig. 7 Structural features of chromatin in different subcompartments.

a, Violin plots for the distributions of the 17 structural features calculated from the structure population for chromatin in different subcompartments. White circles and black bars in the violins show the median value and the interquartile range (IQR: Q3 – Q1). Whiskers show minima and maxima. Q1 and Q3 are the lower and upper quartile of the distribution. Numbers of regions used in each violin plot are: A1: 1,858, A2: 2,723, B1: 1,581, B2: 2,008, B3: 4,187. (b-d): Fold-change enrichment of the 17 structural features for chromatin in different: b, SCI states⁶⁶, c, SPIN states⁶⁷, d, PC1 decile groups³⁸. Structural feature abbreviations are as defined in Fig. 1.

Extended Data Table 1 Properties of subcompartment interaction networks and spatial partitions

Full size table

Extended Data Table 2 Genome-wide Pearson and Spearman correlations between experimental and predicted SON TSA-seq data using different approaches

Full size table

Supplementary information

Supplementary Information

Supplementary Figs. 1–26, Tables 1 and 2, Discussion.

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yildirim, A., Hua, N., Boninsegna, L. et al. Evaluating the role of the nuclear microenvironment in gene function by population-based modeling. Nat Struct Mol Biol 30, 1193–1206 (2023). https://doi.org/10.1038/s41594-023-01036-1

Download citation

Received: 03 November 2021
Accepted: 16 June 2023
Published: 14 August 2023
Issue Date: August 2023
DOI: https://doi.org/10.1038/s41594-023-01036-1

This article is cited by

Computational methods for analysing multiscale 3D genome organization
- Yang Zhang
- Lorenzo Boninsegna
- Jian Ma
Nature Reviews Genetics (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Assessment of 3D genome structures

Average nuclear position and its cell-to-cell heterogeneity

Structural variability correlates with functional properties

Subcompartments separate into spatial partitions

Predicting locations of nuclear speckles

Predicting speckle-associated structural features

Defining lamina- and nucleoli-associated features

The role of the nuclear microenvironment in gene function

Gene transcription

The organizing role of nuclear speckles and lamina

The role of microenvironment in replication timing

Chromatin compartmentalization

Discussion

Methods

Population-based 3D structural modeling

General description

Genome representation

Random starting configurations

Comparison between contact frequency maps from Hi-C experiment and model population

Comparison of simulated single cell chromosome structures with those from DNA-MERFISH imaging

Robustness and converge analysis

Replicates

Population size

Chromatin interaction networks and identification of spatial partitions

Building chromatin interaction networks

Network properties

Identifying spatial partitions via Markov clustering

Speckle partitions

Nucleoli partitions

Properties of partitions

Structural features

Mean radial position (RAD, no. 1)

Local chromatin fiber decompaction (RG, no. 2)

Mean gene–speckle and gene–nucleolus distances (SpD and NuD, nos. 3 and 4)

Cell-to-cell variability of features (δ RAD, δ RG, δ SpD, and δ NuD, nos. 5–8)

Interior localization frequency (ILF, no. 9)

Nuclear-body association frequencies (SAF, LAF, and NAF, nos. 10–12)

TSA-seq (S-TSA, L-TSA, N-TSA, nos. 13–15)

Mean inter-chromosomal neighborhood probability (ICP, no. 16)

Median trans A/B ratio (no. 17)

Data analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links

Cell-to-cell variability of features (δ _RAD, δ _RG, δ _SpD, and δ _NuD, nos. 5–8)