Abstract
The nuclear folding of chromosomes relative to nuclear bodies is an integral part of gene function. Here, we demonstrate that population-based modeling—from ensemble Hi-C data—provides a detailed description of the nuclear microenvironment of genes and its role in gene function. We define the microenvironment by the subnuclear positions of genomic regions with respect to nuclear bodies, local chromatin compaction, and preferences in chromatin compartmentalization. These structural descriptors are determined in single-cell models, thereby revealing the structural variability between cells. We demonstrate that the microenvironment of a genomic region is linked to its functional potential in gene transcription, replication, and chromatin compartmentalization. Some chromatin regions feature a strong preference for a single microenvironment, due to association with specific nuclear bodies in most cells. Other chromatin shows high structural variability, which is a strong indicator of functional heterogeneity. Moreover, we identify specialized nuclear microenvironments, which distinguish chromatin in different functional states and reveal a key role of nuclear speckles in chromosome organization. We demonstrate that our method produces highly predictive three-dimensional genome structures, which accurately reproduce data from a variety of orthogonal experiments, thus considerably expanding the range of Hi-C data analysis.
Similar content being viewed by others
Main
The spatial organization of eukaryotic genomes is linked to regulation of gene transcription, DNA replication, cell differentiation, and, upon malfunction, to cancer and other diseases1,2. Recent advances have led to a prolific development of improved technologies in live-cell and super-resolution microscopy3,4,5,6,7,8,9,10, as well as mapping technologies based on high-throughput sequencing11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26, for probing chromosome interactions and three-dimensional (3D) organization27,28,29,30. However, mapping the 3D nuclear locations of all genes simultaneously in single cells remains a major challenge. Several experimental technologies probe the mean distances (tyramide signal amplification sequencing (TSA-seq)13) or association frequencies (nucleolus-associated domain sequencing (NAD-seq)31; DNA adenine methyltransferase identification (DamID)16) of genes with nuclear speckles, lamina-associated domains (LADs), and nucleoli. However, these methods do not have the technical capacity to collect all this information simultaneously within the same cell, and the considerable cell-to-cell variability of chromosomal structures adds additional layers of complexity. Several multiplex fluorescence in situ hybridization (FISH) and super-resolution microscopy techniques have recently provided such information5,6,7. For instance, DNA- and RNA-multiplexed error-robust FISH (MERFISH) imaging has detected, within the same cells, the nuclear locations of 1,137 genes, together with the positions of nuclear speckles and nucleoli, as well as the amount of mRNA transcripts6. However, at this point, the amount of probed genomic DNA regions is still sparse, representing ~1% of entire genomes.
Here, we introduce an approach for modeling a population of single-cell 3D genome structures to describe the nuclear microenvironment of all genomic regions in single-cell models, defined by their nuclear locations relative to nuclear landmarks and nuclear compartments. Our aim is to evaluate the roles of the nuclear microenvironment and its cell-to-cell variability in chromatin function and identify characteristic nuclear microenvironments that distinguish chromatin in different functional states.
We achieve this goal by using a population-based genome structure modeling approach, which takes Hi-C data to generate a population of diploid genome structures statistically consistent with it32,33,34. We demonstrate that our method produces—from Hi-C data alone—highly predictive genome structures, which predict with high correlation the cytological distances of genomic regions to nuclear speckles and lamina from SON TSA-seq13 and lamin-B1 TSA-seq13 experiments, contact probabilities to the nuclear lamina from lamin-B1 protein A-DamID (pA-DamID)35 experiments, mean radial positions from genomic loci positioning by sequencing (GPSeq)36 experiments, and distance distributions and single-cell chromosome tracing data from 3D FISH18 and DNA-MERFISH6 experiments, respectively. We define the nuclear microenvironment of a genomic region by an array of structural descriptors, including its nuclear radial position; association frequencies with and mean distances to nuclear speckles, the lamina, and nucleoli; the local chromatin fiber compaction; and local compartmentalization in form of the trans A/B ratio, defined as the fraction of its inter-chromosomal interactions with chromatin in the A (active) or B (inactive) compartment6 (Fig. 1a,b). These structural descriptors are determined in single-cell models, thereby revealing the cell-to-cell variability of the nuclear microenvironment for a genomic region across the population of models.
Our genome structure analysis provides several key findings. First, genomic regions with a strong preference for the same specific microenvironment across cells, thus having low structural cell-to-cell variability, are also most homogeneous in their functional properties. These chromatins are associated in most cells with either nuclear speckles or constitutive LADs and act as structural anchor points to genome organization. Second, our analysis shows that the subnuclear microenvironment of a genomic region reflects its transcriptional potential upon activation. Genes with high expression heterogeneity37 often show increased structural variability in the nucleus, indicating a contribution of extrinsic noise to gene expression heterogeneity38. Third, our observations confirm that Hi-C subcompartments39 define physically distinct chromatin environments, some of which (like A1) are linked to associations with nuclear speckles.
Although other computational approaches have modeled entire chromosomes, or even diploid genomes, from Hi-C data18,34,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56, none so far has documented the predictive accuracy in reproducing multimodal experimental data, as presented here. Our findings demonstrate that our approach, from Hi-C data alone, produces predictive models that provide a detailed description of the subnuclear locations, folding, and compartmentalization of chromatin in diploid genomes. Therefore, our approach considerably expands the scope of Hi-C data analysis and is widely applicable to any cell type for which Hi-C data are available.
Results
Assessment of 3D genome structures
Here, we study 3D structures of diploid lymphoblastoid genomes (GM12878) from in situ Hi-C data39 at 200-kb (kilobase) resolution. Our method generates a population of 10,000 genome structures, in which all accumulated chromatin contacts are statistically consistent with contact probabilities from Hi-C experiments33,34,32. Structure optimization is achieved by solving a maximum likelihood estimation problem in an iterative fashion33,34,48,32 (Methods). The resulting genome structure population accurately reproduces experimental Hi-C contact probabilities (Pearson’s r = 0.98, genome-wide; 0.99 and 0.83 for cis and trans contacts, P = ~0, average chromosome SCC57 = 0.87; Extended Data Fig. 1a–c and Supplementary Information).
Our method is robust against missing data, as models generated from sparse Hi-C data (50% entries randomly removed) accurately predict the missing Hi-C contact frequencies (Pearson’s r = 0.93 (cis) and 0.69 (trans) of missing data, P = ~0; Extended Data Fig. 1d,e and Methods). Moreover, our models accurately predict, with significant correlation to their experimental values, a host of orthogonal data from lamin-B1 pA-DamID35, lamin-B1 TSA-seq13, SON TSA-seq13, and genomic loci positioning by sequencing (GPSeq)36 experiments (Pearson’s r = 0.80, 0.78, 0.87, and 0.80, respectively; Table 1 and Methods), which we will discuss in greater detail throughout this paper. Our models also confirm preferences for interior radial positions of chromatin replicated in the earliest G1b phase (P = 2.39 × 10−77, Mann–Whitney–Wilcoxon test, two-sided) and predict a gradual increase in average radial positions for chromatin replicated at later times58 (Extended Data Fig. 1f). Our results also agree with those of 3D FISH experiments18, namely co-location frequencies of four inter-chromosomal pairs of loci (Pearson’s r = 0.99, P = 0.014; Extended Data Fig. 1g) and distance distributions between three loci on chromosome 6 and relative differences in radial positions of these loci (Extended Data Fig. 1h). We also assessed our single-cell chromosome structures with data from multiplex DNA-MERFISH6 imaging and single cell Dip-C25 experiments, and found good agreement between the single-cell chromosome conformations in our models and those from DNA-MERFISH (Extended Data Fig. 2a–c) and Dip-C experiments (Extended Data Fig. 2d and Methods). All results were reproduced using technical replicates (Methods and Supplementary Information).
We now characterize the nuclear microenvironment of genomic regions by calculating a variety of structural descriptors for each genomic region in each single-cell model (Fig. 1a,b). Our aim is to identify characteristic nuclear microenvironments distinguishing chromatin of different functional states and to evaluate the roles of the nuclear topography and its cell-to-cell variability in regulating transcription and replication.
Average nuclear position and its cell-to-cell heterogeneity
The nuclear positions of genes are of functional relevance: FISH experiments revealed for some genes, upon transcriptional activation, a statistical shift of their locations towards the nuclear center59,60. Owing to the stochastic nature of genome structures, the radial nuclear position of a locus can vary between individual cells (Fig. 2a). However, the average radial position over all the models in the population reveals distinct preferences, which vary between different genomic loci (Fig. 2b, upper panel). The minima in the average radial profiles of chromosomes overlap with regions of lowest lamin-B1 DamID signals which have the lowest probabilities to interact with the nuclear envelope61 (Extended Data Fig. 3a,b). Our predictions also reproduce average radial locations, inferred from DNA digestion timing, detected in GPSeq experiments36 (Pearson’s r = 0.80, P = ~0; Extended Data Fig. 3c,d).
Notably, sequence positions that coincide with large transitions in the average radial profile often overlap with borders between the five primary Hi-C subcompartments identified by Rao et al.39 (Fig. 2b, top, and Methods) (that is, two transcriptionally active (A1, A2) and three inactive subcompartments (B1, B2, B3)). Chromatin in different subcompartments displays distinct distributions of average radial positions (Fig. 2c) and radial shell occupancy (Extended Data Fig. 3e), confirming previous observations25,36. For example, both A1 and B1 chromatins preferentially occupy the most interior radial shells of the nucleus, whereas B3 chromatin (mostly associated with LADs) shows a preferential location at the periphery and A2 chromatin shows a wide range of average locations without a marked radial preference (Fig. 2c and Extended Data Fig. 3e).
Structural variability correlates with functional properties
We also calculated the cell-to-cell variability for gene locations (δRAD), to quantify stochastic variations of radial positions between cells (Methods). δRAD differs distinctly between genomic loci (Fig. 2b, bottom). Sections of chromatin with high structural variability (δRAD > 0) alternate, in sharp transition, with regions of low variability (δRAD < 0)—transitions between high and low variability occur over relatively small sequence distances (Fig. 2b, bottom). These transitions align well with borders between subcompartments, most prominently between the A2 and B3 subcompartments (Fig. 2b, bottom). Continuous sections with similar δRAD values are often part of the same subcompartment.
We noticed that the structural variability of a genomic region is a strong indicator of its functional properties, for both the active A and inactive B compartment. Chromatin in the A compartment with low structural variability (δRAD < 0) (A-LV) (Fig. 2d) is enriched for high SON TSA-seq13 signals and low signals from lamin-B1 pA-DamID35 experiments; thus, A-LV regions have relatively short mean distances to nuclear speckles and are excluded from the nuclear periphery (Fig. 2e). Moreover, A-LV chromatin is highly enriched for constitutive inter-LADs61 (that is, regions never observed as LADs in any cell type) and is mostly replicated at the earliest G1b phase58. A-LV chromatin also shows significantly higher transcriptional activity than chromatin in the A compartment with high structural variability (δRAD > 0) (A-HV) (P = 1.35 × 10−40, Mann–Whitney–Wilcoxon test, two-sided; Fig. 2f). Overall, active genes with the highest number of transcripts in single-cell RNA-seq (scRNA-seq) experiments37 have a significantly lower δRAD compared to genes with the lowest number of transcripts (P = 3.45 × 10−18, Mann–Whitney–Wilcoxon test, two-sided; Fig. 2g).
By contrast, A-HV chromatin lacks SON TSA-seq signal enrichment and thus has larger mean distances to nuclear speckles, and is enriched for facultative inter-LADs (Fig. 2e). Notably, A-HV regions with the largest structural variability often show a bimodal distribution in their single-cell radial positions, an indication of two favored nuclear locations—at the nuclear interior and a peripheral location (Fig. 2h). We hypothesize that genes in these regions may exist in two functional states: active in the transcriptionally favorable interior, and silenced in the periphery. Indeed, compared with A-LV chromatin, A-HV chromatin is more enriched for the repressive trimethylated H3 K9 (H3K9me3) mark and depleted of the activating acetylated H3 K9 (H3K9ac) mark (Fig. 2i), which could point to a higher functional heterogeneity in single cells. Notably, the structural variability can distinguish A1 from A2 subcompartment chromatin (Fig. 2j, left, and Extended Data Fig. 3f)—93% of all A-HV regions in the active compartment are A2 chromatin, whereas A1 chromatin is strongly enriched in A-LV (Fig. 2j, left, and 2k).
Similar to the active compartment, B compartment chromatin also shows substantial differences in functional properties between the highly variable (B-HV) (δRAD > 0) and lowly variable (B-LV) (δRAD < 0) (Fig. 2d, right) genomic regions (Fig. 2e). Subsequently, the B1, B2, and B3 subcompartments are well distinguished by their structural variability and average radial positions (Fig. 2j, right), and B2 and B3 are enriched in B-HV and B-LV regions, respectively (Fig. 2k).
Subcompartments separate into spatial partitions
Chromosome folding permits functionally related chromatin, separated in sequence, to assemble into spatial compartments (Fig. 3a). The single-cell interaction networks (CINs) of chromatin in the same subcompartment show a heterogeneous network organization with clusters of highly connected and physically separated subgraphs (that is, local partitions) reminiscent of microphase fragmentation62 (Fig. 3b and Methods). These spatial partitions can be visualized in single genome structures by the occupied volume of the contained chromatin (Fig. 3b,c).
Network structures differ between individual subcompartments. While A1 chromatin is fragmented into the smallest number of partitions with the largest sizes (Fig. 3d,e and Extended Data Table 1) and highest fraction of inter-chromosomal interactions (Fig. 3f), A2 chromatin is fragmented into substantially larger numbers of smaller partitions, dominated by intra-chromosomal interactions (Extended Data Table 1 and Fig. 3d–f). Among the B compartment, B3 chromatin has the largest partitions, dominated by intra-chromosomal interactions (Fig. 3e,f).
The larger partition sizes of A1 and B3 chromatin lead to a more homogenous compartmentalization, with each having a higher neighborhood enrichment score with its own kind (see high enrichment fold along the diagonal in Fig. 3g and Methods). Smaller partition sizes of A2 and B1 chromatin lead to relatively high neighborhood enrichment with other chromatin (see off diagonal enrichment in Fig. 3g). A2 partitions are often associated with B3 chromatin, whereas B1 partitions are associated with A1 chromatin36 (Fig. 3g,h).
When we mapped nascent RNA expression from GRO-seq experiments63 onto our genome structures, we found increasing transcriptional activities towards the centers of A1 partitions (Fig. 4a). A2 partitions show similar trends, although substantially lower signals (Fig. 4a). We also observe that highly expressed genes reside preferably in larger partitions, and expression levels at the centers of large A1 and A2 partitions are notably higher than those of smaller ones (Fig. 4b). These observations indicate that spatial partitions of active chromatin are regional territories of highest transcriptional activities.
Predicting locations of nuclear speckles
Mapping TSA-seq data13 onto our genome structures revealed the strongest TSA-seq signals—and thus the smallest mean speckle distances—for chromatin located towards the central regions of A1 partitions (Fig. 4c); this suggests that the center locations of A1 partitions could represent positions of nuclear speckles in individual cell models. To test this assumption, we simulated the experimental TSA-seq process by using A1 partition centers as approximate speckle locations (Fig. 4d and Methods). The simulated, population-averaged SON TSA-seq data from our models show highly significant correlation with the experimental SON TSA-seq values13 (Pearson’s r = 0.87, P = ~0), capturing well both peak sizes and signal distributions (Fig. 4e,f). For instance, the TSA-seq profile of chromosome 2 is reproduced with high correlation (Pearson’s r = 0. 90, P = ~0) across the entire chromosome profile, despite containing few A1 regions (6.4%) (Fig. 4e). Chromatins grouped by predicted TSA-seq signals show characteristic enrichment of histone modifications, identical to those observed in the experiment13 (Extended Data Fig. 4a). Moreover, predicted speckle locations confirm the proposed correlation between mean speckle distances of chromatin and its experimental TSA-seq signal (Extended Data Fig. 4b).
We then found out that speckle locations can be predicted accurately even without relying on A1 subcompartment annotations, which are only available for a limited number of cell lines. We found that spatial partitions of chromatin with lowest average radial positions in the bottom 10% (labeled as internal (INT) in Fig. 4c) predict speckle locations within 500 nm to those derived from A1 partitions in 99% of structures (78% of chromatin with 10% lowest average radial positions are part of A1). Subsequently, the SON TSA-seq data can also be predicted from INT centers with almost identical accuracy (Pearson’s r = 0.86, P = ~0) (Extended Data Fig. 4d and Extended Data Table 2). Further investigations showed that only INT or A1 chromatin partition centers predict accurately the nuclear speckle locations in our models (Extended Data Fig. 4c,d and Extended Data Table 2).
Predicting speckle-associated structural features
With predicted speckle locations as reference points, we can now calculate speckle-associated features (SpD, SAF, δSpD, S-TSA in Fig. 1b) for each genomic region (Methods). The predicted speckle association frequencies (SAFs) of genomic regions agree with a recent DNA-MERFISH microscopy study6 with high correlation (Pearson’s r = 0.77, P = 1.2 × 10−202; Fig. 4g and Methods). Predicted trans A/B ratios also show a high correlation with those from DNA-MERFISH6 (Pearson’s r = 0.70, P = 7.6 × 10−109; Fig. 4h). We also found a moderate but highly significant correlation for the cell-to-cell variability of speckle distances (δSpD) between our models and the experiment (Extended Data Fig. 4e; Pearson’s r = 0.352, P = 7 × 10−30). Interestingly, we find a strong anticorrelation between the inter-chromosomal contact probability (ICP) of a genomic region and its mean speckle distance (SpD) (Pearson’s r = −0.95, P = ~0; Supplementary Fig. 1). Thus, the surroundings of speckles are strongly enriched in interchromosomal interactions, in particular for A compartment chromatin. This observation is confirmed by a strong correlation between a gene’s SAF and trans A/B ratio6 (Pearson’s r = 0.98, P = ~0; Fig. 4i).
Defining lamina- and nucleoli-associated features
Our models also accurately predict structural features describing chromatin positioning relative to the nuclear lamina (LAF, L-TSA in Fig. 1b). For instance, our models predict experimental lamin-B1 TSA-seq data with high correlation13, thus revealing accurate mean distances of genomic regions to the nuclear envelope (Pearson’s r = 0.78, P = ~0; Extended Data Fig. 4f and Table 1). Our models also predict lamin-B1 pA-DamID35 data with high correlation (Pearson’s r = 0.80, P = ~0; Extended Data Fig. 4g and Table 1), and thus could predict well the contact frequencies of genomic regions with the nuclear periphery. Finally, our models also reproduce experimental lamina association frequencies (LAFs)6 (Pearson’s r = 0.64, P = ~3.6 × 10−119; Extended Data Fig. 4h), despite the differences in shape between IMR-90 and GM12878 cell nuclei. Predicted LAF values are inversely correlated with a gene’s trans A/B ratios, confirming previous observations from DNA-MERFISH imaging6 (Extended Data Fig. 4i).
Moreover, our models also predict nucleolus-associated structural features (NuD, δNuD, NAF (nucleoli association frequencies), N-TSA in Fig. 1; Extended Data Fig. 4j and Methods).
Finally, we also calculate structural features of the chromatin fiber (Extended Data Fig. 5a and Methods), including local chromatin compaction (RG), which confirm the locations of TAD borders (Extended Data Fig. 5b–d).
The role of the nuclear microenvironment in gene function
Overall, we calculate a total of 17 structural features from our single-cell genome structure models (Fig. 1 and Supplementary Figs. 2–22). Collectively, these features define the nuclear microenvironment of each genomic region, which allows us to assess the role of the nuclear microenvironment in explaining functional differences between chromatin, in particular for gene transcription, DNA replication, and chromatin compartmentalization.
Gene transcription
First, we compare the stochastic variability of gene–speckle distances (δSpD) in single-cell models with the heterogeneity of single-cell gene expression from single-cell RNA sequencing (scRNA-seq) experiments37. Cumulatively ranked single-cell distances of a genomic region to its nearest predicted speckle (Fig. 5a, top) show striking similarities to the cumulatively ranked number of gene transcripts of the corresponding genes in a cell population from scRNA-seq37 (Fig. 5b, top, and Methods). Subsequently, the gene transcription frequency (TRF), defined as the fraction of cells a gene transcript is detected in scRNA-seq37 (Fig. 5a,b, bottom), shows a highly significant correlation with the SAF predicted from the models (Fig. 5c, left panel, Spearman’s r = 0.51, P = ~0). Thus, genes with transcripts in a large fraction of cells are also located close to speckles in a large fraction of models. We also validated these findings with transcription frequencies measured from RNA-MERFISH microscopy for 1,137 genes6. Here as well, we observe the identical highly significant correlation between TRF and SAF (Spearman’s r = 0.51, P = 1.6 × 10−64) (Fig. 5c, right panel). Interestingly, a gene’s interior location frequency (ILF; Methods) shows substantially smaller correlation with the TRF than with the SAF, for both scRNA-seq37 and RNA-MERFISH6 data (Spearman’s r = 0.42, P = ~0 (scRNA-seq) and r = 0.45, P = 4.1 × 10−50 (RNA-MERFISH)) (Fig. 5c). These observations indicate a possible role for single-cell variations of a gene’s nuclear microenvironment in its expression heterogeneity.
Moreover, we found that genes in the top 10% of genes with the highest numbers of transcripts (T10) are distinguished in their nuclear microenvironment from genes in the bottom 10% (B10). T10 genes show strong enrichment for several structural features (Fig. 5d, for example SAF and trans A/B), while being depleted in δRAD, δSpD, and δNuD. Thus, T10 genes show a strong preference for the same specific microenvironment in different cells, while B10 genes do not—their microenvironment is highly variable between cells without clear association preferences to nuclear bodies.
The distribution of feature values for the most discriminative features (SpD, ILF, SAF, RAD, and trans A/B) are quite different between the T10 and B10 genes (Fig. 5e). However, SAF and the highly correlated trans A/B ratio outperform all other features, including the radial gene position (RAD), in distinguishing T10 from B10 genes, as shown by the receiver operating characteristic (ROC) curves (Fig. 5f) (AUC for SAF = 0.85, RAD = 0.65). This finding could indicate that the general preference of highly expressed genes at interior radial positions may be an indirect consequence of favored associations with nuclear speckles, which themselves show stochastic preferences towards the nuclear interior13,64.
Moreover, genes controlled by superenhancers (SEN) show overall higher fold enrichments in structural features than genes controlled by regular enhancers (EN) (Methods). Thus, SEN genes reveal stronger preferences in their nuclear microenvironment between cells, particularly for higher SAF, interior positions, trans A/B, ICP, and depletion of LAF values (Fig. 5g).
The organizing role of nuclear speckles and lamina
Our approach allows a detailed analysis of chromatin speckle interactions. Chromatins divided into ten groups on the basis of their experimental SON TSA-seq signals13 show distinct structural enrichment patterns, which gradually change with increasing SON TSA-seq values (Fig. 6a). Chromatins in deciles d4–d7 (intermediate mean speckle distances) are highly variable in their nuclear positions (δRAD) and show no preferred associations with nuclear bodies studied here (Fig. 6a,b). By contrast, chromatins in the first (d1, d2) and last (d9, d10) deciles show the highest fold enrichments and thus the most stable microenvironment with strong structural homogeneity between cells in the population; these regions show the lowest δRAD and have the smallest and largest SpD, respectively (Fig. 6b). The latter coincides with mostly B-LV chromatin located at the nuclear periphery (Fig. 2d, right panel) and subsequently high lamin-B1 TSA-seq signal enrichment (Fig. 2e, left panel). Thus, these genomic regions provide stable structural anchor points at the nuclear periphery. Speckle-associated chromatin with the highest SON TSA-seq signals (d8–d10 in Fig. 6b) and SAF values also show relatively low cell-to-cell structural variability in their radial positions (δRAD) (Fig. 6b, mostly A-LV in Fig. 2d, left panel). Since speckle locations are mostly excluded from the nuclear periphery13,64, these regions act as stable anchor points at the nuclear interior (mostly A-LV in Fig. 2d, left panel). Therefore, both the lamina compartment and nuclear speckles act as anchor points for scaffolding the organization of the spatial genome. These observations provide a structural interpretation of the steep transitions between low and high signal peaks in SON TSA-seq profiles, previously reported as TSA-seq trajectories13 (Fig. 6c and Extended Data Fig. 6a). These transitions correspond to the sequence stretches between two consecutive anchor points, each with relatively low δRAD, and coincide with steep transitions in average speckle distances and radial positions (Fig. 6c and Extended Data Fig. 6a–c). In a fraction of models, these chromosome regions fold from anchor regions at the outer nuclear periphery towards anchor points at the nuclear interior, where the SON TSA-seq peak region is often associated with a nuclear speckle and forms the apex of a chromosomal loop, which then traces back to the nuclear periphery (Fig. 6d and Extended Data Fig. 6c). We found that δRAD in long trajectories (median length of 19.1 Mb between two consecutive anchor points) is significantly larger than for chromatin regions in short trajectories (median length 4.8 Mb) (Mann–Whitney two-sided test, P = 1.48 × 10−18; Extended Data Fig. 6d). Therefore, sequence locations of consecutive anchor points can modulate the structural properties for chromatin between anchor points over an extended genomic range, and disruption of an anchor point would likely affect structural properties of genomic regions over an extended sequence distance.
We also observe that SON TSA-seq signals (that is, mean speckle distances) positively correlate with both the ICP (Methods) (Pearson’s r = 0.76 P = ~0, Fig. 6e, top) and trans A/B ratio (Fig. 6e, bottom). These observations imply that surroundings of nuclear speckles act as major hubs for inter-chromosomal interactions of transcriptionally active genomic regions, confirming similar findings reported earlier13,23.
Finally, our models reveal distinct structural differences for genomic regions with high and intermediate SON TSA-seq signal peaks (that is, previously labeled type I and type II transcription ‘hot zones’13) (Fig. 6f). The vast majority of type II peaks show significantly higher speckle distance and radial variability (δSpD, δRAD) (Fig. 6f,g) than type I peaks and thus do not reside stably at intermediate speckle distances. Instead, they show a wider, in many cases bimodal, speckle distance distribution in comparison to type I peaks (Fig. 6h).
The role of microenvironment in replication timing
Variations in the replication timing58 of chromatin are mirrored by distinct differences in their nuclear microenvironment (Fig. 6i). For example, chromatin that replicates at early time points (G1b, S1) is most enriched for high SAF and trans A/B ratio, as well as low structural variability (Fig. 6i,j), whereas late-replicating chromatin (S4 and G2 phase) are depleted of interior locations and SAF, and strongly enriched for lamina-associated features (Fig. 6i). Overall, SAF, SpD, and trans A/B ratio are more discriminative (that is, these features have higher fold changes) than features related to radial positions (RAD, ILF) in distinguishing early-replicating (G1b) from late-replicating chromatin (G2) (Fig. 6k).
Chromatin compartmentalization
Chromatins in different subcompartments are well separated in terms of their enrichment patterns for structural features, and thus represent distinct physical microenvironments (Fig. 6l, Extended Data Fig. 7a, and Methods) (speckle features are predicted without A1 subcompartment annotations). While A1 chromatin shows strong preferences in its nuclear microenvironment, particularly for speckle-associated features, A2 chromatin lacks clear location preferences, with high cell-to-cell variability in radial locations, overall weak enrichment patterns, and wide distributions of feature values (Fig. 6l and Extended Data Fig. 7a). Similarly, the three inactive B subcompartments are well distinguished in terms of their characteristic enrichment patterns. Indeed, these differences are so pronounced that we are able to predict Hi-C subcompartments from structural features alone without explicit considerations of chromatin interactions. Unsupervised K-means clustering based on structural feature vectors for compartment A chromatin predicts A1 and A2 subcompartment annotations with 94% accuracy. Chromatins in inactive subcompartments were predicted with an accuracy of 84% (Fig. 6m and Methods). These results are comparable in accuracy to supervised methods using Hi-C contact frequencies65. Our approach provides an alternative way of detecting subcompartment annotations while providing underlying structural interpretations.
Moreover, we confirmed our findings with other chromatin compartment annotations, such as SCI states66 and SPIN states67, which showed distinct structural enrichment patterns for each chromatin state (Extended Data Fig. 7b–d).
Discussion
We introduce an approach to determine a population of single-cell 3D genome structures from ensemble Hi-C data. Our method predicts a host of structural features in single-cell models to provide information about the nuclear microenvironment of genomic regions in single cells, which is not available from ensemble Hi-C data itself. Therefore, our method expands the scope of Hi-C data analysis and is widely applicable to other cell types and tissues for which Hi-C data is available.
The models and derived structural features are a powerful resource to unravel relationships between genome structure and function. We found that cell-to-cell heterogeneity of structures varies by genomic loci and is a strong indicator of functional properties. Structurally stable chromatin in the A compartment is dominantly associated with nuclear speckles, and shows relatively high speckle association frequencies, a high trans A/B ratio, and the overall lowest average radial positions. These regions contain highly transcribed genes, are enriched for superenhancers and SON TSA-seq signals, and are replicated at the earliest time points. Moreover, these genomic regions compartmentalize in relatively large spatial partitions, formed by a high fraction of inter-chromosomal interactions. Chromatin in the A1 subcompartment is enriched in this category.
By contrast, active chromatin with high structural variability is characterized by a lack of preferences in nuclear locations. In a fraction of cells, these regions can be located in a silencing environment at the nuclear periphery; in others, it can be located towards the transcriptionally favorable interior. These genes show relatively low transcript frequencies, low inter-chromosomal contact probabilities with low trans A/B ratios, and intermediate replication timing (phases S2, S3). In TSA-seq experiments, most of these regions were identified as type II peaks, with intermediate TSA-seq values. We also noticed that these regions compartmentalize into relatively small spatial partitions, dominated by intra-chromosomal interactions. Chromatin in the A2 subcompartment is enriched in this category. It is possible that the high structural variability of these regions could be linked to functional heterogeneity between cells. For instance, although they are transcriptionally active, these regions have higher levels of the silencing H3K9me3 mark and reduced levels of the activating H3K9ac mark than active regions with low structural variability. Moreover, gene transcripts for these regions are found in a smaller fraction of cells and show lower transcriptional activity.
Interestingly, structural heterogeneity is also an indicator that can distinguish nucleoli- and lamina-associated chromatin in the B compartment. Genomic regions with low structural variability are dominantly associated with the lamina compartment, contain constitutive LADs and are enriched in the B3 subcompartment. Genomic regions with high structural variability are associated with nucleoli and pericentromeric heterochromatin and are enriched in the B2 subcompartment.
Our results suggest that nuclear speckles, together with the lamina compartment, are a major organizing factor in genome structure. Chromatin with low structural variability between cells is dominantly associated with either nuclear speckles or constitutive LADs. LADs are mostly located at the nuclear periphery while speckles are mostly excluded from the periphery13,64. Therefore, LADs and nuclear speckles provide structural anchor points at the periphery and nuclear interior. We hypothesize that A-LV and B-LV regions associated to these anchors act similarly to recently reported fixed points in the nuclear organization of mouse embryonic stem cells7.
Moreover, the observed anticorrelation between inter-chromosomal contact probabilities and mean speckle distances suggests that speckles are hubs that facilitate inter-chromosomal interactions for active chromatin, confirming similar observations from SPRITE experiments23. The high fraction of inter-chromosomal interactions for speckle-associated chromatin could explain the preferential locations of speckles toward the nuclear interior. The probability of inter-chromosomal interactions increases towards the nuclear interior (Fig. 7a). If speckles associate with multiple chromosomes, their locations are more likely at the nuclear interior. Over time, dynamic interactions with multiple chromosomes may restrain their locations towards the interior (Fig. 7b). These cooperative effects could bias the global speckle distributions towards the nuclear interior.
Chromatins with highest and lowest transcriptional activity are distinguished by their nuclear microenvironment. The SAF shows the highest correlation with the gene transcription frequency7,68,69,70. Therefore, the interior preferences of highly activated genes could be a consequence of preferential locations close to nuclear speckles, which in turn have a stochastic preference towards the nuclear interior, confirming previous observations from TSA-seq experiments13. Chromatin replicated at the earliest time are also distinguished in their structural features from late-replicating chromatin. Moreover, our observations confirm that Hi-C subcompartments define physically distinct chromatin environments, some of which (such as A1) linked to associations with nuclear bodies.
In summary, our method defines the nuclear microenvironment of a genomic region by calculating a large number of structural features from 3D genome structures. The nuclear microenvironment of a gene can be linked to its functional potential in transcription and replication and thus is relevant for a better understanding of genome structure function relationships. These features can be calculated from Hi-C data, and thus are applicable to many different cell types.
Methods
Population-based 3D structural modeling
General description
Our goal is to generate a population of 10,000 diploid genome structures, so that the accumulated chromatin contacts across the entire population are statistically consistent with the contact probability matrix A = (aIJ)(N × N) derived from Hi-C experiments18,34, with I and J as two chromatin regions in the genome. To achieve this goal, we utilize population-based modeling, our previously described probabilistic framework to de-multiplex the ensemble Hi-C data into a large population of individual genome structures of diploid genomes statistically consistent with all contact frequencies in the ensemble Hi-C data33,34,32.
The structure optimization is formulated as a maximum likelihood estimation problem solved by an iterative optimization algorithm with a series of optimization strategies for efficient and scalable model estimation33,34,48. Briefly, given a contact probability matrix A = (aIJ)(N × N), we aim to reconstruct all 3D structures X = {X1, X2…XM} in the population of M models, each containing 2 × N genomic regions for the diploid genome (at 200 kb base-pair resolution), and \( \vec{x}_{im} \in {{\mathfrak{R}}}^{3}\), i = 1,…,2N as coordinates of all diploid genomic regions in model m (we use lowercase letters i and i' to indicate a given copy of the genomic region I). We introduce a latent indicator variable \(\mathbf{W} = \left( w_{ijm} \right)_{2N}\) for complementing missing information (that is, missing phasing and ambiguity owing to genome diploidy). W is a binary-valued third-order tensor specifying the contacts of homologous genomic regions in each individual structure of the population, such that \(\mathop{\sum }\limits_{m=1}^{M}{{\boldsymbol{W}}}^{m}/M={\boldsymbol{A}}\), with Wm = (wm)2N × 2N such that \({w}_{{ij}}^{m}={w}_{{ijm}}\). We can jointly approximate the structure population (X) and the contact tensor (W) by maximizing the log-likelihood of the probability:
where
-
i.
Nuclear volume constraint: all chromatin spheres are constrained to the nuclear volume with radius Rnuc; \({\|{\vec{x}}_{{im}}\|}_{2}\le {R}_{\mathrm{nuc}},\) where \({\|{\vec{x}}_{{im}}\|}_{2}\) is the distance of the region i from the nuclear center in structure m.
-
ii.
Excluded volume constraint: this constraint prevents overlap between two regions represented by spheres, defined by their excluded volume radii (Rex); \({d_{ijm} = \|{\vec{x}}_{{im}}-{\vec{x}}_{{jm}}\|}_{2}\ge 2{\times R}_{\mathrm{ex}}\).
-
iii.
Polymer chain constraint: distances between two consecutive 200-kb spheres within the same chromosomes are constrained to their contact distance to ensure chromosomal chain integrity; \({\|{\vec{x}}_{\left(i+1\right)m}-{\vec{x}}_{{im}}\|}_{2}\le 2{\times R}_{\mathrm{soft}}\), where \({R}_{\mathrm{soft}}=\,2{\times R}_{\mathrm{ex}}\).
Our modeling pipeline uses a step wise iterative process in which the optimization hardness is gradually increased by adding contacts with decreasing contact probabilities in the input matrix. The iterative optimization procedure involves two steps, each optimizing local approximations of the likelihood function: (1) assignment step (A-step)—given the estimated structures Xk at step k, estimate Wk; and (2) modeling step (M-step)—given the estimated Wk, generate model population Xk+1 at step k + 1 that maximizes likelihood to observe W. Structures in the M-step are calculated using a combination of optimization approaches, including simulated annealing molecular dynamics simulations.
Moreover, during each optimization cycle, we also use iterative refinement steps, a methodological innovation for effective reassignment of restraints during the optimization process, which allows genome structure generation at higher resolution and improved accuracy in comparison to our previous approach33,34 (see iterative refinement method in Supplementary Information).
After 11 iterations, our method converged and the genome-wide contact probabilities from the structure population agreed with those from the Hi-C experiment.
Genome representation
The nucleus is modeled as a sphere with 5-μm radius (Rnuc)34. Chromosomes are represented by a chromatin chain model at 200-kb base-pair resolution. Each 200-kb chromatin region, in the diploid genome, is modeled as a sphere, defined by an excluded volume radius (Rex = 118 nm). Rex is estimated from the sequence length, the nuclear volume and the genome occupancy (40%), as described in ref. 34. The full diploid genome is represented with a total of 30,332 spheres.
Random starting configurations
Optimizations are initiated with random chromosome configurations. Chromatin regions are randomly placed in a bounding sphere proportional to its chromosome territory size and randomly placed within the nucleus.
Comparison between contact frequency maps from Hi-C experiment and model population
To quantify the agreement between Hi-C experiment and model population, we perform the following analyses:
-
1.
Comparison between input and output Hi-C maps are evaluated by Pearson and stratum adjusted (SCC)57 correlation coefficients (Supplementary Table 1).
-
2.
Restraint residual. On average about 175,304 contact restraints are imposed in each of the 10,000 structures. The restraint residual of each contact restraint between loci k and l is calculated as: \({\eta }_{kl}=\frac{{d}_{kl}-D}{D}\), where dkl is the distance between the contact loci in the model, and D is the target contact distance (2 × Rsoft).
-
3.
Residual ratio. The residual ratio Δr is defined as:
$${\Delta r}_{{kl}}=\,\left({f}_{{kl}}^{\,\mathrm{input}}-\,{f}_{{kl}}^{\,\mathrm{model}}\right)/{f}_{{kl}}^{\,\mathrm{input}}$$with \({f}_{{kl}}^{\,\mathrm{input}}\) and \({f}_{{kl}}^{\,\mathrm{model}}\) as the contact probabilities between regions k and l from experiment and models, respectively. Residual ratios are very small, and centered at a median of 0.03 (mean = −0.05) for intra-chromosomal and 0.001 (mean = −0.002) for inter-chromosomal contacts (Supplementary Fig. 23), showing agreement between experiment and model.
-
4.
Prediction of missing Hi-C data from sparse data model. A sparse Hi-C input data set is generated by randomly removing 50% of the non-zero data entries from the Hi-C contact frequency matrix.
Comparison of simulated single cell chromosome structures with those from DNA-MERFISH imaging
Preprocessing of the DNA-MERFISH dataset6: please refer to the methods in Boninsegna et al.32.
Preprocessing Dip-C dataset25: we collected both homologous chromosome copies from each of the 16 single cells. To match our model resolution, we generated 200-kb-resolution models by averaging coordinates of loci that map to 200-kb bins.
Calculation and comparison of distance matrices: please refer to Methods in Boninsegna et al.32.
Robustness and converge analysis
Replicates
Technical replicates are calculated from different random starting configurations. Resulting contact frequency maps and the average radial positions of all chromatin regions between replica populations are nearly identical (Supplementary Fig. 24). All observed structural features discussed in this paper are reproduced in the technical replicate population.
Population size
To assess convergence with respect to population size, we generated 5 populations with 50, 100, 1,000, 5,000, or 10,000 structures. Chromatin contact frequencies and structural features for each structure populations were compared against results with a population size of 10,000 structures. At a population of 1,000 structures, a size much smaller than our target population, contact frequency values and average radial positions were already converged at a very high correlation with those from a 10,000-structure population (Supplementary Fig. 25).
Chromatin interaction networks and identification of spatial partitions
Building chromatin interaction networks
A chromatin interaction network (CIN) is calculated for each model and for chromatin in each subcompartment separately as follows (Supplementary Fig. 26): each vertex represents a 200-kb chromatin region. An edge between two vertices i and j is drawn if the corresponding chromatin regions are in physical contact in the model, if the spatial distance dij ≤ 2 × Rsoft).
Network properties
Maximal clique enrichment: A clique is a subset of nodes in a network where all nodes are adjacent to each other and fully connected. The maximal clique refers to the clique that cannot be further enlarged. The number of maximal cliques, c, is calculated using the graph_number_of_cliques function in the NetworkX python package71. The maximal clique enrichment (MCE) of the subcompartment s in the structure m is calculated as:
Where cs,m is number of maximal cliques for subcompartment s in structure m; and cr,m is the number of maximal cliques of a CIN constructed from randomly shuffled subcompartment regions in the same structure m. High MCE values show formation of a structural subcompartment with high connectivity between 200-kb regions of the same state.
Neighborhood connectivity: To calculate the neighborhood connectivity (NC) of a subcompartment CIN, we first calculate the average neighbor degree for each node using the average_neighbor_degree function in the NetworkX python package71. The overall neighborhood connectivity of the subcompartment s in the structure m is then calculated as:
where Ns,m is the number of nodes in the CIN of the subcompartment s in the structure m, and degj is the average neighbor degree of node j.
Identifying spatial partitions via Markov clustering
Spatial partitions of subcompartments are identified by applying the Markov Clustering Algorithm (MCL)72, a graph clustering algorithm, which identifies highly connected subgraphs within a network. MCL clustering is performed for each subcompartment CIN in each structure by using the mcl tool in the MCL-edge software72. Unless otherwise noted, the 25% smallest subgraphs (with less than 7 nodes, many of those being singletons) are discarded from further analysis, to focus on highly connected subgraphs. The highly connected subgraphs are referred to as ‘spatial partitions’ throughout the text.
In addition to subcompartment partitions, we also predict speckle and nucleoli partitions as follows:
Speckle partitions
Case 1: Predictions of speckle locations with knowledge of A1 subcompartment annotations.
Speckle locations are identified as the geometric center of A1 spatial partitions identified by Markov clustering of A1 CINs. In each structure, only A1 spatial partitions with sizes larger than three nodes (chromatin regions) are considered for downstream analysis.
Case 2: Predictions of speckle locations without knowledge of subcompartments.
We first identify chromatin expected to have high speckle association. These regions are identified as those with unusually low and stable interior radial positions. We select 10% chromatin regions with the lowest average radial positions (78.4% of these regions are part of the A1 subcompartment). We then generate CINs for the selected group of chromatin regions in each structure of the population. Approximate speckle locations are then identified as the geometric center of the resulting spatial partitions identified by Markov clustering of the CINs. Only spatial partitions with sizes larger than three nodes (chromatin regions) are considered for downstream analysis.
Case 3: Predictions using locations of A2 partition centers.
For comparison, we also identify speckle locations as the geometric center of A2 spatial partitions identified by Markov clustering of A2 CINs similar to case 1. In each structure, only A2 spatial partitions with sizes larger than three nodes (chromatin regions) are considered for downstream analysis.
Nucleoli partitions
Following the same protocol as in case 2 for speckle partitions, we first identify chromatin expected to have high nucleoli association. These regions are identified as those previously reported nucleoli-associated domain (NAD)73 regions and nucleolus organizing regions (NOR, on short arms of chromosomes 13, 14, 15, 21, and 22). Using these regions, we generate CINs in each structure of the population. Approximate nucleoli locations are then identified as the center of mass of the resulting spatial partitions identified by Markov clustering of the CINs. Only the top 25% largest spatial partitions are used as predicted nucleoli. For NOR regions, we use the first 25 restrained 200-kb regions that are closest in sequence to NOR regions in these five chromosomes, as NOR regions do not have Hi-C data and they are not restrained during the modeling protocol.
Properties of partitions
Size of partitions: The size of a spatial partition is calculated as 0.2 × N Mb, where N is the number of nodes in the partition that represents a 0.2-Mb region.
Fraction of inter-chromosomal edges (contacts): For each spatial partition, the inter-chromosomal edge fraction (ICEF) is calculated as:
where Eintra and Einter are the number of intra- and inter- edges in the partition, respectively.
Structural features
Unless otherwise noted, mean values of structural features for each genomic region I are calculated from 2 copies (i and i') and 10,000 structures (total 20,000 configurations) in the following structural feature calculations.
Mean radial position (RAD, no. 1)
Radial position of a chromatin region i in structure m is calculated as:
where di,m is the distance of i to the nuclear center, and Rnuc is the nucleus radius which is 5 μm. ri,m = 0 means the region i is at the nuclear center, while ri,s = 1 means it is located at the nuclear surface.
Local chromatin fiber decompaction (RG, no. 2)
The local compaction of the chromatin fiber at the location of a given locus is estimated by the radius of gyration (RG) for a 1 Mb region centered at the locus (that is, comprising +500 kb up- and 500 kb downstream of the given locus). To estimate the RG values along an entire chromosome we use a sliding-window approach over all chromatin regions in a chromosome.
The RG for a 1 Mb region centered at locus i in structure m is calculated as:
where N is the number of chromatin regions in the 1-Mb window, and dj,m is the distance between the chromatin region j to the center of mass of the 1-Mb region, in structure m.
Mean gene–speckle and gene–nucleolus distances (SpD and NuD, nos. 3 and 4)
For each 200-kb region, the closest speckle partition (or nucleolus partition) in each single structure is identified and the center-to-center distance is calculated (from the center of the region to the geometric center of the partition). The distances across the population are then averaged for each region to calculate mean speckle (or nucleolus) distances.
Cell-to-cell variability of features (δ RAD, δ RG, δ SpD, and δ NuD, nos. 5–8)
Cell-to-cell variability of any structural feature F (\({\delta }_{I}^{\mathrm{RAD}}\) for radial positions, \({\delta }_{I}^{\mathrm{SpD}}\) speckle distances, \({\delta }_{I}^{\mathrm{NuD}}\) nucleoli distances, and \({\delta }_{I}^{\mathrm{RG}}\) local decompaction) for a chromatin region I is calculated as:
where \({\sigma }_{I}^{F}\) is the s.d. of the values for structural feature F calculated from both homologous copies i and i' of the region I across all 10,000 genome structures in the population; \(\bar{{\sigma }^{F}}\) is the mean s.d. of the feature value calculated from all regions within the same chromosome of region I. Positive \({\delta }_{I}^{F}\) values (\({\delta }_{i}^{F} > 0\)) result from high cell-to-cell variability of the feature (for example radial position); negative values (\({\delta }_{i}^{F} < \,0\)) indicate low variability.
Regions in the A compartment with positive and negative \({\delta }_{I}^{{RAD}}\) are called A-HV (high variability) and A-LV (low variability), respectively. Likewise, regions in the B compartment with positive and negative \({\delta }_{I}^{{RAD}}\) are called B-HV and B-LV, respectively. The number of 200-kb regions in each group are 3,164, 2,731, 3,839, and 3,918 for A-LV, A-HV, B-LV, and B-HV, respectively.
Interior localization frequency (ILF, no. 9)
For a given 200-kb region, the interior localization frequency (ILF) is calculated as:
where \({{n}_{rI < 0.5}}\) is the number of structures where either copy of the region I has a radial position lower than 0.5, and M is the total number of structures which is 10,000 in our population.
Nuclear-body association frequencies (SAF, LAF, and NAF, nos. 10–12)
For a given 200-kb region, the association frequency to nuclear bodies (SAF, LAF, and NAF for speckle, lamina, and nucleoli association frequencies, respectively) is calculated as:
where M is the number of structures in the population (two homologous copies of each chromosome are present per structure); \({n}_{{d}_{i} < {d}_{t}}\) and \({n}_{{d}_{{i}^{{\prime} }} < {d}_{t}}\) are the number of structures, in which region i and its homologous copy i′ have a distance to the nuclear body of interest (NB) smaller than the association threshold, dt, respectively. The dts are set to 500 nm, 0.35 × Rnuc, and 1,000 nm for SAF, LAF, and NAF, respectively. We tried different distance thresholds, and the selected thresholds resulted in the best correlations with experimental data. For SAF and NAF calculations, we use the predicted speckle and nucleolus partitions to calculate distances (see ‘Identifying spatial partitions via Markov clustering’). For LAF, we use the direct distances of regions to the nuclear envelope. For all association frequency calculations, we calculate distances from the surface of the region to the center-of-mass of the partition or to the surface of the nuclear envelope.
TSA-seq (S-TSA, L-TSA, N-TSA, nos. 13–15)
To predict TSA-seq signals for speckle, nucleoli, and lamina from our models, we use the following equation:
where M is the number of models, L is the number of predicted speckle locations in structure m, dil is the distance between the region i and the predicted nuclear body location l, and R0 is the estimated decay constant in the TSA-seq experiment13 which is set to 4 in our calculations. The normalized TSA-seq signal for region i then becomes:
where \(\overline{{sig}}\) is the mean signal calculated from all regions in the genome. The predicted signal is then averaged over two copies for each region. The predicted speckle and nucleoli partitions are used for distance calculations (see ‘Identifying spatial partitions via Markov clustering’). For lamina TSA-seq, we use direct distances of each 200-kb chromatin region to the nuclear surface in each structure, which is calculated as (1 − ri,m) × Rnuc, where ri,m is the radial position of the 200-kb region i in structure m and Rnuc is the nucleus radius, which is set to 5 μm.
Mean inter-chromosomal neighborhood probability (ICP, no. 16)
For each target chromatin region i, we define the neighborhood {j} if the center-to-center distances of other regions {j} to the target region are smaller than 500 nm, which can be expressed as a set; Nei = {j: j ≠ i, \(d_{ij}\) < 500 nm}. Inter-chromosomal neighborhood probability (ICP) is then calculated as:
where M is the number of structures, nintra (m,i) and ninter (m,i) are the number of intra- and inter-chromosomal regions in the set Nei in structure m for region i.
Median trans A/B ratio (no. 17)
For each chromatin region i, we define the trans neighborhood {j} if the center-to-center distances of other regions from other chromosomes to itself are smaller than 500 nm, which can be expressed as a set; \({{Ne}}_{i}^{t}=\{j:\,{{chrom}}_{i}\ne \,{{chrom}}_{j},\,{d}_{{ij}} < 500{nm}\}\). The trans A/B ratio is then calculated as:
where \({n}_{A}^{t}\) and \({n}_{B}^{t}\) are the number of trans A and B regions in the set \({{Ne}}_{i}^{t}\) for region i. The median of the trans A/B ratios for a region is then calculated from all the trans A/B ratios of the homologous copies of the region observed in all the structures of the population. The values are then rescaled to have values between 0 and 1.
Data analysis
The analyses and most of the figure panels were performed using custom Python scripts (matplotlibv3.4 (ref. 74), Scikit-learnv1.0 (ref. 75), scipyv1.5 (ref. 76), and networkxv2.3 (ref. 71)) together with the publicly available alabtools platform (https://github.com/alberlab/alabtools). The remaining panels and the final figures were assembled using Adobe Illustrator. Correlations between input and output contact matrices were calculated using HiCRep57 (https://github.com/TaoYang-dev/hicrep). Spatial partitions were identified using the MCL algorithm72 (https://micans.org/mcl/). Chromatin interaction networks were visualized with Cytoscape77. Images of 3D genome structures were generated using UCSF Chimera1.13 (ref. 78). For all other analysis, please refer to the Supplementary Information.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The genome structure population and genome-wide structural features are available at https://doi.org/10.5281/zenodo.7352276. The accession codes for the experimental data used in our analyses are as follows: GEO: GSE63525 (Hi-C), GSE63525 (subcompartments), GSE81553 (SON TSA-seq), GSE81553 (lamin-B1 TSA-seq), GSE56465 (single cell lamina DamID), GSM1480326 (GRO-seq), GSE135882 (GPSeq), GSM923451 (Repli-seq), GSM3596321 (scRNA-seq); 4DN: 4DNFIGL8MCSJ (lamin-B1 pA-DamID), 4DNFILYQ1PAY (compartments); ENCODE: ENCFF313LYI, ENCFF171MDW, ENCFF776DPQ, ENCFF309OEW, ENCFF028KBY, ENCFF601YET, ENCFF831ZHL, ENCFF039HDL, ENCFF340JIF, ENCFF803DJF, ENCFF683HCZ (ChIP–seq, histone modifications), https://zenodo.org/record/3928890 (DNA-MERFISH imaging). The complete list of the datasets used in this study and their accession numbers are also tabulated in Supplementary Table 2.
Code availability
The software used to generate the genome structure population and the accompanying documentation are available at https://github.com/alberlab/igm.
References
Misteli, T. The self-organizing genome: principles of genome architecture and function. Cell 183, 28–45 (2020).
Chakraborty, A. & Ay, F. The role of 3D genome organization in disease: from compartments to single nucleotides. Semin. Cell Dev. Biol. 90, 104–113 (2019).
Bintu, B. et al. Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science 362, eaau1783 (2018).
Nguyen, H. Q. et al. 3D mapping and accelerated super-resolution imaging of the human genome using in situ sequencing. Nat. Methods 17, 822–832 (2020).
Payne, A. C. et al. In situ genome sequencing resolves DNA sequence and structure in intact biological samples. Science 371, eaay3446 (2021).
Su, J. H., Zheng, P., Kinrot, S. S., Bintu, B. & Zhuang, X. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Cell 182, 1641–1659 (2020).
Takei, Y. et al. Integrated spatial genomics reveals global architecture of single nuclei. Nature 590, 344–350 (2021).
Takei, Y. et al. Single-cell nuclear architecture across cell types in the mouse brain. Science 374, 586–594 (2021).
Viana, M. P. et al. Integrated intracellular organization and its variations in human iPS cells. Nature 613, 345–354 (2023).
Wang, S. et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science 353, 598–602 (2016).
Beagrie, R. A. et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543, 519–524 (2017).
Belaghzal, H. et al. Liquid chromatin Hi-C characterizes compartment-dependent chromatin interaction dynamics. Nat. Genet. 53, 367–378 (2021).
Chen, Y. et al. Mapping 3D genome organization relative to nuclear compartments using TSA-Seq as a cytological ruler. J. Cell Biol. 217, 4025–4048 (2018).
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Fang, R. et al. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res. 26, 1345–1348 (2016).
Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).
Hsieh, T. H. et al. Mapping nucleosome resolution chromosome folding in yeast by micro-C. Cell 162, 108–119 (2015).
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
Li, X. et al. Long-read ChIA-PET for base-pair-resolution mapping of haplotype-specific chromatin interactions. Nat. Protoc. 12, 899–915 (2017).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013).
Quinodoz, S. A. et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell 174, 744–757 (2018).
Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263–266 (2017).
Tan, L., Xing, D., Chang, C. H., Li, H. & Xie, X. S. Three-dimensional genome structures of single diploid human cells. Science 361, 924–928 (2018).
Zheng, M. et al. Multiplex chromatin interactions with single-molecule precision. Nature 566, 558–562 (2019).
Bickmore, W. A. & van Steensel, B. Genome architecture: domain organization of interphase chromosomes. Cell 152, 1270–1284 (2013).
Dekker, J. et al. The 4D nucleome project. Nature 549, 219–226 (2017).
Kempfer, R. & Pombo, A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. 21, 207–226 (2020).
Mirny, L. A., Imakaev, M. & Abdennur, N. Two major mechanisms of chromosome organization. Curr. Opin. Cell Biol. 58, 142–152 (2019).
Vertii, A. et al. Two contrasting classes of nucleolus-associated domains in mouse fibroblast heterochromatin. Genome Res. 29, 1235–1249 (2019).
Boninsegna, L. et al. Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations. Nat. Methods 19, 938–949 (2022).
Hua, N. et al. Producing genome structure populations with the dynamic and automated PGS software. Nat. Protoc. 13, 915–926 (2018).
Tjong, H. et al. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl Acad. Sci. USA 113, E1663–E1672 (2016).
van Schaik, T., Vos, M., Peric-Hupkes, D., Hn Celie, P. & van Steensel, B. Cell cycle dynamics of lamina-associated DNA. EMBO Rep. 21, e50636 (2020).
Girelli, G. et al. GPSeq reveals the radial organization of chromatin in the cell nucleus. Nat. Biotechnol. 38, 1184–1193 (2020).
Osorio, D., Yu, X., Yu, P., Serpedin, E. & Cai, J. J. Single-cell RNA sequencing of a European and an African lymphoblastoid cell line. Sci. Data 6, 112 (2019).
Finn, E. H. & Misteli, T. Molecular basis and biological function of variability in spatial genome organization. Science 365, eaaw9498 (2019).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Barbieri, M. et al. Complexity of chromatin folding is captured by the strings and binders switch model. Proc. Natl Acad. Sci. USA 109, 16173–16178 (2012).
Tjong, H., Gong, K., Chen, L. & Alber, F. Physical tethering and volume exclusion determine higher-order genome organization in budding yeast. Genome Res. 22, 1295–1305 (2012).
Bau, D. et al. The three-dimensional folding of the alpha-globin gene domain reveals formation of chromatin globules. Nat. Struct. Mol. Biol. 18, 107–114 (2011).
Chiariello, A. M., Annunziatella, C., Bianco, S., Esposito, A. & Nicodemi, M. Polymer physics of chromosome large-scale 3D organisation. Sci. Rep. 6, 29775 (2016).
Di Pierro, M., Zhang, B., Aiden, E. L., Wolynes, P. G. & Onuchic, J. N. Transferable model for chromosome architecture. Proc. Natl Acad. Sci. USA 113, 12168–12173 (2016).
Di Stefano, M., Paulsen, J., Lien, T. G., Hovig, E. & Micheletti, C. Hi-C-constrained physical models of human chromosomes recover functionally-related properties of genome organization. Sci. Rep. 6, 35985 (2016).
Esposito, A. et al. Polymer physics reveals a combinatorial code linking 3D chromatin architecture to 1D chromatin states. Cell Rep. 38, 10601 (2022).
Le, T. B., Imakaev, M. V., Mirny, L. A. & Laub, M. T. High-resolution mapping of the spatial organization of a bacterial chromosome. Science 342, 731–734 (2013).
Li, Q. et al. The three-dimensional genome organization of Drosophila melanogaster through data integration. Genome Biol. 18, 145 (2017).
Lin, X., Qi, Y., Latham, A. P. & Zhang, B. Multiscale modeling of genome organization with maximum entropy optimization. J. Chem. Phys. 155, 010901 (2021).
Paulsen, J. et al. Chrom3D: three-dimensional genome modeling from Hi-C and nuclear lamin-genome contacts. Genome Biol. 18, 21 (2017).
Qi, Y. et al. Data-driven polymer model for mechanistic exploration of diploid genome organization. Biophys. J. 119, 1905–1916 (2020).
Serra, F. et al. Automatic analysis and 3D-modelling of Hi-C data using TADbit reveals structural features of the fly chromatin colors. PLoS Comput. Biol. 13, e1005665 (2017).
Umbarger, M. A. et al. The three-dimensional architecture of a bacterial genome and its alteration by genetic perturbation. Mol. Cell 44, 252–264 (2011).
Wong, H. et al. A predictive computational model of the dynamic 3D interphase yeast nucleus. Curr. Biol. 22, 1881–1890 (2012).
Yildirim, A. & Feig, M. High-resolution 3D models of Caulobacter crescentus chromosome reveal genome structural variability and organization. Nucleic Acids Res. 46, 3937–3952 (2018).
Zhang, B. & Wolynes, P. G. Topology, structures, and energy landscapes of human chromosomes. Proc. Natl Acad. Sci. USA 112, 6062–6067 (2015).
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402–405 (2014).
Bickmore, W. A. The spatial organization of the human genome. Annu. Rev. Genomics Hum. Genet. 14, 67–84 (2013).
Takizawa, T., Meaburn, K. J. & Misteli, T. The meaning of gene positioning. Cell 135, 9–13 (2008).
Kind, J. et al. Genome-wide maps of nuclear lamina interactions in single human cells. Cell 163, 134–147 (2015).
Hildebrand, E. M. & Dekker, J. Mechanisms and functions of chromosome compartmentalization. Trends Biochem. Sci. 45, 385–396 (2020).
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Carter, K. C., Taneja, K. L. & Lawrence, J. B. Discrete nuclear domains of poly(A) RNA and their relationship to the functional organization of the nucleus. J. Cell Biol. 115, 1191–1202 (1991).
Xiong, K. & Ma, J. Revealing Hi-C subcompartments by imputing inter-chromosomal chromatin interactions. Nat. Commun. 10, 5069 (2019).
Ashoor, H. et al. Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data. Nat. Commun. 11, 1173 (2020).
Wang, Y. et al. SPIN reveals genome-wide landscape of nuclear compartmentalization. Genome Biol. 22, 36 (2021).
Ding, F. & Elowitz, M. B. Constitutive splicing and economies of scale in gene expression. Nat. Struct. Mol. Biol. 26, 424–432 (2019).
Khanna, N., Hu, Y. & Belmont, A. S. HSP70 transgene directed motion to nuclear speckles facilitates heat shock activation. Curr. Biol. 24, 1138–1144 (2014).
Kim, J., Venkata, N. C., Hernandez Gonzalez, G. A., Khanna, N. & Belmont, A. S. Gene expression amplification by nuclear speckle association. J. Cell Biol. 219, e201904046 (2020).
Hagberg, A. A., Schult, D. A. & Swart, P. J. in 7th Python in Science Conference (SciPy2008) (ed. Gäel Varoquaux, T. V. & Millman, J.) 11–15 (2008).
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Nemeth, A. et al. Initial genomics of the human nucleolus. PLoS Genet. 6, e1000889 (2010).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Shin, H. et al. TopDom: an efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Res. 44, e70 (2016).
Acknowledgements
This work was supported by the National Institutes of Health (grant U54DK107981 and UM1HG011593 to F.A.) as part of the 4D Nucleome Initiative, and an NSF CAREER grant (1150287 to F.A.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
A.Y., N.H., and F.A. designed the research. A.Y, N.H., and L.B. performed all calculations and data analysis. A.Y., N.H., and F.A. interpreted results with input from X.J.Z., G.P., K.G., and S.H. Y.Z. and W.L. helped with data interpretations. A.Y., N.H., and F.A. wrote the manuscript with input from X.J.Z. All authors agreed on the manuscript.
Corresponding author
Ethics declarations
Competing interests
X.J.Z. is a co-founder and co-CEO of EarlyDiagnostics. F.A. is shareholder of EarlyDiagnostics. A.Y. is an employee of Illumina. All other authors declare no competing interests.
Peer review
Peer review information
Nature Structural & Molecular Biology thanks Raphael Mourad and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Carolina Perdigoto and Dimitris Typas, in collaboration with the Nature Structural & Molecular Biology team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 3D chromatin structure modeling and assessment.
a, The contact probability matrix calculated from Hi-C (left) and the structure population (right) for chromosome 2. Zoomed-in heatmaps show the matrix for 40–80 Mb. b, Density scatter plot of contact probabilities from Hi-C data and structure population (Pearson correlation: r = 0.98, p = ~0). c, Histograms of restraint residuals from the structure population (Methods). A restraint residual less than 1.05 is considered satisfied and is not displayed in the histograms (99.9% of restraints fall in this category). d, The contact probability matrix for chromosome 2 showing the 50% randomly chosen dataset used as input (lower triangle) vs. the matrix generated from the structure population (upper triangle). e, Density plot of Hi-C contact probabilities missing in the input and their predictions from the structure population (Pearson correlation: r = 0.93, p = ~0). f, Average radial positions of chromatin in different replication phases58. Numbers of regions in each boxplot are (from left to right): 1,714, 1,921, 2,166, 2,246, 2,068, 2,064. g, Comparison of the inter-chromosomal loci co-localization frequencies between the observed occurrence in FISH experiments18 and in the structure population shown as bar plots (left), and scatter plot (right). h, A FISH image with three different probes at far-separating loci on chromosome 6 (left), the comparison of pair-wise distances of these loci in experiment and models (middle), and their relative radial positions in experiment and models (right). Numbers of regions in each boxplot for models: 10,000, for experiment: 892. In f and h, the box and the middle line in the box show the the interquartile range (IQR = Q3 – Q1) and the median. The vertical lines outside the box (and whiskers in h) extend to a maximum of 1.5*IQR beyond the box. Q1 and Q3 are the lower and upper quartile of the distribution. Outliers are shown as dots.
Extended Data Fig. 2 Assessment of single cell chromosome structures.
a, Comparison of selected structures of chromosome 6 modeled in our simulations and imaged in DNA-MERFISH experiments6. 3D chromosome structures are shown to the right of the corresponding normalized distance matrix for structures from DNA MERFISH experiments (top row) and structure modeling (bottom row). Modeled structures at 200 kb base pair resolution have higher coverage than those from experiment. To allow direct comparison, we highlight those genomic regions that are imaged in DNA-MERFISH by opaque ball and stick representations, while other genomic regions are shown as translucent genomic regions. b, Comparison of distance matrices for single cell structures of chromosome 6 from DNA-MERFISH experiments (top) and structure models (bottom); models reproduce folding patterns of different chromosome conformations observed in experiment. c, Comparison of distance matrices for single cell structures of chromosome 2 from DNA-MERFISH experiments (top) and structure models (bottom). d, Comparison of chromosome structures between models from Dip-C experiment25 (top row) and our structure population (bottom row) for chromosome 2 (left) and chromosome 6 (right). Shown are the normalized distance matrices for chromosome structures in different conformational states. Both experimental and modeled structures show high degree of similarity. In b, c, and d, the numbers indicate the Pearson correlation between the experimental and the predicted distance matrices.
Extended Data Fig. 3 Assessment of radial positions.
a, Average radial position profiles in chromosomes 3 (left), 5 (middle), and 11 (right). Also shown in blue are lamina contact frequencies (CF) from single cell lamin DamID experiments61. Valleys in the average radial position plots match well with low lamina CF regions (red dashed lines). b, Density scatter plot of average radial positions (RAD) of chromatin regions from the structure population against the lamina CF from single cell lamin DamID experiments in haploid KBM7 cell type61. 93% of chromatin regions with the 25% lowest average radial positions show either no detectable or only occasional contact with lamina (CF < 20%). Vertical and horizontal black dashed lines show the 25th percentile average radial position and the 20% CF values, respectively. c, Scatter plot showing the comparison between experimental and predicted GPSeq scores35 (Pearson correlation: r = 0.80, p = ~0) d, Comparison of experimental and predicted GPSeq35 profiles for the 0–80 Mb region in chromosome 2. e, Probabilities for chromatin region of a given subcompartment to be located in any of the five concentric shells, each containing the same total amount of chromatin (Methods). Shell 1 is the most interior shell. Error bars show mean +/− s. d. Numbers of regions used in each bar for each shell for A1: 372, A2: 545, B1: 316, B2: 402, B3: 837. f, Violin plots for distributions of cell-to-cell variabilities of radial positions (δRAD) for chromatin regions in different subcompartments. White circles and black bars show the median value and the interquartile range (IQR: Q3 – Q1). Whiskers show minima and maxima. Q1 and Q3 are the lower and upper quartile of the distribution. Numbers of regions used in each violin plot are: A1: 1,858, A2: 2,723, B1: 1,581, B2: 2,008, B3: 4,187. Dashed line separates low and high levels of variability.
Extended Data Fig. 4 Predictions of nuclear body related experimental data using 3D models.
a, Fraction of mapped histone mark peaks and number of A1/A2 chromatin regions in experimental13 (top) and predicted SON TSA-seq deciles (bottom). b, Predicted mean speckle distances (SpD) for chromatin in experimental SON TSA-seq deciles13. The box and the middle line show the interquartile range (IQR = Q3–Q1) and the median. The vertical lines and whiskers extend to a maximum of 1.5*IQR. Q1 and Q3 are the lower and upper quartile of the distribution. Number of regions used in each decile boxplot is 1,368. c, Spearman correlations between the experimental13 and predicted SON TSA-seq signals (top) using sequence distances to A1 clusters (red), 3D distances to A1 partitions in random territories (blue), 3D distances to A1 regions in the same chromosome (green), and 3D distances to A1 partitions (black), and chromosome 17 profiles (Spearman correlations: 0.37, 0.30, 0.38, 0.78, respectively, bottom). d, Spearman correlations between experimental13 and predicted SON TSA-seq signals (top) using A1 (black) or A2 (red) partitions, or partitions from chromatin with 10% lowest average radial positions (INT, blue), and chromosome 3 profiles (Spearman correlations: 0.88, 0.89, 0.58, respectively, bottom). e, Cell-to-cell variability of speckle distances (δSpD) from DNA-MERFISH6 and models. Error bars show mean +/− s.d. of predicted δSpD values in each imaged δSpD quartile. Number of regions used in each quartile (left to right): 245, 241, 247, 246. Scatter plots of the experimental and predicted signals (left) and chromosome 7 profiles (right) for (f) LaminB1 TSA-seq13 (Pearson correlation: r = 0.78, p = ~0) and (g) LaminB1 pA-DamID34 (Pearson correlation: r = 0.80, p = ~0). h, Scatter plot of lamina association frequencies (LAF) from DNA-MERFISH6 and models (Pearson correlation: r = 0.64, p = ~0). i, Scatter plot of median trans A/B ratios and LAF for each region predicted from models (Pearson correlation: r = −0.90, p = ~0). j, Comparison of nucleoli association frequencies (NAF) from DNA-MERFISH6 and models (Pearson correlation: r = 0.71, p = ~0).
Extended Data Fig. 5 Chromatin compaction and TAD borders.
a, Average radius of gyration (RG, that is local decompaction) profile for chromatin in the 40–90 Mb region of chromosome 4. The background is color coded by the subcompartment annotations of chromatin (top). Cell-to-cell variability of RG values (δRG) in the structure population for the same chromatin regions. Negative values indicate regions with low RG variability (bottom). Bars are color coded by the subcompartment annotations of the corresponding chromatin regions. b, RG peak frequencies (that is, the fraction of models showing a RG maximum at a given position) for a 6-Mb region in chromosome 4 (80–86 Mb) (top), and Hi-C contact frequency heatmap for the same region showing TAD borders identified by TopDom79 (bottom). Regions with RG peak frequency maxima are shown with gray dashed lines, and either overlap or are very close to TAD borders identified by TopDom (red dashed lines). c, Two representative structures showing chromatin folding patterns for the same chromatin region in b. TAD identities are shown by color code. d, Averaged RG peak frequencies for loci at TopDom TAD borders (green) compared to randomly selected loci (gray). In around 50% of structures, there is a RG peak in the immediate neighboring region of a TAD border (±200 kb, Mann-Whitney-Wilcoxon test, two-sided, p = 1.47×10−200 compared to random) In ~70% of structures there is a RG peak within a ±400 kb range of a TAD border (Mann-Whitney-Wilcoxon test, two-sided, p = 2.46 × 10−7 compared to random). Data are presented as mean values +/− s.d. Numbers of data points at each distance used for TopDom borders (green): 1,839, for random borders (gray): 2,491.
Extended Data Fig. 6 Chromatin trajectories with speckle and lamina anchor points.
a. Structural feature profiles for a representative example of a long trajectory (chromosome 3 128.4–146.8 Mb). Calculated feature profiles include the experimental SON TSA-seq data, Lamin B1 TSA-seq data, predicted mean radial positions (RAD), and predicted structural variability (δRAD). b. The same structural feature profiles for a representative example of a short trajectory (chromosome 7 5.4–9.2 Mb). Anchor points (indicated by arrows in both a and b) are defined by extreme average radial positions (interior for speckle anchor and exterior position for lamina anchor points) as well as high SON TSA-seq and lamin B1 TSA seq values. c. Schematic illustration of short and long chromatin trajectories with speckle and lamina anchor points and two hypothetical chromatin conformations connecting the anchor points. Chromatin trajectories are defined as consecutive chromatin sequence regions between a speckle and nearest lamina anchor points. d. Box plots of radial structural variability (δRAD) distributions for chromatin regions in short and long trajectories (Mann-Whitney-Wilcoxon test, two-sided, p-value = 1.48 × 10–18). The box and the middle line in the box show the the interquartile range (IQR = Q3 – Q1) and the median. The vertical lines outside the box extend to a maximum of 1.5*IQR beyond the box. Q1 and Q3 are the lower and upper quartile of the distribution. Outliers are shown as dots. Numbers of regions used in each boxplot for short: 321, for long: 1,929.
Extended Data Fig. 7 Structural features of chromatin in different subcompartments.
a, Violin plots for the distributions of the 17 structural features calculated from the structure population for chromatin in different subcompartments. White circles and black bars in the violins show the median value and the interquartile range (IQR: Q3 – Q1). Whiskers show minima and maxima. Q1 and Q3 are the lower and upper quartile of the distribution. Numbers of regions used in each violin plot are: A1: 1,858, A2: 2,723, B1: 1,581, B2: 2,008, B3: 4,187. (b-d): Fold-change enrichment of the 17 structural features for chromatin in different: b, SCI states66, c, SPIN states67, d, PC1 decile groups38. Structural feature abbreviations are as defined in Fig. 1.
Supplementary information
Supplementary Information
Supplementary Figs. 1–26, Tables 1 and 2, Discussion.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yildirim, A., Hua, N., Boninsegna, L. et al. Evaluating the role of the nuclear microenvironment in gene function by population-based modeling. Nat Struct Mol Biol 30, 1193–1206 (2023). https://doi.org/10.1038/s41594-023-01036-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41594-023-01036-1
This article is cited by
-
Computational methods for analysing multiscale 3D genome organization
Nature Reviews Genetics (2024)