A subset of topologically associating domains fold into mesoscale core-periphery networks

Mammalian genomes are folded into a hierarchy of compartments, topologically associating domains (TADs), subTADs, and long-range looping interactions. The higher-order folding patterns of chromatin contacts within TADs and how they localize to disease-associated single nucleotide variants (daSNVs) remains an open area of investigation. Here, we analyze high-resolution Hi-C data with graph theory to understand possible mesoscale network architecture within chromatin domains. We identify a subset of TADs exhibiting strong core-periphery mesoscale structure in embryonic stem cells, neural progenitor cells, and cortical neurons. Hyper-connected core nodes co-localize with genomic segments engaged in multiple looping interactions and enriched for occupancy of the architectural protein CCCTC binding protein (CTCF). CTCF knockdown and in silico deletion of CTCF-bound core nodes disrupts core-periphery structure, whereas in silico mutation of cell type-specific enhancer or gene nodes has a negligible effect. Importantly, neuropsychiatric daSNVs are significantly more likely to localize with TADs folded into core-periphery networks compared to domains devoid of such structure. Together, our results reveal that a subset of TADs encompasses looping interactions connected into a core-periphery mesoscale network. We hypothesize that daSNVs in the periphery of genome folding networks might preserve global nuclear architecture but cause local topological and functional disruptions contributing to human disease. By contrast, daSNVs co-localized with hyper-connected core nodes might cause severe topological and functional disruptions. Overall, these findings shed new light into the mesoscale network structure of fine scale genome folding within chromatin domains and its link to common genetic variants in human disease.

how chromatin architecture is connected to genome function is important because it sheds light on the molecular mechanisms governing healthy development and how these mechanisms go awry in disease.
Computational tools from network science and graph theory have been employed to characterize the structural features of real-world complex systems across a range of length scales 20,21 . Local and global properties of networks are straightforward to compute because the units of analysis, individual nodes and the whole network, are immediately evident and require no additional search. Mesoscale structure, however, is not always evident 22 . The difficulty largely lies in the fact that the identification of mesoscale structure requires the specification of a partition of a network's nodes into groups (the units of analysis) based on a given criterion 23 . Real-world networks are composed of many nodes and edges arranged in complex patterns that can obscure structural regularities. Due to this complexity, if one wishes to identify mesoscale structure in networks, one must frequently search for it algorithmically 24 . Recently, algorithmic approaches explicitly developed for this purpose have uncovered assortative communities 25,26 , cores and peripheries 27 , and bipartite or disassortative structures 28 in a wide range of diverse complex systems 29,30 .
We reasoned that an exploration of mesoscale network properties in high-resolution 3-D genome folding maps might lead to new understanding of the link between genome structure and function. Recently, our group and others have used network modularity maximization and the tuning of a resolution parameter, γ, to uncover a hierarchy of partially overlapping, nested communities in human Hi-C data genome-wide 31,32 . By smoothly tuning the resolution parameter from high to low extremes, we can effectively obtain estimates of a network's community structure, spanning from the coarsest scale at which all network nodes fall into the same community to the finest scale where network nodes form singleton communities 32,33 . This so-called multi-scale community detection methodology ultimately revealed that nested communities in network science are synonymous with TADs/subTADs and have strong utility in their sensitive and specific detection genome-wide from Hi-C maps 31,32 .
Another feature of 3-D genome folding thought to fall in the range of mesoscale intermediate-length structure is the long-range loop 11 . We hypothesized that looping interactions within chromatin domains might form meso-scale network structures with importance for genome function. Here, we demonstrate that a subset of TADs genome-wide are folded into core-periphery networks, whereas size-matched, randomly positioned genomic intervals show no such structure. Core nodes directly correspond to genomic fragments engaged in multiple, highly connected looping interactions. Interconnected core nodes are strongly enriched for constitutively bound CTCF, whereas cell type-specific enhancers and genes are generally evenly distributed across both core and periphery nodes. Experimental knockdown of CTCF and in silico mutation of CTCF-occupied nodes disrupts core-periphery network structure, whereas in silico mutation of cell type specific enhancer or gene nodes has a negligible effect. Importantly, we discovered that common single nucleotide variants (SNVs) associated with key neuropsychiatric diseases, including autism spectrum disorder (ASD), schizophrenia, and obsessive-compulsive disorder (OCD), are strongly enriched in core-periphery TADs in cortical neurons. Together, these results reveal a subset of TADs that are folded in mesoscale core-periphery networks. We hypothesize a working model in which daSNVs that disrupt the highly interconnected core nodes might result in severe topological and functional disruption. By contrast, we posit that daSNVs in peripheral nodes might preserve global genome folding, but result in slight local topological and functional disruptions contributing to human disease.

Results
The mesoscale network patterns created by genome folding within TADs are poorly understood. Progress in exploring such a question has been hindered by the paucity of ultra-high resolution, genome-wide Hi-C maps across multiple mammalian cell types. Recently, Cavalli and colleagues published kilobase (kb)-resolution Hi-C maps with the highest read depth to date allowing for the genome-wide identification of looping interactions in mouse embryonic stem (ES) cells, primary neural progenitor cells (NPCs), and primary cortical neurons (CNs) 34 . We created interaction frequency maps at four kb resolution in all three cell types (Fig. 1A, Supplementary Methods, Supplementary Table 1). By applying the well-established directionality index and hidden Markov model methodology 6 , we identified TADs genome-wide in all three cell types (Supplementary Tables 2-7). We also modeled the distance-dependence and local TAD structure to compute an expected count for every pixel (Supplementary Figs 1-2, Supplementary Methods), thus enabling us to compute background-corrected interaction frequency (Observed/Expected) heatmaps (Fig. 1A). Thus, we created Observed/Expected heatmaps to enable the exploration of mesoscale network structure created by looping interactions within TADs genome-wide in mouse ES cells, NPCs, and CNs.
Visual inspection of the heatmaps revealed that many TADs contain bins that form multiple looping interactions (green arrows, Fig. 1A) with other highly interconnected nodes. This observation is reminiscent of core-periphery structure, a mesoscale organization of a graph that is often studied in network science. Core nodes exhibit high signal strength and are highly connected (high degree) to other highly connected nodes. By contrast, periphery nodes interact with core nodes but are rarely connected to each other. To explore the possibility of core-periphery network structure within TADs, we conceptualized Hi-C data as a network with each genomic bin represented as a node and Observed/Expected interaction frequencies between nodes represented as edges (Fig. 1B). We represented each Observed/Expected interaction frequency matrix for every TAD and cell type as an independent network (Fig. 1B). To quantify the strength of a TAD network's core-periphery structure, we computed a core-periphery statistic, Q, using the Kernighan-Lin algorithm to optimize a quality index measuring core-periphery node separation (detailed in Supplementary Methods). In brief, we sought to maximize the interaction strength among core nodes and to minimize the interaction strength among periphery nodes. Intuitively, this quality index Q will be high when a TAD contains strong core-periphery network structure and low in TADs devoid of such structure. We observed that genome folding within TADs genome-wide in mouse ES cells, NPCs, and CNs exhibit significantly stronger core-periphery structure than expected in two null models: (i) Figure 1. Genome folding within topologically associating domains (TADs) can exhibit mesoscale coreperiphery network structure. (A) Relative interaction frequency (Observed) and background-corrected interaction frequency (Observed/Expected) Hi-C heatmaps for a given TAD in mouse embryonic stem (ES) cells, neural progenitor cells (NPCs), and cortical neurons (CNs). Green lines, borders of TADs. Green arrowheads, genome structure signal indicative of a persistent looping interaction in ensemble Hi-C maps. (B) 3-D genome folding can be represented as a network where nodes correspond to bins of DNA and edges correspond to bin-bin interaction frequencies. Interaction frequency heatmaps of genome folding can be represented as adjacency matrices, allowing tools from graph theory and network science to be readily applied. Blue lines represent TAD boundaries. (C,D) To test for the possible presence of core-periphery structure created by looping interactions within TADs, we computed the core-periphery quality index Q for all true TADs compared to pseudo-TADs generated by choosing size-matched genomic regions at random. A Kolmogorov-Smirnov test was used to compare Q values for TADs vs. randomly placed, size-matched genomic regions in ES cells, NPCs, and CNs.
www.nature.com/scientificreports www.nature.com/scientificreports/ pseudo-TADs where matched-size genomic intervals are positioned at random (Fig. 1C,D) and (ii) Erdős-Rényi random graphs ( Supplementary Fig. 3). These results suggest that genome folding within at least a subset of TADs can exhibit mesoscale core-periphery network structure.
We next set out to identify chromatin domains genome-wide that exhibit the highest core-periphery network structure in ES cells, NPCs, and cortical neurons. For each individual TAD network in each independent cell type, we compared its core-periphery test statistic, Q, to a null distribution of Q values computed on 10,000 randomly placed, size-matched genomic intervals ( Fig. 2A, Supplementary Methods). We rejected the null hypothesis that the specific TAD does not exhibit core-periphery structure in cases where the multiple-testing corrected empirical p-value was less than a rigorous threshold of alpha <= 0.012 ( Fig. 2A, Supplementary Methods). We identified a subset of TADs with the strongest possible core-periphery structure in mouse ES cells, NPCs, and cortical neurons ( Supplementary Fig. 4) and compared their structural features to TADs that were not significantly core-periphery. We noticed that strong core-periphery TADs contain nodes that form multiple looping events with other highly looping nodes ( Fig. 2B-F left column, Fig. 2G). By contrast, TADs devoid of core-periphery network structure are either depleted of looping interactions or contain single anchor-to-anchor looping interactions that do not involve multiple nodes ( Fig. 2B-F right column, Fig. 2H). Importantly, we observed that core nodes directly correspond to genomic fragments engaged in multiple, highly connected looping interactions. These results reveal that a gradient of core-periphery network architectures can be found across TADs genome-wide, with complex core-periphery TADs containing complex hubs of hyper-connected genomic anchors engaged in multiple long-range loops.
We next sought to understand the differences between core and periphery nodes by integrating epigenetic annotations on the linear genetic sequence with all nodes in TADs with significantly strong core-periphery network architecture. We analyzed CTCF and H3K27ac ChIP-seq data from Cavalli and colleagues in the same mouse ES cells, NPCs, and cortical neurons used to generate the Hi-C data (Supplementary Table 8). Specifically, we stratified (1) Fig. 5A-D). We observed a notable group of constitutively occupied CTCF sites across all three lineages (n = 8,853) and negligible CN-specific CTCF sites. By contrast, there were notable groups of ES cell-specific (n = 13,634), NPC-specific (n = 1,088), and CN-specific (n = 6,343) putative enhancers marked by strong H3K27ac+ signal distal from transcription start sites ( Supplementary Figs 5E-H, 6). We also stratified groups of cell type-specific, inactive, and constitutively active genes by H3K27ac+ signal at transcription start sites (Supplementary Figs 5I-L, 6). These results confirm the known groups of cell type-specific genes and enhancers and that CTCF occupancy largely falls into a class of ES cell-specific or constitutive occupancy across cell types.
We next canvased the set of all possible cell type-specific annotations for their co-localization at core versus periphery nodes in TADs with significant core-periphery network structure. We found that core nodes were markedly more co-localized with CTCF occupied sites than periphery nodes (Fig. 3). Consistent with this observation, constitutively bound CTCF sites were strongly enriched for core nodes and depleted for periphery nodes across core-periphery TADs from all three cell types ( Fig. 4A-D, Supplementary Fig. 7). Nearly all cell type-specific enhancer and gene classes were evenly distributed between core and periphery nodes with modest Odds Ratios (  Tables 30-32). The one exception was constitutively expressed genes significantly enriched in core nodes versus periphery nodes in cortical neuron core-periphery TADs at our stringent p-value cutoff of P<0.0005 (Odds Ratio = 3.977) (Fig. 4L, Supplementary  Fig. 9). Taken together, these results reveal that hyper-connected core genome folding nodes are occupied by CTCF and that cell type-specific enhancers and genes show no clear bias to core vs. periphery nodes in TADs with core-periphery network structure.
To determine if looping core-periphery structure could be disrupted, we performed both in silico mutagenesis and in vitro experimental knock down experiments. We first conducted in silico mutagenesis experiments by computing the core-periphery test statistic Q after either deleting nodes containing a particular 1-D Epigenomic annotation or deleting nodes at random (Fig. 5, Supplementary Fig. 10). We found that in silico mutation of CTCF binding sites significantly disrupted core-periphery structure in both ES cells (Fig. 5A) and CNs ( Supplementary  Fig. 10A). This effect was specific to significantly core-periphery TADs and was not made manifest in the same analysis on TADs that did not exhibit notable core-periphery structure (Fig. 5B, Supplementary Fig. 10B). Moreover, in silico mutation of cell type-specific enhancers or genes had a minimal effect on core-periphery structure. We also knocked down CTCF to 20% of its wild type levels in mouse ES cells using lentiviral shRNA ( Supplementary Fig. 11, Supplementary Table 33). We compared genome folding in wild type and CTCF knock-down ES cells by using Chromosome-Conformation-Capture-Carbon-Copy (5C) as previously described 10,[35][36][37][38][39] . Although CTCF levels were not completely ablated, we found that network core-periphery structure was slightly reduced in some but not all chromatin domains (Fig. 6). Together, these results demonstrate that core-periphery network structure in a subset of TADs can be altered upon disruption of CTCF protein levels or mutagenesis of CTCF's occupied sites in the genome. Finally, we assessed the possible link between the 3-D genome's core-periphery network structure and neuropsychiatric disease-associated single nucleotide variants (daSNVs). We first observed that TADs in CNs with significant core-periphery structure are significantly more conserved across placental mammals than domains (2019) 9:9526 | https://doi.org/10.1038/s41598-019-45457-9 www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/ devoid of such structure ( Supplementary Fig. 12A). Core nodes in highly core-periphery TADs have significantly higher conservation and gene density compared to periphery nodes in core-periphery TADs ( Supplementary  Fig. 12B,C). Importantly, we observed that schizophrenia, ASD, and OCD daSNVs in non-coding genomic regions were significantly more likely to co-localize with CN TADs with significant core-periphery structure than to CN TADs devoid of such structure (Fig. 7A, Supplementary Methods). We then compared the distribution of neuropsychiatric daSNVs across core and periphery genome folding nodes (Fig. 7B,C). Importantly, we found that the linkage disequilibrium (LD) blocks encompassing schizophrenia daSNVs were significantly enriched at core nodes, wheras OCD daSNVs were enriched at periphery nodes, and ASD daSNVs were equally distributed across core or periphery nodes (Fig. 7B,C). Size-matched, background SNV LD blocks exhibited a relatively even distribution across core and periphery nodes (Fig. 7B, Supplementary Methods). We hypothesize that daSNVs disrupting highly interconnected core nodes would result in severe topological and functional disruption leading to severe pathological phenotypes (Fig. 7D). We also posit that daSNVs in the periphery of genome folding networks might preserve global genome folding but result in slight local topological and functional disruptions that contribute in aggregate to human disease (Fig. 7D).

Discussion
Complex systems from a wide range of orthogonal disciplines can be represented as networks. Networks are defined mathematically as a collection of nodes and the pattern of edges between them. Because data from genome-wide proximity ligation studies are traditionally represented in the form of a square, symmetric array, we reasoned that computational tools and mathematical models from network science might have utility in capturing mesoscale structure not visible to the human eye in maps of higher-order chromosome architecture. Previously, by applying graph theory methods to Hi-C data, we found that the topology of the mammalian genome is organized into nested community structures that correspond to TADs and subTADs 31 . Here, we again demonstrate that graph theory tools have utility in uncovering an additional layer of mesoscale genome folding. We find that looping interactions in a subset of TADs and subTADs can fold into core-periphery network structures. Prior to this study it was unknown whether genome folding at the highest resolution within large scale TADs had a specific mesoscale network structure. Although network algorithms to query mesoscale structure have existed for some time, progress was limited by the paucity of high-resolution Hi-C data required to detect looping interactions genome-wide. Our finding of core-periphery network structure within a subset of chromatin domains was made possible by the very recent publication of the highest read depth Hi-C libraries to date across mouse ES cells, NPCs, and CNs 34 .
Together, our data suggest that understanding the mesoscale network structures of 3-D genome folding in a comprehensive, quantitative manner could shed new light into mechanisms governing how gene expression patterns are altered over long genomic distances by common genetic variation. First, we find that hyper-connected core nodes are highly enriched for the architectural protein CTCF and its knockdown leads to topological disruption of genome folding core-periphery networks. Second, in cortical neurons, we observed that neuropsychiatric daSNVs are co-localized with TADs exhibiting significant core-periphery network structure. Our working hypothesis is that daSNVs enriched at core looping nodes might lead to severe or early onset forms of human disease due to global disruption of large-scale chromatin architecture and gene expression (Fig. 7D). It is tempting to speculate that mutations in the periphery of genome folding networks might preserve global genome folding networks, but result in local topological and functional disruptions contributing slightly to human disease. In light of the growing knowledge that many complex traits and diseases are thought to be driven by a large number www.nature.com/scientificreports www.nature.com/scientificreports/ of common variants with small effect sizes [40][41][42][43] , an exciting area of future inquiry will be to understand the localization and functional importance of human genetic variation contributing to disease and cell type-specific gene expression signatures compared to core and periphery nodes in genome folding networks. www.nature.com/scientificreports www.nature.com/scientificreports/ Core-periphery structure is abundant in a wide range of complex networks across disciplines, and has critical implications for system function and dynamics. Intuitively, the architecture itself allows for the top-down influence, or even control of, core regions on a receptive periphery. In social systems, core-periphery architecture is thought to support an optimal trade-off in the capacity for social groups to both broadcast (to the periphery) and receive (from the periphery) information critical to the group's survival and success 44 . In information processing systems such as Figure 5. In silico mutation of CTCF occupied nodes results in significant ablation of core-periphery network structure. (A) Difference in Q values before and after node deletion in significant core-periphery TADs for three classes of CTCF (top), three classes of enhancers (middle), and four classes of gene TSS (bottom). We deleted nodes that co-localize with target 1-D Epigenome marks (orange) or the same number of random nodes (grey) for each TAD in ES cells. A single asterisk indicates a marginally significant difference in distribution means (two-tailed Mann-Whitney U test, P < 0.05), and a double asterisk indicates a highly significant difference in distribution means (two-tailed Mann-Whitney test U test, P < 0.005). (B) The same analysis was performed for TADs in ES cells that do not exhibit core-periphery network structure (Supplementary Methods). No significant difference was found for any target annotation (two-tailed Mann-Whitney U test). (2019) 9:9526 | https://doi.org/10.1038/s41598-019-45457-9 www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/ the brain, core-periphery structure is thought to support the ability of the core to integrate the multimodal information obtained from the periphery. In transportation systems, often dual or multiple cores are identified, possibly driven by constraints of geography and the evolution of the system (and the culture it supports) over time 30 . www.nature.com/scientificreports www.nature.com/scientificreports/ Core-periphery structure also impacts the manner in which the system evolves over time and responds to exogenous perturbations 45 . In studies of population dynamics, cores are associated with relatively stable population dynamics, while increasing variance and importance of density-independent processes operate at the periphery 46 . Similar inferences have been drawn in the context of human brain dynamics, where a core of brain regions associated with task performance displays more stable dynamics than a periphery of brain regions associated with supportive roles in cognition 47 . Studies of population-level behavior in humans have also provided initial evidence that resources may be over-utilized in the network core, and under-utilized in the network periphery 48 . Finally, the architecture has important implications for robustness to failures, spreading dynamics, or collective behaviors across systems spanning biological, social, and technological domains 49 . Exciting areas of future inquiry will involve the causal studies necessary to determine how the functional roles for core-periphery structure in complex systems may be made manifest in TADs with core-periphery genome folding structure and human gene regulation.
Methods from graph theory and network science have recently been applied to find biologically meaningful patterns in chromatin folding data 31,32,[50][51][52][53][54][55][56] . Multi-dimensional scaling has been used to convert 2-D proximity ligation matrices into 3-D models of the physical folding of chromosomes 50 . Ruan and colleagues assembled a binary network of chromatin interactions between RNA polymerase II bound genomic regions from ChIA-PET data 53 . They reported that RNA II-mediated connections in the human genome resemble a modular, scale-free network in which most nodes are weakly connected and then a small number of nodes have a high number of interactions. The authors hypothesized that chromatin networks may have evolved scale-free properties to better tolerate random perturbations 57 . Babaei and colleagues recently pursued the possibility of predicting co-expressed gene pairs by network metrics computed from mouse cortex Hi-C data such as the Jaccard index, the betweenness centrality, and the clustering coefficient 55 . Similarly, independent studies have explored genome folding patterns at master replication origins, histone modifications and cohesin binding sites using network metrics such as centrality, clustering coefficient, and assortativity 51,52,56 . Finally, we and others have also recently demonstrated that larger TADs and smaller subTADS can be parsed across length scales via the fine-tuning of the structural resolution parameter in a Louvain-like locally greedy algorithm to maximize a modularity quality function 31,32 . Although the concept of analyzing 3-D genome folding as a network has been pursued previously 31,32,50-56 , here we take an entirely new angle, which is to discover mesoscale core-periphery network structure in a subset of mammalian TADs. The ability to accurately quantify core-periphery network structure in Hi-C genome folding maps provides future opportunities to shed new light on how genome sequence influences 3-D structure to govern function across development and during the onset and progression of disease.

Code and Data Availability
Data in this manuscript were downloaded from GEO as detailed in the Supplementary Methods. We will freely share code upon request.