Extensive fragmentation and re-organization of transcription in Systemic Lupus Erythematosus

Systemic Lupus Erythematosus (SLE) is the prototype of autoimmune diseases, characterized by extensive gene expression perturbations in peripheral blood immune cells. Circumstantial evidence suggests that these perturbations may be due to altered epigenetic profiles and chromatin accessibility but the relationship between transcriptional deregulation and genome organization remains largely unstudied. In this work we propose a genomic approach that leverages patterns of gene coexpression from genome-wide transcriptome profiles in order to identify statistically robust Domains of Co-ordinated gene Expression (DCEs). Application of this method on a large transcriptome profiling dataset of 148 SLE patients and 52 healthy individuals enabled the identification of significant disease-associated alterations in gene co-regulation patterns, which also correlate with SLE activity status. Low disease activity patient genomes are characterized by extensive fragmentation leading to overall fewer DCEs of smaller size. High disease activity genomes display extensive redistribution of co-expression domains with expanded and newly-appearing (emerged) DCEs. The dynamics of domain fragmentation and redistribution are associated with SLE clinical endophenotypes, with genes of the interferon pathway being highly enriched in DCEs that become disrupted and with functions associated to more generalized symptoms, being located in domains that emerge anew in high disease activity genomes. Our results suggest strong links between the SLE phenotype and the underlying genome structure and underline an important role for genome organization in shaping gene expression in SLE.


Methods
Analysis of gene expression. We obtained RNA sequencing data from a total of 142 SLE patients and 58 healthy individuals originally published in a SLE transcriptomics study 6 . Both groups contained individuals mainly of Caucasian ethnicity and an approximate ratio of 1:5 for male over female. The sequencing material was derived from whole blood samples. Εxtensive information regarding patient characteristics, mRNA extraction, sequencing protocol, quality control and mapping are thoroughly reported in 6 .
We used FeatureCounts 25 to extract raw counts and quantify expression levels for a comprehensive set of human genes, as compiled under the GENCODE annotation v15 (https ://www.genco degen es.org/human /relea se_15.html, GRCh37). A fragment was counted in case of any overlap with an exon feature and the counts were grouped based on the "gene_name" attribute of the annotation entities. Only fragments with both ends successfully mapped were considered for summarization. Fragments that were chimeric, overlapping multiple metafeatures (genes), not uniquely mapped, or having any read marked as duplicate were discarded.
The initial number of genes included in the raw count table was 51,716. A multi-step filtering approach was adopted. At first, the "type" of each gene was extracted from the annotation GTF file used in fragment summarization. Then, genes belonging to any of the following types were filtered out: "pseudogene", "processed transcript", "polymorphic pseudogene", "antisense", "sense intronic", "sense overlapping", IG_V pseudogene", "IG_C pseudogene", "TR_V pseudogene", "TR_J pseudogene", "IG_J pseudogene", "non_coding", "Mt-tRNA" and "Mt-rRNA". The total number of genes belonging to those categories were 20,190. Subsequently, 167 genes, which had multiple entries in the annotation file, with the same "gene_name", but different chromosome attribute, and could therefore generate errors in the fragment summarization process, were removed from our dataset as well. The number of genes that passed the filtering procedure was 31,318. Of those 27,061 with non-zero values were included in our analysis.
At the final stage, a two-step normalization was implemented on raw counts (filtered for the different irrelevant gene types), using relative log expression (RLE), followed by normalization for gene length.
Stratification of the patient cohort. We grouped patient samples according to a clinical SLE disease activity index (SLEDAI) 26 . A value of 0 for SLEDAI indicates inactive patients and it increases with higher disease severity. Patient samples were separated into three groups. A low disease activity group, with a maximum SLEDAI value of 2, an intermediate, with SLEDAI that ranged from 3 to 8, and a high disease activity group with SLEDAI greater than 8. The number of samples in each group were 55, 61 and 26 respectively. Both Differential and Topological analyses of gene expression have been performed on these three groups in comparison to the healthy group.
Differential gene expression analysis. Differentially expressed genes (DEGs) were called using zero inflated generalized linear models provided by the MDSeq tool 27 . For this analysis we applied an additional filtering layer. Genes with a mean cpm value lower than 0.05 were excluded. The remaining genes were 18,447. Furthermore, we incorporated gender and drug treatment as covariates in our models. DEGs were identified based on both statistical significance and effect size. They were defined as genes with corrected p value lower than or equal to 0.05 and absolute log2(Fold-Change) value greater than or equal to 0.5. Modular analysis of differential gene expression. We followed a gene set enrichment approach, in order to investigate the over-representation of specific functional modules in our dataset. In a gene set enrichment analysis, the objective is to detect functional modules, whose gene members tend to cluster towards the top (or bottom) of a ranked list. Here, we ranked genes according to absolute differential expression values (log 2 |Fold-Change|), and we tested the over-representation of functional groups of genes (modules www.nature.com/scientificreports/ modules are closely related to blood tissue and immunity, as identified by two independent studies 28,29 , that analyzed a plethora of blood gene expression datasets in a variety of conditions. Finally, we assessed statistical significance by applying a CERNO test 30 . We filtered modules according to statistical significance (corrected p value ≤ 0.05) and their DEG content, i.e. at least 15% of gene members had to be DEGs according to our previous analysis.
Weighted gene co-expression network analysis (WGcnA). In order to detect modules of genes with correlated expression independently of their genome topology, we implemented weighted gene co-expression network analysis (WGCNA) 31 . Briefly, WGCNA represents genes as nodes in a network. These are connected to each other by edges, to which an adjacency score is attributed. The adjacency score of a node pair is calculated by a power function of the absolute value of the correlation of the corresponding pair of genes. Here, the soft threshold parameter (the power in the adjacency function) was selected to be 10, according to scale free topology criterion, which suggests choosing the lowest possible value, such that approximate scale free topology is reached in the network. Modules of co-expressed genes were extracted from the network based on the hierarchical clustering of a topological overlap measure and the subsequent implementation of a dynamic cutter. Those initial modules were merged using hierarchical clustering of their eigengene vectors and by cutting the resulted tree at the height of 0.25. The final modules were functionally characterized by utilizing pathway enrichment and calculating the correlations of the module eigengene vectors with a variety of clinical traits.
Robust co-expression matrix calculations. Each chromosome was split in 10 kb bins, starting from the start of the first gene, till the end of the last gene. A mean gene count for every bin was then calculated from the normalized counts of the genes it contained. Bin counts were then grouped according to four sample categories (healthy, low, intermediate and high SLE activity). Next, we calculated the Spearman correlation matrix between all bins that resided in the same chromosome for all samples within each group. Chromosomal bins with zero expression were ignored for the rest of the analysis. This procedure produced a square correlation matrix for each chromosome. To statistically evaluate the correlation coefficients, a Monte-Carlo-like approach was implemented. The bin counts, of each individual separately, were shuffled randomly and afterwards the correlation matrix was re-constructed. That procedure was repeated 1000 times for each chromosome. In every iteration the calculated correlation coefficients were compared to the original correlation coefficients that were calculated using the intact bin counts. The p value for each coefficient was set as equal to the fraction of those 1000 permutations, in which the corresponding coefficient had the same or more extreme value compared to the actual one. The correlation coefficients with p value greater than 0.05 were discarded from the analysis (turned into 0 s).

Definition of domains of coordinated expression (DCEs).
To call domains of co-ordinated expression, we modified a methodology that was introduced for the definition of topologically associated domains (TADs) 32 , in our case, by using expression correlation data instead of chromosomal contact frequencies. Statistically robust expression correlation matrices (see "Methods" section) were used as input. Domains of co-ordinated expression (DCEs) were defined as genomic regions of consecutive chromosomal bins with correlation above average, delimited by statistically significant boundaries. More specifically, DCE detection is a four-step pipeline, which is repeated for each chromosome and for every study group (see Fig. 1a).
1. First, we compute a signal that runs along the chromosomes and is indicative of the local average correlation of expression. We achieve this by sliding two juxtaposed windows of equal size along a chromosome with a single-bin displacement, until the whole chromosome has been covered. In every iteration, we use the correlation matrix that has already been constructed and statistically evaluated. We look up the correlation values concerning the relationship of the two regions and calculate their average. That value is assigned to the chromosomal bin located in the middle, more precisely, the downstream-most bin inside the upstream window. 2. Subsequently, the calculated signal is used to detect DCE boundaries. Hence, the second step of the pipeline is to compute a smoothed function of that signal, using a smoothing spline, and to detect all local minima of that function. DCEs are initially detected as regions between local minima with a value lower than 0.25, which is the average genome signal for the healthy, control group. 3. The third step is to statistically evaluate and refine the boundaries. We estimate the significance of the boundaries by utilizing a Mann-Whitney U test to compare "within" and "in-between" correlation coefficients. In case any of the initially calculated boundaries does not reach the required statistical significance threshold (p value > 0.05), we "chop" that boundary by one bin towards the centre of DCE, and repeat the test. DCEs with any remaining non-significant boundary are discarded. 4. In the last step we fuse neighbouring DCEs, according to the following criteria: (a) they are separated by at most two bins with bin signal < 0.25. (b) The total number of such "intervening" low signal bins is less or equal to 2. This means that no more than three neighbouring DCEs may be fused in one and that fusion events cannot span intervening sequences longer than two bin sizes. This step enhances the robustness of the pipeline and decreases the noise in our data. The window size used in bin-signal calculation and in boundary evaluation was set to be equal to 3, based on the maximization of average intra-DCE correlation of chromosome 1 of the healthy group.
cell type estimation and entropy calculation. We used the results of CIBERSORT 33  where H is the Shannon (information) entropy, p(xi) is the estimated proportion of x i cell type in whole blood and n is the total number of estimated cell types. Entropy was calculated for every individual in the dataset. Subsequently, the difference between the distribution of entropies of healthy and SLE groups were statistically evaluated by a non parametric Wilcoxon-Mann-Whitney test.
Dce analysis. Two different metrics were applied to explore the differences of DCE sets of different groups.
DCEs were handled as a set of chromosomal intervals. The first metric used was the Jaccard similarity coefficient. DCE pairs between two different groups (e.g. healthy and patient groups) with chromosomal coordinates that overlap were detected. For every pair the Jaccard index was calculated. The second metric was the BP distance score 34 . BPscore takes into account the relative chromosome size and thus may provide a more nuanced We start by calculating the expression profile of each chromosomal bin using the expression profile of the encompassed genes (i). We then calculate the correlation coefficients between the bins located on the same chromosome (ii). Next, the correlation profile of each chromosome is transformed into a one-dimensional binsignal profile (iii). We analyze that profile, detecting local minima and maxima in order to determine the borders of the domains. Finally, a statistical evaluation of those borders results in the final DCE coordinates (iv). www.nature.com/scientificreports/ assessment of coordinate similarity. For the calculation of BP-score we used a publicly available python script, (https ://githu b.com/rz6/bp-metri c), that is provided by the authors.
"Disruptor gene" definition. "Disruptor" genes are defined as those, which reside between two "patient DCEs" derived from a split of a "healthy DCE". More precisely, for each "split" characterized healthy DCE, attributed genes were listed. Subsequently, they were compared as a set with those genes, attributed to patient DCEs, that overlapped the initial healthy DCE. Genes absent from the patient DCEs and present in the healthy DCE were characterized as disruptors. Moreover, those genes were filtered, in order to capture only the genes located in the area between the patient DCEs (the locus of the "split" event). That process was repeated for every distinct patient group.

Results
Gene co-expression patterns are fragmented in SLe patients. Neighbouring gene expression correlation and modelling 16 have been recently introduced to define how gene expression propagates in space. We employed a topologically-inspired approach to quantify the correlation of gene expression genome-wide. After splitting each chromosome in fixed-size bins, we calculated the transcript count correlation and defined regions of significant co-expression based on a permutation test, followed by local minima localization (see "Methods" section). The domains of co-ordinated expression (DCEs) produced through this analysis are supported by permutation analysis involving 1000 random reshuffling events of transcript counts along each chromosome (see "Methods" section, Fig. 1a). In this respect, they correspond to statistically robust chromosomal domains, within which gene co-expression is significantly higher, when compared to the surrounding regions. Analysis of DCE patterns between SLE and healthy individuals shows significant differences, with SLE gene co-expression being organized into smaller and more fragmented regions. This finding is not confined to specific chromosomes, although gene-dense chromosomes with a more compact transcript pattern show increased overall signal (Fig. 1b). Notably, DCE patterns correlate with the activity of the disease (SLEDAI). We found that DCE sizes are smaller in low activity patients, where the percentage of the genome organized in DCE does not exceed 9% as compared to 13% and 17%, for intermediate and high activity respectively, and 19% for healthy individuals. Decreased gene co-expression in SLE patients is evidenced by the: (a) significantly lower numbers of total DCE for low and intermediate disease activity (Fig. 1b), (b) smaller DCE sizes (Fig. 1c) (c) decreased co-expression signal (Fig. 1d) and, (d) smaller overall percentage of the genome covered by DCEs (Fig. 1e).
It is important to note, that the observed differences cannot be explained by batch effects in either sequencing output or the genomic distribution of reads. Sequencing throughput was very similar for all disease activity groups ( Supplementary Figure 1), as was the overall distribution of mapped reads in the annotated transcriptome (Supplementary Figure 2). Differences in DCE patterns cannot be attributed to cell type heterogeneity either, as shown by an entropy analysis of cell type variability (Supplementary Figure 3). Thus, the more fragmented expression patterns in low activity SLE genomes are most likely due to generalized perturbations in gene regulation, which could provide a mechanistical explanation for the recurrent flares that tend to develop in patients who are inactive. This may indicate that, while a desirable outcome, clinical remission may not necessarily be lacking a molecular fingerprint and the combination of the recently suggested susceptibility signature 6 with our fragmented DCE pattern may provide an interesting framework for the assessment of its stability.
Dces are dynamically redistributed in SLe. To gain additional insight into the dynamics of DCEs, we classified DCE patterns into four main groups according to their changes between patient and healthy genomes. We used an implementation of the Jaccard Index to group the domains into: (a) DCEs that were left intact, (b) DCEs that were absent (depleted) in patients while present in healthy individuals, (c) DCEs that were only present (emerged) in patients and, (d) DCEs whose coordinates were altered between patient and healthy genomes. The last group was further categorized into DCEs that were split (one fragmented into two or more smaller sub-DCEs) or merged (two or more joined into one larger) and expanded or contracted.
Low and high disease activity patients showed the most extensive changes in the pattern of DCEs as compared to the healthy state (Fig. 2a). A detailed analysis shows that, in agreement with the changes observed at genomescale level (Fig. 1), there is extensive fragmentation and redistribution of domains in SLE versus healthy genomes. Contraction and depletion of DCEs are more pronounced in low activity patients, with contracted and depleted DCEs corresponding to nearly 73% of DCEs in low activity, as compared to 56% and 48% for intermediate and high disease activity genomes, respectively (Fig. 2a). Conversely, expanded and emerged DCEs comprise over 30% in high activity versus less than 10% in low activity patients (Fig. 2a). These observations are suggestive of different modes of dynamic changes in co-expression domains, with low SLE activity genomes characterized by DCE fragmentation and high activity ones featuring a redistribution of co-expression with increased percentages of expanded and emerged DCEs. This redistribution was also supported by a simple value measure of DCE pattern similarity, calculated with the implementation of BPscore 34 , which showed that in spite of being comparable in genome coverage, the DCEs between high activity patients and healthy controls were radically different in terms of genomic localization (Supplementary Figure 4).

Gene expression changes are reflected upon DCE dynamics.
Changes in the patterns of co-expression may be linked to differential gene expression and underlying chromatin dynamics. To address this, we employed Modular and Weighted Gene Co-Expression Network Analysis (WGCNA) 31 of differential gene expression on the three disease activity groups against healthy individuals. The results were strongly suggestive of quantifiable phenotypic variability between patients with different clinical activity states, in agreement with the previously defined susceptibility and severity gene signatures (Supplementary Figure 5). In addition, we were www.nature.com/scientificreports/ able to define gene expression modules and to correlate them to clinical characteristics such as disease activity. Comparison of WGCNA results with clinical characteristics of the cohort samples allowed us to identifiy a "SLE-DAI gene module", which comprised 224 genes, enriched for innate and adaptive immune pathways, particularly signaling through the Fc-γ and B-cell receptors (BCR). A "Nephritis module" (184 genes) and an "IFN module" (282 genes) were also identified, the latter being highly associated with anti-nuclear and anti-DNA antibodies (Supplementary Figure 6). In order to investigate how changes in the gene expression of these modules may be correlated to the patterns of positional co-expression, we analyzed the degree of overlap between the different DCE categories and the WGCNA modules obtained from our dataset (Fig. 2b). The IFN module was over-represented in split DCEs across all SLE groups and was particularly enriched in DCEs that become depleted within the high activity group, implying that increased IFN pathway gene activity may be linked to loss of co-expression structure. Conversely, the nephritis/neutrophil-specific module was enriched in emergent DCEs from low disease activity genomes. This may be indicative of underlying tendencies in gene deregulation being present even in patients without developed symptoms but who may yet be predisposed to disease flares. Consistent with observations at the level of functional enrichments, the B-cell module was enriched in DCEs that are depleted and largely absent from high disease activity DCEs. Taken together, these findings indicate that functional aspects of gene expression pertaining to distinct clinical characteristics are reflected on the genome organization.
Dce dynamics are strongly associated with chromatin accessibility and chromosomal compartments. Transcriptional coordination in self-contained domains is tightly linked to underlying chromatin organization at various levels ranging from topologically associated domains (TADs) to more extended chromosomal compartments. We went on to correlate the dynamics of DCE patterns with underlying genomic features related to chromatin accessibility and chromosomal compartmentalization. By comparing the coordi- www.nature.com/scientificreports/ nates of stable and dynamic DCEs against ATAC-Seq peaks defined for B-cells in severe-case SLE against healthy individuals 8 , we found split, contracted and merged DCEs (of all disease activity groups) to be enriched in peaks of decreased chromatin accessibility (Supplementary Figure 7). Conversely, depleted and emerged DCEs of all SLE activity groups were enriched (although with a smaller effect size), almost exclusively, in over-accessible regions. This finding suggests a clear distinction between the DCEs that are locally modified, which tend to be confined in under-accessible regions, and those that are dynamically re-distributed, which are preferentially located in more accessible chromatin. We performed a similar analysis at the level of chromosomal compartments (at 100kbp resolution) as defined in a B-lymphoblastoid cell line 9 . On a large scale, chromosomes may be organised into two broad compartments labelled A and B, corresponding to active and inactive chromatin, and also bearing other distinct properties. These may be further subdivided to A1 and A2 and B1 to B4 9 . A chromosomal coordinate overlap enrichment analysis showed DCEs to be generally enriched in the euchromatic A compartment (Fig. 2c). When focusing on specific DCE subtypes, we found that regions, belonging to the most dynamic subsets of emerged and depleted DCEs, were enriched in the A2 subcompartment, which is associated with late-replicating, low GC content DNA, enriched in H3K9me3 and longer gene transcripts 9 . On the other hand, intact DCEs and in general, DCEs that are less dynamic, appear to be more enriched in the gene dense, early-replicating A1 subcompartment. Enrichments in the B4 subcompartments are probably due to the over-representation of particular DCEs in chromosome 19, which hosts the entirety of this very small subcompartment.
Together, the differential enrichments of split and contracted DCEs, compared to the dynamically redistributed emerged and depleted regions, in terms of chromatin accessibility and genome compartments, indicate an interplay between gene regulation and underlying chromatin environment. Regions of high gene density tend to have highly correlated gene expression, but this pattern changes radically with the splitting of co-expression domains in low disease activity and the emergence of new, probably re-arranged domains in high disease activity SLE patients. We hypothesize that epigenetic changes that increase chromatin accessibility, in particular in A2 genomic compartments, may create a permissive environment for the redistribution of co-regulated genomic domains, which are, moreover, associated with functions characteristic of increased disease activity.

Dce splits disrupt enhancer-promoter interactions of key biological functions. While split
DCEs represent no more than 5-10% of the total genome coverage, they are highly enriched among differentially expressed genes and in particular with the IFN gene module. Given their additional enrichment in low disease activity patients and therefore, their possible implication in further disease progression, we performed a focused analysis of split DCEs and the genes lying on their boundaries (see "Methods" section). These were predominantly enriched among the targets of specific transcriptional regulators, a number of which belonged to the broad categories of zinc fingers (SALL1, Ikaros, ZIC3 etc.) and oncogenes (GLI1, ING4) (Fig. 3a). Members of the Ikaros transcriptional regulators have been genetically associated with SLE 2 , and interestingly, IKZF3 lies within a disrupted DCE in all SLE groups.
Based on the differences in the extent of split DCEs between low and high activity genomes, we next assessed their overlaps with the SLE susceptibility and severity gene signatures as previously defined for the same dataset 6 . We found significant differences between the two gene sets with susceptibility genes being highly enriched in split DCEs in contrast to a depletion of severity genes (Fig. 3b). Genes belonging to the susceptibility signature are also enriched in the subset of differentially expressed genes that are found in low disease activity split DCE boundaries (p = 0.0061). Protein and regulatory interaction network analysis of these genes, performed through STRING-DB 35 , revealed an IFN gene signature (Fig. 3c) and interestingly, a set of highly connected genes associated with DAP12 signaling (Fig. 3c, cyan polygon). DAP12 (TYROBP) is a key activator of NK cells, which are reported to have impaired function in SLE patients 36 . Smaller network modules were associated with neutrophils (lime) and B-cells (yellow). We may thus see how, by focusing on split DCE regions we may prioritize genes of the broader susceptibility signature and to investigate their functional connections.
Given the DCE definition as regions with increased regulatory interactions, it is plausible to expect that gene promoters are more likely to be associated with enhancers that are lying within the same region. To test this hypothesis, we obtained cell-type specific promoter-enhancer interactions for CD4, CD8 and CD14 and CD19 cells from Enhancer Atlas 37 and identified genes whose promoter-enhancer pairs were nested within the same DCE in the healthy state but disrupted in SLE. We found that a significant percentage of enhancers-promoter connections that are completely nested in healthy DCEs are disrupted by a DCE split or depletion in one of the SLE disease activity states. Thus, it seems that the redistribution of gene co-regulation domains in disease may also be disrupting the regulatory links between enhancers and their cognate promoters.
Functional enrichment of the genes, whose enhancer-promoter associations are disrupted in SLE, revealed relevant biological functions (Fig. 3d). More specifically, functions related to the immune system are, as expected, highly enriched in all disease activity groups. Others, such as protein metabolism, translation and protein turnover are particularly enriched in high disease activity patients. Interestingly, interleukin-15 (Il15) and interleukin-21 (Il21) signaling are specifically enriched in high activity patients even though with low effect sizes (Fig. 3d). In the context of SLE, increased Il15 levels may regulate the function of NK cells and also enhance the expression of the costimulatory receptor CD40L (CD154) on T-cells via STAT5 38 . Interleukin-21 is released by CD4+ T follicular helper cells and plays an important role in SLE pathogenesis by promoting the maturation of B-cells into autoantibodies-producing plasma cells 39 . More interestingly, the receptors of Il15 and Il21 share the common gamma chain (γc) subunit (CD132) and mediate intracellular effects through activation of the Janus kinase (JAK)-1 and JAK-3 kinases, which are implicated in SLE and are currently tested as putative therapeutic targets 40

Discussion
Genome organization is intricately linked to gene expression and regulation in health and disease, with differentially expressed genes creating clusters under various conditions. Our study, the first such conducted in SLE, shows that genes are organized in extended domains of coordinated expression but, moreover, that these domains are highly dynamic and extensively reorganized during disease progression. While high activity patient patterns are suggestive of a general re-organization of gene regulation that extends to broader chromosomal domains, increased fragmentation of gene co-expression is observed even in the genomes of patients with low disease activity. This may suggest that the observed disruptive patterns of gene expression are be related to the way initial cellular signals propagate in the genome in order to affect hundreds of abnormally regulated genes. Thus, the more disconnected co-expression in low activity SLE genomes could be linked to mechanisms, with which flares occur even in patients that are in remission. While, the governing principles of such mechanisms are yet to be resolved, our analyses suggest a key role for the chromatin environment. Differential enrichment of DCE patterns between open and closed chromatin and chromosomal compartments pertaining to early and late-replicating chromatin, are strong indications of epigenetic patterns underlying the fragmentation and re-organization of gene co-expression. Epigenetic effects, downstream of environmental triggers are expected to lie at the basis of SLE aetiopathogenesis, given the limited association of genetic factors reported for the disease. Further investigation of the mechanisms linking chromatin structure and the organization of gene expression in SLE could be assisted by our approach, through the prioritization of chromosomal domains with increased regulatory potential.
Besides epigenetic phenomena, the formation of co-expression domains could occur more transiently as the result of differential expression in any given setting 41 , through the clustering of differentially expressed genes, that have been positionally constrained through evolution 20,42 . Such a notion is supported by our data in two ways. First through the association of the observed DCEs with functions that are known to be activated in SLE. Major pathways related to the intensity of the symptoms (such as the IFN signature) are associated with the disruption of co-expression, while downstream effects of SLE, related to the damage of organs (e.g. nephritis) are correlated www.nature.com/scientificreports/ with the general re-organization of co-expression in emergent domains. In addition gene signatures from both expression and genome-wide association data are enriched in various types of DCEs (Supplementary Figure 8), a strong indication that transcriptomic as well as genetic data may reveal a hidden layer of information when studied through the lens of genome organization. The dynamics of co-expression clustering are also linked to differential expression, through the tendency of deregulated genes to occur in the boundaries of split DCEs. Inspection of the dynamics of DCE splits is, moreover, indicative of the general pattern of fragmentation and redistribution as is showcased in a number of examples where, compared to a contiguous DCE pattern in the healthy state, we observe splits in low disease activity and more generalized reorganization in high disease activity patients (Fig. 4). The fact that split/disrupted regions are more prominent in low disease activity genomes, combined with their proximity to genes belonging to the susceptibility signature, may come as an indication of an underlying hierarchy behind the gene deregulation  www.nature.com/scientificreports/ program. Indeed, we find enhancer-promoter associations of high relevance to be affected by the disrupted patterns of gene co-expression, which is strongly indicative of DCE splits having a possible multiplicative effect on gene regulation. Overall, our findings suggest that disruption of co-regulation patterns may represent a hallmark of the disease and that, moreover, that this process is constrained by epigenetic factors and the overall chromatin conformation (Fig. 5). The approach we present here constitutes a first attempt to analyze gene expression at the level of genome organization in a complex disease and points to a number of interesting hypotheses linking the SLE phenotype with the underlying genome structure. Targeted conformation capture experiments on homogeneous cell cultures could be implemented in order to test these hypotheses. At the same time, the implementation of singlecell approaches at both transcriptome and genome conformation levels, could provide a data-rich framework for the application of our approach, with the final aim of obtaining cell-type specific co-expression profiles at increased resolution. Figure 5. Patterns of gene co-expression in SLE. Graphical representation of the most prominent characteristics of gene co-expression patterns. Healthy genomes have extended domains of co-ordinated gene expression (DCE), but these become shorter and more fragmented in the genomes of patients with low disease activity. A significant number of differentially expressed genes (DEGs) in SLE low-activity genomes are associated with a disease "susceptibility" signature, located in split DCEs, linked to interferon and other signaling pathways and enriched in regions of low chromatin accessibility. In high disease activity genomes DCEs are re-distributed in new regions, where genes linked a SLE "severity" signature and more generalized manifestations of the disease (e.g. nephritis) localize in areas of DCE contraction, depletion and overall increased chromatin accessibility.