Non-backtracking walks reveal compartments in sparse chromatin interaction networks

Chromatin communities stabilized by protein machinery play essential role in gene regulation and refine global polymeric folding of the chromatin fiber. However, treatment of these communities in the framework of the classical network theory (stochastic block model, SBM) does not take into account intrinsic linear connectivity of the chromatin loci. Here we propose the polymer block model, paving the way for community detection in polymer networks. On the basis of this new model we modify the non-backtracking flow operator and suggest the first protocol for annotation of compartmental domains in sparse single cell Hi-C matrices. In particular, we prove that our approach corresponds to the maximum entropy principle. The benchmark analyses demonstrates that the spectrum of the polymer non-backtracking operator resolves the true compartmental structure up to the theoretical detectability threshold, while all commonly used operators fail above it. We test various operators on real data and conclude that the sizes of the non-backtracking single cell domains are most close to the sizes of compartments from the population data. Moreover, the found domains clearly segregate in the gene density and correlate with the population compartmental mask, corroborating biological significance of our annotation of the chromatin compartmental domains in single cells Hi-C matrices.

Interest to polymer modular networks has appeared recently in the context of genome spatial folding.Proximity of chromatin loci in space is believed to be deeply connected with gene regulation and function.Hi-C experiments [13,14,16] provide the genome-wide colocalization data of chromatin loci.As the main outcome of the experiment, large genome-wide matrices of contacts from each individual cell or from the population are produced.Analyses of these matrices has revealed that the eukaryotic genome is organized in various and biologically relevant communities, whose main function is to insulate certain one regions of DNA and to provide easy access to the others.In particular, the data collected from a population of cells suggest that transcribed ("active") chromatin segregates from the, "inactive" one, forming two compartments in the volume of the nucleus [14,18].Inside the compartments chromatin is organized further into a set of topologically-associated domains (TADs) [19][20][21] that regulate chromatin folding at finer scales.However, interpretation and validation of communities in individual cells remains vaguely defined due to sparsity of respective data.
The broad field of applications of stochastic modular networks initiated the boost development of community detection methods.Spectral algorithms exploit the spectrum of various operators (adjacency, Laplacian, modularity) defined on a network to identify the number of communities and to infer the optimal network partition [22][23][24][25][26]. Typically, leading eigenvectors of these operators positively correlate with a true community structure.These algorithms, along with the majority of theoretical results in the field, are derived for the stochastic block model (SBM) [22,26] as an extension of Erdös-Rényi graphs [27] with explicitly defined communities.One of the strongest limitations of the SBM is that the edges between vertices belonging to the same cluster inevitably attain equal weights.At the same time, biological networks typically have several levels of organization within their communities [28].In particular, identification of several hierarchical levels in the network becomes tremendously important in the case of polymer networks where different pairs of loci have marginally different probabilities to form a spatial contact, caused by the frozen linear connectivity along the chain.
Already for simplest polymer systems the contact probability demonstrates a power-law behavior with the exponent characterizing universal long-ranged properties of the folding and dependent on the space dimension [15].In this work we propose the "polymer stochastic block model" that reflects a particular global polymer organization of the network and explicit structuring into communities.The main new ingredient is the average contact probability P (s = |i − j|) between the pairs of loci (i, j) which is constant for non-polymeric networks, but cannot be neglected for polymers.
Chromatin single cell networks are not only polymeric, but also sparse [16,30].It is known that upon reduction of the total number of edges in the network, there is a fundamental limitation for all community detection methods to succeed [26,31].Furthermore, traditional operators (adjacency, Laplacian, modularity) fail far above this resolution limit, i.e. their leading eigenvectors become uncorrelated with the true community structure above the threshold [32].This is explained by the "Lifshitz tails" at the edge of their spectral density, which are a manifestation of localization of simple random walks on hubs of a sparse network [33,34].Localization on hubs, but not on true communities is a drawback of all conventional spectral methods in the sparse regime.
To prevent the effect of localization on hubs and to make spectral methods useful in the sparse regime, Krzakala et al. proposed to deal with non-backtracking random walks on a directed graph that cannot revisit the same node as soon as in one step [32].The crucial property of the non-backtracking walks [35] is that they do not concentrate on hubs.It has been shown that the non-backtracking operator is able to resolve the community structure in sparse stochastic block model up to the theoretical resolution limit.The spectrum of the non-backtracking operator (which is a non-symmetric matrix with complex eigenvalues), consists of the bulk within a disc and a number of isolated eigenvalues on the real axis.
For the sake of community detection in sparse polymer networks we construct the polymer-type non-backtracking walks, appropriate for community detection in graphs with hidden linear memory.We establish the connection between this operator and the generalized polymer modularity, thus, bridging a gap with the maximum entropy principle.We test the performance of different spectral methods (with and without polymer background) on sparse artificial benchmarks of polymer networks that mimic compartmentalization in single cell Hi-C graphs.We show that the polymer non-backtracking walks resolve the structure of communities up to the detectability threshold, while all other operators fail above it.In order to demonstrate efficiency of the method on real data, we partition a set of single cell Hi-C contact maps of mouse oocytes into active (A) and inactive (B) compartments by different operators.Found domains are shown to have similar sizes to the compartmental domains existing in the population-averaged data.Analyses of the GC content within the domains demonstrates enrichment and depression of the genes density in the two clusters and corroborate their biological significance.
The structure of the paper is as follows.In Section II we propose the polymer stochastic block model, derive the entropy and the corresponding generalized modularity functional.In Section III we construct the polymer non-backtracking walks, prove their robustness on benchmarks emulating compartments and apply them to oocyte single cells data.In Section IV we draw the conclusions.
Consider a polymer chain of N beads, i = 1, 2, ..., N , with spatial configuration {x i } and construct a corresponding topological graph G = (V, E) (as described below) with the adjacency matrix A ij .Such graphs are typically constructed upon processing of chromatin single cell Hi-C data and in computer simulations of DNA folding [13,14].A graph G does not contain pairwise spatial distances of the polymer configuration, however, provides information on spatial proximity which is usually of the major biological relevance.For the 1-bin resolution of G the polymer beads (bins) are the nodes V .The edge between a pair of nodes (i, j) is defined by the condition (i, j) ∈ E iff |x i − x j | < ε, where the threshold ε is some cutoff radius with which the contacts between the two loci are registered in Hi-C.Due to finite excluded volume of chromatin, the theoretical number of contacts per monomer that can be registered in single cell experiments is of order of few units, while the total size of the polymer chain is huge (N ∼ O(10 5 ) in the 1-kb resolution for human chromosomes).Thus, the single cell contact matrices are essentially sparse [16,30].Summation over realizations of adjacency matrices A ij obtained from different cells results in a "population-averaged" matrix A ij .By construction, the entries of the weight matrix A ij are proportional to the probability that the space distance between the monomers (i, j) is less than ε.
Already for the simplest configurations of a polymer chain, such as an ideal polymer chain isomorphic to the random walk, A is not expected to be a homogeneous matrix.This is due to a polymeric power-law behaviour of a contact probability, In polymer physics the critical exponent, α, is an important parameter, characterizing memory in the chain and the embedding space dimension [15].The memory can arise from some peculiar topological state of chromatin or be a result of partial relaxation of mitotic chromosomes [38].Notable examples of α, typically appearing in the chromatin context, are α = 3/2 for ideal chain and α ≈ 1 for the crumpled globule (both in a 3D space) [14,29,39,40].
Communities of folded chromatin refine the background contact probability at small scales and are biologically significant.We treat them as canonical stochastic blocks [22,26] superimposed over the background.Stochastic block model is a network model in which N nodes of a network are split into q different groups G i , i = 1, 2, ..., q and the edges between each pair of nodes are distributed independently with a probability that depends on the group labels ("colors") of respective nodes.It is said, there is a matrix of pairwise group probabilities Ω = ω rt with r, t = 1, 2, ..., q and a randomly chosen pair of nodes (i, j) belonging to groups i ∈ G r , j ∈ G t is linked by an edge with probability ω rt .The corresponding entry in the adjacency matrix A ij is 1 with probability ω rt and 0 otherwise.The sum of many such "single-cell" Bernoulli matrices generates an analogue of the "population-averaged" Hi-C matrix A with Poisson distributed number of contacts with the mean λ ij = ω rt where i ∈ G r , j ∈ G t .To the first approximation, the communities can be considered identical (also known as a planted version of the model) Having ( 1) and ( 2), the simplest assumption one can come up with is that formation of compartments in chromatin is independent of the global memory of the folding.Indeed, phenomenon of compartments is likely related with the preferential interactions of the nodes of the same epigenetic type (e.g., "active" or "inactive") and is modelled as a phase separation of block-copolymers [17].This allows to propose that (1) and ( 2) can be factorized, so that the final probability for the edge (i, j) reads To emulate compartments in single cell Hi-C network, we consider a simple adjacency benchmark of a polymer with two contiguous communities.Namely, we color the chain into alternating segments of A and B type, whose lengths are Poisson distributed with the mean length λ.An example of resulting adjacency matrix is depicted in Fig. 1(a).Note that due to decay of the contact probability, the "checkerboard" compartmentalization pattern is hardly seen in single cells Hi-C data [30,42].Because segments of the same type are surrounded by segments of the other type, they form local "blob-like" clusters along the main diagonal of the adjacency matrix reminiscent to topologically-associated domains [19]; however, they are likely formed by a different mechanism and have an order of magnitude larger size than TADs [17].Such a multi-domain blob structure in Fig. 1

B. Statistical inference of polymer SBM and generalized modularity functional
Suppose that the population-averaged matrix A is observed.The statistical weight of A conditioned on the cluster probability matrix Ω, background contact probability P and group labels of the nodes {g i }, reads where the product runs over all pairs of nodes in the network.Since there are no self-edges in the network, all the diagonal elements of the matrix A are zeros and we do not include them into the product (4).The corresponding partition entropy of the polymer SBM is log Z(A| Ω, P, where we have omitted the constant terms − log A ij ! and A ij log P ij , independent of the partition.For identical communities (see ( 2)), we get Taking into account (6) and omitting again all irrelevant constant terms, we arrive at the final expression for the entropy ( 5) where β = (log w in − log w out ) −1 is some coefficient and is a parameter describing the cluster probabilities inherited from the initial definition of the stochastic blocks.
The entropic functional (7), up to normalization coefficients and constant terms, is the generalized modularity functional.For P ij = d i d j / i d i , where d is the vector of degrees, (7) reduces to the modularity proposed by M. Newman [3,43] for the sake of spectral community detection in scale-free networks.The operator of the generalized modularity reads The second term in ( 9) can be understood as an expectation number of contacts between the nodes (i, j) in the population-averaged data or as a probability of a link in the single cell graph.Indeed, without the stochastic blocks, this value equals P ij by definition.The factor γ responds for the cluster structure superimposed over the background.In the limit of "weak" communities, when w in = w out → 1, the partition parameter yields γ → 1, which corresponds to the pure background.To determine the optimal value of γ, one can run a recursive procedure, which consists in iterative maximization of the generalized modularity and renormalization of γ according to (8).We realize this approach in our numerical analyses below.

III. POLYMER NON-BACKTRACKING FLOW OPERATOR
A. Non-backtracking walks on a directed polymer network Search for the global maximum to the modularity functional is a very hard problem.One of most promising approaches which avoids a brute force, is to suggest that if the community structure is significantly strong, there is an operator whose eigenvectors encode the network partitioning in these communities [3,26].However, as it was first noted by Krzakala et al [32], for sparse networks leading eigenvectors become uncorrelated with true community structure well above the theoretical threshold.As a result, all conventional operators such as adjacency, Laplacian and modularity fail to find communities in rather sparse networks.
To overcome this difficulty, it was proposed to exploit the spectrum of the Hashimoto matrix B, which is a transfer matrix of non-backtracking walks on a graph [35].It is defined on the edges of the directed graph, i → j, k → l, as follows It is seen from ( 10) that the non-backtracking operator prohibits returns to the point which a walker has visited at the previous step.Since matrix B is non-symmetric, its spectrum is complex.For Poissonian graphs the spectral density of B is constrained within a circle of radius d and exhibits no Lifshitz tails, in contrast to the conventional operators [32].Real eigenvalues lying out of the circle become relevant to community structure even in sparse networks.Association of the corresponding eigenvectors with the network partitioning results in detection of communities all the way down to the theoretical limit.In [23] M. Newman suggested a normalized operator, that conserves the probability flow at each step of the walker.
For the sake of community detection in sparse polymer graphs, we propose a conceptually similar operator that describes the evolution of the non-backtracking probability flow on a graph with intrinsic linear memory One can establish the connection between the non-backtracking operator and the generalized modularity, derived as a result of the statistical inference of the polymer SBM in the previous section (see Appendix).Thus, partitioning of a polymer network into two communities according to the leading eigenvector of the polymer non-backtracking flow operator (11) responds to the maximum entropy principle.
An example of the non-backtracking walk on a polymer graph is illustrated in the Fig. 2(a).Note that despite immediate revisiting of the nodes is forbidden, the walker is allowed to make cycles.The second term in (11) plays a role of neutralization towards the marginal contact probability, arising from the linear organization of the network.This compensation provides a measure for the non-backtracking operator to disentangle communities from the fluctuations, imposed by the polymeric scaling.Trivially, the proposed non-backtracking operator reduces to the Newman's flow operator, when the background is not polymeric, but corresponds to the network with fixed expected degrees, H ij = d i d j /2m [23].For a pure polymer graph without contamination by communities, the spectrum of ( 11) is constrained within the circle of radius r = d(d − 1) −1 .As sufficiently resolved communities form in the network, isolated eigenvalues appear on the real axis.
In the Fig. 2(b) we depict the non-backtracking spectrum of a polymer SBM, corresponding to the fractal globule polymer network with P (s) = s −1 of the size N = 1000 with two compartments, organized as contiguous alternating segments with the mean length λ = 100.For the parameters w in , w out used, the two compartments are well resolved that is provided by the isolated eigenvalue separated from the circle.Since the leading eigenvector u (1) of the polymer non-backtracking flow, in contrast to the adjacency or modularity, is defined on directed edges of the network, one needs to evaluate the Potts spin variables g i = ±1 in order to classify the nodes.From the connection between the modularity and polymer flow operator one sees that contribution to the i-th node g i comes from the flow along all the directed edges pointing to i. Thus, in order to switch from edges to nodes, one needs to evaluate the sign of the sum j→i and to assign the node i accordingly, g i = sign(v i ). .

B. Spectral clustering of the polymer stochastic block model
In this section we investigate spectral properties of the polymer non-backtracking flow and compare performance of different operators in partition the polymer SBM.The two compartments with λ = 100 are superimposed over the fractal globule, P (s) = s −1 , with total size of the network, N = 1000.We fix the weight of internal edges at w in = 1 and change the resolution of compartments by tuning the weight of external edges, w out = 0.1 − 0.8.Efficiency of splitting is assessed by the fraction of correctly classified nodes.
In Fig. 3(a) we compare the performance of adjacency, normalized Laplacian, M. Newman's flow operator, polymer modularity and polymer non-backtracking flow matrices.For the latter two, the optimal value (8) of the parameter γ was chosen.It is evident that the polymer flow operator surpasses all conventional operators without the background, as well as the polymer modularity everywhere above w out ≈ 0.5.Qualitatively similar behaviour was demonstrated by the traditional non-backtracking operator without the background, when it was compared to the other operators in [32].Therefore, this analyses (i) underscores the importance of taking into account the contact probability (polymer background) when dealing with polymer graphs and (ii) recapitulates efficiency of non-backtracking walks in resolving communities in sparse networks.
It is worth to note that the abrupt fall in performance of the polymer flow operator coincides with the leveling of its amount of isolated eigenvalues curve at zero, see Fig. 3(d).Thus, external weights around w out = 0.5 correspond to the detectability transition, above which the leading eigenvector becomes uncorrelated with the true nodes assignment.To understand if this corresponds to the theoretical detectability limit, we translate w out into the average amount of inner, c in , and outer, c out , edges and plot them as functions of w out .As it is shown in Fig. 3(c)), the polymer flow operator drops close to the theoretical detectability transition for regular stochastic block models [44] (in which each node has exactly c in random links with other nodes in its community and exactly c out randomly pointed links to nodes from a different community): For the stochastic block model the number of isolated eigenvalues of B equals to the number of communities [32].However, in case of the polymer operator R the number of isolated eigenvalues can be larger and "apparent" clusters might be formed "locally" at the main diagonal due to frozen linear connectivity, see Fig. 1(a).This is evident from the Fig. 3(d) which shows that the number of isolated eigenvalues for the polymer flow operator can be of order of the amount of the segments (N/λ) if w out is sufficiently low.Indeed, the probability of an edge between two distant segments of the same type is by a factor 1/s smaller than the probability for two neighboring monomers from the same segment (recall that s = |k − m| is the genomic distance between segments k and m).Due to a little amount of contacts between the segments, the polymer non-backtracking flow ends up rationalizing them as separate clusters.
The value of γ cannot be chosen arbitrary and should reflect optimal parameters of the stochastic blocks.Thus, one may propose the following iterative approach: (i) begin with the initial value γ 0 = 1, for which we obtain the network partition, (ii) use the resulting numbers of inner and outer edges for estimation of w in , w out , (iii) recalculate γ 1 according to (8), (iv) repeat procedure iteratively until γ converges to γ opt .Results of this approach are demonstrated in the Fig. 3(b) for five different values of w out .It is seen that just several steps of iteration is sufficient to obtain a reasonable convergence towards the theoretical values provided by (8).A drawback of this iterative procedure is that at each step one needs to evaluate the spectrum of the operator 2m × 2m, which could become a hard computational task for large and dense networks.As a reasonable approximation to the optimal value of γ for the polymer flow operator, one can evaluate γ opt similarly for the polymer modularity, which is smaller in size and symmetric.

C. Polymer non-backtracking flow resolves compartments in a single cell Hi-C network
To check robustness of the polymer non-backtracking flow operator on real Hi-C data we run it on a set of individual oocyte cells of mouse [30].From the public repository we have taken the single cells Hi-C data on cis-contacts of 20 chromosomes from 13 single cells (260 adjacency matrices, in total).While single cells matrices with sufficiently large number of contacts are not sparse and can be split into compartments using conventional methods largely used for the bulk data (e.g., the leading eigenvectors of observed/expected transformation of a population-averaged Hi-C map, [14]), here we take the cells with low to moderate amount of contacts for the sake of comparative analyses of clustering performance of different spectral methods on sparse polymer graphs.
Before proceeding with the analyses of compartments in single cells, the raw data must be preliminary processed.In order to extract compartmentalization signal from the maps, one has to coarse-grain them to the resolution 200kb.At this resolution all finer structuring of the genome folding (like topologically-associated domains) is encoded within the coarsegrained blobs and does not communicate with two large-scale A and B compartments.Most of the contacts in cells have degeneracy 1 at the chosen resolution, however, several pairs of the coarse-grained 200kb blobs have more than 1 contacts between their internal monomers.To preserve this feature of enhanced connectivity, we consider the counts of contacts between the pairs as weights of the corresponding edges.Furthermore, the single-cell maps are noisy and some of really existing contacts get lost due to technical shortcomings of the protocol.As long as the neighboring blobs in the chromatin chain are connected with probability 1, all lost contacts A i,i+1 need be added to the adjacency matrix manually; we assign the weight 1 to such edges.We also cleans the coarse-grained data from the self-edges, assigning A ii = 0.
To determine the background model for our analyses we calculate the contact probability for each individual single cell and for the merged cell (summed single cells matrices), see the inset in the Fig. 4a.Resulting dependency turns out to be fairly close to the fractal globule contact probability, P (s) ∼ s −α with α = 1 at scales from ≈ 2M b to the end of the chromosome.A shoulder at lower scales around 1M b reflects enhancement of the contact probability due to the compartmentalization. Importantly, the fractal globule scaling at the megabase scale is universal across different species and cell types; it is evident in the population-averaged contact matrices in mouse oocytes [30], human lymphoblastoid cells [14] and Drosophila cells [37].As it was shown in the previous section, in order to extract compartmentalization profile overlaying a specific long-ranged folding, it is crucial to incorporate the respective background contact probability into the polymer model of the stochastic blocks.Having the background model determined, we construct the polymer non-backtracking flow operator with the variable parameter γ and run the iterative clustering procedure to derive its optimal value γ 0 .Similarly to the analyses on the benchmarks, see Fig. 3b, a swift convergence to the optimal value is observed here.The spectrum of the polymer flow operator for the cell 29749, chromosome 3 at γ 0 ≈ 0.9 is shown in the inset of the Fig. 4b.Nineteen isolated eigenvalues on the real axis are separated from the bulk spectrum.As we have shown in the previous section, this is a quite typical scenario for sparse polymer stochastic block models.In the sparse limit of the polymer SBM, the number of isolated eigenvalues could be much larger than the number of compartments.
The partition of single cells networks in two compartments has been performed in the leading eigenvector approximation of the different operators.The boundaries of active and inactive domains are determined in accordance with the sign of the respective compartmental signals (see Fig. 4(b) and Fig. S5).It is known that the gene density is higher in the actively transcribed A compartment, thus, the fraction of GC letters in bins of active compartmental domains needs to be larger than in inactive domains.To test whether the clusters found in single cells respond to the transcriptional domains and are biologically significant, we calculate the GC content profiles around the centers of all A and B domains separately (the types of the domains were phased in accordance with the leading eigenvector of the bulk data) and then take the average of these profiles in each group.
As expected, the GC content for the population-averaged map (embryonic stem cells, data is taken from [46]) has a pronounced peak at the center of A domains and a symmetrical dip at the center of B domains with the z-score amplitude equal to 0.4.Single cells profiles demonstrate notably lower amplitudes (see Fig. 4(c,d) and Fig. S6).However, only the polymer non-backtracking flow yields the annotation with the similar shape and span.Both profiles (for A and for B) of the polymer non-backtracking flow fall symmetrically to zero at the same genomic distance around 4−5 bins from the center of domains, which also strikingly coincides with the span of the bulk profiles.This is a consequence of similar characteristic sizes of compartmental domains determined by the non-backtracking flow operator ( l ≈ 10.9 bins) and domains from the bulk data ( l ≈ 8.4 bins).Notably, the partitions of the polymeric operators are visibly much more adequate to apparent clustering of contacts in a particular cell (Fig. S5).Despite the similarity in compartmental signals from the polymer modularity and from the polymer non-backtracking flow, the sizes of modularity domains are twice larger and show negative z-scores both for the active and inactive compartments.This is a consequence of sparsity, which results in a poor performance of all traditional spectral methods.

IV. CONCLUSION
In this paper we have developed theoretical grounds for spectral community detection in sparse polymer networks.On the basis of suggested polymeric extension of the stochastic block model, we have proposed the polymer non-backtracking flow operator and have proved that its leading eigenvector partitions a polymer network with two clusters in accordance with the maximum entropy principle.The established connection with the modularity functional provides auxiliary and computationally efficient means for the network partition and search for the optimal resolution parameter of the partition in polymer networks, which, however, is inferior to the non-backtracking in efficiency in sparse cases.
The proposed theoretical framework is verified by extensive numerical simulations of polymer benchmarks, constructed in order to emulate compartmentalization in sparse chromatin networks.Comparative analyses of different operators on the benchmark has suggested that the polymer flow detects the communities up to the theoretical detectability limit, while all other operators fail above it.At the same time, the amount of isolated eigenvalues of the polymer flow operator can be larger than amount of true communities present in the network, due to the frozen linear connectivity that forces the chain to form "blobs" along the chain contour.This result puts the polymer system in contrast with the canonical stochastic block model, where the number of isolated eigenvalues of the non-backtracking exactly matches the number of communities.
Analyses of the single cell Hi-C data of mouse oocytes suggests that the non-backtracking walks efficiently split experimental sparse networks into biologically significant communities, characterized by enrichment and depression of the genes density.The sizes of the compartmental domains are fairly close to the sizes of the population-averaged domains.Comparison with characteristics of the domains, inferred by other operators, underscores superiority of the non-backtracking walks in partitioning sparse polymer networks.
In this study we have utilized for polymer network analysis only the simplest spectral characteristics -spectral densities and corresponding eigenvectors.More involved onesspectral correlators and the level spacing distribution carry additional information about the spectral statistics and a propagation of excitations.The spectral statistics and nonergodicity have been discussed in clusterized networks in [11,12].In context of the gene interaction the spectral statistics has been discussed in [47] for the matrices with the real spectrum.The non-backtracking matrices enjoy complex spectrum hence the special means are required to analyze the spectral statistics in this case.The corresponding tool has been invented recently [48,49] hence a spectral statistics of the non-backtracking flow operator certainly deserves a separate study.Below each map the compartmental signal from the leading eigenvector of the corresponding operator is provided.Hi-C data is taken from [30].
(a) is a reflection of the polymeric nature of the network and it cannot be reproduced with communities of general networks, i.e. in the framework of the canonical stochastic block model with two clusters -see Fig. 1(b) for comparison.

Figure 1 :
Figure 1: Adjacency matrices of N = 1000 with two clusters generated according to the polymer stochastic block model (left; w in = 1, w out = 0.1, P (s) = s −1 , λ = 100) and canonical stochastic block model (right; w in = 0.1, w out = 0.01, λ = 500).Vertices are enumerated by the polymer coordinate (left) and first all red, then all blue ones (right).Induced by the linear memory, the contact probability completely vanishes the compartmental checkerboard pattern away from the main diagonal.

Figure 2 :
Figure 2: (a) Depiction of the polymer SBM network: the backbone (bold), contacts between genomically distant monomers (dashed) and two chemical sorts of the monomers (red and blue), arranged into contiguous alternating segments.An example of the non-backtracking walk on such graph is shown by the arrows.Immediate returns are forbidden, which prevents localization on hubs; (b) Spectrum of the polymer non-backtracking flow (11) for the fractal globular (P (s) = s −1 ) large-scale organization of the chain with two overlaid compartments with mean length λ = 100.

5 𝛾𝛾Figure 3 :
Figure 3: (a) Comparison of performance of different classical operators without background, polymer modularity and polymer non-backtracking flow operators (N = 1000, P (s) = s −1 , w in = 1, λ = 100); (b) The iterative approach that can be used to determine the optimal value of γ for five values of w out ; the true optimal values of γ calculated from (8) are shown by dash; (c) The mean numbers of inner c in and outer c out edges are calculated for each value of w out in order to estimate the detectability threshold for the corresponding regular network.(d) Amount of isolated eigenvalues of the polymer flow operator plotted against w out .Full spectra of the polymer flow operator for the two values of w out are shown in the insets.

Figure 4 :
Figure 4: (a) The average contact probability P (s) of single cells (gray) and of the merged cell (solid, black) computed for logarithmically spaced bins with the logfactor 1.4; the fractal globule scaling P (s) ∼ s −1 is also shown by dashed line for comparison.(b) Annotation of active (red) and inactive (blue) compartmental domains for one of the contact maps (cell 29749, chromosome 3, length N = 492, 200kb resolution) by the polymer non-backtracking flow operator.Below the map the compartmental signal from the corresponding leading eigenvector of the polymer nonbacktracking flow matrix is shown.Inset: the full spectrum of the polymer flow for the same contact map.(c, d) Averaged profiles of the GC content (z-scores) plotted around the centers of the compartmental domains (active -red, inactive -blue) for the population of cells and for a pool of single cells.

Figure S5 :
Figure S5: Annotations of active (red) and inactive (blue) compartmental domains for three chromosomes (16, 3 and 13; resolution 200kb) of the cell 29749 by the polymer non-backtracking flow operator, polymer modularity, M. Newman's flow modularity, normalized Laplacian and adjacency.Below each map the compartmental signal from the leading eigenvector of the corresponding operator is provided.Hi-C data is taken from[30].