The economy of chromosomal distances in bacterial gene regulation

In the transcriptional regulatory network (TRN) of a bacterium, the nodes are genes and a directed edge represents the action of a transcription factor (TF), encoded by the source gene, on the target gene. It is a condensed representation of a large number of biological observations and facts. Nonrandom features of the network are structural evidence of requirements for a reliable systemic function. For the bacterium Escherichia coli we here investigate the (Euclidean) distances covered by the edges in the TRN when its nodes are embedded in the real space of the circular chromosome. Our work is motivated by ’wiring economy’ research in Computational Neuroscience and starts from two contradictory hypotheses: (1) TFs are predominantly employed for long-distance regulation, while local regulation is exerted by chromosomal structure, locally coordinated by the action of structural proteins. Hence long distances should often occur. (2) A large distance between the regulator gene and its target requires a higher expression level of the regulator gene due to longer reaching times and ensuing increased degradation (proteolysis) of the TF and hence will be evolutionarily reduced. Our analysis supports the latter hypothesis.

time, space and millions of cells. The two quantities are presumably proportional: X i = C i x i , where the multiplicative factor C i may depend on the TF i. We cannot infer the value of C i so the only choice is to assume a constant factor C. For the basic considerations presented here, we will thus write X i = Cx i with the same constant factor C for all source loci i. Deviations from this approximation can be expected to account for part of the noise in Figure 4 in the main text.
The second assumption is first-order kinetics, yielding an exponential decrease e −K i t of TF concentration along time. Presumably there is no degradation during 1D sliding (fine-tuning of the search process, passing from weak non-specific DNA binding to a strong binding to a specific recognition site), hence only the 3D search time has to be taken into account in quantifying TF proteolysis. The possibility of 1D sliding also decreases the dilution effect (irreversible loss of TFs) however by a roughly homogeneous factor. Overall, we will ignore the contribution of 1D sliding in this basic investigation of the relationship between x i and wiring length w i .
We thus investigate the relationship between X i ≈ Cx i and the average wiring length w i (distance in space between the source and target nodes, averaged over the target nodes). The initial TF amount x i experiences both dilution (TFs not reaching the target in a sufficiently short time) and degradation (exponential dependence on the reaching time T (w i )). In order that a proper regulation is achieved, the amount x i has to be increased to compensate both the degradation loss (factor e K i T (w i ) ) and the dilution loss.
For a simple deterministic flux, the dilution loss would be proportional to the surface πw 2 i of the sphere, so that the prediction becomes: In order to simplify the problem further, we assume that the degradation kinetics occurs at the same scale of all TFs, i.e. the order of magnitude of K i does not depend on i, and the dependence of K i on i could be ignored. We then need an estimate of the reaching time T (w).

Three options for estimating the reaching time T (w)
Our first approach is the naive estimate of a characteristic time based on the (possibly anomalous) diffusion behaviors of ln X at small wiring lengths (ln w dominates) and large wiring length (K w 2α dominates).
A second possibility is the full computation of the mean first passage time. Here it is unclear whether we should consider a motion in a confined space or not. A general behavior for normal diffusion in a (rectangular 3D) confined space is that the reaching time saturates at large distance, i.e. T (w) tends to T ∞ at large w. The behavior at large wiring lengths would then be X(w) ∼ w 2 .
As a third possibility, one can make use of the calculations of Pulkinnen and Metzler 1 . They provide an expression, Eq. (8) in 1 , of the local concentration Φ(t|r) of the TF in a neighborhood of its target gene, given that the TF experienced a transcriptional burst at t = 0 and that the distance between the TF and the target gene is r (i.e. r = w). This concentration displays a maximum along time, at a time T * (r) increasing with r, which provides a plausible expression for the reaching time T .
It should be emphasized again that any test of such detailed predictions of T (w) and the scaling of x i with w i would require gene expression data beyond the currently available RNA-seq averages.  We assess the quality of our randomization methods by selecting one of the randomized networks as a reference network (or 'base model') and then contrast it with its randomized versions. We generate randomized TRNs using the n th generated network as the input (reference) network to generate the (n + 1) st network. Then, we build CRN of each randomized TRN to check the consistency of the randomized CRNs. Both generated random TRNs and CRNs show similar wiring length distributions with the ones observed in Figure 3 in the main text with z-scores −15.40 on TRN level and −33.83 on CRN level.