Box scaling as a proxy of finite size correlations

The scaling of correlations as a function of size provides important hints to understand critical phenomena on a variety of systems. Its study in biological structures offers two challenges: usually they are not of infinite size, and, in the majority of cases, dimensions can not be varied at will. Here we discuss how finite-size scaling can be approximated in an experimental system of fixed and relatively small extent, by computing correlations inside of a reduced field of view of various widths (we will refer to this procedure as “box-scaling”). A relation among the size of the field of view, and measured correlation length, is derived at, and away from, the critical regime. Numerical simulations of a neuronal network, as well as the ferromagnetic 2D Ising model, are used to verify such approximations. Numerical results support the validity of the heuristic approach, which should be useful to characterize relevant aspects of critical phenomena in biological systems.

www.nature.com/scientificreports/ The article is organized as follows. First the connected correlation function (CCF) is defined. Then the CCF is studied for a neuronal network model under two scenarios: in the first one we proceed in the standard manner, increasing the system size and determining its correlation behavior. In the second setting, the CCF is examined using a fixed system size (relatively large) while varying the size of the field of view (changing the observable window size). Motivated by currently available experiments on neuronal dynamics, we focus here on the case where the window is much smaller than the system size. After that, similar calculations are described in the ferromagnetic 2D Ising model. Finally, the collapse of all the CCF's after rescaling is demonstrated for the two models. The article concludes with a discussion on the potential application of box-scaling to biological data, and a summary of the main results.

Connected correlation function
The connected correlation function measures how a local quantity loses spatial correlation as distance is increased 25 . Two important quantities are usually extracted from this function, the decay with distance and the correlation length. Following previous work 4,7,8,10,12,17,21 we compute it as, where δ(r − r ij ) is a smoothed Dirac δ function (we have taken δ(x) = H(x + 0.5)H(0.5 − x) , where H is the Heaviside function), selecting pairs of signals (i.e., pairs of spin values, neuron states, or some other position dependent variable) at mutual distance r; r ij is the Euclidean distance from the site i to site j; δv i is the value of the signal v i of site i at time t, after subtracting the overall mean of signals V (t) = (1/N) N i v i (t) , i.e., δv i (t) = v i (t) − V (t) ; 1 c 0 is a normalization factor to ensure that C(r = 0) = 1 . Notice that, potential spurious long-range correlations introduced by common external fields or perturbations, are not entering in C(r) of Eq. (1) since V(t) is the instantaneous space average. This also implies that the correlation length can be directly calculated as the zero crossing of the function.
Equation (1) is defined as the space-averaged connected correlation function, which differs from the timeaveraged connected correlation function used to describe correlations between observables measured at different points in space, where v i represents an average of v i over sufficiently long times (see e.g. 26 ). In this case, the values of the correlation length and the spatial decay are determined by regression and fitting 27 . Thus, there is a subtle but important difference between Eqs. (1) and (2): the fluctuation δv i in C(r) is the difference between the local value of the signal v i and its instantaneous average over all the sample V at time t, while in C time , fluctuations are computed with respect to time average v i . It is straightforward to demonstrate (at least for equilibrium systems, for long time series, without external perturbations) the equivalence between both quantities 10 , i.e., they only differ by a constant. Thus, for experimental purposes averaged connected correlation function, Eq. (1) is preferred. To our knowledge, it was first used in 28 , in the context of bird flocks, and was later used in 4,7,8,12,17,22 , among others. Further details can be consulted in 10 .
The correlation length ξ measures the scale at which two points start to become uncorrelated. In a critical system the correlation length is infinite, meaning that the decay of the correlation lacks a characteristic scale. In this case, the position of the zero of C(r) in Eq. (1) is a useful length scale, because then it increases proportionally to the system size L (the CCF defined with instantaneous space averages subtracted must have a zero 10 ). This functional dependence, attesting scale invariance, suggests the presence of critical dynamics. However, it can not be used when the system size is fixed, as in the case of brain networks. Instead, we can subdivide the system in boxes of side W, (with W < L ), and define a characteristic scale r 0 by the first zero of C W (r) , where C W (r) is the CCF, Eq. (1), computed for partial regions of the entire system, of size W. Thus, the implementation follows the same logic and limitations than the box-counting algorithm commonly used to compute the fractal dimension of a data set, image or object 24 .
The hypothesis tested here is that the behavior of r 0 when W is varied with L fixed is the same as would be obtained with W = L and varying L, at least when W ≪ L , namely This behavior can be justified for physical systems in equilibrium by extending the arguments of (Sect. 2.3.3 of Ref. 10) to the box-scaling case (see Supplemental Material). In the cases we show next, we find that the relations in (4) hold for all W up to W ≃ L/2.
In the following, we will study the scaling behavior of the characteristic length r 0 in two models: a 2D neuronal network, and the 2D ferromagnetic Ising model. In all simulations, C(r) and C W (r) were measured for all integer values of r. Subsequently, the smallest value of r for which C(r) [ C W (r) ] is negative, r m , was computed, and r 0 was found as the zero crossing of the linear fit of C(r) [ C W (r) ] between r = r m − 1 and r = r m . (1)

2D Neuronal network model
We study a neuronal network described previously 29,30 . In short, the model is a cellular automata network on a square lattice, in which each neuron can be in one of three states at each time step: 0 for resting, 1 for active (lasting one time step), and 2 for refractory (lasting two time steps). Each neuron connects to K other neurons (here, K = 16 always), in which Euclidean nearest neighbor neurons are favored by an exponential decaying function of the distance r between them (fixed here to r d = 5 ). To ease the interpretation we imposed a cutoff in the interaction probability by preventing neurons to be connected at distances r greater than a given value (called here interaction length I, fixed here at I = 20).
The network overall rate of activity is set by a very small ( h = 10 −7 step −1 ) independent Poisson perturbation to each neuron. We have verified that the results are robust over a wide range of h values (e.g., h = 10 −9 to 10 −4 step −1 ). The model includes a control parameter σ = K × T , in which T is the probability that an active neuron (i.e., in state 1) can excite one of the K neighbors that are connected to it. Therefore, as shown previously 29 , for any given value of K, the model can be made critical by changing the transmission probability T such that σ ≃ 1 . We study two different scenarios: in the first we compute the CCF of the neuronal activities collected from systems of increasing sizes, from L = 40 up to L = 640 . In the second case, a system of (fixed) large size ( L = 1000 , i.e., L × L neurons) was simulated, from which the activity of each neuron inside of windows of sizes smaller than L (from W = 40 up to W = 640 ) were extracted for the correlation analysis. Since we are interested on situations where system size, although finite, is much larger than window size, we considered periodic boundary conditions. Results presented below correspond to averages of twenty or more realizations lasting 2.5 × 10 4 steps for each parameter value.
There are four scale lengths to consider in this model: The first is the interaction length I, it is the scale at which neurons interact via direct connections. The second is the system size L. The third is the size of the observation window ( W < L ) which determines the subset of neurons selected to compute the CCF. The last scale is the characteristic length r 0 which will be determined from the CCF. Figure 1 shows the connected correlation functions computed for various system and window sizes, considering v = 1 in Eq. (1), if neuron is active or refractory, and 0 otherwise, and three values of the control parameter σ . Computation of Eq. (1) was used for all pairs of neurons, taking one snapshot every 100 time steps, on the last 80% of the data (i.e., after 5000 equilibration steps). Only one window of each size was considered at each snapshot. Similar results can be found if Eq. (1) is calculated only for pairs of neurons on the same line, taking one snapshot at each time step (not shown). The results correspond to representative values of the control parameter: www.nature.com/scientificreports/ sub-critical σ = 0.64 , super-critical ( σ = 1.6 ) and critical ( σ ∼ 1 ) as indicated in the legend. Top panels (a, b and c) correspond to computations from increasing system sizes. Bottom panels (d, e and f) correspond to CCF computed using various window sizes from a system of size L = 1000 . It can be seen that the functions obtained when changing system size L or changing window size W are qualitatively very similar. In particular we note that, as expected from (4) The values of r 0 extracted from the curves in Fig. 1 (as well as two other σ values) are plotted in Fig. 2. Here the same data is presented using different axis formats to visualize the different functional dependency near and away the critical point. Dashed lines in panels b and e are a guide to the eye illustrating the expected logarithmic behavior of r 0 for sub-critical and super-critical regimes (open circles). The dashed lines in panels a and d denote the linear dependence expected for r 0 (filled circles) in the critical regime. Finally, the same data is plotted in log-log axis in panels c and f to reveal the crossover behavior for W values close to and smaller than the interaction length ( I = 20 ), denoted by the deviation from the asymptotic linear dependency for large W. Also it is worth to notice the small deviation of the linear scaling observed at criticality when the value of W approaches the system size L ( W = 640 in panel d of Fig. 2). This deviation is expected from the theory, as discussed in the Supp. Material. In brief, r 0 depends on both W and L. If W ≪ L , then r/L ≪ 1 and the dependance of r 0 on L does not need to be considered. On the other hand, for small L, r 0 (W, L) grows with L, for fixed W, having it's minimum for W = L.
Overall, these results show that the scaling of the characteristic length r 0 follows a similar functional dependence with either the box-scaling or the system-size.

2D ferromagnetic Ising model
The results obtained from the neuronal model were replicated in numerical simulations of the ferromagnetic 2D Ising model on a square lattice, using periodic boundary conditions, with Hamiltonian H = − �i,j� s i s j , where s i = ±1 and i, j stands for sum over nearest-neighbors. Similar to the previous model, the simulations used  Figure 3 shows representative results for the two scenarios at three different temperatures: sub-critical ( T = 2.0 , Panels a and d), critical ( T = 2.27 , Panels b and e) and super-critical ( T = 3.0 , Panels c and f). The top panels represent results computed for increasing system sizes and the bottom panels for a fixed lattice size and various window sizes. Note that, as already seen in the simulations of the neuronal model, the computation of the CCF by changing system size L or by changing window size W produces very similar results.
The dependence of r 0 with system and window size is shown in Fig. 4 using the same format as in Fig. 2 for the neuronal model. It is clear that the results obtained from varying the system or the window size show a striking similarity, suggesting that for this system the approximation is also valid. Notice the small deviation from linearity observed at criticality for W = 512 in panel d of Fig. 4 which is similar to that exhibited by the neuronal model for W sizes near the value of L.

CCF rescaling and data collapse
The box-scaling analysis presented here can be considered as a variant of the finite-size scaling (FSS) method 15,19,20,31,32 used extensively in a broad range of situations, including dynamic parameters such as viscosity or coalescence point (see for instance 33,34 ), or time decaying functions (i.e., dynamic scaling 35,36 ).
FSS often allows to expose the scale invariance observed close to a critical point by the successful collapse of some observable, using the system size, the correlation length, and some universal critical exponents 20 . In analogy with FSS, using box-scaling we show in Fig. 5, the collapse of c 0 C W (r)W η as a function of r/r 0 for both Ising and neuronal model (see the Supp. Material for a derivation). For the Ising model, we find that the best collapse is for η ≃ 0.25 in accordance with results for standard FSS collapse.
For the neuronal model, we lack of any previous estimate of η . Empirically, we found that η ≃ 0.4 − 0.7 gives good collapse results for C W (r) in the range of r > I . For distance values r < I collapse of C W (r) is not expected, since within this range the correlations are dominated by the interactions rather than by the collective   www.nature.com/scientificreports/ phenomena. Finally, results for W = 640 were not considered since the condition W ≪ L is not fulfilled. The critical exponent values used to rescale were chosen based on the minimization of the relative collapse error (Fig. 5c,d), defined in the Supp. Material. Notice that for both cases, the error is two orders of magnitude smaller than the observable values. While it is not the main focus of this work, scaling and collapse can be demonstrated for other observables, including magnetization |�m�| , where m(t) = 1 W 2 i s i (t) , and its related susceptibility, χ = W 2 T [�m 2 � − �m� 2 ] in the Ising model. Box-scaling data can be collapsed using the same values of critical exponents ν = 1 , β = 1/8 and γ = 7/4 used in FSS, just replacing system size L by window size W (see Supp. Material). Analogous results can be obtained for the neuronal model (see Supp. Material). It need to be noted that the formal connection between the box-scaling studied here and FSS could be traced back to Binder' work 23 who described spin distributions, fluctuations and other observables in the equilibrium Ising system as a function of window size.

Discussion
We presented results for a correlation function well suited for out of equilibrium experiments. We have studied the behaviour of the CCF as a window length is varied for two models: the Ising model, which represents an equilibrium material, and it's dynamics are governed by a Hamiltonian, and the neuronal network model, which represents (a piece of) living matter, whose elements are cellular automatons, and their dynamics may be modified by external perturbations. Both models present critical dynamics and in both cases, box-scaling reproduces finite-size scaling results. The most relevant difference between them, in the context of box-scaling, is that in the Ising model, interaction length is much smaller than any considered window, while it is smaller but comparable to smallest windows for the neuronal model. This difference is noticed, for instance, from the bending of r 0 v.s. W in Fig. 2.
Regarding the feasibility of box-scaling on experimental setups, where system size cannot easily be varied, it needs to be noticed that the method is affected by the same limitations as other approaches for biological data: inhomogeneities, non-stationarity, and, in most cases, finite length of the data. However, we should stress that this approach has already been shown to be practical in some settings 4,7,8,12,22 . In Supp. Material (see Supp. Fig. 4) we reproduce results for characteristic length as a function of window size from already published 12 human fMRI data (where the already mentioned limitations are present), and we find that critical dynamics is unambiguiously distinguished from sub or super critical, using the same scaling as in Fig. 2d-f of Fig. 4d-f.
Relatively small windows may be affected by inhomogeneities such as the local fluctuations in the number of elements (i.e., neurons). This effect is present on a variety of situations whenever a portion of a system is assumed to represent the whole. While it may not be easy to account for all inhomogeneity effects, it is straightforward to estimate it's contribution. For instance, several small non-overlapping windows may be taken from the largest available window, and box-scaling analysis can be performed on each of them, measuring to what extent the result on one of them is representative of the whole group.
Regarding non-stationarity, it should be stressed that box-scaling computes the space-averaged connected correlation function, where the instantaneous average is subtracted to each signal. This correlation function differs from the more frequently used time averaged correlation function, or the Pearson correlation (where time averages are subtracted). Thus the calculation can be performed independently on different snapshots, and box-scaling analysis can be performed as a function of time.
Indeed, box-scaling does not require a large amount of data: as an example, we have successfully reproduced the results for the Ising model using 1% of the data shown in Figs. 3 and 4 (see Supp. Material), entailing few hundreds of samples. The reason behind this is that system state is not derived by a single value of r 0 (which may be noisy in some experimental conditions), but it is derived from the whole r 0 (W) curve, which shows a linear/ logarithmic dependence for systems at/away from criticality. Moreover, the scale-invariance typical of criticality was well captured by the data collapse after rescaling demonstrated in Fig. 5, remarking that at odds with zerocrossing, the collapse takes information from all recorded data points.
Finally, we would like to point out that an analysis similar to box-scaling method has recently been proposed in the context of scale-free networks 37 . There, a scaling hypothesis for finite size scale-free networks is proposed and tested over several naturally occurring networks.
In summary, the results obtained from a neuronal network model and from the ferromagnetic 2D Ising model show that the finite-size scaling of the correlation length ξ can be approximated-near the small-size limit-by the dependence of the characteristic length r 0 on window size. Results are derived from CCF where instantaneous averages are considered, in such way that the result is insensitive to inhomogeneous distribution of signals or external perturbations on the whole window. The results are particularly relevant at the experimental level in neuroscience, in which techniques to map different areas of the brain cortex are now available 38 , while changing system size is not feasible. In that direction, the present analysis is fully consistent with the experimental observations being reported using optogenetic tools 21 and also multi-electrode arrays 22 . It should also be straightforward to apply box-scaling to in vitro results, where both system and window sizes may be changed 39 .