Self-organized criticality in cortical assemblies occurs in concurrent scale-free and small-world networks

The spontaneous activity of cortical networks is characterized by the emergence of different dynamic states. Although several attempts were accomplished to understand the origin of these dynamics, the underlying factors continue to be elusive. In this work, we specifically investigated the interplay between network topology and spontaneous dynamics within the framework of self-organized criticality (SOC). The obtained results support the hypothesis that the emergence of critical states occurs in specific complex network topologies. By combining multi-electrode recordings of spontaneous activity of in vitro cortical assemblies with theoretical models, we demonstrate that different ‘connectivity rules’ drive the network towards different dynamic states. In particular, scale-free architectures with different degree of small-worldness account better for the variability observed in experimental data, giving rise to different dynamic states. Moreover, in relationship with the balance between excitation and inhibition and percentage of inhibitory hubs, the simulated cortical networks fall in a critical regime.


Computational Model
As mentioned in the main text, the neuron model is based on the Izhikevich equations 1 . This model depends on four parameters, which allow to reproduce the spiking and bursting behavior of specific types of cortical neurons. From a mathematical point of view, the model is described by a two-dimensional system of ordinary differential equations 1 : = 0.04 2 + 5 + 140 − + (1) with the after-spike resetting conditions: In Equations (1)(2)(3), v is the membrane potential of the neuron, u is a membrane recovery variable which takes into account the activation of K + and inactivation of Na + channels.
Equation (4) displays the used values for the four parameters. In Eqs. (4), the first row relates to the excitatory (regular spiking, RS), while the second one to the inhibitory (fast spiking, FS) neurons. Although we used the class of RS and FS models, we introduced per each neuron a random variable ri (which spans from 0 to 1) in order to introduce a further variability in the neuron dynamics: a RS neuron is obtained if ri = 0, whereas if ri = 1, a bursting neuron is obtained. Such distribution is biased towards RS neuron.
To model the morphological network connectivity, we made extensive use of graph theory 2 .
Briefly, all graphs are composed by nodes which represent the neurons and edges which model the morphological connections among the neurons. If edges take into account the directionality of the connection (i.e., from a pre-to a post-synaptic neuron), the graph is named directed, otherwise it is called undirected. The structure of the graph is described by the adjacency matrix, a square symmetric matrix of size equal to the number of nodes N with binary entries. If the element aij = 1, a connection between the node j to i is present, otherwise aij = 0 means no connection. All the autoconnections (aii = 1) are avoided. Then, the value 1 of the non-zero aij elements are substituted with numbers representing the different synaptic weights drawn from a normal distribution. To model the dynamics of the network, each node of the graph is "replaced" by a neuron model, whose dynamics is simulated according to the Izhikevich equations (see Eqs. [1][2]. Graphs are topologically characterized by evaluating the path length (L), the clustering coefficient (C), and the connectivity degree (D).
A path is an ordered sequence of distinct edges and nodes that link a source node j to a target i: thus the path length L is the minimum number of distinct connections to reach i from j. The clustering coefficient Ci of a generic node i is evaluated as the ratio between the sum of the connections (with directionality) existing among the neighbors of node i and the sum of all nodes linked to i (without directionality). Ci lies between 0 and 1 and typically it is averaged over all nodes of the graph to obtain the mean clustering coefficient or the graph clustering coefficient.
The connectivity degree, or degree for short, Di of a node i is the sum of all the incoming and outcoming connections. By averaging over the entire graph, we obtained the so called average degree.
A graph can be created according to different algorithms and can assume different properties in terms of the aforementioned metrics. Here below, we provided a description of the architectures used to model the connectivity of in vitro cortical networks, namely random (RND) 3 , scale-free (SF) 4 , and small-world (SW) 5 .
Random Network. The model of RND network implemented in this work follows the original one devised by Erdõs and Renyi 3 . The fundamental assumption of random networks is that, despite the random placement of links, the correspondent graph is characterized by a uniform connection probability and a Poissonian degree distribution. The independent variables for building up a random graph are the number of nodes N and the total number of edges, with the condition that the minimum number of edges must be • 2 . In its original formulation, the random algorithm was characterized by an evolution process, where, starting from N isolated nodes, subsequent random edges were added 6 . In our implementation, a random graph is chosen at random from the set of all symmetric graphs with N nodes and m edges.
Scale-Free Network. In SF networks 4 the degree distribution follows a power-law: thus, if m is the number of edges which incident to a node (i.e. the connectivity degree), the power-law distribution is given by 7 : where in neuronal systems the characteristic exponent  usually lies between 1.3 (slice recordings [7]) and 2 (fMRI recordings [5]). To build a SF network, we made use of the algorithm described in 8 , that has been proved to be efficient in terms of computation when dealing with large-scale networks. Nodes are added one by one successively. For each added node, m edges are generated, 4 where m is the minimum node degree. The endpoints are selected from the nodes whose edges have already been created, with a bias towards high degree nodes.
Small-World Network. The term small-world (SW) was coined by Milgram 9 with reference to social networks in which a person reaches any other person with a relatively short number of links (i.e., the so called "six degrees of separation" problem 9 ). More recently, Watts and Strogatz 5 formalized and studied the features of this network topology. A SW graph lies in between a condition of regular lattice and randomness. In fact, by increasing the probability p of rewiring, the order of a regular lattice is disrupted, and when p = 1 a random graph is generated. Increasing the probability of rewiring, both the characteristic path length L and the clustering coefficient C decrease. In our case SW networks are obtained with a one-dimensional network made up of N neurons with connections between the k nearest and the next-nearest neighbors. Then, each link is rewired with a given probability p (i.e., shifting one end of the bond to a new node chosen at random from the whole system) with the constraint that no vertex can have a link with itself 5 . In the presented simulations, we set such a value to 0.5. It is worth noticing that similar results can be achieved when 0.4  p < 1.0.
If we consider RND, SF, and SW networks with the same average degree, the following relationships among the aforementioned metrics hold 6 : where the subscript "REG" indicates a regular network. In this topology, each node is only connected to the neighbors and has the same number of links which implies a high degree of clusterization.

Cell culture
Dissociated cortical neurons were extracted from rat embryos and plated onto 60 planar TiN/SiN micro-electrodes (30 µm diameter, 200 µm spaced) at the density of 5-8 × 10 4 cells/device, which means about 1200 -1400 cells/mm 2 (Fig. S1a). The procedure was approved by the European Neurons are maintained in culture dishes, each containing 1 ml of nutrient medium (i.e. serumfree Neurobasal medium supplemented with B27 and Glutamax-I) and placed in a humidified incubator having an atmosphere of 5% CO2 at 37 °C. Further details can be found in 10 .

Experimental setup and Electrophysiology
Electrophysiological activity was recorded during the third-fourth week in vitro to allow the maturation of the network 11 . The experimental set-up is based on the MEA60 System (Multi Channel Systems, MCS, Reutlingen, Germany) consisting of a mounting support with integrated 60 channels pre-and filter amplifier (gain 1200×) and a personal computer equipped with a PCI data acquisition board for real time signal monitoring and recording. The electrophysiological activity was recorded without any chemical or electrical stimulation (i.e., spontaneous activity). The recorded signals ranged from random spike activity to more complicated and synchronized burst signals, as depicted in Fig. S1b and c. Raw signals were recorded and sampled at 10 kHz, and data were then processed off-line by using a custom software to extract the spike trains (see Data Analysis).

Dataset
The dataset consists of n = 10 cortical cultures recorded for periods of at least 1 hour each, from 21 days in vitro (DIVs) to 35 DIVs.

Data analysis
Experimental data were processed off-line by using a custom software package 12  Spike detection. Extracellularly recorded spikes were detected using the PTSD (Precise Timing Spike Detection) algorithm 13 . Briefly, spike trains were built using three parameters: (1) a differential threshold set to 8 times the standard deviation of the baseline noise independently for each channel; (2) a peak lifetime period (set at 2 ms); (3) a refractory period (set at 1 ms). The data presented in the text were not spike sorted. This choice was made according to the fact that during a burst a global increase of the activity produces a fast sequence of spikes with different and overlapping shapes which make the sorting difficult and unreliable 14 .
Simulated data have been peak detected by means of a simple hard-threshold spike detection algorithm. In our simulations, we set the threshold value at 0 mV.
Burst detection. Both experimental and simulated data have been burst detected by means of the algorithms devised in 15 . Briefly, Detected bursts are sequences of spikes having an ISI smaller than a reference value (set at 100 ms in our experiments and simulations), and containing at least a minimum number of consecutive spikes (set at 5 spikes in our experiments and simulations).
For all the simulations, we first computed the mean firing rate (MFR), mean bursting rate (MBR), burst duration (BD), inter burst interval (IBI). Then, to characterize the interdependency between dynamics and connectivity, we focused on: (i) cross-correlation, to evaluate the degree of synchronization among the neurons; (ii) neuronal avalanches, to determine whether or not the system can be considered in critical state 16 .

Cross-correlation, coincidence index. The cross-correlation (CC) function 17 was built by
considering the spike trains (X and Y) of two neurons. It measures the frequency at which one cell fires (target) as a function of time, relative to the firing of a spike in another cell (reference) within a time frame around the spikes of the X train of ±T (T = 150 ms). Mathematically, CC is defined as: In Eq. (7),  is the time shift or time lag (set at 1 ms) and ts indicates the timing of an event in the X train. The corrected Cxy(τ) is obtained by means of a normalization procedure: each element of the array is divided by the square root of the product between the number of peaks in the X (NX in Eq. 7) and the Y (NY in Eq. 7) train 18 . The CC function was evaluated by considering all pairs of neurons in the network. From the CC function, we extracted the coincidence index (CI0) which is defined as the ratio of the integral of the CC function in a specified area (± 1ms) around the zero bin to the integral of the total area. This parameter allows to quantify the degree of synchronization of the neurons in the networks 10 .

Inter-event interval (IEI).
Considering the electrophysiological activity of the whole network, we derived the probability density of time intervals between successive spikes occurring at all the neurons, namely the inter-event interval (IEI) distribution 16 . Computing the average value of the IEI distribution, we obtained for every simulated network an estimate of the average time between two successive activations of any pair of neurons (Fig. S2)  Avalanche detection. Following the pioneer works of Beggs and Plenz 16,19 , a neuronal avalanche is defined as an event of widespread spontaneous electrical activity, preceded and followed by a silent period. Recordings are divided into time windows of duration Δt (called bins); inside each bin the spatial distribution of activity over the network represents a frame. A frame, which does not contain any spike, is called a blank frame. A neuron is active in a time bin Δt if, at least one spike is recorded, within that time window. Thus, a frame is active if, at least, one electrode is active.
Following these definitions, a neuronal avalanche is a continuous sequence of active frames, preceded and followed by at least one blank frame (Fig. S3a). Without loss of generality, the time window used to bin the spiking activity was adjusted for each simulation according to the signal's timescale (i.e. average inter-event interval (IEI)).
Avalanches are usually characterized by their size: in this work, we defined the size of an avalanche as the number of neurons (or electrodes when experimental recordings are considered) being active at least once within the avalanche. From the simulated spike trains, we computed neuronal avalanches and we estimated the probability density function (PDF) of avalanche sizes, as the relative frequency histogram, in which the height of each bar represents the proportion of avalanches of a given size. We also computed the complementary cumulative distribution function (CDF), which describes the probability that a random variable X with a given probability distribution is found at a value greater than or equal to x, (i.e., ( ) = Pr ( ≥ )). Figure S3. Sketches illustrating the algorithms used to detect neuronal avalanches and perform power-law fitting assessment. a) Avalanche detection algorithm. b) Fitting procedures used to assess the presence of a power-law distribution.

Model distributions
The statistically significance of the computed power-law distributions have been performed by following the procedures proposed by Clauset et al. 20 : after the evaluation of the goodness-of-fit, we compared the power-law model accuracy with alternative distribution models (namely power-law with exponential cut-off, exponential, and log-normal). Fig. S3b illustrates the aforementioned steps used to assess the presence of a power-law distribution. Here below, a description of the model distributions is provided.

Power-law.
A continuous power-law distribution is described by a PDF ( ) such that: where X is the observed value and C is a normalization constant.
In our case, x can only assume integer values (e.g. number of neurons), hence we must consider the corresponding discrete PDF of the form: In both cases, ( ) diverges as → 0, so one can identify a lower bound to the power-law regime. The normalizing constant C can be easily computed by solving ∫ ( ) +∞ = 1 (continuous case) or ∑ ( ) +∞ = = 1 (discrete case), providing > 1. In this study (and in general for empirical data), an upper bound is also present and is given by the finite system size, i.e. the number of electrodes in the recording array 21 . Therefore, will depend on all parameters , and .
Finally, we also considered the CDF, defined to be ( ) = Pr( ≥ ) = 1 − ∑ ( ) Exponential. The PDF for the exponential distribution, with parameter > 0 and normalizing constant , is Power-law with exponential cut-off. The power-law distribution with exponential cut-off is defined as: with ≥ 0 and > 1. (11) Log-normal. The PDF of the lognormal distribution is given by: (12) with scale parameter and location parameter ≥ 0. In this work, we applied such method by excluding from the fitting the first bin, corresponding to avalanches of unit-dimension and the last bins of the histogram, corresponding to avalanches whose probability is less than 1‰ of the maximum value of the distribution.

Maximum Likelihood Estimation.
Let x be the independent variable (e.g., avalanche sizes' occurrence frequencies) to which we wish to fit the power-law distribution reported in Eq. (13). A power-law distribution is described by the probability density function p(x), reported in Eq. (8).
In Eq. (9), if x  0 the probability density p(x) diverges: this means that not all the values of x  0 can be taken into account. Thus, it becomes necessary to define a lower bound (xmin) for the powerlaw behavior. The ML method estimates first such xmin value, and then, the scaling parameter  of Eq. (13) by means of the two following relationships in the case of continuous and discrete case 20 : where xi are the observed values of x which satisfy the condition xi  xmin.
To choose the best value of xmin, several strategies can be found in the literature 22 . The approach followed in this work is based on the work of Clauset and co-workers 23 . Practically, we consider the value of xmin that makes the probability distribution of the measured data and the best-fit power-law model as similar as possible above xmin. To measure the distance between the cumulative distribution function of the data and the fitted model, the Kolmogorov-Smirnov distance was used, since it can be used also for non-normal distributed data: In Eq. (15) S(x) and P(x) are respectively the cumulative distribution function of the observations under the condition x > xmin and the power-law model which best fits the data in the interval x > xmin.

Goodness-of-Fit.
After the estimation of the critical exponent and of the power-law fitting, we have to verify whether the power-law hypothesis is reasonable or not, given the data distribution. To this end, it becomes necessary the use of a goodness-of-fit test which generates a p-value that quantifies the likelihood of such hypothesis. This method works as follows: the fitted distribution is sampled to generate artificial data-sets; then the Kolmogorov-Smirnov distance between each data-set and the fitted distribution is evaluated, producing the distribution of Kolmogorov-Smirnov distances expected if the fitted distribution is the true distribution of the data. A p-value is calculated as the proportion of artificial data sets showing a poorer fit than fitting the observed data set. When this value is close to 1, the data set can be considered to be drawn from the fitted distribution, and if not, the hypothesis should be rejected. We chose a p-level equal to 0.1, meaning that the power-law hypothesis is ruled out if p  0.1 according to 20,24 .

Log-Likelihood Ratio Test (LRT). Although the Kolmogorov-Smirnov goodness-of-fit test
provides results in favor of a power-law distribution, we would like to compare it with other likely distributions. To this end, we used the log-likelihood ratio test (LRT) statistics 20 . Practically, the likelihoods of two competing distributions are evaluated and compared as follows: where L indicates the likelihood of a distribution. If LRT > 0 the power-law model is preferred, otherwise it should be rejected. However, the sign of LRT is not sufficient to assess whether the power-law is the best model to the data, because of possible statistical fluctuations. Following the method proposed by Vuong and colleagues 25 to test the significance of the LRT, we computed the pvalue: if it is small (p < 0.1) the power-law model can be considered as the best candidate to fit the experimental data. Fig. S3b summarizes the whole procedure.

Sub-critical distribution
The same procedure was applied to verify whether the cumulative distribution function of a subcritical dynamics is ruled by an exponential distribution function. Following the previously described approach, we proceeded as follows.
Let x be the independent variable (e.g., avalanche sizes' occurrence frequencies) to which we wish to fit the exponential distribution. An exponential distribution is described by the probability density function p(x) of the form: where Rx = [0, +), and 0 is the parameter to estimate with ML by means of the following equation: Equation (18) shows that the estimation of 0 corresponds to the reciprocal of the mean.
The procedure to evaluate the goodness-of-fit of the exponential distribution follows the same steps used for the power-law one, namely: i) calculation of the Kolmogorov-Smirnov statistics and likelihood ratios (see Eq. 15); ii) comparison of the exponential accuracy with alternative distribution models (power-law, exponential cut-off, and log-normal) with Eq. (16).

Excitatory/inhibitory links balance
The same percentages are approximately maintained by considering the total number of excitatory/inhibitory links for both SF and RND networks ( Fig. S4a and b, black-square lines). By sweeping the percentage of inhibitory neurons (5, 20, 40, 60, 70, 80%), also the percentage of inhibitory links follows the same proportionality rule (Fig. S4). Figure S4. Percentage of inhibitory links obtained by varying the percentage of inhibitory neurons for a) SF and b) RND networks of increasing average degree (from Net_1 to Net_9).

Functional networks resemble structural features
We tested whether the structural features of the simulated networks are preserved in the functional networks inferred by means of the cross-correlation algorithm (see Supplementary   Information, Data Analysis). For each realization of each SF and RND network, we applied the cross-correlation with time lag set to 1 ms and time frame set to 150 ms. After that, for each simulation we obtained an adjacency matrix containing the peak values of the cross-correlograms.
Following the approach proposed in 26 , we sorted all the statistically significant links based on the connection strengths, and we took into account only the ones that overcome an arbitrary threshold set as mean + 2 standard deviation of the non-zero values of the cross-correlogram peaks.
For the obtained thresholded adjacency matrices, the degree distributions were evaluated and they are presented in Fig. S5a and b for SF and RND networks, respectively. Each colored line with symbols shows the average degree distribution computed over 10 realizations. It is rather evident that for SF networks, the degree distribution can be fitted with a power-law (Fig. S5a) with an exponent lying between -1.81 and -1.23. Interestingly enough, these exponents are not far from the corresponding ones obtained for the structural degree distributions (cf. Fig. S5c). On the other hand, as expected, the degree distributions of RND networks display a quasi-gaussian distribution, although some fluctuations can be observed.
Coherently with the results found by comparing diffusion imaging and resting state functional magnetic resonance imaging (fMRI) that show a tight relationship between structural and functional connections [27][28][29] , also from our simulations we found that functional connections are predictive of the structural ones, and that the topological parameters of structural networks are qualitatively preserved in functional networks.

Distribution comparison
The plausibility of the power-law fitting can be estimated by comparing the fitting results with the ones obtained by fitting other similar data distributions, namely: truncated power-law, exponential and lognormal, by means of the LRT 20 (cf. Methods, Fitting procedures). Particularly, the exponential distribution is indicative of a sub-critical dynamic state; the log-normal distribution can be assimilated to a power-law because of its heavy-tail which makes it difficult to distinguish between the two; the truncated power-law distribution is indeed a power law distribution for finitesize systems (i.e., finite number of neurons). In the latter case, it is likely to observe an exponential cut-off above the system size. For each simulated network, Table S1 shows the mean values of the LRT and the corresponding p-level for the alternative distributions.
The first two networks (i.e., Net_1, Net_2) were not considered since the goodness of the corresponding power-law fitting was not sufficient (cf. Fig. 4e of the main text). Net_3 and Net_4, although displaying a good power-law fitting, are better described by a truncated power-law model, as witnessed by the negative value of LRT, see Table S1. Net_5 displays positive LRTs when comparing power-law fitting with alternative models, but since the associated p-values exceed 0.1 the plausibility for a power-law distribution is not significant. Finally, avalanche size distributions from Net_6 to Net_9, are best fit by a power-law than by any other tested distribution, so they can be considered as ruled by a power-law. These results support the hypothesis that criticality is achieved by SF networks having a relatively high average connectivity degree, which also display, in turn, consistent small-world features.

Synaptic weight distribution affects critical dynamics
In Fig. 6 of the main text, we explored whether criticality is affected by the synaptic weigth distribution, in particular by sweeping the mean value of the excitatory connections. In this section, two further analyses have been performed: first, we swept the standard deviation of the synaptic weight distribution (from 0.0 to 2.2) for all SF and RND networks, keeping constant the mean to the default value ( ̅ = 10); second, only for the SF networks, we swept simultaneously both the standard deviation and the mean of the synaptic weight distribution for each degree (i.e., Net_i), and we computed the GoF.
The false color map of Fig. S6 (organized as the correspondent Fig. 6 of the main text), displays for SF and RND networks the GoF of the power-law distribution ( Fig. S6a and b), the MFR ( Fig   S6a and b), and the percentage of active neurons ( Fig. S6e and f) as a function of the standard deviation of synaptic weight (y-axis) and the average degree (x-axis). A black dashed box highlights the standard deviation of the synaptic weights that was used in all previous simulations. We delimited with a solid black line, the critical regime area obtained by thresholding the GoF and performing the LRT. By sweeping the standard deviation of the synaptic weights, two main considerations can be done. For SF networks, we observe that below the default value of the standard deviation (i.e., s5), we cannot achieve criticality independently of the degree (i.e., Net_i).
The chosen value (s5) of the standard deviation guarantees also a good propagation of the electrophysiological activity of the network, as Fig. S6e shows: below s5, the percentage of active neurons is below 30%, while for values of standard deviation greater than s5, and with a medium/high level of degree (i.e., from Net_6), more than 80% of the neurons are active. Finally, as Figure S6c shows, we can see how the firing rate grows as a function of the standard deviation: the greater is the standard deviation of the synaptic weight distribution, the higher is the value of the achieved firing rate.  For RND networks, the increase of the standard deviation of the synaptic weights induces a wider region of criticality than the one obtained by sweeping the mean value ( Fig. 6b and S6b). However, also in this configuration, the obtained firing rate values present non-physiological values (Fig.   S6d), although smaller than the ones of Fig. 6d (up to 80 spikes/s). The percentage of active neurons in RND networks displays a similar trend of the SF ones (Fig. S6f).
Finally, we simulated the emergent dynamics of each SF network (i.e., from Net_1 to Net_9) by sweeping simultaneously both the mean value and the standard deviation of the synaptic weight distribution, accordingly to the values used in Fig. 6 (w0-w13) and Fig. S6 (s0-s11). Figure S7 shows in a 3D fashion the GoF (z-axis) as a function of the mean (y-axis) and standard deviation (xaxis) of the synaptic weight distribution. The green asterisk highlights the default values used for the mean (̂) and the standard deviation (). The white-square planes identify the GoF threshold p = 0.1. Values greater than 0.1 suggest a critical behavior (cf. Methods, Fitting procedures).
From these simulations, two considerations emerge: (i) the degree of connectivity of the network is fundamental to push the network toward a critical state. As an example, considering the case of Net_1 of Fig. S7, an increase of both the mean and the standard deviation of the synaptic weight distribution does not guarantee a wide critical area, which can be easily obtained in Net_6, … Net_9. (ii) Comparing the effect of the mean and standard deviation of the synaptic weight distribution, it results that the mean is the most sensible parameter to move the network towards a critical regime.

Pharmacological manipulation
We tested if, by modifying the inhibitory synaptic transmission efficacy, without changing the composition of the network (i.e., 30% of inhibitory neurons) and the topology of the graph (i.e., SF networks with the same average degree), the power-law regime is preserved or not. By reducing the efficacy of all inhibitory connections (i.e. 50% and 100% of the original value) to mimic the effect of bicuculline (BIC), a supercritical behavior is achieved (Fig. S8b, grey and dark grey bars) for all the considered degrees.
The simulated effect of BIC was also evaluated in terms of network synchronization by means of the CI0. Fig. S8a shows the trend of the CI0 for all the three simulated conditions (i.e. spontaneous activity and different levels of BIC). These modeling results show the same qualitative trend of the experimental data 10 , confirming that a supercritical state is typical of cultures with a high synchronization level, as induced by BIC stimulation. Figure S8. Effect of BIC on a) Coincidence Index, b) Goodness-of-Fit of the power-law model. Two levels of BIC concentration have been simulated in order to consider half-blocked (circle and grey bars) and fully-blocked (triangles and dark grey bars) inhibitory transmission.

Abbreviation List
Here below, we reported the abbreviations we used in both the main text and Supplementary