Evaluating the Small-World-Ness of a Sampled Network: Functional Connectivity of Entorhinal-Hippocampal Circuitry

The amount of publicly accessible experimental data has gradually increased in recent years, which makes it possible to reconsider many longstanding questions in neuroscience. In this paper, an efficient framework is presented for reconstructing functional connectivity using experimental spike-train data. A modified generalized linear model (GLM) with L1-norm penalty was used to investigate 10 datasets. These datasets contain spike-train data collected from the entorhinal-hippocampal region in the brains of rats performing different tasks. The analysis shows that entorhinal-hippocampal network of well-trained rats demonstrated significant small-world features. It is found that the connectivity structure generated by distance-dependent models is responsible for the observed small-world features of the reconstructed networks. The models are utilized to simulate a subset of units recorded from a large biological neural network using multiple electrodes. Two metrics for quantifying the small-world-ness both suggest that the reconstructed network from the sampled nodes estimates a more prominent small-world-ness feature than that of the original unknown network when the number of recorded neurons is small. Finally, this study shows that it is feasible to adjust the estimated small-world-ness results based on the number of neurons recorded to provide a more accurate reference of the network property.


Results
Model Estimation. This study investigates the relationship among neurons with firing rates higher than 0.5 Hz. The K-S plot and the prediction accuracy provide us with parameters for model verification. The parameters selected as significant were used to reconstruct the neural network model. Figure 1 presents the reconstructed functional network from the Rat #3 (ec016) recorded for 28 minutes in session 437 (ec016.437). Before the recording, the rat had trained on the wheel task 8 times. Our reconstructed functional network contains 36 neurons across 4 regions in entorhinal-hippocampual area and, in each region, the neurons are nearly fully connected, which shows a strong regional small-world feature. Although the structural (anatomical) connectivity at hippocampus is relatively simpler than most of the other brain regions, the functional connectivity of Minimum Input Nodes to Control Network. As shown in Fig. 3, when the size of the reconstructed network is small, one can find a perfect matching in the directed graph, which means only one node in the network can control the dynamics of the whole network. As more neurons become involved in the network, more input nodes (driver nodes) need to be controlled. More complex network require more driver nodes to control the whole network in entorhinal-hippocampal regions of rats.  Simulating the Topology of a Biological Neural Network. The actual synaptic connections between two neurons were shown to depend on their physical distance 9,14 , which leads us to use two distance-dependent models to simulate a neural network and make comparisons between them. Our results show that this idea can help form small-world properties very well: (1) small average path lengths and (2) large average clustering coefficients, which meet with experimental data and our interest in analyzing the small-world features of the sampled network compared to the original unknown network. Model #1 is Here, d is the distance between two neurons. The parameters α and λ are in (0, 1). If α is 1, the neighbours are certainly connected by edges with length 1. If α < 1, the global connection depends on both parameters as shown in Fig. 4(a,b). As λ increases from zero, the expected number of long-range associations increases and α determines the initial states of the probability of connections among neighbors, with negligible influence on the longest range to establish connections. One significant difference between these two models is the final state, which is the probability of connection at longer range. Figure 4(c,d) shows the effect of varying the parameters of model #2. We set γ equals to 1. When β increases, the final state is decreasing but it is larger than 0. Model #2 provides a condition when the range between two neurons is long, they can form a connection with a higher probability when compared with model #1. Model #2 works to change the initial states of connections by adjusting γ and gives a big tolerance to a long-range connection formation. The biases P 0 and P 1 in these two models are used to adjust the performance of the outputs, and without loss of generality, we set them to 0 in the following discussion.
We simulated a biological neural network based on the above distance-dependent probability models in a three-dimensional space, which reflects the actual relationship better than one-dimensional or two-dimensional grids that neglected unique spatial properties of entorhinal-hippocampal structure. Although one or two-dimensional grids is convenient mathematical derivation of graph properties, intensive simulations of three-dimensional model would reveal more realistic graph properties of biological neural networks. Six electrodes were randomly inserted into this space to measure neurons extracelluarly. The number of electrodes inserted in the experimental datasets is different across the sessions. An average number of electrodes (six) was included in the simulations. Only neurons close to the electrodes would be sampled.
Performance of Two Metrics for Describing Small-world-ness. The distance-dependent probability model #1 and #2 both indicate that the longer distance between two neurons lowers the connection probability. In each simulation, 500 randomly distributed neurons were generated as test nodes. A distance threshold l was set. When the distance between two neurons is larger than l, they have a probability P out to form a connection; otherwise, they are connected with a probability P in . Normally, P in > P out . Figure 5 shows the influence of different sets of P in and P out on PL and CC . When the sum of these probabilities are larger, PL is smaller and CC is higher. Note that when P in is increased in the first 5 groups, the slopes of PL and CC are smaller than that when increasing P out in the last 3 groups. Then a linear model is used to fit PL and CC using both P in and P out respectively. The results could be described by  Here, R 2 are 0.9994 and 0.9897 respectively, which show that PL and CC have strong linear relationships with the two connection probabilities, P in and P out . We have simulated 50 random graphs adjusting the local region of P in , which can accurately represent the PL and CC of those derived from experimental data as sum-    Table 1. We used genetic algorithm to perform global optimization to minimize the differences of patterns ( PL and CC ) between experimental and simulated networks, which does not rely on the initial values of P in and P out . As shown in Table 2, our model is sufficient for providing us with similar properties as experimental data, especially when the size of network is large (N ≈ 50).
Without loss of generality, we selected P in = 0.6 and P out = 0.1 to analyze the performance of the two small-world-ness metrics. As shown in Fig. 6, the metrics S w of ON and SN are both larger than 1, which indicates that ON and SN are small-world networks. S w of ON is significantly different from the base value 1, with P-value (P) of t-test close to 0, lower than the significance level 0.05. For S w of SN, compared with base value 1, P ≈ 0, and compared with S w of ON, P = 0.00014. For ⁎ S w of ON, compared with base value 0, P = 0. When compared with base value 0, P = 0.05 for ⁎ S w of SN, and compared with ⁎ S w of ON, P ≈ 0. Yet, the small-world-ness is overestimated. The metric ⁎ S w provides more information about the network properties. When ⁎ S w of ON is 0.49, it indicates that ON is a weak small-world network. Actually, the S w of ON is 1.32, slightly larger than 1, so we can not consider it as a strong small-world network. However, S w can not convey information clearly as there is no limit on the values taken by S w . The metric ⁎ S w of SN is 0.05, so it is more like a small-world network as compared with ON. This case shows that, based on the distance-dependent probability model, ⁎ S w conveys much more information, and can distinguish the properties of different networks clearly.
The values displayed in Fig. 7 are the calculated values of averaging 50 realizations with the 6 electrodes randomly inserted in a three-dimensional space. Then, we found that the standard deviance of S w and ⁎ S w for ON is 0.0013 and 0.0005, while the SN is 0.63 and 0.25, respectively, indicating that S w is very sensitive to the positions of electrodes when the network size is small. In real experiments, the electrodes are inserted without much prior knowledge to aid the estimation of the small-world-ness, so a consistent metric is preferred to describe this property and then justify whether we overestimate or underestimate the small-world features. In Fig. 7, we list the details of 50 trials, which show the differences of small-world-ness for SN and ON. S w varies a lot in different trails. However, the performance of ⁎ S w is consistent over 50 trials. It shows that we consistently overestimated the small-world-ness when using SN except for one case.
In reality, only the neurons distributed around each electrode can be sampled in experiments. Hence, the ability of each electrode to detect electrical signal determines the number of sampled neurons. We simulated different P in and P out connection probabilities to see how the two small-world-ness metrics are changed with the increase in the percentage of sampled neurons. In the three-dimensional space, we inserted six electrodes to  Table 2. Simulated small world patterns. Figure 6. Small-world-ness of the original and the sampled networks described by two metrics. (a) S w larger than '1' indicates it is a small-world network; (b) ⁎ S w close to '0' indicates it is a small-world network, close to '1' or '− 1' presents it is more like a random or lattice network. sample neurons around them. We set a threshold l′ , and only the neurons distributed within the range l′ of the electrode can be sampled. By enlarging this threshold, more neurons can be recorded. Figure 8, (a) − ( f ) suggest that, when the percentage of sampled neurons is increasing, the ⁎ S w of SN is always smaller than that of ON. This demonstrates that ⁎ S w overestimates the small-world-ness of ON based on SN. Figure 8, (g) − (l) indicate that when the percentage of sampled neurons is below 20% (100 neurons), the S w of SN is smaller than that of ON, but it would quickly increase to the peak, giving an indication of overestimating the small-world-ness. However, when the number of sampled neurons increases, the S w of SN is close to that of ON. Figure 8 also shows that ⁎ S w can give a consistent overestimation in the small-world-ness feature from SN. However, it converges to the small-world-ness of ON with higher sampling percentage. On the other hand, S w is unable to provide us with a consistent result when the number of sampled neurons is small. Yet, S w converges to the actual small-world-ness more quickly than ⁎ S w .
Estimation of Actual Small-world-ness Based on Sampled Topology. As the ⁎ S w gives us a consistent and promising method to describe the small world feature of the network in multi-electrode sampling case, we use it as a quantitative method to evaluate the small-world-ness. To mimic experimental situation, we should estimate the number of neurons in the measured region. The region is approximate to a cuboid. The overall average density of the brain is around 9.2 × 10 4 neurons/mm 3 and 7.2 × 10 8 synapses/mm 3 15 . The edge density is therefore around 8.5%. Here we have 4 shanks with intershank distance around 200 μm, and the electrodes measured approximately a column in 50 μm radius. Thus, we can calculate the length of the piece of tissue measured to be around 780 μm, the width is approximately 320 μm, and the height of tissue measured is around 100 μm. The approximate number of neurons is around 2296. If a tighter tissue measurement is adopted, less neurons will be included in the regions measured. We simulated different sizes of networks (N = 500, 1000, 1500, 2000, 2500) with various combinations of P in and P out , and then sampled different numbers of neurons from networks to find the relationship between the small-world-ness of the original network and that of the sampled network. The distance-dependent model configuration is straightforward to simulate networks with similar properties as biological networks, such as edge density, and is also convenient to form small world networks. The combination of P in and P out ensures that we can simulate a variety of edge density of brain network. We take P in = 0.25 and P out = 0.05 to get the edge density about 8.32% (≈ 8.5%) as an example in Fig. 9.
The percentage error (σ) of our simulated network is calculated by the small-world-ness of original network ⁎ S w and sampled network ⁎ S w using In Fig. 9, we intend to find the relationship between percentage error and the number of sampled neurons no matter what the size of original network is. Two-term exponential function was selected to fit our simulated data. The general model of our fitted function is  Thus, ⁎ S w of sampled network could be adjusted by equation (9) to infer actual small-world-ness. By exploring edge density probabilities from 5.8% to 23%, which is consistent with estimation of cortical connectivity from pair-wise cell recordings 16 , the corresponding range of adjustment can be identified for reference.
In Table 3, the results are adjusted accordingly, and the small-world networks identified remain to be significant.

Discussion
The reconstructed networks in this study show that the neural network in the entorhinal-hippocampal region of well-trained rats demonstrate small-world features, which are consistent with some previous studies 17,18 . Such small-world network has been observed not only at the regional level but also at the neuronal level 16,18 . However, few previous studies have been conducted on constructing networks based on multi-neuron recordings of local circuits during behavioral tasks. We focused on the entorhinal-hippocampal region at the level of cells when a rat was performing a task for several rats with different tasks. Our results on small-world properties of networks reconstructed from individual neurons are consistent with previous studies 17 . However, while their methods are suitable for reconstructing small networks, sparse GLM is more desirable when estimating larger numbers of parameters if more neurons are involved in the network. Furthermore, complex network studies, which investigate whether the properties derived from the sampled network could accurately represent the original network accurately at the neuronal level, are lacking. We have, therefore, compared two popular metrics estimating small-world-ness based on a multi-electrode sampled network. Finally, we estimated the true small-world-ness of the original unknown network based on the number of neurons recorded. Our results also show that, when the network size is small, it can be controlled with a few nodes, where these networks are homogeneous and dense. When the network size is increasing, it becomes much more difficult to control the whole network.
We have also compared the similarities and the differences of neural networks at different training stages. As shown in Table 1, Rat #1 did the same task Mwheel for 10 times, the size of reconstructed network is similar. Although the placement of electrodes and training time are the same for Rat #2 for example, we found that for different tasks, the size of the reconstructed network is quite different. Furthermore, when the training time increases for the same task like Rat #4, more neurons and more connections are involved in the functional network, which is in agreement with previous studies 19 . The results show us that when the rats conducted different tasks, their brain functional networks showed significant small-world features. However, different tasks or different training stages influenced the size of reconstructed network. With the further development of statistical models for network reconstructions in the future, we hope to be able to understand more about functional anatomy of cell assemblies. Instead of thresholding a binary matrix as in our case, the model should work directly with both directed and weighted networks. Our sparse kernel GLM to reconstruct functional connectivity provides a topology for analysis of other network properties such as the relationship between neuronal firing patterns and community modules. When the size of a reconstructed neural network increases, detecting communities can provide valuable information for understanding biological systems 20 .
We have observed that the graph theory has been commonly used to analyze large-scale networks such as different brain regions in neuroscience, but very few studies analyzed networks at the neuronal level. Also, only limited portion of neurons could be recorded by multi-electrode techniques. When the original network is sparsely sampled, the average path lengths and average clustering coefficients are quite different from that of the actual network. Therefore, we have utilized several quantitative methods to evaluate the small-world-ness. In this paper, we test the performance of two commonly used small-world-ness metrics based on a widely used subsampling scheme. After intensive simulations based on the distance-dependent probability model, a quantifiable relationship between the error in the estimated small-world-ness and the number of sampled neurons has been identified. Along the way, we have found possible adjustments to the small-world-ness measures to reflect the actual network properties from sampled data. Such rules were then applied to adjust the small-world-ness  Table 3. Small-world-ness adjustment on 10 sessions. ⁎ S w Small-world-ness of sampled network. ⁎ S w Smallworld-ness of original network when the edge density is from 5.8% to 23%.
measures of the reconstructed microcircuit network in the earlier analysis of experimental data. In the range around average edge density observed in the brain, the corrected measures indicate the EC-CA1 circuitry is indeed a small-world network. Our results show that one can use the number of sampled neurons to estimate a more accurate small-world-ness compared to that of sub-graph, which could be applied to the studies of other small-world networks when there are only sampled data available.

Methods
Experimental Data. The datasets, curated from CRCNS 1 , contain multi-unit recordings from the entorhinal-hippocampal regions of 11 rats when the animals were conducting different behavioral tasks. The data was obtained from 442 recording sessions, and in each session the animal finished one of 14 behavioral tasks. The details of analyzed sessions are shown in Table 4. The selected sessions contain 4 different rats doing 4 tasks and the number of recorded neurons number ranges from 25 to 105. The neurons were recorded from 5 different brain regions. The selected sessions can help us do comparative studies. The neurons with firing rates less than 0.5 Hz and isolated ones in final reconstructed network are left out.
Generalized Linear Model. Spike trains can be viewed as point processes 21 . The conditional intensity function λ ( ( )) t H t based on the firing probability H(t) in the past can generate an instantaneous firing rate, which can be explained as where N(t) denotes the number of events which occurred within the time interval [0, t] 22 . We take each neuron's spiking activity as the output, and the spike trains from all the other neurons as the input, to establish a GLM. The estimated coefficients can be regarded as coupling strengths between the output neuron and input neurons. One can estimate the coefficients using the GLM as where k 0 is a scalar zeroth-order kernel function, and k (n) are first-order kernel functions describing the relationship between the output neuron i. ' s spike probability λ i (t) and the n-th input x n (t). Without loss of generality, the first-order model will be adopted for demonstration in this session. Here, is a known link function. Since the firing rate of neuron i obeys a Gaussian distribution, the equation (11) can be written as Global Basis Functions. From equation (11), it can be seen that the number of parameters to estimate is N × (M + 1) + 1. Thus, if a long memory of the neuron is taken into consideration, meaning M is large, too many estimated parameters need much computational efficiency. A common method to reduce the number of model coefficients is the functional expansion technique, which first decomposes the kernel functions k (n) (τ) into  where ∈ X R N is the state vector, ∈ u R M is the input vector, ∈ × A R N N is the state matrix, and ∈ × B R N M is the input matrix. The element a ij in the state matrix A is 0, if there is no link from node j to node i. Because the parameters in both A and B matrices are either independent free ones or zero, A and B are called structured matrices. The structured properties show that in a real system, if one can fix the parameters to some values so that the system is controllable, then the system is structurally controllable.
In Liu et al. 11 , it is proved that in order to fully control a network of size N, the minimum number of input vertices (N i ) needed is related to the size of maximum matching (M * ) in the corresponding digraph: If the directed graph has a perfect matching, the minimum input vertice is 1; otherwise, it is equal to − ⁎ N M . In an undirected graph, a matching M is an independent edge set without common vertices. In a directed graph, an edge subset M is a matching if none of the edges therein share a common starting vertex or a common ending vertex. We find the maximum matching of a directed graph by transforming it into an undirected bipartite graph. The Hopcroft-Karp algorithm is chosen to determine the maximal matching in the corresponding transformed bipartite graph as this algorithm is both accurate and efficient for the small size neuronal network commonly recorded (< 100 neurons).
Approaches to Describe Small-world-ness. Our study shows that the reconstructed functional neural network presents significant small-world features due to its large average clustering coefficient CC and small average path length PL . Although these two parameters have been widely adopted to identify a small-world network 29 , they were not defined for comparing the small-world-ness among different networks. For example, network A has larger CC but also larger PL than network B, so we can not tell which network shows stronger small-world-ness. Thus, it is necessary to use a quantitative measure to describe the small-world feature. In an earlier study, the significance in small-world-ness of a network could be described by the following metric 30 where C r and L r are CC and PL from the random graph with the same number of nodes and edges as that of the reconstructed network. A small-world network usually has  S 1 w , which should meet  1 C C r and ≈ 1 L L r . However, this metric is sensitive to the size of network, and it becomes larger as the size of network increases. These drawbacks in metric S w have been tackled by another metric to describe the small-world-ness 31 . It focuses on analyzing whether the measured network is more like a regular lattice or a random graph. The Watts and Strogatz model was proposed to produce a small-world network which had both large CC as with the corresponding regular lattice and small PL as with the random graph. The new metric is based on these two properties, and the new measure is where C l is the CC from the corresponding regular lattice. When the new metric is close to 0, the network shows stronger small-world-ness feature, which means that C ≈ C l and L ≈ L r . When ≈ ⁎ S 1 w , the network is more like a random graph. Whereas, when ≈ − ⁎ S 1 w , it has characteristics like a regular lattice. This metric is not sensitive to the size of network. We consider both these two metrics for measuring the small-world-ness of the network model.