Good Neighbors, Bad Neighbors: The Frequent Network Neighborhood Mapping of the Hippocampus Enlightens Several Structural Factors of the Human Intelligence on a 414-Subject Cohort

The human connectome has become the very frequent subject of study of brain-scientists, psychologists, and imaging experts in the last decade. With diffusion magnetic resonance imaging techniques, unified with advanced data processing algorithms, today we are able to compute braingraphs with several hundred, anatomically identified nodes and thousands of edges, corresponding to the anatomical connections of the brain. The analysis of these graphs without refined mathematical tools is hopeless. These tools need to address the high error rate of the MRI processing workflow, and need to find structural causes or at least correlations of psychological properties and cerebral connections. Until now, structural connectomics was only rarely able identifying such causes or correlations. In the present work, we study the frequent neighbor sets of the most deeply investigated brain area, the hippocampus. By applying the Frequent Network Neighborhood mapping method, we identified frequent neighbor-sets of the hippocampus, which may influence numerous psychological parameters, including intelligence-related ones. We have found neighbor sets, which have significantly higher frequency in subjects with high-scored Penn Matrix tests, and with low-scored Penn Word Memory tests. Our study utilizes the braingraphs, computed from the imaging data of the Human Connectome Project's 414 subjects, each with 463 anatomically identified nodes.


Introduction
Our brain contains approximately 80 billion neurons, each connected to hundreds or even thousands of other neurons. All brain functions are closely connected to this network of the brain, frequently called "the connectome" [1,2,3]. Today, the neuronal-level connectome (or braingraph), where the nodes correspond to the 80 billion neurons, and two nodes are connected by an edge if the corresponding neurons are connected by an axon, is unknown for us. The only fully developed species with known neuronal-level braingraph is that of the nematode Caenorhabditis elegans, with 302 neurons, determined in the eighties by electronmicroscopic techniques ( [4], the graph can be downloaded from braingraph.org [5]). More recently, serious developments are reported in the mapping of the neuronal-level braingraph of the fruitfly Drosophila melanogaster with 100,000 neurons [6].
With currently available techniques the human braingraph can be constructed and analyzed in a much coarser resolution than the neuronal level, with the help of diffusion magnetic resonance imaging (MRI) [7]. In these graphs, the nodes are anatomically identified 1-1.5 cm 2 areas of the gray matter (frequently addressed as "ROIs", i.e., Regions Of Interests), and two nodes are connected by an edge if the diffusion MRI analyzing workflow [7,8,9,10] finds axonal fiber tracts between them. Therefore, we can construct today braingraphs upto 1015 nodes and several thousand edges. Perhaps the most reliable large human MRI datasets to date are the public releases of the Human Connectome Project (HCP) [11].

The graph-theoretical analysis of the braingraph
The exact, robust and graph-theoretical analysis of the human braingraphs is a fast developing and important area today. Our research group has contributed numerous results in this field, analyzing the HCP data. We have computed hundreds of braingraphs [5], and prepared the Budapest Reference Connectome Server, which generates the graph of k-frequent edges of the human connectome of n=477 people, where 1 ≤ k ≤ n, and the k-frequent edges are those, which are present in at least k braingraphs out of the n=477. The parameter k is selectable, along with other parameters at the webserver https://pitgroup.org/connectome/, and the resulting consensus graph can be visualized and downloaded from the site [12,13].
In the work [14] we have mapped the individually more and less variable lobes of the human brain on 395 subjects, with the help of a natural measure: the distribution function. We have shown that the frontal and the limbic lobes are more conservative, while the edges in the temporal and occipital lobes show more diversity between the individual braingraphs. We have also compared the lobes of the brain by computing numerous graph-theoretical parameters in the sub-graphs, induced by the vertices of the lobes in [15]. We have found that the right temporal and the right parietal lobes have better connectedness-related graphtheoretical parameters than the left ones (e.g., larger minimum vertex cover, larger Hoffmanbound). More interestingly, the left frontal lobe has better such parameters than the right one.
We have compared the volumetric properties of the male and female brain areas in [16], and the sex differences in the human brain connectomes in [17,18,19]. We have shown a strong statistical advantage of the female connectomes in the connectedness-related advanced graph-theoretical parameters in a smaller cohort in [17] and in a larger cohort in [18]. In [19] we have clarified that the better, connectedness-related braingraph parameter-results of women cannot be due to the brain-volume differences: we have identified 36 large-brain females and 36 small-brain males, such that the brain volumes of all females were larger in the group than those of all males, and the advantage of the women remained valid even after this highly specific subject selection.
The development of the connections in the mammal brains is a hot research area today with many open questions. Lots of information were learned from embryonic rat and mouse brain microscopy on the development of single neuronal tracts [20,21]. In the human brain, much less is known about the phases of the axonal development and growth. By analyzing the features of the publicly available Budapest Reference Connectome Server http:\connectome.pitgroup.org, we have discovered the phenomenon of the Consensus Connectome Dynamics (CCD), which, by our hypothesis, describes the individual axonal development of the human brain [22,23,24,25]. The CCD phenomenon is also applicable for directing the edges of the braingraph [24,25].

Robust methods
The robust analysis of the MR imaging data is an important point in all applications, since there are numerous complex steps, where noise or data processing artifacts may appear in the image processing workflow. For example, one such area is the tractography phase, where the crossing axonal fibers may induce errors in the processing [26,27,28]. Therefore, the error-correcting analytical methods have an utmost importance in processing of these data.
Our research group pioneered several such methods by examining the frequently appearing substructures. This approach will not consider rarely appearing errors, since if we deal with substructures, which appear with a minimum frequency of 80% or 90%, then the infrequent errors will be filtered out. The Budapest Reference Connectome Server generates the kfrequent edges [12,13]. In the work [29] we have mapped the frequently appearing subgraphs of the human connectome. The frequent complete subgraphs of the human braingraph were identified in [30].
Numerous publications attempt to find correlations between the psychological and anatomical, more exactly, connectomical, or graph-theoretical properties of the braingraph (e.g., [31]). The difficulty of identifying structural-psychological correlations lies in the individual diversity of the cerebral connections. One possible solution to this difficulty is the comparison of the frequent substructures with the results of psychological measurements.
In the publication [32] we defined the Frequent Network Neighborhood Mapping.

The Frequent Network Neighborhood Mapping
Here we would like to formalize the frequent neighborhood mapping. The motivation of the formalism below is the identification of the robust, frequent neighborhoods of some important node u, where the word "frequent" means that the same neighborhood of u appears frequently in the braingraphs of the N subjects of ours: Let G(V, E) be a graph with vertex-set V and edge-set E. Let u be a vertex. Vertex v is a neighbor of u if the unordered pair {u, v} is an edge of G. Then Γ(u), called the neighbor-set of u, contains all the neighbors of vertex u, that is: In other words, Γ i (u) is the neighborhood of u in graph G i . We say that the vertex-set W ⊂ V is a k-frequent neighborhood of u if there are at least k indices i, such that W ⊂ Γ i (u). If, say, k/N ≥ 0.8, then W is a frequent neighbor set of u with a cut-off value (or threshold) of 80%.
In the work [32] we have identified the frequent neighbor sets of the hippocampus of size at most 4, with threshold of 80%. We have also identified the frequent neighbor-sets of the hippocampus, which were more frequent in male and female subjects, respectively.

Discussion and Results
The hippocampus is, perhaps, the most frequently and deeply investigated area of the brain: it is a part of the limbic system, it has a role in turning short-time memory into long-time memory, in spatial orientation, navigation and memory [33,34,35,36]. It is a sea-horse-shaped organ, and it is present in the left-and also in the right hemispheres: that is, there is a left-and a right hippocampus in the brain.
Here we identify the frequent hippocampus neighbor sets of size up to 4, for hippocampi in both hemispheres. Next, we investigate whether the presence of these neighbors of the hippocampus has any statistical significance with some, intelligence-related test results of the subjects. We call the hippocampus neighbor-sets, with these significant differences in frequencies "significant sets" in short.
The motivation of this study is as follows: by the best of our knowledge, no connections were proven between the presence or absence of any single connectome-edge and any psychological property of the subjects examined. This failure may be due to the great variability and plasticity of the brain connections [14,12,13]. Here we want to overcome these difficulties in a two-fold strategy: (i) Instead of the individual appearances of graph-theoretical objects we consider frequent objects; (ii) Instead of frequent single edges from vertex u we consider frequent subsets of the neighbor-set Γ(u), where u is the hippocampus.

Measures of intelligence
In the present study, we consider two psychological tests, which were administered to the subjects of the Human Connectome Project: PMAT24 A CR: Penn Matrix Test: Number of Correct Responses; scored from 0 through 24. This is a multiple-choice test where the subject needs to choose the best fit from a list of objects into the one empty position of a small matrix of objects. The PMAT test is believed to assess the mental abstraction and flexibility [37]. The higher scores show better mental abilities. We grouped the scores as "low" between 0 and 16, and "high" between 17 and 24; the cut-off score 17 is the median.
IWRD TOT: Penn Word Memory Test: Total Number of Correct Responses, scored from 0 through 40. In the first phase of the test, the subjects need to memorize 20 written words. In the recognition phase, 40 words are shown, and the participants need to decide whether the words were seen in the first phase or not. The score is the number of the correct answers. We valued the scores 0-35 as "low" and 36-40 as "high", the cut-off score 36 is the median. Table 1 shows the results of the Frequent Network Neighborhood Mapping for these two tests. The table list the numbers of the frequent neighbor sets of the left-and the right hippocampus in the connectomes of the subjects with high-and low PMAT24 and IWRD test scores, respectively.
In the columns, labeled by 1,2,3 and 4 the numbers of the 1,2,3 and 4-element frequent neighbor-sets are given, for the subjects with high and low test scores. The threshold for "frequent" sets is 80% in the cases of both the right-and the left hippocampus, and 90%, when their union is considered. The column with "significant" label contains the number of the neighborhood sets of the statistically differing (p=0.01) frequencies in the "low" and the "high" test scores (called briefly "significant sets"). In the case of PMAT24 tests, the majority of the significant sets are related to the high test values. This may imply that these neighborhoods of the hippocampus are beneficial for the PMAT24 test results, so, these are the "good neighbors" of the hippocampus.

Good and bad neighbors of the hippocampus for the Penn Matrix test
Here we list some neighbor sets with significant differences of frequencies in low-and high-scored PMAT24 subjects.
The full lists can be downloaded from http://uratim.com/hc/hc_neighbors_PMAT_IWRD_xls.zip; we refer to the naming conventions of the files there to the "Data Availability" section below.
The the naming of the nodes below follows those listed in the CMTK nypipe GitHub repository https://github.com/LTS5/cmp_nipype/blob/master/cmtklib/data/parcellation/lausanne2008/Parcel In this test, most of the significant sets are related to the higher scores.
The following three neighbor sets of the left hippocampus have a significantly higher frequency in low-scored PMAT24 subjects:   Table 1: The table list the numbers of the frequent neighbor sets of the left-and the right hippocampus in the connectomes of the subjects with high-and low PMAT24 and IWRD test scores, respectively. In the columns, labeled by 1,2,3 and 4 the numbers of the 1,2,3 and 4-element frequent neighbor-sets are given, for the subjects with high and low test scores. The threshold for "frequent" sets is 80% in the cases of both the right-and the left hippocampus, and 90%, when their union is considered. The column with "significant" label contains the number of the neighborhood sets of the statistically differing (p=0.01) frequencies in the "low" and the "high" test scores (called briefly "significant sets"

Materials and Methods
The braingraphs in our work were computed from the MRI data of the Human Connectome Project's Public Data Release at http://www.humanconnectome.org/documentation/S500 [11]. The subjects of this study were healthy young adults, between the ages of 22 and 35 years. The braingraphs were computed by us, applying the CMTK toolkit [10], with randomized seeding, 1 million streamlines and deterministic tractography. We have used the 463-vertex graphs for the present work. The graphs are available freely for download at our site: https://braingraph.org/cms/download-pit-group-connectomes/. The workflow, by which the graphs were computed from the HCP data is described in details in [5].
The computation of the frequent neighbor sets of the hippocampus, which facilitated the Frequent Network Neighborhood Mapping, used an apriori-like algorithm [38,39], with small modifications: http://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/, similarly as in [32]. Succinctly, the apriori algorithm makes use of the following observation: If vertex-set A is frequent with a cut-off value, say 80%, then all of the subsets of A has a frequency at least of the cut-off value, i.e., 80%. Therefore, the 2-element frequent sets can be built from the 1-element frequent sets, the 3-element frequent sets from the already identified 2-element frequent sets, and so on.
For the identification of the frequent neighbor-sets, we have applied a two-step strategy: First, we partitioned the braingraphs into two groups by the parity of the second digit of the ID of the subjects. Next, we identified the frequent neighbor sets of the hippocampus within both classes of the partition (using the apriori algorithm), with a cut-off value of 80%. Only those sets were accepted to be frequent, which were frequent with cut-off value 80% in both classes. In a certain sense, this strategy modeled the frequency counting in random subsets; consequently, those neighbor sets, which were frequent only in one of the classes, were identified as such.

Statistical Analysis
Next, we analyzed the appearance of the frequent hippocampus neighbor sets in the highand low-scored PMAT24 A CR and IWRD TOT subjects. We identified the neighbor-sets, which were significantly more frequent in the connectomes of the high-scored and the lowscored subjects. For the statistical analysis we used χ 2 test with significance bound of p = 0.01, with Holm-Bonferroni corrections [40].
Our null hypothesis is that the frequencies are the same in the connectomes of the lowand high scored subjects, and we refute this hypothesis with p=0.01.
For a neighbor set F , its occurrences were counted in the low-scored dataset by count 1 (F ) and in the high-scored dataset by count 2 (F ). The support was computed as follows:

Si
, where S i , for i = 1, 2, is the number of the subjects with low-and high scores, respectively.
For the significance analysis in the difference of supp 1 (F ) and supp 2 (F ) we used the χ 2 -test for categorical data: Now: The degree of freedom for this test is one (since it is the number of samples minus one times the number of categories minus one). Holm-Bonferroni correction [40]: The p-values for the frequent sets were ordered p 1 ≤ p 2 ≤ p 3 ≤ . . . ≤ p m . For a significance level α = 0.01, let the Holm-Bonferroni value for k th frequent set be defined as p ′ k = α m+1−k . Then let t be the minimum index such that p t > p ′ t : The null hypotheses for indices i ≤ t need to be rejected.

Conclusions
By the application of Frequent Network Neighborhood Mapping, we examined the neighbors of the human hippocampus, and found that some frequent neighbor sets correlate with the better Penn Matrix test results, and some frequent neighbor sets correlate with worse Penn Word Memory test results. By our knowledge, this is the first result which connects the intelligence-related measures with the neighbors of the human hippocampus, with strict statistical significance analysis.

Data availability
The data source of this study is Human Connectome Project's Public Data Release at http://www.humanconnectome.org/documentation/S500 [11].
The result tables, with the listing of the frequent neighbor sets, whose frequency differ significantly in low-and high scored subjects, can be downloaded in Excel format from the site http://uratim.com/hc/hc_neighbors_PMAT_IWRD_xls.zip. The archive contains 12 files. Six of them has the prefix IWRD TOT, six of them PMAT24 A CR, containing the significant sets for these tests. After the prefix, the filenames carry strings hc l or hc r or hc, meaning that the neighbor-sets are those of the left-or right hippocampus, or the union of those. Next, the word "lower" or "upper" mean that the neighbor sets have significant differences in the frequency in the lower half or the upper half of the scored subjects. Those tables, which correspond to 0s in Table 1 are empty.