Introduction

The plenty of high-quality structural MRI data from the human brain makes possible of studying the cerebral anatomy in an unprecedented way today. Among other large projects, the NIH-funded Human Connectome Project1 records and publishes multimodal MRI data from hundreds of healthy individuals. High-angular resolution diffusion imaging (HARDI) data can be processed to discover connections, consisting of axonal fibers, between anatomically identified2 gray matter areas. Consequently, a braingraph or connectome can be constructed that contains the connections as follows: the nodes or vertices of the graph correspond to the anatomically identified gray matter areas, and two nodes are connected if fibers of axons are discovered between them by processing the diffusion weighted data3,4,5.

If we have braingraphs from several hundred subjects, then, since the vertices of the different braingraphs are corresponded to the very same brain map2, we can describe the diversity of the edges between different subjects and in different lobes or smaller brain areas as in6, or we can just describe the common edges through numerous subjects, as in the Budapest Reference Connectome Server http://connectome.pitgroup.org 7,8. The data source of these studies was the Human Connectome Project1.

Distinguishing the frequently and rarely appearing connections within the human brain may help the neuroscientist in identifying the normally appearing, usual, standard and non-standard connections. These non-standard connections can cause or can be caused by some disease, or just can be the result of the personal variability with or without psychological consequences. Therefore, the mapping of the frequent and the infrequent connections by the Budapest Reference Connectome Server can have straightforward clinical significance.

Very surprisingly, we have discovered a phenomenon, called Consensus Connectome Dynamics (CCD), on the Budapest Reference Connectome Server, which may open up new horizons in the study of the development of the human brain9. This discovery was very surprising since the server was not built to study brain development: the imaging data is originated from adults between 22 and 35 years of age, so the age-span is — seemingly — inadequate for studying the early development of brain connections, occurring in several months just before and after the birth10. To clarify this discovery, we need to cover some details of the Budapest Reference Connectome Server http://connectome.pitgroup.org.

The server is capable of computing and visualizing the consensus connectomes with setting several parameters. The braingraphs of \(n=418\) subjects are processed on the server. Let k be an integer such that \(1\le k\le n\). Let us call an edge e \(k-frequent\) if the edge e is present in at least k braingraphs out of the maximum n graphs. Let us call a connectome k-consensus connectome if it contains all the k-frequent edges. The k-consensus connectomes, consequently, contains all edges that are present in at least k connectomes. For \(k=1\), the 1-consensus connectome contains all the edges that are present in at least one subject’s braingraph out of the n graphs. The n-consensus connectome contains the edges that are present in all subject’s connectomes. Clearly, the n-consensus connectome contains much fewer edges than the 1-consensus connectome. Let \(1\le i\le j\le n\) then it is also obvious that the edges of the j-consensus connectome are also present in the i-consensus connectome. This means that the i-consensus connectome contains more and more edges by the decrease of i.

The fascinating observation9 is that the new edges that appear in the – larger – i-consensus connectome, relative to the – smaller – i + 1-consensus connectome, are not placed randomly, they seem to connect to the edges of the i + 1-consensus connectome. Consequently, if we consider the k-consensus connectomes, for decreasing k values from \(k=n,n-\mathrm{1,}\,\mathrm{...,}\,\mathrm{3,\; 2,\; 1}\) then we get more and more edges, and the edges form a growing, complex, tree-like structure.

The observation is visualized on a video at https://youtu.be/yxlyudPaVUE for the whole brain and at https://youtu.be/wBciB2eW6_8 restricted for the frontal lobe only9,11. The observation is statically visualized on a very large component-tree at http://pitgroup.org/static/graphmlviewer/index.html?src=connectome_dynamics_component_tree.graphml, which is described in detail in9. The observation is analyzed quantitatively on Fig. 2 in9 for the whole brain and on Fig. 1 of11 for the frontal lobe only.

Figure 2
figure 1

The comparison of the sum of the isolated edges in the CCD phenomenon with the corresponding value in the doubly-preferential attachment random model we suggest. The horizontal axis contains the numbering of the steps: step i correspond to the \(n-i\)-consensus connectome.

Figure 1
figure 2

The comparison of the edge numbers in the CCD phenomenon with the edge numbers in the doubly-preferential attachment random model we suggest. The horizontal axis contains the numbering of the steps: step i corresponds to the \((n+1-i)\)-consensus connectome.

The interested reader can also experience the Consensus Connectome Dynamics phenomenon on the Budapest Reference Connectome Server http://connectome.pitgroup.org by (i) choosing the “Show options” button and (ii) moving the “Minimum edge confidence” slider to the rightmost position, and (iii) slowly moving the “Minimum edge confidence” slider from right to left.

In9,11 we hypothesized that the Consensus Connectome Dynamics (CCD) phenomenon describes the order of the development of the connections of the brain: the deviation of the oldest, first developed connections are the smallest, and, gradually, the newer and newer developed connections cumulate the deviations of the connections that they connect to, and, because of this, their deviation will be higher and higher; that is, they will appear only in k-consensus connectomes for smaller values of k.

Results

In the present contribution we examine two relevant questions concerning the CCD phenomenon:

  1. a:

    Robustness: The CCD phenomenon can be characterized by the order of appearance of the edges in the growing graphs of k-consensus connectomes, for the decreasing k parameters. For showing that this order has any biological meaning and it is not just the product of the specific choice of the dataset processed, we need to demonstrate the independence of the appearance of the edges from the particular choice of the underlying dataset.

  2. b:

    Random graph model for CCD: The main reason for preparing a random network model for a known, interesting graph is uncovering the possible mechanism involved in the development (or evolution) of the graph. As the most famous example, the Barabási-Albert model for the description of the degree distribution of the webgraph12,13 uses the “preferential attachment” principle in the description of the development of the webgraph. In the Barabási-Albert model, roughly, in every step one new vertex appears, and it connects to the older ones with probabilities proportional to the degree of the older vertices. It was shown first by computer simulation12 and, later by an exact mathematical proof13 that this process led to the power law degree distribution with exponent −3. Most importantly, since the random simulation process well-described the degree-distribution of the webgraph, the model uncovers also the mechanism that guides the users of the web in hyperlinking the new vertices (web pages) in the webgraph.

Discussion

In what follows we show that the CCD phenomenon is robust in the sense described above, and we also define a probabilistic graph model with a “doubly preferential attachment” that well-describes the CCD phenomenon.

Robustness

Here we examine the independence of the CCD phenomenon of the particular choice of the underlying datasets of braingraphs. For this goal, we partitioned the 418 braingraphs into 4 disjoint sets of almost equal size with a ±1 margin (let us call these sets “quarter sets”), and we computed the order of appearance of the edges in the CCD, according to each quarter set. Next, we have compared the order of appearance of the edge-pairs as follows:

We have chosen those edges that are present in (at least one graph) in all the four quarter sets; there are 31,873 such edges. Then we - randomly - chose two of the quarter sets out of the four, say X and Y, and also randomly two edges, say e and f out of the 31,873 ones that are present in all four quarter sets. Next, we compute the experimental probability that in the X dataset-based edge e appears strictly before f and in the Y based experiment f appears strictly before e (that is, their order of appearances differs).

If in the CCD the edges just appeared randomly this probability would be equal to 1/2. The smaller is the probability, the more robust is the CCD phenomenon. We have got this probability to be 0.104.

Similarly, we have also computed the order of the connections of the vertices in the consensus connectomes, and for a randomly chosen u, v vertex-pair we computed the experimental probability that in the X dataset-based CCD vertex u appears strictly before v and in the Y based experiment v appears strictly before u. We have got that this probability is 0.053.

Dealing with possible artifacts: It is known that the algorithmic details in the workflow of processing the MR images may influence the connectomes constructed from these images (e.g.14, compares the effects of the choice of different tractography algorithms to connectomes). The 418 graphs in our computations above have been constructed using a deterministic fiber-tracking algorithm (SD_STREAM option in MRtrix 0.2). For excluding processing artifacts, we have re-computed all the 418 graphs with probabilistic tractography (MRtrix 0.2, with 1 million streamlines, white matter seeding/masking and probabilistic fiber-tracking [the SD_PROB option]). Next, we compared the new graphs (constructed with the application of probabilistic tractography) with the old ones (constructed with deterministic tractography) in several ways:

The probability that two random edges appeared in a different order in the CCD in two random quarter sets in the new graphs: 0.076 (it is better than in the old graphs, there this value was 0.104).

The probability that two random vertices are connected in different order to the consensus connectomes in the growing CCD structure in the new graphs: 0.067 (it is a little bit worse than the value for the old graphs: 0.053).

We have also compared the old and the new graphs as follows: we have taken the edges that were present in the new and the old graphs, in all 4 quarter sets of the old and new graphs, then we took two random edges and two random quarter sets, one from the old, the other from the new quarter sets, and have found that 0.101 is the probability that the order of the appearance of these two edges differ in the old and the new graphs in the CCD phenomenon.

Similarly, we have found that the probability of the connection of two random vertices in the CCD phenomenon by two random quarter sets (one from the old, one from the new quarter sets) is 0.085 (for the old graphs this value was 0.053).

Therefore, we can conclude that for the edges and the vertices, the order of their appearance in the CCD phenomenon is almost independent of the underlying dataset, so, in our opinion, this order describes a property of the brain, and not of the datasets.

Random graph model for the CCD

There are three significant differences — relative to the webgraph-simulation — that need to be addressed in developing the random graph model for the CCD phenomenon:

  1. (i)

    In CCD, we have the data of the buildup of the graph; in other words, in CCD we have a dynamic process of the appearances of the new edges, while in the case of the webgraph only a static image: the degree distribution of the vertices in the graph;

  2. (ii)

    In CCD we observe new edges between those “old” vertices that were connected to some edges in the previous steps (i.e., with larger k s in k-consensus connectomes). In the Barabási-Albert model, new edges are always connected to the new vertex, and they never appear between two old nodes.

  3. (iii)

    We do not intend to model an unboundedly growing graph as in12,13; our goal is to model the 1015-vertex CCD phenomenon.

Here we suggest a doubly-preferential attachment probability distribution for the new edges: the probability of the appearance of a new uv edge between vertices u and v is proportional to the sum \(\deg \,u+\deg \,v\), i.e., the sum of their degrees. We call this rule “doubly-preferential attachment”, because in the Barabási-Albert model12 the new vertex u was connected to old vertex v with a probability, proportional to \(\deg \,v\) (the “preferential attachment model”). The mathematical details and the parameter choices are detailed in the “Methods” section.

Figure 1 compares the increase of the edge numbers in the real CCD phenomenon and in the doubly-preferential attachment model we suggest. Step i on the horizontal axis correspond to the \((n+1-i)\)-consensus connectome.

Figure 2 compares the sum of the isolated edges in the CCD and the random, doubly-preferentially attached model. An edge is called “isolated” in the k-consensus connectome, if it does not connect to any other edges, and it was not present in the k + 1-consensus connectome. One quantitative characterization of the CCD phenomenon is the very small number of isolated edges (c.f. Figure 2 in9 and Fig. 1 of11). Therefore the sum of the isolated edges is a appropriate measure of the good characterization of the CCD phenomenon.

Conclusions

We have shown that the CCD phenomenon is robust, in other words, most probably it describes a biological phenomenon, and it is not just the property of the particularly chosen datasets.

We have also shown that the doubly-preferentially attached model well-describes the CCD phenomenon. This fact also strengthens our hypothesis described in9,11 that the CCD phenomenon copies the axonal development of the brain on a macroscopic level: there we hypothesized that the new axonal connections prefer connecting to neurons with numerous existing connections; the success of the doubly-preferentially attached model is in line with this assumption, since here new edges appear more probably between – already – high-degree nodes.

Methods and Materials

We used a Barabási-Albert-like model for approximating the connectome distribution. First, we observed that the number of edges is approximately an exponential function of k, with sharp increases at the beginning and at the end. An exponential regression yielded the equation \(46.37{e}^{0.014k}\) (\({R}^{2}=0.99\)). Let \(A\,:=46.37\) and \(B\,:=0.014\). From this equation we have derived the following simple model: we start from a \(\lfloor A\rfloor \)-edge graph (i.e. a 46-edge graph), generated randomly over D selected nodes (where \(D\le N\) is a parameter of the model) then, in each step, we add each uv edge with the probability

$${p}_{uv}\,:=\frac{B}{\mathrm{2(}N-\mathrm{1)}}({\rm{\deg }}\,u+{\rm{\deg }}\,v),$$
(1)

where \(\deg \,u\) denotes the degree of node u in the previous step.

This indeed yields an exponentially growing number of edges. Observe that the expected number of new edges in the next step is (if we do not account for multiple edges)

$$\sum _{u\in V}\sum _{\begin{array}{c}v\in V\\ v > u\end{array}}\frac{B}{\mathrm{2(}N-\mathrm{1)}}({\rm{\deg }}\,u+{\rm{\deg }}\,v)=\frac{B}{\mathrm{2(}N-\mathrm{1)}}2\frac{N-1}{2}\sum _{u\in V}{\rm{\deg }}\,u=B|E|,$$
(2)

where \(|E|\) is the number of edges in the previous step. Thus our model indeed generates an exponential expected number of edges, namely approximately \(\lfloor A\rfloor {e}^{Bk}=46{e}^{0.014k}\) edges in the kth step, which is consistent with the exponential regression.

Unfortunately, this model does not allow adding new edges between zero-degree (isolated) nodes, because p uv becomes 0 for those nodes. To circumvent this problem, we have modified the equation so that we allow a certain probability for the inclusion of these “isolated” edges. We added a constant to \({p}_{uv}\), that is, in our new model, the probability of inclusion for the edge uv has now become

$${p}_{uv}\,:=\frac{B}{\mathrm{2(}N-\mathrm{1)}}({\rm{\deg }}\,u+{\rm{\deg }}\,v)+C,$$
(3)

where C is the inclusion probability for isolated edges.

This causes the number of edges to be not \(\lfloor A\rfloor {e}^{Bk}\), but approximately \((\lfloor A\rfloor +(\frac{N}{2})C/B){e}^{Bk}\). This is because the expected number of new edges of C is \((\frac{N}{2})C\) in each step, and the number of edges is multiplied by about \(1+B\) in each step. Therefore, by using the \(1+z+{z}^{2}+\mathrm{...}\,=\frac{1}{1-z}\) formula for \(z\,:=\frac{1}{1+B}\), we can derive the number of edges relative to \({e}^{Bk}\). Based on this formula, we need to decrease the initial number of edges from \(\lfloor A\rfloor \) to \(\lfloor A-(\frac{N}{2})C/B+0.5\rfloor \) (0.5 is added for rounding to the nearest integer).

Since it is unlikely that two isolated edges appear from the same node at once, the value of C influences the total number of new isolated edges in an almost linear fashion. So we can start from a value of C, count the total number of new isolated edges, then compare it with the desired total number of isolated edges, and divide C with this ratio. This way, we determined the optimal value for C as \(7.6\times {10}^{-7}\). We found that, in reality, the cumulative number of new isolated edges is 0 up until step 47, then increases in an approximately linear fashion up until step 186, and after that it levels off at 50–55 edges in total. In comparison, the average curve of the cumulative number of new isolated edges in our simulation increased linearly until about step 130, had a concave section until about step 230, where it leveled off at 56 edges.

We can thus conclude that our model for CCD not only approximates the number of edges in each step well, but also the cumulative number of isolated edges. Furthermore, to avoid over-fitting, the model is simple and only has 4 parameters.

Data availability

The Human Connectome Project’s MRI data is accessible at: http://www.humanconnectome.org/documentation/S500 1.

The graphs (both undirected and directed) that were prepared by us from the HCP data can be downloaded at the site http://braingraph.org/download-pit-group-connectomes/.