Functional orderly topography of brain networks associated with gene expression heterogeneity

The human cerebral cortex is vastly expanded relative to nonhuman primates and rodents, leading to a functional orderly topography of brain networks. Here, we show that functional topography may be associated with gene expression heterogeneity. The neocortex exhibits greater heterogeneity in gene expression, with a lower expression of housekeeping genes, a longer mean path length, fewer clusters, and a lower degree of ordering in networks than archicortical and subcortical areas in human, rhesus macaque, and mouse brains. In particular, the cerebellar cortex displays greater heterogeneity in gene expression than cerebellar deep nuclei in the human brain, but not in the mouse brain, corresponding to the emergence of novel functions in the human cerebellar cortex. Moreover, the cortical areas with greater heterogeneity, primarily located in the multimodal association cortex, tend to express genes with higher evolutionary rates and exhibit a higher degree of functional connectivity measured by resting-state fMRI, implying that such a spatial distribution of gene expression may be shaped by evolution and is favourable for the specialization of higher cognitive functions. Together, the cross-species imaging and genetic findings may provide convergent evidence to support the association between the orderly topography of brain function networks and gene expression.

human brains to characterize their evolutionary rates. The synonymous and non-synonymous substitution rates between human and mouse were obtained from Ensembl (http://www.ensembl.org/biomart/martview/).

Small-world structure of gene expression network.
Previous studies on the macro-level brain connectome have demonstrated that the human brain has a small-world structure with a characteristic path length that is lower than that of the regular network and a clustering coefficient that is greater than that of a random network 4,5 . At the micro level, we expected that the gene expression networks of all brain samples would exhibit the small-world property. We generated populations of regular networks and random graphs that preserved the same number of nodes and edges in the gene expression networks and extracted their mean path length and average clustering coefficient. We found that the gene expression networks have shorter path lengths and larger clustering coefficients compared to those of the random and regular networks ( Supplementary Fig.   2), confirming the small-world property of the gene expression networks in human brains.
Topological properties of the networks. The most elementary characteristic of a node is its degree (or connectivity), k, which provides information regarding the number of links that a node has to other nodes. The mean path length represents the average of the shortest paths between all pairs of nodes and offers a measure of a network's overall navigability 6 . An undirected network with n nodes and l links is characterized by an average degree, k=l/n. In addition, the average clustering coefficient characterizes the overall tendency of the nodes to form clusters or groups 7 .
The closer the local clustering coefficient is to 1, the more likely that the network will form clusters. The betweenness centrality is a measure of the importance of a node in a network, i.e., the relative importance of a node to the network. The betweenness centrality is equal to the fraction of the shortest paths that pass through a node and is counted over the shortest paths among all pairs of nodes.
The eigen entropy of networks was defined as the entropy of the normalized largest eigen vector of an adjacency matrix 8 . For a network with n genes, its adjacency matrix can be represented as Integration of brain-wide gene expression dataset from Allen Human Brain Atlas. We combined the gene expresion dataset across all six brains using the typical workflow for processing brain-wide transcriptomic data described in previous paper 19 . By mapping the samples to cerebral cortex of human brains, we obtain an integrated dataset including 15,745 genes expressed in 1280 samples. The following probe filtering criteria were applied during the process: 1) the probe-to-gene annotations were updated using Re-Annotator package; 2) probes where expression measures do not exceed the background in more than 50% samples were removed; 3) a representative probe for a gene was selected based on the highest intensity; 4) applying limma batch effect removal on cross-subject aggregated data, followed by the scaled robust sigmoid normalization. After normalization, samples have no inter-individual differences in gene expression.

Functional connectivity of human brains. Functional connectivity was analyzed using
resting-state fMRI data from fifty randomly selected subjects from the Human Connectome Project (HCP, http://www.humanconnectome.org/documentation/S500/). The HCP minimal preprocessing pipeline was used for the resting-state fMRI data 9 . This pipeline includes artifact removal, motion correction, and registration to a standard space. The fMRI data were then processed using previously described procedures 10 as follows: 1) spatial smoothing using a 6-mm full-width half-maximum Gaussian kernel; 2) normalization of the global mean signal intensity across runs; 3) linear detrending and bandpass temporal filtering (0.01~0.08 Hz); and 4) regression of nuisance variables, including the six parameters obtained by rigid body head motion correction, ventricular and white matter signals, and the first temporal derivatives of all the above items. The AAL atlas, which divided the whole brain into 116 regions, was used for the region-to-region functional connectivity measures in the current study 11 . All AAL regional masks were generated using the free software WFU_PickAtlas (Version 2.0, http://www.ansir.wfubmc.edu). We evaluated the functional connectivity between each pair of regional averaged time courses using Pearson's correlation coefficient. Significant functional correlations were selected using one-sample t-tests (P<0.05, Bonferroni correction), resulting in the binary 116×116 symmetric connectivity matrix C of the functional connectivity network in human brains (Supplementary Table 7).
To match the functional connectivity network in the human brains, we mapped 2,703 samples containing transcriptional data to 116 brain regions. By averaging the attribute values of samples of the same region, we obtained the mean gene expression characteristics and network properties of the brain regions (104, 104, 50, 59, 60, and 62 regions mapped to Brains #1-6, respectively).
The gene expression network in the mouse brain. We established the expression networks in mouse brain structures based on the Allen Institute mouse brain atlas, which offers finely sampled whole-genome expression data 12 . According to the Allen Reference Atlas, a 56-day-old male C57BL/6J mouse brain was partitioned into 73 structures and 12 regions. We computed the expression levels of 719,905 genes in 73 brain structures contained in the coronal planes. Based on the expression intensity in each voxel, we obtained the expression levels of genes in 73 structures by averaging across all voxels in the brain structures. The expression levels in the mouse in situ hybridization data from the Allen Mouse Brain Atlas were quantified using a metric called expression energy (fraction of stained volume × the average intensity of staining) as previously described 12 . In total, 2,873 genes were found to be expressed in at least one structure by selecting genes with fractions expressing pixels above 0.02 to omit genes with extremely low expression. We downloaded 33,145 protein interactions among 8,499 mouse gene products from the BIOGRID database 13 . By integrating the gene expression data and protein interactions in the mouse brain, we established interaction networks in 73 structures of the adult mouse brain.
We identified 570 HKGs that were expressed in 72 or 73 structures and 360 specific genes that were expressed in one or two structures. Regarding the genes expressed in more structures in the mouse brain, their homologous genes in humans tended to be more widely expressed in the human  Table   8). We mapped the functional connectivity data of the mouse brains to 22 brain regions and computed the degree of each region in the functional connectivity network. Based on D99 template of macaque brain 18 , the functional connectivity between each pair of regional averaged time courses was evaluated using Pearson's correlation coefficient. Significant functional correlations (r>0.15) were selected to obtain the binary 304×304 symmetric connectivity matrix of the functional connectivity network in macaque brains (Supplementary Table 9). We mapped the functional connectivity data of the macaque brains to 304 brain structures and computed the degree of each structure in the functional connectivity network.

The influence of methodological variability on the analyses results
Considering that choice of processing steps and parameters can have a potential influence on the statistical outcomes of research with the AHBA, we used abagen as a tool to investigate the influence of methodological variability on our results. We varied the methods of gene normalization, sample normalization and probe selection and took other parameters as default values and computed the heterogeneity index of brain regions based on the Desikan-Killiany atlas.
The results are consistent under different standardizing workflows. In 10 distinct processing pipelines, the heterogeneity of gene expression in the neocortex was found to be significantly greater than those in the archicortex and subcortex, with the regions in neocortex tend to have lower standard errors of gene expression levels than those in archicortex and subcortex.