Using a recently invented technique for gene expression mapping in the whole-anatomy context, termed transcriptome tomography, we have generated a dataset of 36,000 maps of overall gene expression in the adult-mouse brain. Here, using an informatics approach, we identified a broad co-expression network that follows an inverse power law and is rich in functional interaction and gene-ontology terms. Our framework for the integrated analysis of expression maps and graphs of co-expression networks revealed that groups of combinatorially expressed genes, which regulate cell differentiation during development, were present in the adult brain and each of these groups was associated with a discrete cell types. These groups included non-coding genes of unknown function. We found that these genes specifically linked developmentally conserved groups in the network. A previously unrecognized robust expression pattern covering the whole brain was related to the molecular anatomy of key biological processes occurring in particular areas.
An integrated understanding of biological structure and function based on knowledge of gene expression is critical in systemically approaching complex structures, such as the mammalian brain, where an estimated 25,000 genes are expressed1,2,3. The combinatorial number of randomly selected genes is greater than the number of cells in the brain. Indeed, a non-random combination of expressed genes is required for the formation of functional regions. The activation or suppression of specific subsets of genes, referred to as co-expressed or co-suppressed genes, is thought to regulate cell type- or region-specific brain functions. Thus, well-designed experimental systems for profiling combinatorial gene expression in relation to anatomical structures are required to reveal novel aspects that will allow an integrated understanding of brain structure and function.
Previous in situ hybridization (ISH)-based resources that were employed to comprehensively map co-expression networks in the mouse brain include the Anatomic Gene Expression Atlas (AGEA) of the Allen Mouse Brain Atlas (ABA), which is a database for searching expression similarity of approximately 4,500 genes that are selected by image processing of ABA expression maps as regionally expressed genes in the adult mouse brain, and Eurexpress, which offers co-expression clusters as well as annotated images of more than 18,000 gene expression patterns in E14.5 mouse embryos4,5. The systematic investigation of expression data within these databases provides detailed information in two dimensions at the cellular level for individual genes. However, information in the third dimension is difficult to obtain at the same resolution. Additionally, quantifying expression densities in a three-dimensional (3D) anatomical context and comparing these data across genes is difficult, despite time-consuming and labor-intensive experimental efforts.
An alternative approach is high-throughput analysis of gene expression data obtained through microarray or RNA sequencing methods. Weighted correlation network analysis (WGCNA) is a software package that performs computational analyses to reveal topological modules of co-expression that are based on microarray expression data6. Using archived microarray data, statistically significant co-expression patterns with biological relevance are revealed; and anatomical areas of co-expression are informed by anatomical descriptions associated with the data. High-throughput transcriptome analyses have been performed in a large number of selected anatomical areas in the brains of humans and mice7,8,9. However, due to the nature of these studies, the examined anatomical areas are restricted to key components, and an alternative strategy is, therefore, required to uncover gene expression networks across the brain.
Recently, we developed a system for high-throughput gene expression mapping in a broad 3D-anatomical context, denoted as transcriptome tomography10. In the data acquisition process, a block-face image is obtained before each section is cut. A serial set of sections is collected in batches as a “fraction”, and a series of fractions is obtained using a whole brain. The materials of the fractions are subjected to probe-based gene expression density measurements by microarray. At least three whole brains sectioned in each of three orthogonal planes are required for 3D reconstruction of the gene expression profiles (Supplementary Fig. S1). The block-face images are used to indicate the 3D outlines of the whole brains and their fractions. The outlines of the whole brains are used for transformation to locate all three in the same x-y-z coordinates. Values of the measured expression densities are assigned to voxels of the transformed fractions. The values in each voxel obtained from at least three fractions are averaged and the average represents the expression density of the voxel. Using this system, we have generated a 3D dataset that delivers a comprehensive spatial overview of gene expression patterns of the mouse brain in the ViBrism database (ViBrism DB: http://vibrism.neuroinf.jp/, including more than 36,000 maps of 25,000 genes). The first dataset was created using 61 fractions of six series of sectioning in total (two replicates of three orthogonal sectioning series). Because of the limited number of data points, the anatomical resolution is only so high, and the maps roughly show expression patterns in sub-anatomical regions. However, the expression maps cover the whole brain and expression density values for each gene-probe in the fractions can be correlated with data for any other probes, so that they are directly usable for probe-by-probe correlation analysis of co-expression. Here, we introduce a framework for comprehensively assessing co-expression patterns that are latent within expression maps. We elucidated functions associated with the co-expression of gene networks, discovering that these functions followed a power law and were enriched in functional gene groups, as defined by gene ontology (GO) terms and biological interactions. We focused on genes encoding transcription factors and related molecules and found that the combinatorial expression of genes functioning in cell differentiation during development also occurs in the adult brain and that each combination, including previously uncharacterized genes, is associated with a discrete spatial location (Supplementary Video S1).
Co-expression network graphs following an inverse power law
To measure expression similarities between genes and to characterize expression patterns in a gene probe-based analysis, which used 36,558 total probes, we calculated the Pearson correlation coefficient, r, of the expression density values in the fractions between the gene probes as a measure of correlation. We then calculated κ, which is the number of similarly expressed gene-probes to a target probe including itself (SET), using each probe as a target (the SET search results are provided in the “Similarly Expressed Gene Search” database at the ViBrism website). The distribution of the number of probes exhibiting κ compared with κ was plotted (Fig. 1A). This κ distribution followed an inverse power law, which indicated formation of non-random robust clusters of similarly expressed genes: that is, the clusters contained a significant number of genes associated with a large number of gene-probes in the SET analysis, corresponding to co-expressed hub genes11.
To evaluate the κ distribution, we computationally randomized the fraction data to produce a dataset representing 36,558 patterns of random expression for the total probes; then, using this dataset, we calculated κ values of the probes with a variety of r thresholds and compared the κ distributions of the randomized dataset to the κ distributions of the fraction data (randomized and non-randomized datasets, respectively, in Supplementary Fig. S2: r > 0.85, 0.75, and 0.65). With a threshold of r > 0.85, the κ distribution of the non-randomized dataset was statistically distinguishable from the distribution of a randomized dataset with a larger number of probes exhibiting κ throughout the range of κ values except for κ = 1 (p < 0.05). We concluded that the SETs with r > 0.85 were non-randomly formed with statistical significances in the fraction data. Thus, we decided to perform SET analysis with a threshold of r > 0.85 unless otherwise indicated. Indeed, at this r threshold, in the randomized dataset, no gene-probes were associated with κ larger than 100. Therefore, we decided to set a threshold of κ > 100 for co-expression hub genes in this study (Fig. 1B). Moreover, using an informatics approach, we inspected the r values between genes that are known to co-localize within cells or in closely located cells. As shown in sample analyses of genes in the hypothalamus and the midbrain12,13,14,15, genes that are co-expressed in serotonergic neurons of the raphe nuclei, in other monoamine neurons of the midbrain and in magnocellular neurons or other functional neurons of the hypothalamic nuclei displayed r values higher than 0.85 (Fig. 1C).
To investigate the underlying biological features of the observed non-random co-expression patterns, we searched databases for biological pathway analysis and GO terms. We revealed that co-expressed hub genes were enriched in functions related to biological interactions (p < 0.001, Supplementary Table S1-sheet 1), and probes exhibiting larger numbers of κ were related to more GO terms of neuronal cell components (p < 0.01, Supplementary Table S1-sheet 2). Moreover, we found that GO terms accumulated in SETs of area marker genes that were implied in a previous study10 (Supplementary Table S2 as spreadsheets). These findings support the idea that the SETs formed exhibited biological relevance.
Framework for integrated analysis of graphs of co-expression networks and anatomical maps
To further investigate the biological significance of these SETs, we developed a framework for integrating co-expression network analysis through cross-validation with spatial maps of gene expression. In our maps, anatomical space is defined by gene expression. For example, brain areas that are rich in cholinergic neurons are defined by the marker gene choline acetyltransferase, Chat10. A significant result of the integrated analysis framework was that a broad range of gene expression patterns can be quantified in a whole-brain context and that the graphs of the co-expression networks that were generated with a stochastic algorithm16 represented functional brain regions, without adding any anatomical information, as shown in the graphs of genes that are expressed in distinct cell types in the hypothalamus and the midbrain (Fig. 1C).
In the following studies, the framework proceeded with statistical significance, as follows: genes with an expression variance were chosen based on a variable V that is a false discovery rate (FDR) of ANOVA analysis, representing the variance of expression densities in the orthogonal series of fractions compared to the variance in the biological replicate fractions (see the Methods). We set the FDR value at < 0.05 to choose genes with variable expression10. These genes were then employed in searches for the associated SETs. Genes that are selectively expressed within the same cells should exhibit high r-values, theoretically r = 1, and these genes therefore represented a SET. Then, the cell type-specific functions of the genes were revealed through accumulated GO terms or literature searches. Finally, the regions of gene expression were visualized in maps, along with the graphs of co-expression networks (r > 0.85). We focused on homeodomain transcription factors, many of which play important roles in embryonic patterning and cell differentiation17.
Lhx and co-expressed genes in distinct cell types
LIM-homeodomain transcription factors (Lhx) developmentally specify discrete types of neurons18. All 12 genes of the Lhx family were detected in the adult brain, and all except for Lmx1b were variably expressed (FDR < 0.05, Fig. 2A). Lhx1 and Lhx5 (Lhx1/5) and Lhx8/Isl1 were co-expressed, whereas the remainder of the genes were not co-expressed with each other and displayed non-redundant expression patterns. Four genes, Lhx1/2/5/6, were co-expression hubs; however, no SETs were found for five of the genes, Lhx3/4, Isl2 and Lmx1a/1b.
To demonstrate one advantage of this integrated analysis, we first mined genes in the SET of Lhx8 (also called Lhx7), which is an area marker gene for the caudate putamen (CPu)10. In total, 15 genes were included in the graph of the co-expression network for the SET, and five genes are known to be expressed and function in the CPu (Fig. 2B, the list of genes is provided in Supplementary Table S2: a spreadsheet for Lhx8)19,20,21,22,23. A subpopulation of cells in the adult CPu is composed of cholinergic neurons that develop as follows: post-mitotic progenitor cells expressing Nkx2-1 (a transcription factor also known as Titf1) begin migrating from the ventricular zone in the ventral part of the forebrain, and, during migration, the cells differentiate into mature neurons under the combined activity of Lhx8 and Isl1, which are associated with the suppression of Lhx619. In this study, the genes encoding these three transcription factors, Nkx2-1, Isl1 and Lhx8, were all included in this network graph, whereas Lhx6 was not.
Moreover, the expression densities that were observed in the maps could be accounted for by the developmentally programmed fate of the Nkx2-1-expressing cell lineage derived from the embryonic ventricular zone (Fig 2C). The moderately high expression of Nkx2-1 accompanied by the high expression of Lhx8/Isl1 was detected in the ventral part of the forebrain, which is rich in cholinergic neurons, as defined by the Chat expression. And the highest Nkx2-1 expression was observed in the area caudal to the olfactory bulb, where cells with neurogenesis potential in the adult brain that were derived from the ventricular zone are present24,25. We found that figures in of the open resource, GENSAT26, elicited Nkx2-1 expressing cells in these areas (Supplementary Fig. S3).
In addition, novel genes, including a predicted gene (Gm136) and a non-coding gene of unknown function (D830015G02), and Spint1, which regulates the proliferation and cell fate of neural progenitor cells27, were first revealed, to our knowledge, as highly expressed in these areas. We predict these genes to specifically function in the CPu and nearby areas.
We next investigated genes in the SETs of other Lhx family members, focusing on genes encoding morphogenetic transcription factors, morphogens and functional markers that are characteristic of cell functions, to determine whether we could obtain additional graphs of co-expression networks with distinct functions (Supplementary Table S3).
The Lhx1/5 proteins control Purkinje cell differentiation in the developing cerebellum28; Lhx1/5 were detected in the adult cerebellum in our maps, as shown in Figure 2A. We generated a graph of the co-expression network, which was composed of genes in the Lhx1/5 SETs (Fig. 3A). The graph included genes encoding molecules that are highly expressed in adult Purkinje cells, such as a morphogen, Wnt329 and a glutamate transporter, Slc1a630, in addition to genes encoding the Wnt receptor Fzd1, the transcription factor En2, which regulates the morphological and molecular patterning of the embryonic cerebellum31, the sodium/glucose co-transporter Slc5a1, which is regulated by En232, and the transcription factor Pax3, which is expressed in cells surrounding Purkinje cells, which presumably correspond to Bergmann glia in the adult cerebellum33. Because the gene expression in these cells was confirmed in the ISH figures of ABA (Supplementary Fig. S4A), we infer, based on the integrated analysis of anatomical maps and network graphs, that Fzd1 transduces the Wnt3 signal that is produced by Purkinje cells to the surrounding Bergmann-glial cells, wherein Pax3 is expressed. This local Wnt3/Fzd1/Pax3 signaling could regulate the differentiation of the glial cells, maintaining stem cell characteristics in the mature stage34 in relation to Purkinje cells. The Lhx1/5 SET contained more than 100 genes. Among these genes, we nominate Dao1 as a functional marker gene of this SET, in terms of Purkinje and Bergmann-glial cell interactions. Dao1 encodes an enzyme35 that is exclusively active in the mature cerebellum for the catabolization of a neuromodulator molecule, D-serine, which is produced in the Bergmann glia and modifies the glutamatergic synaptic plasticity of the Purkinje cells.
Lhx2 specifies neurons in the upper layer of the cerebral neocortex and suppresses the organization of the hippocampus36,37. The graph of the Lhx2 co-expression network included genes encoding a forkhead transcription factor, Foxg1, and a homeodomain transcription factor, Emx2 (Fig. 3B). These three genes are expressed across the dorsolateral wall of the embryonic forebrain, and each gene plays crucial roles in specifying the location of the progenitor cells of projection neurons in the cerebral neocortex38. In our maps, these three genes were localized to the dorsolateral areas of the adult cerebrum. The co-expression graph also contained a gene encoding a transcription factor, Nr2e1, which controls the division of progenitor cells to generate subsets of neurons in the upper layer neocortex of the cerebrum39: Lhx2 and Nr2e1 are known to cooperate in demarcating the eye field40. A form of muscarinic cholinergic receptor gene, Chrm1, was included in the network as a functional marker gene. Chrm1 is involved in a distinct aspect of brain function: object recognition memory with delay41.
Lhx6 is a determinant of GABAergic interneurons in the CPu and parts of the cerebral cortex and a determinant of GABAergic projection pathway in subcortical regions of the ventral forebrain19,42,43. The Lhx6 co-expression network contained genes encoding Foxo6 and Wnt2, which play roles in GABAergic neuron specification during forebrain development44 (Fig. 3B). Intriguingly, this network also included genes encoding a zinc finger transcription factor, Fezf2, which can be used to experimentally generate glutamatergic projection neurons directly from GABAergic precursors in the CPu45, as well as a T-box transcription factor, Tbr1, which restrains Fezf2 activity to the lower-layer neocortex, where glutamatergic projection neurons normally originate46,47. A nicotinic cholinergic receptor gene, Chrna5, which regulates GABAergic and glutamatergic transmission48, was also present in this network and was considered a functional-marker gene, similar to Chrm1 in the Lhx2 network. Moreover, the co-expression networks of Lhx2 and Lhx6 were linked through Tbr1, which is expressed in both the upper and lower layers of the neocortex because of its function in regulating the layer identity of postmitotic neurons49 (Supplementary Fig. S5).
Lhx9 was found to be co-expressed with a Wnt signaling mediator gene, Tcf7l27 (Fig. 3C); both were present in the caudal forebrain and in the dorsal midbrain in the expression maps. Lhx9 determines cell differentiation in the caudal forebrain by regulating Wnt signals in the zebrafish50. In the mouse brain, Tcf7l2 is expressed in subcortical regions, the thalamus and the dorsal midbrain, where neurons are specialized for the integration of diverse sources of sensory information51. Therefore, we hypothesize that Lhx9 could determine the fate of the neurons that participate in the sensory input integration via Wnt/Tcf7l2 signaling and that this gene network continues to function in the adult brain. Indeed, we found experimental evidence that supported this hypothesis in figures of GENSAT26 (Supplementary Fig. S6). Lhx9 expressing cells are present in selective neuron groups of visual and auditory input system and of the integration system52,53 in the adult mouse brain.
Pax and co-expressed genes in subsets of cell types
To test how broadly developmentally programmed co-expression networks appeared in the adult brain, we investigated the expression of genes encoding other homeodomain transcription factors: Pax/PRD/Arx and ANTP, including Hox, Nkx, Dlx, Emx, Irx and En (Supplementary Table S3).
The expression of genes in the Pax family is developmentally programmed54. All nine genes were detected using our platform, and their spatial expression patterns were comparable to the areas originating from the cells expressing these genes in embryonic stages33 (Fig. 4A). The informatics approach to the co-expression networks indicated potential roles for the genes in neurogenesis or plasticity in local environments, as Pax3/Wnt3/Fzd1 were described above.
Pax6 was detected in an area caudal to the olfactory bulb and in the cerebellum. The co-expression network of Pax6 contained genes encoding Wnt signaling molecules, including Wnt7a, which controls central nervous system vasculature differentiation55, Fzd756 and Dact1 (also known as Frodo), which is downstream of Wnt and is essential for optic and neural development57 (Fig. 4B). Indeed, Fzd7 contributes to self-renewal signaling in embryonic stem cells, similar to the role of Wnt in neuronal stem cells56,58. In addition, the inhibition of Wnt signaling induces Pax6, consequently promoting the differentiation of retinal precursor cells from neuronal progenitor cells in vitro59. These experimental findings allow us to conclude that we can select a set of genes that are related to stem/progenitor cell regulation wherein local Wnt7a signaling via Fzd7/Dact1 restrains Pax6 from inducing excessive neurogenesis. This scenario appears to explain the regulatory mechanism of neurogenesis that takes place in the olfactory bulb areas60 and in the cerebellum61 (Supplementary Fig. S7). Because no genes were common to the Pax6 and Pax3 SETs, these two co-expression networks, though both were in the cerebellum, seem to participate in different local biological systems (Supplementary Fig. S4).
A gene for Pax7, which regionalizes the embryonic dorsal part of the midbrain62 and persists there in mature superior collicular visual-related neurons63, was detected in the superior colliculus of the dorsal midbrain in our map and was selected the area marker gene10. Pax7 was co-expressed with Sox14 (Fig. 4C). Sox14 organizes the development of the visual system, including the superior colliculus, and orchestrates the diurnal cycle in response to light and dark conditions64. We suggest that Pax7 may also participate in the diurnal cycle regulation. Pax8 was present in the hindbrain, as discussed below.
Genes encoding other homeodomain transcription factors expressed in subsets of cell types
Of the 18 Nkx family genes, 15 were expressed in the adult brain (Fig. 5A). The gene encoding Nkx6-1, which participates in motor neurogenesis65, was co-expressed with motor neuron-specific genes that play a role in synaptic plasticity, such as Grin3b, which encodes an NMDA-type ionotropic glutamate receptor66, and Calcb, which encodes a motor neuron-derived astrocyte activation protein67. Nkx6-1/6-3 were co-expressed. Both genes were co-expressed with Hox genes that display a co-linear expression density pattern of a posterior to anterior gradient and function in the segmental organization of the ventral hindbrain68 (Fig. 5B). Pax8 also participated in this network. Nkx2-2/6-2 and Nkx2-9 were hypothesized to be expressed in non-neuronal cells in the adult brain because these genes were co-expressed with genes encoding oligodendroglial cell markers, such as Mbp/Edg869, and astrocyte markers, such as Klk670, respectively (Fig. 5C). Nkx5-1/5-2 were co-expressed in the ventro-caudal forebrain and the ventral hindbrain. Nkx2-1/3-1 are mentioned elsewhere. No SETs were identified for the other Nkx genes.
Arx, which is a gene encoding a transcription repressor that is expressed in developing and mature interneurons under the influence of the transcription factors Dlx1/271,72, showed a co-expression network that was composed of genes with a similar function to genes that are present in the Nkx6-1 network (Fig. 5B and 5D). Arx was co-expressed with Nkx3-1, and its network included Dlx1/2, as expected. Among the 6 Dlx genes, Dlx1/2/5/6 displayed an expression density pattern of an anterior to posterior gradient, which was analogous to that found for Hox genes. The Arx network contained a gene encoding another form of NMDA-type receptor, Grin2b, which is localized to the forebrain and controls synaptic plasticity73, and a gene encoding a neural dendrite-specific adhesion molecule, Icam5, which regulates synapse formation74, analogous to Calcb in the Nkx6-1 network.
Emx1, which is an area marker gene for the dorsal forebrain during developmental stages, is inducible through Wnt signaling75. Emx1 was localized to the dorso-lateral forebrain and co-expressed with Wnt9a. Emx1 was also co-expressed with esPRC2p48, which encodes a polycomb complex protein (also called E130012A19Rik) that regulates stem cell differentiation to glutamatergic neurons in the cerebral cortex and the hippocampus76. Emx2 was described above, and Emx3 was not detected. En1, Irx3, Lrx4 and Irx6 were expressed in the hindbrain and displayed unique patterns. Irx1/2/5 formed a co-expression network in the cerebellum.
Combinatorial expression of transcription factor genes and previously uncharacterized coding and non-coding genes
Among 1332 probes for genes of transcription factors, 1251 probes (94%) were detected at least in one fraction of this study. The mean value of variable I10 that represented the intensity medians for the probes in the fractions of the 1251 probes (18.44 +/− 1.77) were 40% lower than the mean value of I for the total probes (30.97 +/− 0.62). Among the examined genes encoding homeodomain transcription factors17, all 12 Lhx, 9 Pax, 1 Arx, 2 Emx, 6 Irx and 2 En, 30 of 39 Hox, 15 of 18 Nkx and 5 of 6 Dlx genes, totaling 82 of 95 genes (86.3%), were detected in the adult brain, although the mean value of I (3.27 +/− 1.22) was approximately 10% of the value for the total probes. This result indicated that the sensitivity of our system was sufficient to detect low expression genes.
The examined genes formed co-expression gene groups (Supplementary Table S3). The combinatorial expression of distinct gene members of homeodomain transcription factor families, including Lhx, Pax/PRD/Arx and ANTP, as well as other transcription factor families was repeatedly observed as modules that constructed parts of the co-expression network, which followed an inverse power law. Approximately 40% of the genes encoding the homeodomain transcription factors covering the entire anatomical context of the adult brain have been described here. For further examination, the SETs of all genes expressed in the brain are searchable, and the 3D expression maps can be viewed and downloaded from the ViBrism DB website. Then, network graphs can be depicted as described in the methods.
Previously uncharacterized genes were involved in the SETs of transcription factor genes. In particular, six genes displayed extremely high r (> 0.95, Table 1) to the co-expression hub genes Lhx1/2/6 and Arx. To the best of our knowledge, our framework provided the first visualization of the location of action of the six genes and provided information about their function in relation to co-expressed genes (Fig. 6). Except for 2010300C02Rik, which encodes a putative protein with a rat ortholog, these gene sequences are not conserved in other species and are non-protein coding; thus, the functions of the genes are unique to the rodent or mouse brain. In the network graphs, five genes were in a position that linked the co-expression gene groups: BM939341 linked Fezf2/Tbr1 to Lhx6; Gm11549 and 2010300C02Rik linked Fezf2/Tbr1 to Lhx2/Emx2/Foxg1/Nr2e1; and AI835086 and 3110039M20Rik linked Lhx2/Emx2/Foxg1/Nr2e1 to Arx/Dlx1/2. This linkage suggests that these previously uncharacterized genes play roles in coordinating functions of the gene groups that are repeatedly observed and developmentally conserved and that consequently participate in creating mouse-specific neural designs.
We introduced a framework for the integrated analysis of 3D gene expression maps and graphs of co-expression networks and demonstrated how this framework can be used to gain insights into the underlying biology of co-expression networks, which were found to follow an inverse power law. The integration of expression maps and co-expression networks, which is not based on archived information of selected key areas in the brain but based on our whole brain analysis results were demonstrated in this study. Consequently, unexpectedly wide varieties of combinatorially expressed genes, which are known to be involved in the regulation of cell differentiation during development, were also observed in the adult brain. The gene combinations on which we focused included family members of homeodomain transcription factors, Wnt signaling molecules, functional markers and uncharacterized genes. Distinct members of the families were expressed in particular cell types. Consequently, analogous combinations appeared as modules in a variety of cells, for example: Lhx2/Foxg1/Nr2e1/Tbr1/Chrm1 in the upper-player neurons are analogous to Lhx6/Foxo6/Fezf2/Tbr1/Chrna5 in the lower-layer and subcortical neurons of the cerebrum; Pax3/Wnt3/Fzd1 in Purkinje/Bergmann cells are analogous to Pax6/Wnt7a/Fzd7 in stem/progenitor cells; and Nkx6-1/Nkx6-3/Hox/Calcb/Grin3b in the cranial motor neurons are analogous to Arx/Nkx3-1/Dlx/Icam5/Grin2b in the cerebral interneurons (Supplementary Table S3). Moreover, each module appeared to display a particular biological function: the first example is a module operating in neural pathway formation, the second functions in stem cell regulation, and the third functions in synaptic plasticity. Finally, we revealed that mouse-specific non-coding genes were in positions linking the conserved co-expression gene networks. This linkage suggests characteristics of these genes that coordinate multiple gene groups and create mouse-specific neurogenesis designs.
Genetic modules constituting the robust structures of co-expression networks are widely observed in various species as a mechanism underlying networks that are characterized by a power law1,6,11; however, such studies are limited to pre-determined anatomical areas or are based on retrieved datasets. Our framework enables the analysis of co-expression gene networks throughout the brain, along with an anatomical overview. Networks in complex areas that could not be described in anatomical terms, for instance, the Lhx2 network in Chrm1-expressing cells that are associated with delayed memory formation, are detectable, and the anatomical locations of the networks can be compared with areas that are defined by area marker genes. These advantages enable the visualization of previously unrecognized robust expression patterns.
An advantage of the data, which this study is built upon, is provided by the straightforward quantification of gene expression densities using our tomographic technique, compared to the AGEA data produced with elaborate image processing of ISH-based 2 dimensional datasets of ABA. The expression densities of a large number of genes are measured in a fraction. Consequently, all the 3D expression maps are reconstructed into the same 3D anatomical space, and the expression similarities of any measured genes can be detected using r as a similarity measure in the fractions. There is no need for spatial registration of expression maps in one-by-one manner and pre-selection of 4,500 regionally expressed genes out of approximately 25,000 gene maps by image processing, which is performed in the ABA dataset analysis. Therefore, novel genes that have not been mapped with other methods have been included in our network analysis and their functions were predicted. The pairwise comparison method using r as an expression similarity measure for network analysis can be implemented in any numerical datasets. It may be applied to the numerical data of ABA and may produce precise co-expression results.
The mapping resolution of the tomography technique depends on the number of data points. In this dataset, only 61 fractions are used for the mapping, and consequently the anatomical resolution of the maps is not high. In 40 anatomical sub-regions, as we have shown in the previous paper10, the 3D distributions of the expression maps are comparable to the maps in ABA. However, very selective expression cannot be mapped properly because of the limitation of the present fraction number. The number limitation is biologically due to the amount of RNA extracted from the fraction materials. A 50-μm-thick fraction is sufficient to obtain RNA materials for an analysis in the present molecular measurement technique. Therefore, 3D maps of higher resolution than existing ABA (200 μm at a distance in the sagittal plane) can be produced, and co-expression can be analyzed there, using the transcriptome tomography approach.
We believe that our framework will contribute to obtaining a gene expression-based understanding of complex brain structures and functions. Gene expression densities measured in the “fractions” are used for creating both 3D expression maps and co-expression networks, so that, the maps and the networks are directly integrated without any additional experimental procedures. Varieties of combinations of genes that are observed with the framework suggest a broader potential for neurogenesis and the plasticity of the adult brain. Whole gene expression maps and the SET search are provided in ViBrism DB. Transcriptome tomography, together with the framework for performing co-expression searches using the ViBrism DB platform, represents an important tool for the systematic investigation of the molecular anatomy of key biological processes that involve uncharacterized genes.
All experiments on mice and their care were performed in accordance with the procedures that the RIKEN Regulations for Animal Experiments approved (approval ID: H19-1W009).
The tomography methodology was described in a previous study10 and in the Introduction and Supplementary Fig. S1 in this paper. Briefly, two types of data were obtained from 1-mm-thick tissue fractions of 8-week-old male C57BL/6J mice brains via sequential cross-sectioning (5-μm-thick cross-sectioning slices × 200 times sectioning per fraction): 1) gene expression density data measured with microarray and 2) each block-face image of the cross-sectioning planes. This process was performed in six brains and resulted in six sectioning series (two biological replicates of three orthogonal series) of 61 fractions in total.
The gene expression densities in the fractions were measured as the intensities detected with microarray probes (Whole Mouse Genome 012694, Agilent, CA), as described in the previous paper10. They were per-chip normalized using 75 percentile normalization, which was robust enough for the batch effect in the study using a few batches, and subjected to probe-based analysis (36,558 probe data points from the 61 fractions, hereafter referred to as fraction data): Supplementary Table S4-sheet 1 showed probe information and sheet 2 shows quality control index values of extracted RNAs and microarray analyses. These probes are sufficient to detect most genes (approximately 25,000 genes) and splicing variants. The microarray data discussed in this publication have been deposited in the NCBI Gene Expression Omnibus database77 and are accessible through GEO Series accession number GSE36408 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36408).
Fraction data analyses and statistics
Two sets of variables, I and V were calculated from the log-transformed fraction data10: briefly, I represented the intensity medians for the probes in the 61 fractions; mean +/−. sem of the indicated probes were shown; and the variable V, representing the variance of the probe intensity in the orthogonal series of fractions compared to the variance in the biological replicate fractions, was defined as FDR calculated through one-way ANOVA with multiple-testing Benjamini and Hochberg correction. FRD is a rate of false discovery of genes with expression variances of replicates as variably expressed genes (see Note in Supplementary Fig. S1). Another variable, κ, represents the number of similarly expressed probes to a target probe including itself (SET). The r of the intensity values in the 61 fractions between the probes was calculated and used as a correlation measure for the expression similarity. The applied threshold was r > 0.85, unless otherwise indicated. Power law fit lines and 95% confidence intervals of κ distributions are created with linear regression modeling of log scaled values. The hypergeometric p-values, with or without multiple-testing corrections, are shown for the GO analyses of the selected gene sets. The alpha level is 0.01 in all analyses.
Randomization of the fraction data to generate random expression patterns
To randomize the fraction data, the order of gene expression densities in fractions of a series were randomized for each probe using the Ruby programming language (http://www.ruby-doc.org/core-2.1.0/Random.html), on which a pseudo-random number generator is implemented using Mersenne Twister (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/mt.html). The density data in the paired fraction (see Note in Supplementary Fig. S1) were randomized in the same order in the series to be maintained as replicates in the randomized dataset. Consequently, in the statistical characteristics of the fraction data, variables I and V for the probes were not changed through the randomization process.
Graphs of co-expression networks at the indicated threshold of r were depicted with R version 3.0.078, using a stochastic, force-based graph layout algorithm described by Kamada-Kawai16. Biological pathways were analyzed using GeneSpring GX v12 software based on the open resource IntAct (http://www.ebi.ac.uk/intact/main.xhtml), together with Biopax-format data imported from Reactome (http://reactome.org/download/index.html) and the results of a PubMed search using a natural language processing algorithm. The biological meanings of the co-expression networks were manually curated. Probes for transcription factor genes were selected by association with GO:0003700, sequence specific transcription factor activity.
Anatomical map analysis
Gene expression maps were visualized using an 80% cut-off filter to reveal areas of high expression, unless otherwise indicated. The instructions for highlighting areas based on gene expression in 3D expression maps are provided in the Quick Manual for VCAT (downloadable from our website).
ISH of genes expressed in the mouse brain at postnatal stages
The spatial cellular expression patterns of the genes were analyzed by ISH histochemistry of male C57BL/6J mouse brains at postnatal stages of day 7 and 21. Sagittal sections of the brain were subjected to hybridization as previously described79. High resolution digital images of hybridized sections were used for this study.
We thank C. Armit (MRC Human Genetics Unit, University of Edinburgh, UK) for critical reading and help in manuscript editing, S. Onami (RIKEN Quantitative Biology Center, Japan) and S. Takagi (University of Tokyo, Japan) for suggestions regarding the network analyses and task force members of the International Neuroinformatics Coordinating Facility (INCF) Program on Digital Brain Atlasing in discussion of 2D-3D image analysis. This research was supported by funding from the RIKEN Strategic Programs for R&D (President's Discretionary Fund) to H.Y., Y.O., K.S. et, al.; JSPS KAKENHI Grant Number 25560428 and 26280110 to Y.O. and H.Y., and 23300137 and 258057 to T.F.; JST Acceleration Utilization of University IP Program to Y.O.; and RIKEN Neuroinformatics Japan Center to Y.O. and T.F.