Overcoming false-positive gene-category enrichment in the analysis of spatially resolved transcriptomic brain atlas data

Transcriptomic atlases have improved our understanding of the correlations between gene-expression patterns and spatially varying properties of brain structure and function. Gene-category enrichment analysis (GCEA) is a common method to identify functional gene categories that drive these associations, using gene-to-category annotation systems like the Gene Ontology (GO). Here, we show that applying standard GCEA methodology to spatial transcriptomic data is affected by substantial false-positive bias, with GO categories displaying an over 500-fold average inflation of false-positive associations with random neural phenotypes in mouse and human. The estimated false-positive rate of a GO category is associated with its rate of being reported as significantly enriched in the literature, suggesting that published reports are affected by this false-positive bias. We show that within-category gene–gene coexpression and spatial autocorrelation are key drivers of the false-positive bias and introduce flexible ensemble-based null models that can account for these effects, made available as a software toolbox.


S1.1 Within-category coexpression drives false-positive bias
We first aimed to understand why, under the ensemble of random spatial brain phenotypes (SBPs), the 'SBP-random' ensemble, RDSM (CFPR = 13%) has a higher category false-positive rate (CFPR) than ZA (0.07%). Region × gene expression matrices and gene × gene coexpression matrices are plotted for RDSM and ZA in Figs S3A and B, respectively. Compared to a representative random set of 40 genes, shown in Fig. S3C, we find a much more consistent spatial patterning of gene expression in the RDSM category, resulting in increased within-category (gene-gene) coexpression. This spatial coherency of expression patterning is consistent with genes associated with regulating dendritic-spine morphogenesis (RDSM) having a coordinated brain function that varies characteristically across brain regions. By contrast, GO categories that play a minimal or less-specific role in brain function, such as zymogen activation (ZA), exhibit noisier expression patterns and lower within-category coexpression (that is visually similar to that of a random set of genes).
To understand how these differences in within-category coexpression affect category falsepositive rates, we investigated the distribution of mean correlation coefficients between the expression profiles of genes in each category and the ensembles of randomized SBPs analyzed above. These category-score distributions are plotted for SBP-random and SBP-spatial ensembles in Figs S3D and E, respectively. A category with wider distributions than a random set of genes (the null comparison used in GSEA) will have a greater probability of obtaining a significant correlation to that ensemble of phenotypes. First we note that, consistent with both SBP-random and SBP-spatial ensembles containing no information about gene expression, all category-score distributions are symmetric about zero.
Under random phenotypes (Fig. S3D), RDSM has a wider distribution of category scores than ZA; it is more likely to exhibit a higher correlation to a random SBP. This widening is driven by the increased coexpression of RDMS genes, such that a chance correlation between a 1 random SBP and any single RDMS gene is amplified through a similar correlation with many other genes in the category. By contrast, categories like ZA contain genes with lower gene-gene coexpression, such that a chance correlation between an SBP and a given ZA gene is more likely to be cancelled out by a chance correlation in the opposite direction for another gene, driving category scores towards zero and resulting in a narrower category-score distribution.
In estimating category p-values, conventional GSEA compares the category score to that of a random set of the same number of genes, shown gray in Fig. S3D (where we have performed many randomizations of sets of 40 genes so as not to bias towards any particular random set). Due to the high coexpression of RDSM genes, this occurs for approximately 13% of SBP-random phenotypes, but for ZA, which has a similar coexpression structure as random genes, this occurs for just 0.07% of random SBPs.

S1.2 The role of spatial autocorrelation
Relative to the SBP-random ensemble (of purely random phenotypes), the CFPR of RDSM increased under the SBP-spatial ensemble: 13% → 28%, whereas the CFPR for ZA decreased: 0.07% → 0.02%. Examining the category-score distributions in Fig. S3E, we first note that they are all much wider than for the SBP-random ensemble (Fig. S3D), including for random sets of genes. This is due to the predominance of spatial autocorrelation in the expression patterns of individual genes: a gene that exhibits a strongly autocorrelated expression map is more likely to be strongly correlated to a spatially autocorrelated SBP than a random spatial map. Relative to random genes, we see a wider category-score distribution for RDSM, and a narrower distribution for ZA, consistent with their CFPRs. Imposing the constraint of spatial autocorrelation (i.e., SBP-rand → SBP-spatial) can thus either increase a GO category's CFPR (if it exhibits a more similar spatial autocorrelation structure to the SBP-spatial ensemble) or decrease it (if it exhibits a less similar spatial autocorrelation structure).

S2 Phenotype Enrichment
Full GO enrichment results for all phenotypes across all three null models are provided as data files in the data repository accompanying this article [1].
For human cortex: • Node structural connectivity betweenness, B, EnrichmentThreeWays_human_cortex_betweenness.csv • Node structural connectivity degree, k, EnrichmentThreeWays_human_cortex_degree.csv And for mouse (whole-brain and cortex), where all files share the EnrichmentThreeWays_mouse prefix; suffixes are given below:

S3 Literature Survey
Enrichment on the basis of spatial patterns of expression has not been performed consistently in the existing literature. Studies have processed the data differently (including substantially different methods for normalizing and filtering genes), and performed the enrichment differently (using different software packages, annotation systems, and including different sets of categories for enrichment). Our survey focuses on studies that have reported gene-set enrichment for Biological Processes in the Gene Ontology (GO) [2].

French and Pavlidis
3. Ji et al. [7]: Tabulated enrichment categories that appeared often across many analyses using 4084 coronal section genes: all brain structures and different ways of measuring connectivity. Data taken from Table 1. 4. Fakhry and Ji [8]: 4084 (coronal-section) genes that are predictive of voxel-level brain connectivity using ORA. Data taken from Fig. 5, summarizing each GO category as the number of injection structures for which it was significant.
5. Rubinov et al. [9]: 3380 genes (assayed multiple times) scored by partial least squares for nodal participation metrics, doing enrichment on the genes in the top 25% of positive weights. Data taken from Table S3. 6. Fulcher and Fornito [10]: 17 642 genes scored by differences in gene coexpression contribution (GCC) scores using GSR using ermineJ [5]. Two GSEA were performed, for: (i) connected versus unconnected pairs of brain areas (data taken from Table S1), (ii) rich and feeder connections versus peripheral connections (data taken from Table S5).
7. Mills et al. [11]: 3079 (coronal-section) genes, enrichment for processes showing a strong relationship between CGE and functional connectivity (FC) using ORA (ermineJ [5]). Data taken from Table 3. 8. Ko et al. [12]: 170 neuron-, 44 oligodendrocyte-, and 50 astrocyte-specific genes for the coronal plane (gene sets taken from [13] using a 10-fold threshold, and including only genes that could be matched to AMBA data). Enrichment performed on each gene set, with results summarized in the text (Page 2).

S3.2 Allen Human Brain Atlas
The following studies report the results of GSEA using data from the Allen Human Brain Atlas (AHBA) [ (Table S5), (iii) brain-wide functional connectivity association study of autism (Table S7), and (iv) chronic schizophrenia (Table S8).
10. Kuncheva et al. [29]: 16 906 pre-selected genes, enrichment on clusters of spatial expression networks (SENs) for GO biological process (uncorrected threshold, p < 0.001) with categories reduced using REVIGO. Full results not provided; summaries taken from text.
11. Ritchie et al. [30]: 13 384 genes, enrichment on spatial correlation (Spearman's ρ) between expression patterns and the T1w:T2w ratio, using the AUROC method, ranked genes by correlation, including GO categories with between 10 and 200 genes after intersection with AHBA data (6885 GO categories). Data taken as p FWER for biological processes: negative correlations ( Table 2) and positive correlations (Table 3).
12. Diez and Sepulcre [31]: 3719 neural genes (determined from browsing AmiGO), enrichment on correlation between stepwise functional connectivity maps and cortical expression profiles, focusing a priori on brain-related categories. Enrichment performed as overrepresentation analysis on biological processes using PANTHER 13.1 at a threshold FDR < 0.005. Data taken from Supplementary Tables 3-6 as FDR-corrected p-values.
13. Wen et al. [32] analyzed for gene modules that are correlated to R2t * using ToppGene, with additional testing using DAVID. We noted GO Biological Process categories with p < 0.05 from the top 30 Annotation Clusters (up to an enrichment score of 1.27) from Dataset S3.
14. Liu et al. [33] analyzed how dynamic brain networks (analyzed from the viewpoint of the chronnectome) are spatially configured, and how these spatial maps are associated with gene transcription. We took data from 16. Anderson et al. [35] analyzed enrichment for genes highly correlated to interneuron markers PVALB and SST using ToppGene. We took GO:BP data from Table 1. 17. Betzel et al. [36] used GOrilla to understand gene subsets that exhibit correlated expression patterns that are most strongly related to ECoG FC, providing results in frequency bands 1 − 4 Hz (biological process results taken from Table S5) and 4 − 8 Hz (biological process results taken from Table S7).
18. Meijer et al. [37] used GOrilla to find genes that are differentially expressed genes in the stress network (brain areas activated by stress in individuals with low or high stress sensitivity). Data was taken from Table S4. 19. Romero-Garcia et al. [38] performed PLS to find genes associated with changes in cortical thickness in autism, using Enrichr for enrichment in GO biological processes. Data was taken from the main text.
20. Liu et al. [39] investigated how emotion regulation and memory control related fMRI task activation maps correlate with gene expression. 1061 'inhibition-related genes' are correlated with all four tasks (memory control, emotion regulation, stop-signal, and go/nogo). Enrichment done using GOATOOLS. Data taken from Table S3. 21. Grothe et al. [40] analyzed differentially-expressed gene sets in Alzheimer's disease vulnerable brain regions using GSEA (gene set enrichment analysis) using all 20 737 genes (1036 GO-based gene sets were included). Enrichment results for GO biological processes were taken from Table 3 (differentially expressed gene sets in neurodegeneration-vulnerable brain regions).
22. Vidal-Pineiro et al. [41] investigated transcriptional patterns related to cortical thinning across the lifespan. GO-term enrichment was performed with VisuaL Annotation Display (VLAD). Data was taken from Table S1. 23. Yao et al. [42] proposed a method to perform enrichment by jointly considering gene sets (GS) and brain circuits (BC) to examine if a GS-BC pair is enriched in a list of gene-neuroimaging quantitative traits (QT, such as the average amyloid deposition). Enrichment results for GO biological processes were taken from Table 3.   Figure S2: Across five equally-spaced bins of CFPR, bars show the percentage of GO categories that have been reported as significant in the literature survey of A human, and B mouse. The larger number of human studies allowed us to distinguish between GO categories reported in two, or three or more studies; in mouse we distinguish between one, or two or more studies. We plot expression (brain area × gene) and coexpression (gene × gene) heat maps for two example GO categories in the mouse brain:

S4 Supplementary Figures
A 'regulation of dendritic spine morphogenesis' (40 genes); and B 'zymogen activation' (42 genes); as well as C a random set of 40 genes, for comparison. Each gene's expression is normalized (low to high) for visualization purposes. Genes annotated to 'regulation of dendritic spine morphogenesis' display a more characteristic spatial patterning and hence have higher coexpression. Distributions of each category's score (mean correlation between the genes in that category and a phenotype) across an ensemble of null phenotypes, are plotted as violin plots for: D the SBP-random ensemble of random-number phenotypes [cf. Fig. 2A(ii)], and E the SBP-spatial ensemble of random spatially autocorrelated phenotypes [cf. Fig. 2A(iii)]. The mean of each distribution is annotated with a horizontal line. Distance scale (mm) Figure S4: CFPR (SBP-spatial) > CFPR (SBP-rand) is greatest for GO categories with strong spatial autocorrelation (R 2 exp ) and a similar distance scale of autocorrelation. For A human, and B mouse, we show a distribution of fitted distance scales, λ, across all GO categories. For each category, this was estimated from an exponential fit to correlated gene expression (of a given category of genes) as a function of distance. The global value, d 0 (obtained from including all genes in the correlated gene expression calculation, and used to construct the SBP-spatial ensemble) is annotated in the upper right of the distribution plot. Across equiprobable bins of R 2 exp and for a given distance range, upper plots show the percentage of GO categories in each bin that display an increase in CFPR under the SBP-spatial null ensemble relative to the SBP-random null ensemble (as Fig. 3B). 8 Table S1: GO categories with high category false-positive rates (CFPRs) are predominantly related to neuronal and metabolic biological function in mouse and human. We list selected GO categories with amongst the highest CFPRs across mouse and human, across ensembles of spatially independent random phenotypes (SBP-random) and spatially autocorrelated random phenotypes (SBP-spatial). CFPR is listed for SBP-random [and for SBP-spatial in brackets]. A full ordered list is in Supplementary File CFPRTable.csv.