Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A data-driven framework for mapping domains of human neurobiology

Abstract

Functional neuroimaging has been a mainstay of human neuroscience for the past 25 years. Interpretation of functional magnetic resonance imaging (fMRI) data has often occurred within knowledge frameworks crafted by experts, which have the potential to amplify biases that limit the replicability of findings. Here, we use a computational approach to derive a data-driven framework for neurobiological domains that synthesizes the texts and data of nearly 20,000 human neuroimaging articles. Across multiple levels of domain specificity, the structure–function links within domains better replicate in held-out articles than those mapped from dominant frameworks in neuroscience and psychiatry. We further show that the data-driven framework partitions the literature into modular subfields, for which domains serve as generalizable prototypes of structure–function patterns in single articles. The approach to computational ontology we present here is the most comprehensive characterization of human brain circuits quantifiable with fMRI and may be extended to synthesize other scientific literatures.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Approach to data-driven ontology.
Fig. 2: Top-performing solutions of the data-driven framework.
Fig. 3: Approach to mapping expert-determined frameworks for brain function (RDoC) and mental illness (DSM).
Fig. 4: Data-driven framework of brain functions related to expert-determined frameworks for brain function (RDoC) and mental illness (DSM).
Fig. 5: Mental functions defined in a data-driven manner have more reproducible links with locations of brain activity.
Fig. 6: The data-driven framework partitions the neuroimaging literature into modular subfields, for which domains are generalizable representations of brain circuits and mental functions.

Similar content being viewed by others

Data availability

Data not subject to restrictions can be accessed at http://github.com/ehbeam/neuro-knowledge-engine. This repository contains data generated from the corpus of 18,155 human neuroimaging articles, including matrices of the terms and brain structures reported in each document, as well as GloVe25 embeddings trained on the expanded neuroimaging and psychiatric corpora. Due to copyright restrictions, article PDF files and extracted texts have not been made publicly available. Article metadata including PMIDs are provided so that the corpus contents can be retrieved from PubMed. Subsets of the brain coordinate data were previously made available online by BrainMap (http://www.brainmap.org/software.html) and Neurosynth (http://github.com/neurosynth/neurosynth-data).

Code availability

The code used to generate and assess knowledge frameworks is available at http://github.com/ehbeam/neuro-knowledge-engine. The code supporting the interactive viewer for the data-driven framework can be accessed at http://github.com/ehbeam/nke-viewer.

References

  1. Bard, J. B. L. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nat. Rev. Genet. 5, 213–222 (2004).

    Article  CAS  PubMed  Google Scholar 

  2. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Alterovitz, G. et al. Ontology engineering. Nat. Biotechnol. 28, 128–130 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Price, C. J. & Friston, K. J. Functional ontologies for cognition: the systematic definition of structure and function. Cogn. Neuropsychol. 22, 262–275 (2005).

    Article  PubMed  Google Scholar 

  5. Nuzzo, R. How scientists fool themselves – and how they can stop. Nature 526, 182–185 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Lindquist, K. A., Satpute, A. B., Wager, T. D., Weber, J. & Barrett, L. F. The brain basis of positive and negative affect: evidence from a meta-analysis of the human neuroimaging literature. Cereb. Cortex 26, 1910–1922 (2016).

    Article  PubMed  Google Scholar 

  7. Liu, X., Hairston, J., Schrier, M. & Fan, J. Common and distinct networks underlying reward valence and processing stages: a meta-analysis of functional neuroimaging studies. Neurosci. Biobehav. Rev. 35, 1219–1236 (2011).

    Article  PubMed  Google Scholar 

  8. Wager, T. D., Jonides, J. & Reading, S. Neuroimaging studies of shifting attention: a meta-analysis. NeuroImage 22, 1679–1693 (2004).

    Article  PubMed  Google Scholar 

  9. Siegel, E. H. et al. Emotion fingerprints or emotion populations? A meta-analytic investigation of autonomic features of emotion categories. Psychol. Bull. 144, 343–393 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Redick, T. S. & Lindsey, D. R. B. Complex span and n-back measures of working memory: a meta-analysis. Psychon. Bull. Rev. 20, 1102–1113 (2013).

    Article  PubMed  Google Scholar 

  11. Binder, J. R., Desai, R. H., Graves, W. W. & Conant, L. L. Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cereb. Cortex 19, 2767–2796 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Insel, T. et al. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. Am. J. Psychiatry 167, 748–751 (2010).

    Article  PubMed  Google Scholar 

  13. Stephan, K. E. et al. Charting the landscape of priority problems in psychiatry, part 1: classification and diagnosis. Lancet Psychiatry 3, 77–83 (2016).

    Article  PubMed  Google Scholar 

  14. Fox, P. T. & Lancaster, J. L. Mapping context and content: the BrainMap model. Nat. Rev. Neurosci. 3, 319–321 (2002).

    Article  CAS  PubMed  Google Scholar 

  15. Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C. & Wager, T. D. Large-scale automated synthesis of human functional neuroimaging data. Nat. Methods 8, 665–670 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage 31, 968–980 (2006).

    Article  PubMed  Google Scholar 

  17. Diedrichsen, J., Balster, J. H., Cussans, E. & Ramnani, N. A probabilistic MR atlas of the human cerebellum. NeuroImage 46, 39–46 (2009).

    Article  PubMed  Google Scholar 

  18. Poldrack, R. A. et al. The Cognitive Atlas: toward a knowledge foundation for cognitive neuroscience. Front. Neuroinform. 5, 17 (2011).

  19. Poldrack, R. A. Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron 72, 692–697 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Schröter, M., Paulsen, O. & Bullmore, E. T. Micro-connectomics: probing the organization of neuronal networks at the cellular scale. Nat. Rev. Neurosci. 18, 131–146 (2017).

    Article  PubMed  Google Scholar 

  21. Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186–198 (2009).

    Article  CAS  PubMed  Google Scholar 

  22. Kragel, P. A. et al. Generalizable representations of pain, cognitive control, and negative emotion in medial frontal cortex. Nat. Neurosci. 21, 283–289 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Wang, X. et al. Representational similarity analysis reveals task-dependent semantic influence of the visual word form area. Sci. Rep. 8, 3047 (2018).

  24. von Luxburg, U., Williamson, R. C. & Guyon, I. Clustering: science or art? JMLR: Workshop Conf. Proc. 27, 65–79 (2012).

    Google Scholar 

  25. Pennington, J., Socher, R. & Manning, C. GloVe: global vectors for word representation. Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1532–1543 (2014).

  26. McCoy, T. H. et al. High throughput phenotyping for dimensional psychopathology in electronic health records. Biol. Psychiatry 83, 997–1004 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Kessler, R. C. et al. Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch. Gen. Psychiatry 62, 617–627 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Contractor, A. A. et al. Latent profile analyses of posttraumatic stress disorder, depression and generalized anxiety disorder symptoms in trauma-exposed soldiers. J. Psychiatr. Res. 68, 19–26 (2015).

    Article  PubMed  Google Scholar 

  29. Williams, L. M. Precision psychiatry: a neural circuit taxonomy for depression and anxiety. Lancet Psychiatry 3, 472–480 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Russell, J. A. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 1161–1178 (1980).

    Article  Google Scholar 

  31. Barrett, L. F. The theory of constructed emotion: an active inference account of interoception and categorization. Soc. Cogn. Affect. Neurosci. 12, 1833 (2017).

  32. Kornblum, S., Hasbroucq, T. & Osman, A. Dimensional overlap: cognitive basis for stimulus-response compatibility – a model and taxonomy. Psychol. Rev. 97, 253–270 (1990).

    Article  CAS  PubMed  Google Scholar 

  33. Corbetta, M. Frontoparietal cortical networks for directing attention and the eye to visual locations: identical, independent, or overlapping neural systems? Proc. Natl Acad. Sci. USA 95, 831–838 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. McCoy, T. H. et al. Genome-wide association study of dimensional psychopathology using electronic health records. Biol. Psychiatry 83, 1005–1011 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Drysdale, A. T. et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat. Med. 23, 28–38 (2017).

    Article  CAS  PubMed  Google Scholar 

  36. Janak, P. H. & Tye, K. M. From circuits to behaviour in the amygdala. Nature 517, 284–292 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Cottaris, N. P. & De Valois, R. L. Temporal dynamics of chromatic tuning in macaque primary visual cortex. Nature 395, 896–900 (1998).

    Article  CAS  PubMed  Google Scholar 

  38. Salmelin, R., Hari, R., Lounasmaa, O. V. & Sams, M. Dynamics of brain activation during picture naming. Nature 368, 463–465 (1994).

    Article  CAS  PubMed  Google Scholar 

  39. Gutschalk, A., Patterson, R. D., Scherg, M., Uppenkamp, S. & Rupp, A. Temporal dynamics of pitch in human auditory cortex. NeuroImage 22, 755–766 (2004).

    Article  PubMed  Google Scholar 

  40. Menon, V. & Uddin, L. Q. Saliency, switching, attention and control: a network model of insula function. Brain Struct. Func. 214, 655–667 (2010).

    Article  Google Scholar 

  41. van den Heuvel, M. P. & Sporns, O. Network hubs in the human brain. Trends Cogn. Sci. 17, 683–696 (2013).

    Article  Google Scholar 

  42. McTeague, L. M. et al. Identification of common neural circuit disruptions in cognitive control across psychiatric disorders. Am. J. Psychiatry 174, 676–685 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Eisenberg, I. W. et al. Uncovering the structure of self-regulation through data-driven ontology discovery. Nat. Commun. 10, 2319 (2019).

  44. Bolt, T. et al. Ontological dimensions of cognitive-neural mappings. Neuroinformatics 18, 451–463 (2020).

    Article  PubMed  Google Scholar 

  45. Bertolero, M. A., Yeo, B. T. T., Bassett, D. S. & D’Esposito, M. A mechanistic model of connector hubs, modularity and cognition. Nat. Hum. Behav. 2, 765–777 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Ioannidis, J. P. A., Fanelli, D., Dunne, D. D. & Goodman, S. N. Meta-research: evaluation and improvement of research methods and practices. PLoS Biol. 13, e1002264 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Proc. Syst. 2016, 4349–4357 (2016).

    Google Scholar 

  48. Voytek, J. B. & Voytek, B. Automated cognome construction and semi-automated hypothesis generation. J. Neurosci. Methods 208, 92–100 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Yarkoni, T. Automated Coordinate Extractor (ACE) (GitHub, 2015).

  50. Lancaster, J. L. et al. Bias between MNI and Talairach coordinates analyzed using the ICBM-152 brain template. Hum. Brain Mapp. 28, 1194–1205 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57, 289–300 (1995).

    Google Scholar 

Download references

Acknowledgements

We thank D. McFarland and S. Baccus for discussing versions of the approaches and figures presented here. Research reported in this publication was supported by the National Institute of Mental Health of the National Institutes of Health under award number DP1 MH116506 to A.E. and award number F30 MH120956 to E.B. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

E.B. conceptualized the structure of the meta-analysis, collected the data, conducted the computational and statistical analyses, and composed the manuscript. C.P. provided guidance on the natural language processing and deep learning approaches, including their text descriptions. R.A.P. and A.E. advised the overall study design and provided input to the manuscript.

Corresponding author

Correspondence to Amit Etkin.

Ethics declarations

Competing interests

A.E. holds equity in Akili Interactive, Mindstrong Health, and Alto Neuroscience, and receives salary from Alto Neuroscience. The remaining authors declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks Jessica A. Turner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Corpora of human neuroimaging articles.

a, Articles reporting locations of activity in the human brain in standard MNI or Talairach space. Article metadata and coordinates were curated first from BrainMap (n = 3,346), then from Neurosynth (n = 12,676), then by deploying the Automated Coordinate Extractor (n = 2,133). b, A comprehensive corpus of human neuroimaging articles served as the basis for a computational linguistics approach to selecting mental function terms for the RDoC framework. Articles were retrieved in response to a PubMed query (Supplementary Table 3, top) and combined with those reporting coordinate data. c, A corpus of human neuroimaging articles enriched with studies addressing psychiatric illness served as the basis for selecting mental function and dysfunction terms for the DSM framework. As before, articles were retrieved through a PubMed query (Supplementary Table 3, bottom) and combined with those reporting coordinate data.

Extended Data Fig. 2 Data-driven solutions for k = 2 to 5 using logistic regression classifiers to select function terms.

Domains generated at lower values of k than selected through the optimization procedure detailed in Fig. 1a.

Extended Data Fig. 3 Data-driven solutions for k = 7 to 10 using logistic regression classifiers to select function terms.

Domains generated at higher values of k than selected through the optimization procedure detailed in Fig. 1a. Additional framework solutions up to k = 50 are visualized at http://neuro-knowledge.org.

Extended Data Fig. 4 Neural network approach to data-driven ontology generation.

The procedure in Fig. 1a was repeated using neural network classifiers in place of logistic regression. Neural network classifiers were comprised of 8 fully connected layers and fit with learning rate = 0.001, weight decay = 0.001, neurons per layer = 100, dropout probability = 0.1 (first 7 layers), and batch size = 1,024. In Step 3, neural networks were trained over 100 epochs to predict term and structure occurrences within domains. In Step 4, neural networks were trained over 500 epochs to predict domain term list and circuit occurrences. a, Validation set ROC-AUC plotted for forward inference, reverse inference, and their average. b, Data-driven solution for k = 6. Term size is scaled to frequency in the corpus of 18,155 articles with activation coordinate data. The number of terms per domain was selected in Step 3 to maximize neural network performance in the validation set. Brain maps show structures included in each circuit as a result of clustering by PMI-weighted co-occurrences with function terms. c, Article partitioning based on maximal similarity to terms and structures in domain prototypes visualized by multidimensional scaling. d, Modularity was assessed by comparing the mean Dice distance of function and structure occurrences of articles between domains versus within domains. Observed values are colored by domain; null distributions in gray were computed by shuffling distance values across article partitions over 1,000 iterations. e, Generalizability was assessed by Dice similarity of each domain’s ‘prototype’ vector of function terms and brain structures with the occurrences of terms and structures in each article of the domain’s partition. Observed values are colored by domain; null distributions in gray were computed by shuffling terms and structures in each prototype over 1,000 iterations.

Extended Data Fig. 5 Framework generated from mental function terms and subsequently mapped to brain circuits.

To control for the contribution of brain coordinate data, the framework was rederived solely from mental function terms. Mental function terms were clustered by k-means according to PMI-weighted co-occurrences in the training set of 12,708 articles. The top 25 function terms were assigned to each domain by rpb of binarized occurrences with the centroid of occurrences across ‘seed’ terms from clustering. The number of terms per domain and name for each domain were determined as before. Circuits were mapped from PPMI of brain structures with the centroid of domain terms (FDR < 0.01). a, Domains are visualized for k = 6, the same dimensionality as the data-driven framework in the main text. Term size is scaled to rpb with the centroid of seed terms. Term-based domains are linked to the RDoC and DSM domains illustrated in Fig. 4. Links between domains were computed across the corpus of 18,155 articles by Dice similarity of mental function terms and brain structures (FDR < 0.05 based on permutation testing over 10,000 iterations). The Dice similarity of links with RDoC and DSM frameworks across the corpus is shown for b, the data-driven framework based on brain coordinates and mental function terms (as in Fig. 4), and c, the framework based only on terms. Dice similarity with d, RDoC and e, the DSM was macro-averaged across domains, and a one-sided bootstrap test assessed the difference in means between the data-driven frameworks. The term-based framework was more similar to RDoC than the framework also based on coordinates (99.9% CI=[0.022, 0.098]), and to the DSM (95% CI=[0.001, 0.034]). Bootstrap distributions were computed by resampling function terms and brain structures over 10,000 iterations.

Extended Data Fig. 6 Between-framework comparisons with k = 9 data-driven domains.

To control for dimensionality in comparisons with the DSM, analyses were repeated with the k = 9 solution of the data-driven framework. Differences between framework pairs were assessed by two-sided bootstrap tests. With k = 9 in place of k = 6 domains, no differences in the data-driven, RDoC, and DSM rankings were observed. a, Reverse inference ROC-AUC in the test set was higher for the k = 9 data-driven framework than both RDoC (99.9% CI of the difference=[0.010, 0.066]) and the DSM (99.9% CI of the difference=[0.029, 0.102]). b, ROC curves of the k = 9 reverse inference classifiers. c, ROC-AUC of the k = 9 reverse inference classifiers. d, Forward inference ROC-AUC in the test set was higher for the k = 9 data-driven framework than both RDoC (99.9% CI of the difference=[0.027, 0.052]) and the DSM (99.9% CI of the difference=[0.061, 0.095]). e, ROC curves of the k = 9 forward inference classifiers. f, ROC-AUC of the k = 9 forward inference classifiers. Articles were partitioned into the the k = 9 data-driven domains within the g, discovery set (n = 12,708) and h, replication set (n = 5,447). i, Domain-averaged modularity (left panels) was higher for the k = 9 data-driven framework than for the k = 6 solution (99.9% CI of the difference = [0.007, 0.016] discovery, [0.009, 0.024] replication), RDoC (99.9% CI of the difference = [0.046, 0.053] discovery, [0.045, 0.059] replication), and the DSM (99.9% CI of the difference = [0.011, 0.022] discovery, [0.007, 0.026] replication). j, Domain-averaged generalizability (left panels) was higher for the k = 9 data-driven framework than for the DSM (99.9% CI of the difference = [0.043, 0.160] discovery, [0.039, 0.164] replication). Observed values for the k = 9 domains in the i-j right panels were compared against null distributions generated by shuffling over 1,000 iterations (* FDR < 0.001).

Extended Data Fig. 7 Forward inference classification with logistic regression.

a, Logistic regression classifiers were trained to predict whether coordinates were reported within brain structures based on the occurrences of mental function terms in full texts. Classifier features included term occurrences thresholded by mean frequency across the corpus, then the mean frequency of terms in each domain. Activation coordinate data were mapped to 118 structures in a whole-brain atlas. Training was performed in 70% of articles (n = 12,708), hyperparameters were tuned on a validation set containing 20% of articles (n = 3,631), then classifiers were evaluated in a test set containing 10% of articles (n = 1,816). Plots are colored by the domain to which structures were assigned in the data-driven framework, and by the domain with highest PPMI for the RDoC and DSM frameworks. Test set ROC curves are shown for b, the data-driven framework, c, RDoC, and d, the DSM. e-g, For each brain structure, the significance of the test set ROC-AUC was determined by a one-sided permutation test comparing the observed value to a null distribution, and the p value was FDR-corrected for multiple comparisons (* FDR < 0.001). Observed test set values are shown with solid lines. Null distributions (gray) were computed by shuffling true labels over 1,000 iterations. Bootstrap distributions (colored) were computed by resampling articles in the test set with replacement over 1,000 iterations. h, The difference in mean ROC-AUC was assessed for each framework pair by a two-sided bootstrap test. The data-driven framework had higher ROC-AUC than both RDoC (99.9% CI of the difference = [0.020, 0.049]) and the DSM (99.9% CI of the difference = [0.055, 0.091]). RDoC had higher ROC-AUC than the DSM (99.9% CI of the difference = [0.024, 0.058]). Solid lines denote means of the bootstrap distributions obtained by macro-averaging across brain structure classifiers. i-j, Difference in ROC-AUC between the data-driven and expert-determined frameworks. Maps were thresholded to show differences with FDR < 0.001 based on permutation testing.

Extended Data Fig. 8 Reverse inference classification with neural networks.

Neural network classifiers were trained to perform reverse inference, using brain activation coordinates to predict occurrences of mental function terms grouped by domains shown in Extended Data Fig. 4. Classification models comprised 8 fully connected (FC) layers, all with ReLU activation functions except the output layer which was activated by a sigmoid. The optimal learning rate, weight decay, number of neurons per layer, and dropout probability were determined for each framework through a randomized grid search. ROC curves are shown for the test set performance of classifiers with mental function features defined by b, the data-driven framework, c, RDoC, and d, the DSM. e-g, For each domain, the significance of the test set ROC-AUC was determined by a one-sided permutation test comparing the observed value to a null distribution, and the p value was FDR-corrected for multiple comparisons (* FDR < 0.001). Observed values in the test set are shown with solid lines. Null distributions (gray) were computed by shuffling true labels for term list scores over 1,000 iterations; the 99.9% CI is shaded, and distribution means are shown with dashed lines. Bootstrap distributions of ROC-AUC (colored) were computed by resampling articles in the test set with replacement over 1,000 iterations. h, Differences in bootstrap distribution means were assessed for each framework pair. While there were no differences between frameworks at the 99.9% confidence level, the data-driven framework had higher ROC-AUC than RDoC at the 99% confidence level (99% CI of the difference = [0.007, 0.050]), and higher ROC-AUC than the DSM at the 95% confidence level (95% CI of the difference = [0.0003, 0.049]). Solid lines denote bootstrap distribution means.

Extended Data Fig. 9 Forward inference classification with neural networks.

a, Neural network classifiers were trained to perform forward inference, using function term occurrences grouped by the domains in Extended Data Fig. 4 to predict brain activation coordinates. Forward inference classifiers were optimized over a grid search with the same hyperparameter values as reverse inference classifiers in Extended Data Fig. 8. ROC curves are shown for test set performance of classifiers with mental function features defined by b, the data-driven framework, c, RDoC, and d, the DSM. Plots are colored by the domain assignment for structures in the data-driven framework, and by the domain with the highest PPMI for the structure in RDoC and DSM frameworks. e-g, For each brain structure, the significance of the test set ROC-AUC was determined by a one-sided permutation test comparing the observed value to a null distribution, and the p value was FDR-corrected for multiple comparisons (* FDR < 0.001). Observed values in the test set are shown with solid lines. Null distributions (gray) were computed by shuffling true labels for term list scores over 1,000 iterations; the 99.9% CI is shaded, and distribution means are shown with dashed lines. Bootstrap distributions of ROC-AUC (colored) were computed by resampling articles in the test set with replacement over 1,000 iterations. h, Differences in bootstrap distribution means were assessed for each framework pair. The data-driven framework had higher ROC-AUC than both RDoC (99.9% CI of the difference=[0.014, 0.046]) and the DSM (99.9% CI of the difference=[0.049, 0.086]). RDoC had higher ROC-AUC than the DSM (99.9% CI of the difference=[0.018, 0.058]). Solid lines denote bootstrap distribution means. i-j, Difference in ROC-AUC scores between the data-driven and expert-determined frameworks. Maps were thresholded to show differences with FDR < 0.001 based on permutation testing.

Extended Data Fig. 10 Additional visualizations of article partitioning by domains.

Dice distance between articles is shown for binarized vectors of the mental function terms that occurred in the full text and the brain structures to which reported coordinate data were mapped. Articles were split into sets for discovery (n = 12,708) and replication (n = 5,447), then matched to domains based on the Dice similarity of their term-structure vectors. Domain assignments are represented by the color coding scheme established in Fig. 4 for a, the data-driven framework, b, RDoC, and c, the DSM. Shaded areas represent the lower triangle of distances between articles within each domain partition. d-f, Dice distance between articles visualized with t-SNE. Distances were computed between the terms and structures of articles in the full corpus (n = 18,155), and dimensionality of the 18,155 ×18,155 matrix was reduced by principal component analysis. The first 10 principal components (18,155 ×10) were taken as inputs to t-SNE (perplexity = 25, early exaggeration = 15, learning rate = 500, and maximum iterations = 1,000). Articles are visualized separately for the discovery and replication sets, with colors and shapes corresponding to domain assignments in each framework.

Supplementary information

Supplementary Information

Supplementary Figs. 1–9 and Supplementary Tables 1–5

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beam, E., Potts, C., Poldrack, R.A. et al. A data-driven framework for mapping domains of human neurobiology. Nat Neurosci 24, 1733–1744 (2021). https://doi.org/10.1038/s41593-021-00948-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-021-00948-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing