Meta-matching as a simple framework to translate phenotypic predictive models from big to small data

He, Tong; An, Lijun; Chen, Pansheng; Chen, Jianzhong; Feng, Jiashi; Bzdok, Danilo; Holmes, Avram J.; Eickhoff, Simon B.; Yeo, B. T. Thomas

doi:10.1038/s41593-022-01059-9

Article
Published: 16 May 2022

Meta-matching as a simple framework to translate phenotypic predictive models from big to small data

Tong He^1,2,3,
Lijun An^1,2,3,
Pansheng Chen^1,2,3,
Jianzhong Chen^1,2,3,
Jiashi Feng⁴,
Danilo Bzdok ORCID: orcid.org/0000-0003-3466-6620^5,6,
Avram J. Holmes ORCID: orcid.org/0000-0001-6583-803X⁷,
Simon B. Eickhoff^8,9 &
…
B. T. Thomas Yeo ORCID: orcid.org/0000-0002-0119-3276^1,2,3,10,11

Nature Neuroscience volume 25, pages 795–804 (2022)Cite this article

8053 Accesses
21 Citations
82 Altmetric
Metrics details

Subjects

Abstract

We propose a simple framework—meta-matching—to translate predictive models from large-scale datasets to new unseen non-brain-imaging phenotypes in small-scale studies. The key consideration is that a unique phenotype from a boutique study likely correlates with (but is not the same as) related phenotypes in some large-scale dataset. Meta-matching exploits these correlations to boost prediction in the boutique study. We apply meta-matching to predict non-brain-imaging phenotypes from resting-state functional connectivity. Using the UK Biobank (N = 36,848) and Human Connectome Project (HCP) (N = 1,019) datasets, we demonstrate that meta-matching can greatly boost the prediction of new phenotypes in small independent datasets in many scenarios. For example, translating a UK Biobank model to 100 HCP participants yields an eight-fold improvement in variance explained with an average absolute gain of 4.0% (minimum = −0.2%, maximum = 16.0%) across 35 phenotypes. With a growing number of large-scale datasets collecting increasingly diverse phenotypes, our results represent a lower bound on the potential of meta-matching.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Experimental setup for meta-matching in the UK Biobank.**

**Fig. 2: Application of basic and advanced meta-matching to the UK Biobank.**

**Fig. 3: Meta-matching reliably outperforms predictions from classical KRR in the UK Biobank.**

**Fig. 4: Examples of phenotypic prediction performance in the test meta-set (N = 9,900) in the case of 100-shot learning.**

**Fig. 5: Prediction improvements were driven by correlations between training and test meta-set phenotypes.**

**Fig. 6: Experiment setup for meta-matching in the HCP.**

**Fig. 7: Meta-matching reliably outperforms classical KRR in the HCP.**

**Fig. 8: Agreement (correlation) of PNFs with pseudo ground truth in the HCP dataset.**

The serotonin theory of depression: a systematic umbrella review of the evidence

Article Open access 20 July 2022

Associations of dietary patterns with brain health from behavioral, neuroimaging, biochemical and genetic analyses

Article Open access 01 April 2024

Two common and distinct forms of variation in human functional brain networks

Article 30 April 2024

Data availability

This study used publicly available data from the UK Biobank (https://www.ukbiobank.ac.uk/) and the HCP (https://www.humanconnectome.org/). Data can be accessed via data use agreements.

Code availability

Code for the classical (KRR) baseline and meta-matching algorithms can be found here: https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/predict_phenotypes/He2022_MM. The trained models for meta-matching (that is, meta-matching model 1.0) are also publicly available (https://github.com/ThomasYeoLab/Meta_matching_models). The code was reviewed by two co-authors (L.A. and P.C.) before merging into the GitHub repository to reduce the chance of coding errors.

References

Gabrieli, J. D. E., Ghosh, S. S. & Whitfield-Gabrieli, S. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience. Neuron 85, 11–26 (2015).
Article CAS PubMed Central PubMed Google Scholar
Woo, C. W., Chang, L. J., Lindquist, M. A. & Wager, T. D. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20, 365–377 (2017).
Article CAS PubMed Central PubMed Google Scholar
Varoquaux, G. & Poldrack, R. A. Predictive models avoid excessive reductionism in cognitive neuroimaging. Curr. Opin. Neurobiol. 55, 1–6 (2019).
Article CAS PubMed Google Scholar
Eickhoff, S. B. & Langner, R. Neuroimaging-based prediction of mental traits: road to utopia or Orwell? PLoS Biol. 17, e300049 (2019).
Arbabshirani, M. R., Plis, S., Sui, J. & Calhoun, V. D. Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. Neuroimage 145, 137–165 (2017).
Article PubMed Google Scholar
Masouleh, S. K., Eickhoff, S. B., Hoffstaedter, F. & Genon, S. Empirical examination of the replicability of associations between brain structure and psychological variables. eLife 8, e43464 (2019).
Poldrack, R. A., Huckins, G. & Varoquaux, G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry 77, 534–540 (2020).
Article PubMed Central PubMed Google Scholar
Bzdok, D. & Meyer-Lindenberg, A. Machine learning for precision psychiatry: opportunities and challenges. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 3, 223–230 (2018).
PubMed Google Scholar
Chu, C., Hsu, A. L., Chou, K. H., Bandettini, P. & Lin, C. P. Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. Neuroimage 60, 59–70 (2012).
Article PubMed Google Scholar
Cui, Z. & Gong, G. The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features. Neuroimage 178, 622–637 (2018).
Article PubMed Google Scholar
He, T. et al. Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics. Neuroimage 206, 116276 (2020).
Article PubMed Google Scholar
Schulz, M. A. et al. Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets. Nat. Commun. 11, 4238 (2020).
Ravi, S. & Larochelle, H. Optimization as a model for few-shot learning. 5th Int. Conf. Learn. Represent. https://openreview.net/pdf?id=rJY0-Kcll (2017).
Andrychowicz, M. et al. Learning to learn by gradient descent by gradient descent. In Adv. Neural Inf. Process. Syst. 29 (NIPS 2016).
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. 34th Int. Conf. Mach. Learn. 1125–1135 http://proceedings.mlr.press/v70/finn17a.html (2017).
Vanschoren, J. Meta-learning. In: Automated Machine Learning (Springer, 2019).
Chen, Z. & Liu, B. Lifelong Machine Learning (Morgan & Claypool, 2016).
Koppe, G., Meyer-Lindenberg, A. & Durstewitz, D. Deep learning for small and big data in psychiatry. Neuropsychopharmacology 46, 176–190 (2020).
Heinsfeld, A. S., Franco, A. R., Craddock, R. C., Buchweitz, A. & Meneguzzi, F. Identification of autism spectrum disorder using deep learning and the ABIDE dataset. Neuroimage Clin. 17, 16–23 (2018).
Article PubMed Google Scholar
Nichol, A., Achiam, J. & Schulman, J. On first-order meta-learning algorithms. Preprint at https://arxiv.org/abs/1803.02999 (2018).
Mahajan, K., Sharma, M. & Vig, L. Meta-DermDiagnosis: few-shot skin disease identification using meta-learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 3142–3151 https://ieeexplore.ieee.org/document/9150592 (2020).
Li, X., Yu, L., Fu, C.-W. & Heng, P.-A. Difficulty-aware meta-learning for rare disease diagnosis. Preprint at https://arxiv.org/abs/1907.00354 (2019).
Rusu, A. A. et al. Meta-learning with latent embedding optimization. 7th Int. Conf. Learn. Represent. ICLR 2019 1–17 (2019).
Smith, S. M. et al. A positive–negative mode of population covariation links brain connectivity, demographics and behavior. Nat. Neurosci. 18, 1565–1567 (2015).
Article CAS PubMed Central PubMed Google Scholar
Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 19, 1523–1536 (2016).
Article CAS PubMed Central PubMed Google Scholar
Alnæs, D., Kaufmann, T., Marquand, A. F., Smith, S. M. & Westlye, L. T. Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc. Natl Acad. Sci. USA 117, 12419–12427 (2020).
Chen, J. et al. Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nat. Commun. Accepted (2022). https://doi.org/10.1038/s41467-022-29766-8
Biswal, B., FZ, Y., VM, H. & JS, H. Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn. Reson. Med. 34, 537–541 (1995).
Article CAS PubMed Google Scholar
Fox, M. D. & Raichle, M. E. Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nat. Rev. Neurosci. 8, 700–711 (2007).
Article CAS PubMed Google Scholar
Buckner, R. L., Krienen, F. M. & Yeo, B. T. T. Opportunities and limitations of intrinsic functional connectivity MRI. Nat. Neurosci. 16, 832–837 (2013).
Article PubMed Google Scholar
Fornito, A., Zalesky, A. & Breakspear, M. The connectomics of brain disorders. Nat. Rev. Neurosci. 16, 159–172 (2015).
Article CAS PubMed Google Scholar
Smith, S. M. et al. Correspondence of the brain’s functional architecture during activation and rest. Proc. Natl Acad. Sci. USA 106, 13040–13045 (2009).
Article CAS PubMed Central PubMed Google Scholar
Yeo, B. T. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011).
Article PubMed Google Scholar
Xia, C. H. et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nat. Commun. 9, 3003 (2018).
Kebets, V. et al. Somatosensory-motor dysconnectivity spans multiple transdiagnostic dimensions of psychopathology. Biol. Psychiatry 86, 779–791 (2019).
Article PubMed Google Scholar
Shen, X., Tokoglu, F., Papademetris, X. & Constable, R. T. Groupwise whole-brain parcellation from resting-state fMRI data for network node identification. Neuroimage 82, 403–415 (2013).
Article CAS PubMed Google Scholar
Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016).
Article CAS PubMed Central PubMed Google Scholar
Gordon, E. M. et al. Generation and evaluation of a cortical area parcellation from resting-state correlations. Cereb. Cortex 26, 288–303 (2016).
Article PubMed Google Scholar
Eickhoff, S. B., Yeo, B. T. T. & Genon, S. Imaging-based parcellations of the human brain. Nat. Rev. Neurosci. 19, 672–686 (2018).
Article CAS PubMed Google Scholar
Dosenbach, N. U. F. et al. Prediction of individual brain maturity using fMRI. Science 329, 1358–1361 (2010).
Article CAS PubMed Central PubMed Google Scholar
Finn, E. S. et al. Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nat. Neurosci. 18, 1664–1671 (2015).
Article CAS PubMed Central PubMed Google Scholar
Rosenberg, M. D. et al. A neuromarker of sustained attention from whole-brain functional connectivity. Nat. Neurosci. 19, 165–171 (2016).
Article CAS PubMed Google Scholar
Reinen, J. M. et al. The human cortex possesses a reconfigurable dynamic network architecture that is disrupted in psychosis. Nat. Commun. 9, 1157 (2018).
Article PubMed Central CAS PubMed Google Scholar
Li, J. et al. Global signal regression strengthens association between resting-state functional connectivity and behavior. Neuroimage 196, 126–141 (2019).
Article PubMed Google Scholar
Weis, S. et al. Sex classification by resting state brain connectivity. Cereb. Cortex 30, 824–835 (2020).
Article PubMed Google Scholar
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, 1–10 (2015).
Article Google Scholar
Van Essen, D. C. et al. The WU-Minn Human Connectome Project: an overview. Neuroimage 80, 62–79 (2013).
Article PubMed Google Scholar
Varoquaux, G. et al. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines. Neuroimage 145, 166–179 (2017).
Article PubMed Google Scholar
Scheinost, D. et al. Ten simple rules for predictive modeling of individual differences in neuroimaging. Neuroimage 193, 35–45 (2019).
Article PubMed Google Scholar
Tan, C. et al. A survey on deep transfer learning. In International conference on artificial neural networks 270–279 (Springer, Cham, 2018).
Breiman, L. Stacked regressions. Mach. Learn. 24, 49–64 (1996).
Article Google Scholar
Wolpert, D. Stacked generalization. Neural Netw. 5, 241–259 (1992).
Article Google Scholar
Haufe, S. et al. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage 87, 96–110 (2014).
Article PubMed Google Scholar
Rosenberg, M. D., Casey, B. J. & Holmes, A. J. Prediction complements explanation in understanding the developing brain. Nat. Commun. 9, 589 (2018).
Ruder, S. An overview of multi-task learning in deep neural networks. Preprint at https://arxiv.org/abs/1706.05098 (2017).
Dutt, R. K. et al. Mental health in the UK Biobank: a roadmap to self-report measures and neuroimaging correlates. Hum. Brain Mapp. 43, 816–832 (2021).
Elliott, P. & Peakman, T. C. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int. J. Epidemiol. 37, 234–244 (2008).
Article PubMed Google Scholar
Alfaro-Almagro, F. et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, 400–424 (2018).
Article PubMed Google Scholar
Van Essen, D. C. et al. The Human Connectome Project: a data acquisition perspective. Neuroimage 62, 2222–2231 (2012).
Article PubMed Google Scholar
Barch, D. M. et al. Function in the human connectome: task-fMRI and individual differences in behavior. Neuroimage 80, 169–189 (2013).
Article PubMed Google Scholar
Smith, S. M. et al. Resting-state fMRI in the Human Connectome Project. Neuroimage 80, 144–168 (2013).
Article PubMed Google Scholar
Smith, S. M. et al. Network modelling methods for FMRI. Neuroimage 54, 875–891 (2011).
Article PubMed Google Scholar
Beckmann, C. F. & Smith, S. M. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans. Med. Imaging 23, 137–152 (2004).
Article PubMed Google Scholar
Schaefer, A. et al. Local–global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).
Fischl, B. et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355 (2002).
Article CAS PubMed Google Scholar
Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80, 105–124 (2013).
Article PubMed Google Scholar
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS PubMed Google Scholar
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. Proc. 9th Python Sci. Conf. https://conference.scipy.org/proceedings/scipy2010/seabold.html (2010).
Kong, R. et al. Spatial topography of individual-specific cortical networks predicts human cognition, personality, and emotion. Cereb. Cortex 29, 2533–2551 (2019).
Article PubMed Google Scholar
Paszke, A. et al. Automatic differentiation in PyTorch. In NIPS 2017.
Kuhn, M. & Johnson, K. Applied Predictive Modeling (Springer, 2013).
Ilievski, I., Akhtar, T., Feng, J. & Shoemaker, C. A. Efficient hyperparameter optimization of deep learning algorithms using deterministic RBF surrogates. Proc. 31st AAAI Conference on Artificial Intelligence https://dl.acm.org/doi/10.5555/3298239.3298360 (2017).
Regis, R. G. & Shoemaker, C. A. Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Engineering Optimization 45, 529–555 (2013).
Article Google Scholar
Eriksson, D., Bindel, D. & Shoemaker, C. A. pySOT: Python surrogate optimization toolbox. https://github.com/dme65/pySOT (2019).

Download references

Acknowledgements

We would like to thank C. Annette, T. Akhtar and Z. Li for their help on the HORD algorithm. This work was supported by the Singapore National Research Foundation (NRF) Fellowship Class of 2017 (B.T.T.Y.), the NUS Yong Loo Lin School of Medicine NUHSRO/2020/124/TMR/LOA (B.T.T.Y.), the Singapore National Medical Research Council (NMRC) LCG OFLCG19May-0035 (B.T.T.Y.), the NMRC STaR20nov-0003 (B.T.T.Y.), the Healthy Brains Healthy Lives initiative from the Canada First Research Excellence Fund (D.B.), the Canada Institute for Advanced Research CIFAR Artificial Intelligence Chairs program (D.B.), Google Research Award (D.B.) and National Institutes of Health (NIH) R01AG068563A (D.B.), NIH R01MH120080 (A.J.H.) and NIH R01MH123245 (A.J.H.). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the Singapore NRF or NMRC. Our computational work was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). The Titan Xp GPUs used for this research were donated by Nvidia Corporation. This research has been conducted using the UK Biobank resource under application 25163 and the Human Connectome Project, the WU-Minn Consortium (principal investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH institutes and centers that support the NIH Blueprint for Neuroscience Research and by the McDonnell Center for Systems Neuroscience at Washington University.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
Tong He, Lijun An, Pansheng Chen, Jianzhong Chen & B. T. Thomas Yeo
Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), National University of Singapore, Singapore, Singapore
Tong He, Lijun An, Pansheng Chen, Jianzhong Chen & B. T. Thomas Yeo
N.1 Institute for Health & Institute for Digital Medicine (WisDM), National University of Singapore, Singapore, Singapore
Tong He, Lijun An, Pansheng Chen, Jianzhong Chen & B. T. Thomas Yeo
Bytedance, Bejing, China
Jiashi Feng
Department of Biomedical Engineering, McConnell Brain Imaging Centre (BIC), Montreal Neurological Institute (MNI), Faculty of Medicine, School of Computer Science, McGill University, Montreal QC, Canada
Danilo Bzdok
Mila – Quebec Artificial Intelligence Institute, Montreal, QC, Canada
Danilo Bzdok
Departments of Psychology and Psychiatry, Yale University, New Haven, CT, USA
Avram J. Holmes
Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Simon B. Eickhoff
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
Simon B. Eickhoff
NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore, Singapore
B. T. Thomas Yeo
Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA
B. T. Thomas Yeo

Authors

Tong He
View author publications
You can also search for this author in PubMed Google Scholar
Lijun An
View author publications
You can also search for this author in PubMed Google Scholar
Pansheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiashi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Bzdok
View author publications
You can also search for this author in PubMed Google Scholar
Avram J. Holmes
View author publications
You can also search for this author in PubMed Google Scholar
Simon B. Eickhoff
View author publications
You can also search for this author in PubMed Google Scholar
B. T. Thomas Yeo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.H., L.A., P.C., J.C., J.F., D.B., A.J.H., S.B.E. and B.T.T.Y. designed the research. T.H. conducted the research. T.H., L.A., P.C., J.C., J.F., D.B., A.J.H., S.B.E. and B.T.T.Y. interpreted the results. T.H. and B.T.T.Y. wrote the manuscript and created the figures. T.H., L.A. and P.C. reviewed and published the code. All authors contributed to project direction via discussion. All authors edited the manuscript.

Corresponding author

Correspondence to B. T. Thomas Yeo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Janine Bijsterbosch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–14, Supplementary Methods and Supplementary Tables 1–3

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, T., An, L., Chen, P. et al. Meta-matching as a simple framework to translate phenotypic predictive models from big to small data. Nat Neurosci 25, 795–804 (2022). https://doi.org/10.1038/s41593-022-01059-9

Download citation

Received: 22 October 2020
Accepted: 23 March 2022
Published: 16 May 2022
Issue Date: June 2022
DOI: https://doi.org/10.1038/s41593-022-01059-9

This article is cited by

The challenges and prospects of brain-based prediction of behaviour
- Jianxiao Wu
- Jingwei Li
- Sarah Genon
Nature Human Behaviour (2023)
Multivariate BWAS can be replicable with moderate sample sizes
- Tamas Spisak
- Ulrike Bingel
- Tor D. Wager
Nature (2023)
Exploring the latent structure of behavior using the Human Connectome Project’s data
- Mikkel Schöttner
- Thomas A. W. Bolton
- Patric Hagmann
Scientific Reports (2023)
Machine learning in attention-deficit/hyperactivity disorder: new approaches toward understanding the neural mechanisms
- Meng Cao
- Elizabeth Martin
- Xiaobo Li
Translational Psychiatry (2023)
Piggybacking on big data
- Janine Bijsterbosch
Nature Neuroscience (2022)