Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Meta-matching as a simple framework to translate phenotypic predictive models from big to small data

Abstract

We propose a simple framework—meta-matching—to translate predictive models from large-scale datasets to new unseen non-brain-imaging phenotypes in small-scale studies. The key consideration is that a unique phenotype from a boutique study likely correlates with (but is not the same as) related phenotypes in some large-scale dataset. Meta-matching exploits these correlations to boost prediction in the boutique study. We apply meta-matching to predict non-brain-imaging phenotypes from resting-state functional connectivity. Using the UK Biobank (N = 36,848) and Human Connectome Project (HCP) (N = 1,019) datasets, we demonstrate that meta-matching can greatly boost the prediction of new phenotypes in small independent datasets in many scenarios. For example, translating a UK Biobank model to 100 HCP participants yields an eight-fold improvement in variance explained with an average absolute gain of 4.0% (minimum = −0.2%, maximum = 16.0%) across 35 phenotypes. With a growing number of large-scale datasets collecting increasingly diverse phenotypes, our results represent a lower bound on the potential of meta-matching.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Experimental setup for meta-matching in the UK Biobank.
Fig. 2: Application of basic and advanced meta-matching to the UK Biobank.
Fig. 3: Meta-matching reliably outperforms predictions from classical KRR in the UK Biobank.
Fig. 4: Examples of phenotypic prediction performance in the test meta-set (N = 9,900) in the case of 100-shot learning.
Fig. 5: Prediction improvements were driven by correlations between training and test meta-set phenotypes.
Fig. 6: Experiment setup for meta-matching in the HCP.
Fig. 7: Meta-matching reliably outperforms classical KRR in the HCP.
Fig. 8: Agreement (correlation) of PNFs with pseudo ground truth in the HCP dataset.

Similar content being viewed by others

Data availability

This study used publicly available data from the UK Biobank (https://www.ukbiobank.ac.uk/) and the HCP (https://www.humanconnectome.org/). Data can be accessed via data use agreements.

Code availability

Code for the classical (KRR) baseline and meta-matching algorithms can be found here: https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/predict_phenotypes/He2022_MM. The trained models for meta-matching (that is, meta-matching model 1.0) are also publicly available (https://github.com/ThomasYeoLab/Meta_matching_models). The code was reviewed by two co-authors (L.A. and P.C.) before merging into the GitHub repository to reduce the chance of coding errors.

References

  1. Gabrieli, J. D. E., Ghosh, S. S. & Whitfield-Gabrieli, S. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience. Neuron 85, 11–26 (2015).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  2. Woo, C. W., Chang, L. J., Lindquist, M. A. & Wager, T. D. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20, 365–377 (2017).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  3. Varoquaux, G. & Poldrack, R. A. Predictive models avoid excessive reductionism in cognitive neuroimaging. Curr. Opin. Neurobiol. 55, 1–6 (2019).

    Article  CAS  PubMed  Google Scholar 

  4. Eickhoff, S. B. & Langner, R. Neuroimaging-based prediction of mental traits: road to utopia or Orwell? PLoS Biol. 17, e300049 (2019).

  5. Arbabshirani, M. R., Plis, S., Sui, J. & Calhoun, V. D. Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. Neuroimage 145, 137–165 (2017).

    Article  PubMed  Google Scholar 

  6. Masouleh, S. K., Eickhoff, S. B., Hoffstaedter, F. & Genon, S. Empirical examination of the replicability of associations between brain structure and psychological variables. eLife 8, e43464 (2019).

  7. Poldrack, R. A., Huckins, G. & Varoquaux, G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry 77, 534–540 (2020).

    Article  PubMed Central  PubMed  Google Scholar 

  8. Bzdok, D. & Meyer-Lindenberg, A. Machine learning for precision psychiatry: opportunities and challenges. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 3, 223–230 (2018).

    PubMed  Google Scholar 

  9. Chu, C., Hsu, A. L., Chou, K. H., Bandettini, P. & Lin, C. P. Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. Neuroimage 60, 59–70 (2012).

    Article  PubMed  Google Scholar 

  10. Cui, Z. & Gong, G. The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features. Neuroimage 178, 622–637 (2018).

    Article  PubMed  Google Scholar 

  11. He, T. et al. Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics. Neuroimage 206, 116276 (2020).

    Article  PubMed  Google Scholar 

  12. Schulz, M. A. et al. Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets. Nat. Commun. 11, 4238 (2020).

  13. Ravi, S. & Larochelle, H. Optimization as a model for few-shot learning. 5th Int. Conf. Learn. Represent. https://openreview.net/pdf?id=rJY0-Kcll (2017).

  14. Andrychowicz, M. et al. Learning to learn by gradient descent by gradient descent. In Adv. Neural Inf. Process. Syst. 29 (NIPS 2016).

  15. Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. 34th Int. Conf. Mach. Learn. 1125–1135 http://proceedings.mlr.press/v70/finn17a.html (2017).

  16. Vanschoren, J. Meta-learning. In: Automated Machine Learning (Springer, 2019).

  17. Chen, Z. & Liu, B. Lifelong Machine Learning (Morgan & Claypool, 2016).

  18. Koppe, G., Meyer-Lindenberg, A. & Durstewitz, D. Deep learning for small and big data in psychiatry. Neuropsychopharmacology 46, 176–190 (2020).

  19. Heinsfeld, A. S., Franco, A. R., Craddock, R. C., Buchweitz, A. & Meneguzzi, F. Identification of autism spectrum disorder using deep learning and the ABIDE dataset. Neuroimage Clin. 17, 16–23 (2018).

    Article  PubMed  Google Scholar 

  20. Nichol, A., Achiam, J. & Schulman, J. On first-order meta-learning algorithms. Preprint at https://arxiv.org/abs/1803.02999 (2018).

  21. Mahajan, K., Sharma, M. & Vig, L. Meta-DermDiagnosis: few-shot skin disease identification using meta-learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 3142–3151 https://ieeexplore.ieee.org/document/9150592 (2020).

  22. Li, X., Yu, L., Fu, C.-W. & Heng, P.-A. Difficulty-aware meta-learning for rare disease diagnosis. Preprint at https://arxiv.org/abs/1907.00354 (2019).

  23. Rusu, A. A. et al. Meta-learning with latent embedding optimization. 7th Int. Conf. Learn. Represent. ICLR 2019 1–17 (2019).

  24. Smith, S. M. et al. A positive–negative mode of population covariation links brain connectivity, demographics and behavior. Nat. Neurosci. 18, 1565–1567 (2015).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  25. Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 19, 1523–1536 (2016).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  26. Alnæs, D., Kaufmann, T., Marquand, A. F., Smith, S. M. & Westlye, L. T. Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc. Natl Acad. Sci. USA 117, 12419–12427 (2020).

  27. Chen, J. et al. Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nat. Commun. Accepted (2022). https://doi.org/10.1038/s41467-022-29766-8

  28. Biswal, B., FZ, Y., VM, H. & JS, H. Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn. Reson. Med. 34, 537–541 (1995).

    Article  CAS  PubMed  Google Scholar 

  29. Fox, M. D. & Raichle, M. E. Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nat. Rev. Neurosci. 8, 700–711 (2007).

    Article  CAS  PubMed  Google Scholar 

  30. Buckner, R. L., Krienen, F. M. & Yeo, B. T. T. Opportunities and limitations of intrinsic functional connectivity MRI. Nat. Neurosci. 16, 832–837 (2013).

    Article  PubMed  Google Scholar 

  31. Fornito, A., Zalesky, A. & Breakspear, M. The connectomics of brain disorders. Nat. Rev. Neurosci. 16, 159–172 (2015).

    Article  CAS  PubMed  Google Scholar 

  32. Smith, S. M. et al. Correspondence of the brain’s functional architecture during activation and rest. Proc. Natl Acad. Sci. USA 106, 13040–13045 (2009).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  33. Yeo, B. T. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011).

    Article  PubMed  Google Scholar 

  34. Xia, C. H. et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nat. Commun. 9, 3003 (2018).

  35. Kebets, V. et al. Somatosensory-motor dysconnectivity spans multiple transdiagnostic dimensions of psychopathology. Biol. Psychiatry 86, 779–791 (2019).

    Article  PubMed  Google Scholar 

  36. Shen, X., Tokoglu, F., Papademetris, X. & Constable, R. T. Groupwise whole-brain parcellation from resting-state fMRI data for network node identification. Neuroimage 82, 403–415 (2013).

    Article  CAS  PubMed  Google Scholar 

  37. Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  38. Gordon, E. M. et al. Generation and evaluation of a cortical area parcellation from resting-state correlations. Cereb. Cortex 26, 288–303 (2016).

    Article  PubMed  Google Scholar 

  39. Eickhoff, S. B., Yeo, B. T. T. & Genon, S. Imaging-based parcellations of the human brain. Nat. Rev. Neurosci. 19, 672–686 (2018).

    Article  CAS  PubMed  Google Scholar 

  40. Dosenbach, N. U. F. et al. Prediction of individual brain maturity using fMRI. Science 329, 1358–1361 (2010).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  41. Finn, E. S. et al. Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nat. Neurosci. 18, 1664–1671 (2015).

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  42. Rosenberg, M. D. et al. A neuromarker of sustained attention from whole-brain functional connectivity. Nat. Neurosci. 19, 165–171 (2016).

    Article  CAS  PubMed  Google Scholar 

  43. Reinen, J. M. et al. The human cortex possesses a reconfigurable dynamic network architecture that is disrupted in psychosis. Nat. Commun. 9, 1157 (2018).

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  44. Li, J. et al. Global signal regression strengthens association between resting-state functional connectivity and behavior. Neuroimage 196, 126–141 (2019).

    Article  PubMed  Google Scholar 

  45. Weis, S. et al. Sex classification by resting state brain connectivity. Cereb. Cortex 30, 824–835 (2020).

    Article  PubMed  Google Scholar 

  46. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, 1–10 (2015).

    Article  Google Scholar 

  47. Van Essen, D. C. et al. The WU-Minn Human Connectome Project: an overview. Neuroimage 80, 62–79 (2013).

    Article  PubMed  Google Scholar 

  48. Varoquaux, G. et al. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines. Neuroimage 145, 166–179 (2017).

    Article  PubMed  Google Scholar 

  49. Scheinost, D. et al. Ten simple rules for predictive modeling of individual differences in neuroimaging. Neuroimage 193, 35–45 (2019).

    Article  PubMed  Google Scholar 

  50. Tan, C. et al. A survey on deep transfer learning. In International conference on artificial neural networks 270–279 (Springer, Cham, 2018).

  51. Breiman, L. Stacked regressions. Mach. Learn. 24, 49–64 (1996).

    Article  Google Scholar 

  52. Wolpert, D. Stacked generalization. Neural Netw. 5, 241–259 (1992).

    Article  Google Scholar 

  53. Haufe, S. et al. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage 87, 96–110 (2014).

    Article  PubMed  Google Scholar 

  54. Rosenberg, M. D., Casey, B. J. & Holmes, A. J. Prediction complements explanation in understanding the developing brain. Nat. Commun. 9, 589 (2018).

  55. Ruder, S. An overview of multi-task learning in deep neural networks. Preprint at https://arxiv.org/abs/1706.05098 (2017).

  56. Dutt, R. K. et al. Mental health in the UK Biobank: a roadmap to self-report measures and neuroimaging correlates. Hum. Brain Mapp. 43, 816–832 (2021).

  57. Elliott, P. & Peakman, T. C. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int. J. Epidemiol. 37, 234–244 (2008).

    Article  PubMed  Google Scholar 

  58. Alfaro-Almagro, F. et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, 400–424 (2018).

    Article  PubMed  Google Scholar 

  59. Van Essen, D. C. et al. The Human Connectome Project: a data acquisition perspective. Neuroimage 62, 2222–2231 (2012).

    Article  PubMed  Google Scholar 

  60. Barch, D. M. et al. Function in the human connectome: task-fMRI and individual differences in behavior. Neuroimage 80, 169–189 (2013).

    Article  PubMed  Google Scholar 

  61. Smith, S. M. et al. Resting-state fMRI in the Human Connectome Project. Neuroimage 80, 144–168 (2013).

    Article  PubMed  Google Scholar 

  62. Smith, S. M. et al. Network modelling methods for FMRI. Neuroimage 54, 875–891 (2011).

    Article  PubMed  Google Scholar 

  63. Beckmann, C. F. & Smith, S. M. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans. Med. Imaging 23, 137–152 (2004).

    Article  PubMed  Google Scholar 

  64. Schaefer, A. et al. Local–global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).

  65. Fischl, B. et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355 (2002).

    Article  CAS  PubMed  Google Scholar 

  66. Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80, 105–124 (2013).

    Article  PubMed  Google Scholar 

  67. Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).

  68. Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  69. Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. Proc. 9th Python Sci. Conf. https://conference.scipy.org/proceedings/scipy2010/seabold.html (2010).

  70. Kong, R. et al. Spatial topography of individual-specific cortical networks predicts human cognition, personality, and emotion. Cereb. Cortex 29, 2533–2551 (2019).

    Article  PubMed  Google Scholar 

  71. Paszke, A. et al. Automatic differentiation in PyTorch. In NIPS 2017.

  72. Kuhn, M. & Johnson, K. Applied Predictive Modeling (Springer, 2013).

  73. Ilievski, I., Akhtar, T., Feng, J. & Shoemaker, C. A. Efficient hyperparameter optimization of deep learning algorithms using deterministic RBF surrogates. Proc. 31st AAAI Conference on Artificial Intelligence https://dl.acm.org/doi/10.5555/3298239.3298360 (2017).

  74. Regis, R. G. & Shoemaker, C. A. Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Engineering Optimization 45, 529–555 (2013).

    Article  Google Scholar 

  75. Eriksson, D., Bindel, D. & Shoemaker, C. A. pySOT: Python surrogate optimization toolbox. https://github.com/dme65/pySOT (2019).

Download references

Acknowledgements

We would like to thank C. Annette, T. Akhtar and Z. Li for their help on the HORD algorithm. This work was supported by the Singapore National Research Foundation (NRF) Fellowship Class of 2017 (B.T.T.Y.), the NUS Yong Loo Lin School of Medicine NUHSRO/2020/124/TMR/LOA (B.T.T.Y.), the Singapore National Medical Research Council (NMRC) LCG OFLCG19May-0035 (B.T.T.Y.), the NMRC STaR20nov-0003 (B.T.T.Y.), the Healthy Brains Healthy Lives initiative from the Canada First Research Excellence Fund (D.B.), the Canada Institute for Advanced Research CIFAR Artificial Intelligence Chairs program (D.B.), Google Research Award (D.B.) and National Institutes of Health (NIH) R01AG068563A (D.B.), NIH R01MH120080 (A.J.H.) and NIH R01MH123245 (A.J.H.). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the Singapore NRF or NMRC. Our computational work was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). The Titan Xp GPUs used for this research were donated by Nvidia Corporation. This research has been conducted using the UK Biobank resource under application 25163 and the Human Connectome Project, the WU-Minn Consortium (principal investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH institutes and centers that support the NIH Blueprint for Neuroscience Research and by the McDonnell Center for Systems Neuroscience at Washington University.

Author information

Authors and Affiliations

Authors

Contributions

T.H., L.A., P.C., J.C., J.F., D.B., A.J.H., S.B.E. and B.T.T.Y. designed the research. T.H. conducted the research. T.H., L.A., P.C., J.C., J.F., D.B., A.J.H., S.B.E. and B.T.T.Y. interpreted the results. T.H. and B.T.T.Y. wrote the manuscript and created the figures. T.H., L.A. and P.C. reviewed and published the code. All authors contributed to project direction via discussion. All authors edited the manuscript.

Corresponding author

Correspondence to B. T. Thomas Yeo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Janine Bijsterbosch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–14, Supplementary Methods and Supplementary Tables 1–3

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, T., An, L., Chen, P. et al. Meta-matching as a simple framework to translate phenotypic predictive models from big to small data. Nat Neurosci 25, 795–804 (2022). https://doi.org/10.1038/s41593-022-01059-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-022-01059-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing