Machine learning and data mining: strategies for hypothesis generation

Oquendo, M A; Baca-Garcia, E; Artés-Rodríguez, A; Perez-Cruz, F; Galfalvy, H C; Blasco-Fontecilla, H; Madigan, D; Duan, N

doi:10.1038/mp.2011.173

Perspective
Published: 10 January 2012

Machine learning and data mining: strategies for hypothesis generation

M A Oquendo¹,
E Baca-Garcia^1,2,
A Artés-Rodríguez³,
F Perez-Cruz^3,4,
H C Galfalvy¹,
H Blasco-Fontecilla²,
D Madigan⁵ &
…
N Duan^1,6

Molecular Psychiatry volume 17, pages 956–959 (2012)Cite this article

5417 Accesses
60 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Strategies for generating knowledge in medicine have included observation of associations in clinical or research settings and more recently, development of pathophysiological models based on molecular biology. Although critically important, they limit hypothesis generation to an incremental pace. Machine learning and data mining are alternative approaches to identifying new vistas to pursue, as is already evident in the literature. In concert with these analytic strategies, novel approaches to data collection can enhance the hypothesis pipeline as well. In data farming, data are obtained in an ‘organic’ way, in the sense that it is entered by patients themselves and available for harvesting. In contrast, in evidence farming (EF), it is the provider who enters medical data about individual patients. EF differs from regular electronic medical record systems because frontline providers can use it to learn from their own past experience. In addition to the possibility of generating large databases with farming approaches, it is likely that we can further harness the power of large data sets collected using either farming or more standard techniques through implementation of data-mining and machine-learning strategies. Exploiting large databases to develop new hypotheses regarding neurobiological and genetic underpinnings of psychiatric illness is useful in itself, but also affords the opportunity to identify novel mechanisms to be targeted in drug discovery and development.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

A primer on the use of machine learning to distil knowledge from data in biological psychiatry

Article 04 January 2024

Axes of a revolution: challenges and promises of big data in healthcare

Article 13 January 2020

Deep learning for small and big data in psychiatry

Article Open access 15 July 2020

References

Carlsson A . A paradigm shift in brain research. Science 2001; 294: 1021–1024.
Article CAS Google Scholar
Mitchell TM . The Discipline of Machine Learning. School of Computer Science: Pittsburgh, PA, 2006. Available from: http://aaai.org/AITopics/MachineLearning.
Google Scholar
Nilsson NJ . Introduction to Machine Learning. An early draft of a proposed textbook. Robotics Laboratory, Department of Computer Science, Stanford University: Stanford, 1996. Available from: http://robotics.stanford.edu/people/nilsson/mlbook.html.
Google Scholar
Hand DJ . Mining medical data. Stat Methods Med Res 2000; 9: 305–307.
PubMed CAS Google Scholar
Smyth P . Data mining: data analysis on a grand scale? Stat Methods Med Res 2000; 9: 309–327.
Article CAS Google Scholar
Burgun A, Bodenreider O . Accessing and integrating data and knowledge for biomedical research. Yearb Med Inform 2008; 47(Suppl 1): 91–101.
Google Scholar
Hochberg AM, Hauben M, Pearson RK, O’Hara DJ, Reisinger SJ, Goldsmith DI et al. An evaluation of three signal-detection algorithms using a highly inclusive reference event database. Drug Saf 2009; 32: 509–525.
Article Google Scholar
Sanz EJ, De-las-Cuevas C, Kiuru A, Bate A, Edwards R . Selective serotonin reuptake inhibitors in pregnant women and neonatal withdrawal syndrome: a database analysis. Lancet 2005; 365: 482–487.
Article CAS Google Scholar
Baca-Garcia E, Perez-Rodriguez MM, Basurte-Villamor I, Saiz-Ruiz J, Leiva-Murillo JM, de Prado-Cumplido M et al. Using data mining to explore complex clinical decisions: A study of hospitalization after a suicide attempt. J Clin Psychiatry 2006; 67: 1124–1132.
Article Google Scholar
Ray S, Britschgi M, Herbert C, Takeda-Uchimura Y, Boxer A, Blennow K et al. Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins. Nat Med 2007; 13: 1359–1362.
Article CAS Google Scholar
Baca-Garcia E, Perez-Rodriguez MM, Basurte-Villamor I, Lopez-Castroman J, Fernandez del Moral AL, Jimenez-Arriero MA et al. Diagnostic stability and evolution of bipolar disorder in clinical practice: a prospective cohort study. Acta Psychiatr Scand 2007; 115: 473–480.
Article CAS Google Scholar
Baca-Garcia E, Vaquero-Lorenzo C, Perez-Rodriguez MM, Gratacos M, Bayes M, Santiago-Mozos R et al. Nucleotide variation in central nervous system genes among male suicide attempters. Am J Med Genet B Neuropsychiatr Genet 2010; 153B: 208–213.
PubMed CAS Google Scholar
Sun D, van Erp TG, Thompson PM, Bearden CE, Daley M, Kushan L et al. Elucidating a magnetic resonance imaging-based neuroanatomic biomarker for psychosis: classification analysis using probabilistic brain atlas and machine learning algorithms. Biol Psychiatry 2009; 66: 1055–1060.
Article Google Scholar
Shen H, Wang L, Liu Y, Hu D . Discriminative analysis of resting-state functional connectivity patterns of schizophrenia using low dimensional embedding of fMRI. Neuroimage 2010; 49: 3110–3121.
Article Google Scholar
Hay MC, Weisner TS, Subramanian S, Duan N, Niedzinski EJ, Kravitz RL . Harnessing experience: exploring the gap between evidence-based medicine and clinical practice. J Eval Clin Pract 2008; 14: 707–713.
Article Google Scholar
Unutzer J, Choi Y, Cook IA, Oishi S . A web-based data management system to improve care for depression in a multicenter clinical trial. Psychiatr Serv 2002; 53: 671–673.
Article Google Scholar

Download references

Acknowledgements

Dr Blasco-Fontecilla acknowledges the Spanish Ministry of Health (Rio Hortega CM08/00170), Alicia Koplowitz Foundation, and Conchita Rabago Foundation for funding his post-doctoral rotation at CHRU, Montpellier, France. SAF2010-21849.

Author information

Authors and Affiliations

Department of Psychiatry, New York State Psychiatric Institute and Columbia University, New York, NY, USA
M A Oquendo, E Baca-Garcia, H C Galfalvy & N Duan
Fundacion Jimenez Diaz and Universidad Autonoma, CIBERSAM, Madrid, Spain
E Baca-Garcia & H Blasco-Fontecilla
Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Madrid, Spain
A Artés-Rodríguez & F Perez-Cruz
Princeton University, Princeton, NJ, USA
F Perez-Cruz
Department of Statistics, Columbia University, New York, NY, USA
D Madigan
Department of Biostatistics, Columbia University, New York, NY, USA
N Duan

Authors

M A Oquendo
View author publications
You can also search for this author in PubMed Google Scholar
E Baca-Garcia
View author publications
You can also search for this author in PubMed Google Scholar
A Artés-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
F Perez-Cruz
View author publications
You can also search for this author in PubMed Google Scholar
H C Galfalvy
View author publications
You can also search for this author in PubMed Google Scholar
H Blasco-Fontecilla
View author publications
You can also search for this author in PubMed Google Scholar
D Madigan
View author publications
You can also search for this author in PubMed Google Scholar
N Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M A Oquendo.

Ethics declarations

Competing interests

Dr Oquendo has received unrestricted educational grants and/or lecture fees form Astra-Zeneca, Bristol Myers Squibb, Eli Lilly, Janssen, Otsuko, Pfizer, Sanofi-Aventis and Shire. Her family owns stock in Bistol Myers Squibb. The remaining authors declare no conflict of interest.

PowerPoint slides

PowerPoint slide for Fig. 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oquendo, M., Baca-Garcia, E., Artés-Rodríguez, A. et al. Machine learning and data mining: strategies for hypothesis generation. Mol Psychiatry 17, 956–959 (2012). https://doi.org/10.1038/mp.2011.173

Download citation

Received: 15 July 2011
Revised: 20 October 2011
Accepted: 21 November 2011
Published: 10 January 2012
Issue Date: October 2012
DOI: https://doi.org/10.1038/mp.2011.173

Keywords

This article is cited by

Applications of artificial intelligence−machine learning for detection of stress: a critical overview
- Alexios-Fotios A. Mentis
- Donghoon Lee
- Panos Roussos
Molecular Psychiatry (2023)
Optimizing prediction of response to antidepressant medications using machine learning and integrated genetic, clinical, and demographic data
- Dekel Taliaz
- Amit Spinrad
- Bernard Lerer
Translational Psychiatry (2021)
Computational psychiatry: a report from the 2017 NIMH workshop on opportunities and challenges
- Michele Ferrante
- A. David Redish
- Joshua A. Gordon
Molecular Psychiatry (2019)
The role of machine learning in neuroimaging for drug discovery and development
- Orla M. Doyle
- Mitul A. Mehta
- Michael J. Brammer
Psychopharmacology (2015)
Stabilized sparse ordinal regression for medical risk stratification
- Truyen Tran
- Dinh Phung
- Svetha Venkatesh
Knowledge and Information Systems (2015)