Evaluating efficiency and accuracy of deep-learning-based approaches on study selection for psychiatry systematic reviews

Gorelik, Aaron J.; Gorelik, Mark G.; Ridout, Kathryn K.; Nimarko, Akua F.; Peisch, Virginia; Kuramkote, Shamanth R.; Low, Michelle; Pan, Tracy; Singh, Simirthi; Nrusimha, Ananya; Singh, Manpreet K.

doi:10.1038/s44220-023-00109-w

Article
Published: 31 August 2023

Evaluating efficiency and accuracy of deep-learning-based approaches on study selection for psychiatry systematic reviews

Aaron J. Gorelik ORCID: orcid.org/0000-0002-2589-1220¹^na1,
Mark G. Gorelik²^na1,
Kathryn K. Ridout³,
Akua F. Nimarko¹,
Virginia Peisch¹,
Shamanth R. Kuramkote⁴,
Michelle Low⁵,
Tracy Pan¹,
Simirthi Singh⁶,
Ananya Nrusimha⁷ &
…
Manpreet K. Singh ORCID: orcid.org/0000-0002-4373-3293¹

Nature Mental Health volume 1, pages 623–632 (2023)Cite this article

343 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Scientific publications in mental health are rapidly growing in number and complexity, making curation of abstracts for systematic reviews increasingly time consuming and challenging. As a result, systematic reviews on broad topics have also complicated human review due, in part, to variations and lack of objectivity among multiple human reviewers. Resolving these complexities can be time consuming and impact the accuracy and breadth of systematic reviews. To address these challenges, we propose and evaluate multiple machine-learning-based approaches to capture inclusion and exclusion criteria and automate the abstract selection process. We fine-tuned or trained models on psychiatry abstracts from four systematic-review topic areas. The models were then applied to abstracts derived from an independently curated oncology literature database. Transformer-based machine-learning models outperformed trained human reviewers in abstract screening for three out of four topic areas with accuracies ranging from –4% to 17.7%. Such approaches may facilitate the sharing and synthesis of research expertise across disciplines.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Process flow for individual systematic-review topic area.**

**Fig. 2: Paper selection methods performance.**

**Fig. 3: Human/AI augmentation performance.**

Pediatric literature trends: high-level analysis using text-mining

Article 17 March 2021

A large-scale dataset of patient summaries for retrieval-based clinical decision support systems

Article Open access 18 December 2023

An open source machine learning framework for efficient and transparent systematic reviews

Article Open access 01 February 2021

Data availability

Source data are provided with this paper. All papers used as source data for abstracts are available at Embase, Web of Science, PsycInfo, CINAHL, PubMed, and NCI. All datasets, including abstracts, used for machine learning and evaluation of results (human reviewers and automated) are clearly identified and publicly available at https://github.com/MetaAnalysisPipeline/MetaAnalysisPipeline.

Code availability

The code used for text preprocessing and machine-learning pipelines for SciBERT, BERT, and Naïve Bayes are available at https://github.com/MetaAnalysisPipeline/MetaAnalysisPipeline.

References

Levels of Evidence and Grades for Recommendations for Developers of Clinical Practice Guidelines (NHMRC, 2009).
Hoffmann, T., Bennett, S. & Mar, C. D. Evidence-Based Practice Across the Health Professions (Churchill Livingstone, 2014).
Kendall, S. Evidence-based resources simplified. Can. Fam. Physician 54, 241–243 (2008).
PubMed PubMed Central Google Scholar
Davidson, M. & Iles, R. in Research Methods in Health: Foundations for Evidence-Based Practice (ed. Liamputtong, P.) 285–300 (Oxford Univ. Press, 2010).
Cook, D. J., Mulrow, C. D. & Haynes, R. B. Systematic reviews: synthesis of best evidence for clinical decisions. Ann. Intern. Med. 126, 376–380 (1997).
Article PubMed Google Scholar
Glass, G. V. Primary, secondary, and meta-analysis of research. Educ. Res. 5, 3–8 (1976).
Article Google Scholar
Greco, T., Zangrillo, A., Biondi-Zoccai, G. & Landoni, G. Meta-analysis: pitfalls and hints. Heart Lung Vessels 5, 219–225 (2013).
PubMed PubMed Central Google Scholar
Michelson, M. & Reuter, K. The significant cost of systematic reviews and meta-analyses: a call for greater involvement of machine learning to assess the promise of clinical trials. Contemp. Clin. Trials Commun. 16, 100443 (2019).
Article PubMed PubMed Central Google Scholar
Allen, I. E. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA 282, 634–635 (1999).
Article PubMed Google Scholar
Polanczyk, G., de Lima, M. S., Horta, B. L., Biederman, J. & Rohde, L. A. The worldwide prevalence of ADHD: a systematic review and metaregression analysis. Am. J. Psychiatry 164, 942–948 (2007).
Article PubMed Google Scholar
Kennis, M. et al. Prospective biomarkers of major depressive disorder: a systematic review and meta-analysis. Mol. Psychiatry 25, 321–338 (2020).
Article PubMed Google Scholar
Broyd, S. J. et al. Default-mode brain dysfunction in mental disorders: a systematic review. Neurosci. Biobehav. Rev. 33, 279–296 (2009).
Article PubMed Google Scholar
Dowlati, Y. et al. A meta-analysis of cytokines in major depression. Biol. Psychiatry 67, 446–457 (2010).
Article PubMed Google Scholar
Cipriani, A. et al. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. Lancet 391, 1357–1366 (2018).
Article PubMed PubMed Central Google Scholar
Moore, T. H. M. et al. Cannabis use and risk of psychotic or affective mental health outcomes: a systematic review. Lancet 370, 319–328 (2007).
Article PubMed Google Scholar
Xiong, J. et al. Impact of COVID-19 pandemic on mental health in the general population: a systematic review. J. Affect. Disord. 277, 55–64 (2020).
Article PubMed PubMed Central Google Scholar
Leucht, S. et al. Comparative efficacy and tolerability of 15 antipsychotic drugs in schizophrenia: a multiple-treatments meta-analysis. Lancet 382, 951–962 (2013).
Article PubMed Google Scholar
Bown, M. J. & Sutton, A. J. Quality control in systematic reviews and meta-analyses. Eur. J. Vasc. Endovasc. Surg. 40, 669–677 (2010).
Article PubMed Google Scholar
Gurevitch, J., Koricheva, J., Nakagawa, S. & Stewart, G. Meta-analysis and the science of research synthesis. Nature. 555, 175–182 (2018).
Article PubMed Google Scholar
Elliott, J. H. et al. Living systematic reviews: an emerging opportunity to narrow the evidence–practice gap. PLoS Med. 11, e1001603 (2014).
Article PubMed PubMed Central Google Scholar
Lerner, I., Créquit, P., Ravaud, P. & Atal, I. Automatic screening using word embeddings achieved high sensitivity and workload reduction for updating living network meta-analyses. J. Clin. Epidemiol. 108, 86–94 (2019).
Article PubMed Google Scholar
Bao, Y. et al. Using machine learning and natural language processing to review and classify the medical literature on cancer susceptibility genes. JCO Clin. Cancer Inform. https://doi.org/10.1200/CCI.19.00042 (2019).
Bannach-Brown, A. et al. Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Syst. Rev. 8, 23 (2019).
Article PubMed PubMed Central Google Scholar
Lange, T., Schwarzer, G., Datzmann, T. & Binder, H. Machine learning for identifying relevant publications in updates of systematic reviews of diagnostic test studies. Res. Synth. Methods https://doi.org/10.1002/jrsm.1486 (2021).
Khalil, H., Ameen, D. & Zarnegar, A. Tools to support the automation of systematic reviews: a scoping review. J. Clin. Epidemiol. 144, 22–42 (2022).
Article PubMed Google Scholar
Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan-a web and mobile app for systematic reviews. Syst. Rev. 5, 210 (2016).
Article PubMed PubMed Central Google Scholar
Gates, A. et al. Performance and Usability of Machine Learning for Screening in Systematic Reviews: A Comparative Evaluation of Three Tools (Agency for Healthcare Research and Quality, 2019); http://www.ncbi.nlm.nih.gov/books/NBK550175/
Orgeolet, L. et al. Can artificial intelligence replace manual search for systematic literature? Review on cutaneous manifestations in primary Sjögren’s syndrome. Rheumatology (Oxford) 59, 811–819 (2020).
Article PubMed Google Scholar
Xiong, Z. et al. A machine learning aided systematic review and meta-analysis of the relative risk of atrial fibrillation in patients with diabetes mellitus. Front. Physiol. 9, 835 (2018).
Article PubMed PubMed Central Google Scholar
Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature. 571, 95–98 (2019).
Article PubMed Google Scholar
Olier, I. et al. Transformational machine learning: learning how to learn from many related scientific problems. Proc. Natl Acad. Sci. USA 118, e2108013118 (2021).
Article PubMed PubMed Central Google Scholar
Nichols, J. D., Oli, M. K., Kendall, W. L. & Boomer, G. S. Opinion: a better approach for dealing with reproducibility and replicability in science. Proc Natl Acad. Sci. USA 118, e2100769118 (2021).
Article PubMed PubMed Central Google Scholar
Patel, B. N. et al. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. NPJ Digit. Med. 2, 111 (2019).
Article PubMed PubMed Central Google Scholar
Marshall, I. J., Noel-Storr, A., Kuiper, J., Thomas, J. & Wallace, B. C. Machine learning for identifying randomized controlled trials: an evaluation and practitioner’s guide. Res. Synth. Methods. 9, 602–614 (2018).
Article PubMed PubMed Central Google Scholar
Wolf, T. et al. HuggingFace’s transformers: state-of-the-art natural language processing. ArXiv http://arxiv.org/abs/1910.03771 (2020).
Norman, C. R., Leeflang, M. M. G., Porcher, R. & Névéol, A. Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy. Syst. Rev. 8, 243 (2019).
Article PubMed PubMed Central Google Scholar
Frénay, B. & Kabán, A. A comprehensive introduction to label noise. In Proc. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 267–276 (ESANN, 2014).
Delgado-Rodriguez, M. Bias. J. Epidemiol. Community Health. 58, 635–641 (2004).
Article PubMed PubMed Central Google Scholar
Song, H., Kim, M., Park, D. & Lee, J. G. Learning from noisy labels with deep neural networks: a survey. ArXiv http://arxiv.org/abs/2007.08199 (2020).
Beltagy, I., Lo, K. & Cohan, A. SciBERT: a pretrained language model for scientific text. ArXiv http://arxiv.org/abs/1903.10676 (2019).
Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv http://arxiv.org/abs/1810.04805 (2019).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Vaswani, A. et al. Attention is all you need. ArXiv http://arxiv.org/abs/1706.03762 (2017).
Berlin, J. A. & Golub, R. M. Meta-analysis as evidence: building a better pyramid. JAMA. 312, 603–605 (2014).
Article PubMed Google Scholar
Oremus, M., Oremus, C., Hall, G. B. C. & McKinnon, M. C. ECT & cognition systematic review team. Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales. BMJ Open 2, e001368 (2012).
Article PubMed PubMed Central Google Scholar
Atkinson, D. & Murray, M. Improving Interrater Reliability (ERIC, 1987); https://eric.ed.gov/?id=ED287175.
Linder, S. K., Kamath, G. R., Pratt, G. F., Saraykar, S. S. & Volk, R. J. Citation searches are more sensitive than keyword searches to identify studies using specific measurement instruments. J. Clin. Epidemiol. 68, 412–417 (2015).
Article PubMed Google Scholar
Kanaris, I., Kanaris, K., Houvardas, I. & Stamatatos, E. Words vs. character n-grams for anti-spam filtering. Int. J. Artif. Intel.l 20, 1–20 (2006).
Google Scholar
Chen, P. H., Zafar, H., Galperin-Aizenberg, M. & Cook, T. Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports. J. Digit. Imaging. 31, 178–184 (2018).
Article PubMed Google Scholar
Yamamoto, S., Lauscher, A., Ponzetto, S. P., Glavaš, G. & Morishima, S. Self-supervised learning for visual summary identification in scientific publications. ArXiv http://arxiv.org/abs/2012.11213 (2021).
Bisong, E. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners 1st edn (ed. Bisong, E.) 59–64 (Apress, 2019); https://doi.org/10.1007/978-1-4842-4470-8_7.
McKinney, W. Data Structures for Statistical Computing in Python (SciPy, 2010); https://doi.org/10.25080/Majora-92bf1922-00a.
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article PubMed PubMed Central Google Scholar
Abadi, M. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. ArXiv https://arxiv.org/abs/1603.04467 (2016).
Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. ArXiv https://arxiv.org/abs/1912.01703 (2019).
Van Rossum, G. & Drake, F. L. Python 3 Reference Manual (CreateSpace, 2009).

Download references

Acknowledgments

We express appreciation to the members of our Pediatric Emotion and Resilience Lab (PEARL) for their contributions to this work. We have no funding to disclose related to this work.

Author information

These authors contributed equally: Aaron J. Gorelik, Mark G. Gorelik.

Authors and Affiliations

School of Medicine, Stanford University, Stanford, CA, USA
Aaron J. Gorelik, Akua F. Nimarko, Virginia Peisch, Tracy Pan & Manpreet K. Singh
Department of Pathology and Immunology, Washington University, St Louis, MO, USA
Mark G. Gorelik
Department of Psychiatry and Division of Research, The Permanente Medical Group, Oakland, CA, USA
Kathryn K. Ridout
Department of Psychiatry and Behavioral Sciences, Department of Neuroscience, Rice University, Houston, TX, USA
Shamanth R. Kuramkote
Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
Michelle Low
Department of Biological Sciences, University of the Pacific, Stockton, CA, USA
Simirthi Singh
School of Medicine, University of California, Davis, CA, USA
Ananya Nrusimha

Authors

Aaron J. Gorelik
View author publications
You can also search for this author in PubMed Google Scholar
Mark G. Gorelik
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn K. Ridout
View author publications
You can also search for this author in PubMed Google Scholar
Akua F. Nimarko
View author publications
You can also search for this author in PubMed Google Scholar
Virginia Peisch
View author publications
You can also search for this author in PubMed Google Scholar
Shamanth R. Kuramkote
View author publications
You can also search for this author in PubMed Google Scholar
Michelle Low
View author publications
You can also search for this author in PubMed Google Scholar
Tracy Pan
View author publications
You can also search for this author in PubMed Google Scholar
Simirthi Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ananya Nrusimha
View author publications
You can also search for this author in PubMed Google Scholar
Manpreet K. Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.J.G., M.G.G., K.K.R., and M.K.S. conceptualized and executed the study protocol. A.J.G., M.G.G., K.K.R., and M.K.S. contributed to the analyses. A.J.G., M.G.G., M.K.S., K.K.R., V.P., and A.N. contributed to writing of the manuscript. A.J.G. and M.G.G. contributed to visualizations. K.K.R., A.N., A.F.N., V.P., S.R.K., T.P., M.L., S.S., and M.K.S. contributed to the manual tagging of abstracts.

Corresponding author

Correspondence to Manpreet K. Singh.

Ethics declarations

Competing interests

M.K.S. has received research support from Stanford’s Maternal Child Health Research Institute and Stanford’s Department of Psychiatry and Behavioral Sciences, National Institute of Mental Health, National Institute of Aging, Patient Centered Outcomes Research Institute, Johnson and Johnson, and the Brain and Behavior Research Foundation. She is on the advisory board for Sunovion and Skyland Trail and is a consultant for Johnson and Johnson, Alkermes, and Neumora. She has previously consulted for X, moonshot factory, Alphabet Inc., and Limbix Health. She receives honoraria from the American Academy of Child and Adolescent Psychiatry and royalties from American Psychiatric Association Publishing and Thrive Global. K.K.R. receives support from The Permanente Medical Group’s Physician Researcher Program. No other authors report any biomedical financial interests or potential conflicts of interest.

Peer review

Peer review information

Nature Mental Health thanks Federica Colombo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–10, Tables 1–14, and Methods.

Reporting Summary

Supplementary Data

Source data for Supplementary Figs. 1–5, 7, and 8 and Supplementary Tables 11–13.

Source data

Source Data Fig. 2

Source data for Fig. 2.

Source Data Fig. 3

Source data for Fig. 3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gorelik, A.J., Gorelik, M.G., Ridout, K.K. et al. Evaluating efficiency and accuracy of deep-learning-based approaches on study selection for psychiatry systematic reviews. Nat. Mental Health 1, 623–632 (2023). https://doi.org/10.1038/s44220-023-00109-w

Download citation

Received: 18 August 2022
Accepted: 20 July 2023
Published: 31 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1038/s44220-023-00109-w