Speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting

Hansen, Lasse; Rocca, Roberta; Simonsen, Arndis; Olsen, Ludvig; Parola, Alberto; Bliksted, Vibeke; Ladegaard, Nicolai; Bang, Dan; Tylén, Kristian; Weed, Ethan; Østergaard, Søren Dinesen; Fusaroli, Riccardo

doi:10.1038/s44220-023-00152-7

Article
Published: 09 November 2023

Speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting

Nature Mental Health volume 1, pages 971–981 (2023)Cite this article

283 Accesses
1 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Speech patterns are argued to be promising diagnostic markers for neuropsychiatric conditions. However, most studies only compare one condition with healthy controls, which does not reflect the challenge faced in clinical practice. Here, to address this, we assessed recordings from 420 participants with major depressive disorder, schizophrenia, autism spectrum disorder and non-psychiatric controls. We trained and tested a variety of models on both binary and multiclass classification tasks using speech and text features. While binary classification models performed similarly to prior research (F1: 0.54–0.92), multiclass classification performance was markedly lower (F1: 0.35–0.75). By combining voice- and text-based models, relative overall performance improved by 9.4% F1 macro. Our findings suggest that binary models may not capture markers specific to individual conditions. Future research should aim to collect larger transdiagnostic datasets to capture the complexity of neuropsychiatric conditions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Processing and training pipelines for feature-based text, speech and ensemble models.**

**Fig. 2: Binary and multiclass performance per group and classification setup.**

**Fig. 3: Detailed performance overview for the ten best multiclass models.**

**Fig. 4: Confusion matrices of the performance of the best multiclass models on the test set.**

Relative importance of speech and voice features in the classification of schizophrenia and depression

Article Open access 19 September 2023

Automatic language analysis identifies and predicts schizophrenia in first-episode of psychosis

Article Open access 01 June 2022

Phonetic relevance and phonemic grouping of speech in the automatic detection of Parkinson’s Disease

Article Open access 13 December 2019

Data availability

Due to reasons of patient confidentiality and GDPR regulations, the raw data used for the present study (transcripts, audio recordings) cannot be shared. The output of all models is supplied in https://github.com/HLasse/multidiagnosis-speech for replication of the figures and tables in the paper.

Code availability

The code used for the analysis in the study can be found in the following two GitHub repositories: https://github.com/HLasse/multidiagnosis-speech/tree/main and https://github.com/rbroc/multidiagnosis-text/tree/master

References

MacFarlane, H., Salem, A. C., Chen, L., Asgari, M. & Fombonne, E. Combining voice and language features improves automated autism detection. Autism Res. 15, 1288–1300 (2022).
Article PubMed PubMed Central Google Scholar
He, L. et al. Deep learning for depression recognition with audiovisual cues: a review. Inf. Fusion 80, 56–86 (2022).
Article Google Scholar
Parola, A., Simonsen, A., Bliksted, V. & Fusaroli, R. Voice patterns in schizophrenia: a systematic review and Bayesian meta-analysis. Schizophr. Res. 216, 24–40 (2020).
Article PubMed Google Scholar
Low, D. M., Bentley, K. H. & Ghosh, S. S. Automated assessment of psychiatric disorders using speech: a systematic review. Laryngoscope Investig. Otolaryngol. 5, 96–116 (2020).
Article PubMed PubMed Central Google Scholar
Koops, S. et al. Speech as a biomarker for depression. CNS Neurol. Disord. Drug Targets 22, 152–160 (2023).
Article PubMed Google Scholar
Fusaroli, R., Lambrechts, A., Bang, D., Bowler, D. M. & Gaigg, S. B. Is voice a marker for autism spectrum disorder? A systematic review and meta-analysis. Autism Res. 10, 384–407 (2017).
Article PubMed Google Scholar
Hansen, L. et al. A generalizable speech emotion recognition model reveals depression and remission. Acta Psychiatr. Scand. 145, 186–199 (2022).
Article PubMed Google Scholar
Kraepelin, E. Manic-Depressive Insanity and Paranoia (E. & S. Livingstone, 1921).
Hamilton, M. A rating scale for depression. J. Neurol. Neurosurg. Psychiatry 23, 56–62 (1960).
Article PubMed PubMed Central Google Scholar
Fusaroli, R. et al. Toward a cumulative science of vocal markers of autism: a cross-linguistic meta-analysis-based investigation of acoustic markers in American and Danish autistic children. Autism Res. 15, 653–664 (2022).
Article PubMed Google Scholar
Rybner, A. et al. Vocal markers of autism: assessing the generalizability of machine learning models. Autism Res. 15, 1018–1030 (2022).
Article PubMed Google Scholar
Parola, A. et al. Speech disturbances in schizophrenia: assessing cross-linguistic generalizability of NLP automated measures of coherence. Schizophr. Res. 259, 59–70 (2022).
Article PubMed Google Scholar
Parola, A. et al. Voice patterns as markers of schizophrenia: building a cumulative generalizable approach via a cross-linguistic and meta-analysis based investigation. Schizophr. Bull. 49, S125–S141 (2023).
Article PubMed Google Scholar
Cummins, N. et al. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015).
Article Google Scholar
Nguyen, T., Phung, D., Dao, B., Venkatesh, S. & Berk, M. Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 5, 217–226 (2014).
Article Google Scholar
The ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research (World Health Organization, 1993).
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM-5 Vol. 10 (American Psychiatric Association, 2013).
Jensen, L. N. & Dwenger, N. MetaVoice. MetaVoice https://metavoice.au.dk/index.html (2022).
Eyben, F. & Schuller, B. openSMILE:): the Munich open-source large-scale multimedia feature extractor. ACMSIGMultimedia Rec. 6, 4–13 (2015).
Article Google Scholar
Degottex, G., Kane, J., Drugman, T., Raitio, T. & Scherer, S. COVAREP—a collaborative voice analysis repository for speech technologies. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 960–964 (IEEE, 2014).
Voppel, A. E., de Boer, J. N., Brederoo, S. G., Schnack, H. G. & Sommer, I. E. C. Semantic and acoustic markers in schizophrenia-spectrum disorders: a combinatory machine learning approach. Schizophr. Bull. 49, S163–S171 (2022).
Article PubMed Central Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article PubMed Google Scholar
Cummins, N., Baird, A. & Schuller, B. W. Speech analysis for health: current state-of-the-art and the increasing impact of deep learning. Methods 151, 41–54 (2018).
Article PubMed Google Scholar
Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neur. In. 33, 12449–12460 (2020).
Google Scholar
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019).
Ruder, S., Peters, M. E., Swayamdipta, S. & Wolf, T. Transfer learning in natural language processing. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials 15–18 (Association for Computational Linguistics, 2019).
Forbes, M. K. et al. Elemental psychopathology: distilling constituent symptoms and patterns of repetition in the diagnostic criteria of the DSM-5. Elem. Psychopathol. https://doi.org/10.31234/osf.io/u56p2 (2023).
de Boer, J. N. et al. Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool. Psychol. Med. 53, 1302–1312 (2023).
Article PubMed Google Scholar
Harrigan, J. A., Larson, M. A. & Pflum, C. J. The role of auditory cues in the detection of state anxiety 1. J. Appl. Soc. Psychol. 24, 1965–1983 (1994).
Article Google Scholar
Amerman, J. D. & Parnell, M. M. Speech timing strategies in elderly adults. J. Phon. 20, 65–76 (1992).
Article Google Scholar
Speechbrain. speechbrain/spkrec-xvect-voxceleb. Hugging Face https://huggingface.co/speechbrain/spkrec-xvect-voxceleb (2022).
Thompson, A. R. Pharmacological agents with effects on voice. Am. J. Otolaryngol. 16, 12–18 (1995).
Article PubMed Google Scholar
Fried, E. I. Studying mental health problems as systems, not syndromes. Curr. Dir. Psychol. Sci. 31, 500–508 (2022).
Article Google Scholar
Shaffer, R. C. et al. The relationship between expressive language sampling and clinical measures in fragile X syndrome and typical development. Brain Sci. 10, 66 (2020).
Article PubMed PubMed Central Google Scholar
Rocca, R. & Yarkoni, T. Language as a fingerprint: self-supervised learning of user encodings using transformers. In Findings of the Association for Computational Linguistics: EMNLP 2022 1701–1714 (Association for Computational Linguistics, 2022).
Insel, T. et al. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. Am. J. Psychiatry 167, 748–751 (2010).
Article PubMed Google Scholar
Ladegaard, N., Lysaker, P. H., Larsen, E. R. & Videbech, P. A comparison of capacities for social cognition and metacognition in first episode and prolonged depression. Psychiatry Res. 220, 883–889 (2014).
Article PubMed Google Scholar
Ladegaard, N., Larsen, E. R., Videbech, P. & Lysaker, P. H. Higher-order social cognition in first-episode major depression. Psychiatry Res. 216, 37–43 (2014).
Article PubMed Google Scholar
Beck, K. I. et al. Cross-cultural comparison of theory of mind deficits in patients with schizophrenia from China and Denmark: different aspects of ToM show different results. Nord. J. Psychiatry 74, 366–373 (2020).
Article PubMed Google Scholar
Bliksted, V. et al. Hyper-and hypomentalizing in patients with first-episode schizophrenia: fMRI and behavioral studies. Schizophr. Bull. 45, 377–385 (2019).
Article PubMed Google Scholar
Bliksted, V., Fagerlund, B., Weed, E., Frith, C. & Videbech, P. Social cognition and neurocognitive deficits in first-episode schizophrenia. Schizophr. Res. 153, 9–17 (2014).
Article PubMed Google Scholar
Abell, F., Happe, F. & Frith, U. Do triangles play tricks? Attribution of mental states to animated shapes in normal and abnormal development. Cogn. Dev. 15, 1–16 (2000).
Article Google Scholar
Sechidis, K., Fusaroli, R., Orozco-Arroyave, J. R., Wolf, D. & Zhang, Y.-P. A machine learning perspective on the emotional content of Parkinsonian speech. Artif. Intell. Med. 115, 102061 (2021).
Article PubMed Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. & Khudanpur, S. X-vectors: robust dnn embeddings for speaker recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5329–5333 (IEEE, 2018).
Hansen, L., Olsen, L. R. & Enevoldsen, K. TextDescriptives: a Python package for calculating a large variety of metrics from text. J. Open Source Softw. 8, 5153 (2023).
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (2017).
Article Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Babu, A. et al. XLS-R: Self-supervised cross-lingual speech representation learning at scale. In Proc. Interspeech 2022 2278–2282 (ISCA, 2022).
Conneau, A. et al. Unsupervised Cross-lingual representation learning at scale. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 8440–8451 (Association for Computational Linguistics, 2020).
Reimers, N. & Gurevych, I. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 4512–4525 (Association for Computational Linguistics, 2020).
Abayomi-Alli, O. O., Damaševičius, R., Qazi, A., Adedoyin-Olowe, M. & Misra, S. Data augmentation and deep learning methods in sound classification: a systematic review. Electronics 11, 3795 (2022).
Article Google Scholar

Download references

Acknowledgements

We acknowledge seed funding from the Interacting Minds Centre (‘Clinical voices’). R.R. is partly supported by funding from the Volkswagen Stiftung. A.P. was supported by Marie Skłodowska-Curie Actions—H2020-MSCA-IF-2018 grant (ID: 832518, Project: MOVES). D.B. is supported by a Lundbeck Foundation Fellowship (R368-2021-325).

Author information

Authors and Affiliations

Department of Affective Disorders, Aarhus University Hospital–Psychiatry, Aarhus, Denmark
Lasse Hansen, Nicolai Ladegaard & Søren Dinesen Østergaard
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
Lasse Hansen, Ludvig Olsen, Vibeke Bliksted, Nicolai Ladegaard & Søren Dinesen Østergaard
Center for Humanities Computing, Aarhus University, Aarhus, Denmark
Lasse Hansen & Roberta Rocca
Interacting Minds Centre, Aarhus University, Aarhus, Denmark
Roberta Rocca, Arndis Simonsen, Vibeke Bliksted, Kristian Tylén, Ethan Weed & Riccardo Fusaroli
Psychosis Research Unit, Aarhus University Hospital–Psychiatry, Aarhus, Denmark
Arndis Simonsen & Vibeke Bliksted
Department of Molecular Medicine, Aarhus University, Aarhus, Denmark
Ludvig Olsen
Department of Linguistics, Cognitive Science and Semiotics, Aarhus University, Aarhus, Denmark
Alberto Parola, Kristian Tylén, Ethan Weed & Riccardo Fusaroli
Centre for Language Technology, University of Copenhagen, Copenhagen, Denmark
Alberto Parola
Center of Functionally Integrative Neuroscience, Aarhus University, Aarhus, Denmark
Dan Bang
Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA
Riccardo Fusaroli

Authors

Lasse Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Roberta Rocca
View author publications
You can also search for this author in PubMed Google Scholar
Arndis Simonsen
View author publications
You can also search for this author in PubMed Google Scholar
Ludvig Olsen
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Parola
View author publications
You can also search for this author in PubMed Google Scholar
Vibeke Bliksted
View author publications
You can also search for this author in PubMed Google Scholar
Nicolai Ladegaard
View author publications
You can also search for this author in PubMed Google Scholar
Dan Bang
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Tylén
View author publications
You can also search for this author in PubMed Google Scholar
Ethan Weed
View author publications
You can also search for this author in PubMed Google Scholar
Søren Dinesen Østergaard
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Fusaroli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.H., R.R. and R.F. conceived and conceptualized the research idea, and developed the analysis plan. A.S., A.P., V.B., N.L., D.B., K.T., E.W. and R.F. collected the data; L.H., R.R. and L.O. cleaned and preprocessed the data; and L.H. and R.R. conducted the data analysis. L.H., R.R. and R.F. drafted the paper. A.S., L.O., A.P., V.B., N.L., D.B., K.T., E.W. and S.D.Ø. reviewed the paper and provided critical comments. R.F. supervised and administered the project.

Corresponding author

Correspondence to Lasse Hansen.

Ethics declarations

Competing interests

S.D.Ø. received the 2020 Lundbeck Foundation Young Investigator Prize. Furthermore, S.D.Ø. owns/has owned units of mutual funds with stock tickers DKIGI, IAIMWC, SPIC25KL and WEKAFKI, and has owned units of exchange traded funds with stock tickers BATE, TRET, QDV5, QDVH, QDVE, SADM, IQQH, USPY, EXH2, 2B76 and EUNL. R.F. has been a paid consultant for F. Hoffman-La Roche. L.H. has been an intern at F. Hoffman-La Roche. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Mental Health thanks Heather MacFarlane, Alban Voppel and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Tables 1–9, animated triangles task, data processing, models and Fig. 1.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hansen, L., Rocca, R., Simonsen, A. et al. Speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting. Nat. Mental Health 1, 971–981 (2023). https://doi.org/10.1038/s44220-023-00152-7

Download citation

Received: 01 February 2023
Accepted: 27 September 2023
Published: 09 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1038/s44220-023-00152-7

This article is cited by

Relative importance of speech and voice features in the classification of schizophrenia and depression
- Mark Berardi
- Katharina Brosch
- Maria Dietrich
Translational Psychiatry (2023)