Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting

Abstract

Speech patterns are argued to be promising diagnostic markers for neuropsychiatric conditions. However, most studies only compare one condition with healthy controls, which does not reflect the challenge faced in clinical practice. Here, to address this, we assessed recordings from 420 participants with major depressive disorder, schizophrenia, autism spectrum disorder and non-psychiatric controls. We trained and tested a variety of models on both binary and multiclass classification tasks using speech and text features. While binary classification models performed similarly to prior research (F1: 0.54–0.92), multiclass classification performance was markedly lower (F1: 0.35–0.75). By combining voice- and text-based models, relative overall performance improved by 9.4% F1 macro. Our findings suggest that binary models may not capture markers specific to individual conditions. Future research should aim to collect larger transdiagnostic datasets to capture the complexity of neuropsychiatric conditions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Processing and training pipelines for feature-based text, speech and ensemble models.
Fig. 2: Binary and multiclass performance per group and classification setup.
Fig. 3: Detailed performance overview for the ten best multiclass models.
Fig. 4: Confusion matrices of the performance of the best multiclass models on the test set.

Similar content being viewed by others

Data availability

Due to reasons of patient confidentiality and GDPR regulations, the raw data used for the present study (transcripts, audio recordings) cannot be shared. The output of all models is supplied in https://github.com/HLasse/multidiagnosis-speech for replication of the figures and tables in the paper.

Code availability

The code used for the analysis in the study can be found in the following two GitHub repositories: https://github.com/HLasse/multidiagnosis-speech/tree/main and https://github.com/rbroc/multidiagnosis-text/tree/master

References

  1. MacFarlane, H., Salem, A. C., Chen, L., Asgari, M. & Fombonne, E. Combining voice and language features improves automated autism detection. Autism Res. 15, 1288–1300 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  2. He, L. et al. Deep learning for depression recognition with audiovisual cues: a review. Inf. Fusion 80, 56–86 (2022).

    Article  Google Scholar 

  3. Parola, A., Simonsen, A., Bliksted, V. & Fusaroli, R. Voice patterns in schizophrenia: a systematic review and Bayesian meta-analysis. Schizophr. Res. 216, 24–40 (2020).

    Article  PubMed  Google Scholar 

  4. Low, D. M., Bentley, K. H. & Ghosh, S. S. Automated assessment of psychiatric disorders using speech: a systematic review. Laryngoscope Investig. Otolaryngol. 5, 96–116 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Koops, S. et al. Speech as a biomarker for depression. CNS Neurol. Disord. Drug Targets 22, 152–160 (2023).

    Article  PubMed  Google Scholar 

  6. Fusaroli, R., Lambrechts, A., Bang, D., Bowler, D. M. & Gaigg, S. B. Is voice a marker for autism spectrum disorder? A systematic review and meta-analysis. Autism Res. 10, 384–407 (2017).

    Article  PubMed  Google Scholar 

  7. Hansen, L. et al. A generalizable speech emotion recognition model reveals depression and remission. Acta Psychiatr. Scand. 145, 186–199 (2022).

    Article  PubMed  Google Scholar 

  8. Kraepelin, E. Manic-Depressive Insanity and Paranoia (E. & S. Livingstone, 1921).

  9. Hamilton, M. A rating scale for depression. J. Neurol. Neurosurg. Psychiatry 23, 56–62 (1960).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Fusaroli, R. et al. Toward a cumulative science of vocal markers of autism: a cross-linguistic meta-analysis-based investigation of acoustic markers in American and Danish autistic children. Autism Res. 15, 653–664 (2022).

    Article  PubMed  Google Scholar 

  11. Rybner, A. et al. Vocal markers of autism: assessing the generalizability of machine learning models. Autism Res. 15, 1018–1030 (2022).

    Article  PubMed  Google Scholar 

  12. Parola, A. et al. Speech disturbances in schizophrenia: assessing cross-linguistic generalizability of NLP automated measures of coherence. Schizophr. Res. 259, 59–70 (2022).

    Article  PubMed  Google Scholar 

  13. Parola, A. et al. Voice patterns as markers of schizophrenia: building a cumulative generalizable approach via a cross-linguistic and meta-analysis based investigation. Schizophr. Bull. 49, S125–S141 (2023).

    Article  PubMed  Google Scholar 

  14. Cummins, N. et al. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015).

    Article  Google Scholar 

  15. Nguyen, T., Phung, D., Dao, B., Venkatesh, S. & Berk, M. Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 5, 217–226 (2014).

    Article  Google Scholar 

  16. The ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research (World Health Organization, 1993).

  17. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM-5 Vol. 10 (American Psychiatric Association, 2013).

  18. Jensen, L. N. & Dwenger, N. MetaVoice. MetaVoice https://metavoice.au.dk/index.html (2022).

  19. Eyben, F. & Schuller, B. openSMILE:): the Munich open-source large-scale multimedia feature extractor. ACMSIGMultimedia Rec. 6, 4–13 (2015).

    Article  Google Scholar 

  20. Degottex, G., Kane, J., Drugman, T., Raitio, T. & Scherer, S. COVAREP—a collaborative voice analysis repository for speech technologies. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 960–964 (IEEE, 2014).

  21. Voppel, A. E., de Boer, J. N., Brederoo, S. G., Schnack, H. G. & Sommer, I. E. C. Semantic and acoustic markers in schizophrenia-spectrum disorders: a combinatory machine learning approach. Schizophr. Bull. 49, S163–S171 (2022).

    Article  PubMed Central  Google Scholar 

  22. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).

  23. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  PubMed  Google Scholar 

  24. Cummins, N., Baird, A. & Schuller, B. W. Speech analysis for health: current state-of-the-art and the increasing impact of deep learning. Methods 151, 41–54 (2018).

    Article  PubMed  Google Scholar 

  25. Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neur. In. 33, 12449–12460 (2020).

    Google Scholar 

  26. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019).

  27. Ruder, S., Peters, M. E., Swayamdipta, S. & Wolf, T. Transfer learning in natural language processing. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials 15–18 (Association for Computational Linguistics, 2019).

  28. Forbes, M. K. et al. Elemental psychopathology: distilling constituent symptoms and patterns of repetition in the diagnostic criteria of the DSM-5. Elem. Psychopathol. https://doi.org/10.31234/osf.io/u56p2 (2023).

  29. de Boer, J. N. et al. Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool. Psychol. Med. 53, 1302–1312 (2023).

    Article  PubMed  Google Scholar 

  30. Harrigan, J. A., Larson, M. A. & Pflum, C. J. The role of auditory cues in the detection of state anxiety 1. J. Appl. Soc. Psychol. 24, 1965–1983 (1994).

    Article  Google Scholar 

  31. Amerman, J. D. & Parnell, M. M. Speech timing strategies in elderly adults. J. Phon. 20, 65–76 (1992).

    Article  Google Scholar 

  32. Speechbrain. speechbrain/spkrec-xvect-voxceleb. Hugging Face https://huggingface.co/speechbrain/spkrec-xvect-voxceleb (2022).

  33. Thompson, A. R. Pharmacological agents with effects on voice. Am. J. Otolaryngol. 16, 12–18 (1995).

    Article  PubMed  Google Scholar 

  34. Fried, E. I. Studying mental health problems as systems, not syndromes. Curr. Dir. Psychol. Sci. 31, 500–508 (2022).

    Article  Google Scholar 

  35. Shaffer, R. C. et al. The relationship between expressive language sampling and clinical measures in fragile X syndrome and typical development. Brain Sci. 10, 66 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Rocca, R. & Yarkoni, T. Language as a fingerprint: self-supervised learning of user encodings using transformers. In Findings of the Association for Computational Linguistics: EMNLP 2022 1701–1714 (Association for Computational Linguistics, 2022).

  37. Insel, T. et al. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. Am. J. Psychiatry 167, 748–751 (2010).

    Article  PubMed  Google Scholar 

  38. Ladegaard, N., Lysaker, P. H., Larsen, E. R. & Videbech, P. A comparison of capacities for social cognition and metacognition in first episode and prolonged depression. Psychiatry Res. 220, 883–889 (2014).

    Article  PubMed  Google Scholar 

  39. Ladegaard, N., Larsen, E. R., Videbech, P. & Lysaker, P. H. Higher-order social cognition in first-episode major depression. Psychiatry Res. 216, 37–43 (2014).

    Article  PubMed  Google Scholar 

  40. Beck, K. I. et al. Cross-cultural comparison of theory of mind deficits in patients with schizophrenia from China and Denmark: different aspects of ToM show different results. Nord. J. Psychiatry 74, 366–373 (2020).

    Article  PubMed  Google Scholar 

  41. Bliksted, V. et al. Hyper-and hypomentalizing in patients with first-episode schizophrenia: fMRI and behavioral studies. Schizophr. Bull. 45, 377–385 (2019).

    Article  PubMed  Google Scholar 

  42. Bliksted, V., Fagerlund, B., Weed, E., Frith, C. & Videbech, P. Social cognition and neurocognitive deficits in first-episode schizophrenia. Schizophr. Res. 153, 9–17 (2014).

    Article  PubMed  Google Scholar 

  43. Abell, F., Happe, F. & Frith, U. Do triangles play tricks? Attribution of mental states to animated shapes in normal and abnormal development. Cogn. Dev. 15, 1–16 (2000).

    Article  Google Scholar 

  44. Sechidis, K., Fusaroli, R., Orozco-Arroyave, J. R., Wolf, D. & Zhang, Y.-P. A machine learning perspective on the emotional content of Parkinsonian speech. Artif. Intell. Med. 115, 102061 (2021).

    Article  PubMed  Google Scholar 

  45. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. & Khudanpur, S. X-vectors: robust dnn embeddings for speaker recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5329–5333 (IEEE, 2018).

  46. Hansen, L., Olsen, L. R. & Enevoldsen, K. TextDescriptives: a Python package for calculating a large variety of metrics from text. J. Open Source Softw. 8, 5153 (2023).

    Article  Google Scholar 

  47. Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (2017).

    Article  Google Scholar 

  48. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).

  49. Babu, A. et al. XLS-R: Self-supervised cross-lingual speech representation learning at scale. In Proc. Interspeech 2022 2278–2282 (ISCA, 2022).

  50. Conneau, A. et al. Unsupervised Cross-lingual representation learning at scale. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 8440–8451 (Association for Computational Linguistics, 2020).

  51. Reimers, N. & Gurevych, I. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 4512–4525 (Association for Computational Linguistics, 2020).

  52. Abayomi-Alli, O. O., Damaševičius, R., Qazi, A., Adedoyin-Olowe, M. & Misra, S. Data augmentation and deep learning methods in sound classification: a systematic review. Electronics 11, 3795 (2022).

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge seed funding from the Interacting Minds Centre (‘Clinical voices’). R.R. is partly supported by funding from the Volkswagen Stiftung. A.P. was supported by Marie Skłodowska-Curie Actions—H2020-MSCA-IF-2018 grant (ID: 832518, Project: MOVES). D.B. is supported by a Lundbeck Foundation Fellowship (R368-2021-325).

Author information

Authors and Affiliations

Authors

Contributions

L.H., R.R. and R.F. conceived and conceptualized the research idea, and developed the analysis plan. A.S., A.P., V.B., N.L., D.B., K.T., E.W. and R.F. collected the data; L.H., R.R. and L.O. cleaned and preprocessed the data; and L.H. and R.R. conducted the data analysis. L.H., R.R. and R.F. drafted the paper. A.S., L.O., A.P., V.B., N.L., D.B., K.T., E.W. and S.D.Ø. reviewed the paper and provided critical comments. R.F. supervised and administered the project.

Corresponding author

Correspondence to Lasse Hansen.

Ethics declarations

Competing interests

S.D.Ø. received the 2020 Lundbeck Foundation Young Investigator Prize. Furthermore, S.D.Ø. owns/has owned units of mutual funds with stock tickers DKIGI, IAIMWC, SPIC25KL and WEKAFKI, and has owned units of exchange traded funds with stock tickers BATE, TRET, QDV5, QDVH, QDVE, SADM, IQQH, USPY, EXH2, 2B76 and EUNL. R.F. has been a paid consultant for F. Hoffman-La Roche. L.H. has been an intern at F. Hoffman-La Roche. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Mental Health thanks Heather MacFarlane, Alban Voppel and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Tables 1–9, animated triangles task, data processing, models and Fig. 1.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hansen, L., Rocca, R., Simonsen, A. et al. Speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting. Nat. Mental Health 1, 971–981 (2023). https://doi.org/10.1038/s44220-023-00152-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s44220-023-00152-7

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research