Detecting clinically significant events through automated language analysis: Quo imus?

Foltz, Peter W; Rosenstein, Mark; Elvevåg, Brita

doi:10.1038/npjschz.2015.54

Download PDF

Brief Communication
Open access
Published: 06 January 2016

Detecting clinically significant events through automated language analysis: Quo imus?

npj Schizophrenia volume 2, Article number: 15054 (2016) Cite this article

3176 Accesses
21 Citations
6 Altmetric
Metrics details

Subjects

We found the recent paper by Bedi et al.¹ simultaneously exciting, heartening and, sadly, a bit discouraging. It shows that modern, statistical natural language processing (NLP) and machine-learning (ML) techniques can potentially be useful as a component of diagnosis, here predicting who among those at risk will eventually transition to full-blown psychosis. This result follows closely our own and others observations of the value of these techniques in, for example, discriminating patients with schizophrenia from controls,² discriminating schizophrenia probands, first-degree relatives and unrelated healthy controls,³ differentiating those at high risk of psychosis from unrelated putatively healthy participants⁴ and in a candidate gene study linking language in general to underlying neurobiology,⁵ all quite encouraging outcomes.

Our disappointment is not directly with the Bedi et al.¹ paper itself, but that we as a field are, after this long proving period, still at the ‘promising’ stage. This inertia arises from two primary factors. The first is owing to the use of small, often second-hand data sets produced for other studies, which severely constrains the NLP techniques that can be applied and the generality of the obtained results. The second is that the methodologies applied must become sufficiently assimilated into the field to be used effectively in analyses so as to provide valid, reliable measures of the constructs of interest. This understanding permits better linking of the appropriate features of language to the underlying etiologies of interest.

To realize the potential of the transformative next steps, we must routinely and systematically strive to obtain larger data sets containing multiple language samples from participants collected over time. This will allow quantifying the joint time course of the disease(s) and changes in language. Increased sample size further improves the methodologies, permitting moving beyond the less-reliable cross-validation to the use of the gold-standard for validating ML results, which is a ‘hold-out’ data set. In such an approach all modeling is conducted blind to the hold-out set, and when modeling is completed, the model is run on the held-out set to measure expected performance in the larger population, thereby ensuring generalization while lowering the risk of overfitting. At least as importantly, realistically sized data sets allow the application of larger combinations of more sophisticated NLP/ML techniques that move beyond the often used simple word-count features. This permits deeper characterization of more important aspects of language, such as semantic structures, discourse organization, as well as acoustic characteristics.⁶

From our perspective, Figure 3 from Bedi et al.¹ is a beautiful, low-dimensional, small, incremental step toward our vision, which is that of a truly high-dimensional language-feature space with the potential to align with the aspirational goals of the NIMH Research Domain Criteria by employing language to locate and pinpoint those with severe mental illness at coordinates within this space. Once localized, the features that define the resulting hypothesized clusters can potentially be calibrated for use in early detection, continuously evaluating treatment and providing links to the biology underlying these diseases, simultaneously superseding our existing diagnostic categories. But this vision is only achievable with purpose-designed studies containing sufficiently large populations with a mix of both healthy participants and individuals sampled across multiple categories of diagnostic groups. Our field must become versed in the use of more powerful applications of NLP/ML techniques and offer more reproducible methodologies. These results, taken with others, are sufficiently encouraging so that it is now time for us to move beyond ‘promising’.

References

Bedi, G. et al. Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophr. 1, 15030 (2015).
Article Google Scholar
Elvevåg, B., Foltz, P. W., Weinberger, D. R. & Goldberg, T. E. Quantifying incoherence in speech: an automated methodology and novel application to schizophrenia. Schizophr. Res. 93, 304–316 (2007).
Article Google Scholar
Elvevåg, B., Foltz, P. W., Rosenstein, M. & DeLisi, L. E. An automated method to analyze language use in patients with schizophrenia and their first-degree relatives. J. Neurolinguistics 23, 270–284 (2010).
Article Google Scholar
Rosenstein, M., Foltz, P. W., DeLisi, L. E. & Elvevåg, B. Language as a biomarker in those at high-risk for psychosis. Schizophr. Res. 165, 249–250 (2015).
Article CAS Google Scholar
Nicodemus, K. K. et al. Category fluency, latent semantic analysis and schizophrenia: a candidate gene approach. Cortex 55, 182–191 (2014).
Article Google Scholar
Cohen, A. S., Hong, L. S. & Guevara, A. Understanding emotional expression using prosodic analysis of natural speech: refining the methodology. J. Behav. Ther. Exp. Psychiatry 41, 150–157 (2010).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Cognitive Science, University of Colorado, Boulder, CO, USA
Peter W Foltz
Pearson, Boulder, CO, USA
Peter W Foltz & Mark Rosenstein
Department of Clinical Medicine, Psychiatry Research Group, University of Tromsø, Norway,
Brita Elvevåg
and Norwegian Centre for Integrated Care and Telemedicine, University Hospital of North Norway, Tromsø, Norway
Brita Elvevåg

Authors

Peter W Foltz
View author publications
You can also search for this author in PubMed Google Scholar
Mark Rosenstein
View author publications
You can also search for this author in PubMed Google Scholar
Brita Elvevåg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter W Foltz.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Foltz, P., Rosenstein, M. & Elvevåg, B. Detecting clinically significant events through automated language analysis: Quo imus?. npj Schizophr 2, 15054 (2016). https://doi.org/10.1038/npjschz.2015.54

Download citation

Received: 28 September 2015
Accepted: 05 October 2015
Published: 06 January 2016
DOI: https://doi.org/10.1038/npjschz.2015.54

This article is cited by

Automatic language analysis identifies and predicts schizophrenia in first-episode of psychosis
- Alicia Figueroa-Barra
- Daniel Del Aguila
- Camila Valderrama
Schizophrenia (2022)
Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach
- Valentina Bambini
- Federico Frau
- Marta Bosia
Schizophrenia (2022)
Digital Phenotyping Using Multimodal Data
- Alex S. Cohen
- Christopher R. Cox
- Brita Elvevåg
Current Behavioral Neuroscience Reports (2020)
A machine learning approach to predicting psychosis using semantic density and latent content analysis
- Neguine Rezaii
- Elaine Walker
- Phillip Wolff
npj Schizophrenia (2019)

Detecting clinically significant events through automated language analysis: Quo imus?

Subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

This article is cited by

Automatic language analysis identifies and predicts schizophrenia in first-episode of psychosis

Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

Digital Phenotyping Using Multimodal Data

A machine learning approach to predicting psychosis using semantic density and latent content analysis

Search

Quick links

Subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Automatic language analysis identifies and predicts schizophrenia in first-episode of psychosis

Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

Digital Phenotyping Using Multimodal Data

A machine learning approach to predicting psychosis using semantic density and latent content analysis

Search

Quick links