To the Editor — A recent study by Just et al.1 used a Gaussian Naïve Bayes classifier cross-validated with a leave-one-out procedure to correctly classify participants as healthy controls or suicidal ideators with 91% accuracy using functional magnetic resonance imaging (fMRI)-based activation patterns associated with 30 emotion stimulus concepts. These emotion responses were elicited via linguistic stimuli. The accuracy of the classifier is excellent and these results hold promise for future detection of individuals at risk for suicidality. We wonder, however, how generalizable the results of the current study are across culturally diverse populations.

One of the article’s main assumptions is that the neural representations of emotional concepts are homogeneous across individuals and within groups. This assumption is problematic and not justified by the authors. First, due to a small sample size, their groups are unlikely to capture and adequately represent the diversity in the broader population. Second, studies show that cultural, linguistic and experiential background can influence responses to emotion stimulus concepts2,3,4. Neuroimaging studies have also shown cultural differences in brain activity in response to emotion-evoking stimuli5,6. A review of transcultural neuroimaging research suggests that cultural differences affect cognitive processes, including perception, attention, language and emotion7. A discussion addressing this point would be valuable in clarifying the reach of the article’s results. In addition, participant demographics reflecting language and culture may be informative.

The article’s main assumption directs choices in the design of the classifier, which could result in a bias. For example, the authors excluded 21 participants with suicidal ideation based on “poor classification accuracy of the 30 concepts,” which they attributed to “some combination of excessive head motion and an inability to sustain attention to the task”. We question whether the inaccurate concept identification is indeed inaccurate or simply different for those subjects. This approach led to the training of the classifier with only “ideal” data, potentially capturing only a subgroup of suicidal ideators from the population. The classification accuracy decreased, though not significantly, when the excluded group was tested (87%). We worry that with greater population heterogeneity, the classification accuracy will further decrease. It would be useful to investigate whether this drop in accuracy reflects important heterogeneity.

The initial approach to building this classifier was based on insightful hypotheses regarding emotion processing and fMRI neural signatures. However, the convergence to the final classifier appears to be largely achieved by trial and error. Perhaps the authors can clarify their approach. We present here a set of recommendations for development of future iterations of the model. First, it would be helpful to directly test why the brain signatures of the excluded population were “inaccurate” in concept identification, given that these concepts are at the basis of the classifier’s design. Second, the classification of suicidal ideators against relevant psychiatric groups (for example, depression, anxiety) may prove more revealing than a comparison with healthy controls, as it is in individuals from these groups that suicidality is most often assessed. Third, increased sample size and sampling from populations across different cultural and linguistic strata will shed light on the validity of the assumption of common neural activation patterns to emotion concepts.

There is significant clinical promise of the authors’ reported findings, including monitoring current suicide risk and response to treatment. We believe these suggested next steps could strengthen the clinical applicability of this classifier.