Development and evaluation of the HRSD-D, an image-based digital measure of the Hamilton rating scale for depression

Berko, Adi; Bar-Sella, Avigail; Fisher, Hadar; Sobolev, Michael; Pollak, J. P.; Zilcha-Mano, Sigal

doi:10.1038/s41598-022-18434-y

Download PDF

Article
Open access
Published: 22 August 2022

Development and evaluation of the HRSD-D, an image-based digital measure of the Hamilton rating scale for depression

Adi Berko¹,
Avigail Bar-Sella¹,
Hadar Fisher¹,
Michael Sobolev^2,3,
J. P. Pollak³ &
…
Sigal Zilcha-Mano¹

Scientific Reports volume 12, Article number: 14342 (2022) Cite this article

1509 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The Hamilton rating scale for depression (HRSD) is considered the gold standard for the assessment of major depressive disorder. Nevertheless, it has drawbacks such as reliance on retrospective reports and a relatively long administration time. Using a combination of an experience sampling method with mobile health technology, the present study aimed at developing and conducting initial validation of HRSD-D, the first digital image-based assessment of the HRSD. Fifty-three well-trained HRSD interviewers selected the most representative image for each item from an initial sample of images. Based on their responses, we developed the prototype of HRSD-D in two versions: trait-like (HRSD-DT) and state-like (HRSD-DS). HRSD-DT collects one-time reports on general tendencies to experience depressive symptoms; HRSD-DS collects daily reports on the experience of symptoms. Using a total of 1933 responses collected in a preclinical sample (N = 86), we evaluated the validity and feasibility of HRSD-D, based on participant reports of HRSD-DT at baseline, and 28 consecutive daily reports of HRSD-DS, using smartphone devices. HRSD-D showed good convergent validity with respect to the original HRSD, as evident in high correlations between HRSD-DS and HRSD (up to Bstd = 0.80). Our combined qualitative and quantitative analyses indicate that HRSD-D captured both dynamic and stable features of symptomatology, in a user-friendly monitoring process. HRSD-D is a promising tool for the assessment of trait and state depression and contributes to the use of mobile technologies in mental health research and practice.

A smartphone- and wearable-based biomarker for the estimation of unipolar depression severity

Article Open access 01 November 2023

Daily estimates of clinical severity of symptoms in bipolar disorder from smartphone-based self-assessments

Article Open access 18 June 2020

Classifying and clustering mood disorder patients using smartphone data from a feasibility study

Article Open access 21 December 2023

Introduction

Major depressive disorder (MDD) is the leading cause of disability worldwide, with more than 260 million people affected¹. The burden of MDD is on the rise globally, making the improvement of treatments a high priority (World Health Organization¹). Although treatments are overall effective in reducing symptoms, less than half the patients display full remission^2,3. The reason may lie in problems of symptom monitoring, as insensitive monitoring may impede the identification of subtle differences between individuals and between patterns of symptomatology.

Available monitoring methods can be divided into two main categories: self-report questionnaires (such as the Beck Depression Inventory, BDI, Beck et al.⁴) and semi-structured interviews (such as the Montgomery-Asberg Depression Rating Scale⁵). One of the common tools for MDD symptom monitoring is the Hamilton Rating Scale for Depression (HRSD), a semi-structured interview, containing 17 items assessing the patient's symptoms in the preceding week⁶. This scale is the most commonly used tool for evaluating MDD symptoms in psychotherapy and psychiatric research, and is considered a gold standard^7,8,9. Despite its extensive use, the HRSD has been criticized for the unequal contribution of the items to the global score, due to their different scaling, and for the poor inter-rater and retest reliability displayed by many of its items¹⁰.

In addition to these psychometric drawbacks, the HRSD has some conceptual drawbacks, which also characterize other traditional MDD monitoring tools. First, HRSD has low ecological validity, as the interview takes place in clinical or lab settings, while asking patients to retrospectively report on their everyday experiences¹¹. As the memory of depressed individuals is negatively biased^12,13, relying on their retrospective reports is problematic. Second, HRSD consumes a considerable amount of time and resources, as patients are asked to attend the interview weekly, and each interview may take 30 min to complete⁷. This procedure, which relies heavily on the patient's cooperation, might be very challenging, given the poor motivation that usually characterizes MDD^14,15. Third, HRSD is usually based on a weekly assessment of symptoms⁸, and occasionally, assessment points are even more distant in time (e.g., Moran & Mohr, 2005). This weekly (or even monthly) monitoring is partial, as a great deal of information regarding symptomatology patterns is missing¹⁶. Finally, HRSD fails to distinguish between trait-like and state-like components of symptomatology; that is, between stable general features of the individual's symptomatology and the dynamic daily manifestation of symptoms. Distinguishing between them is essential for the understanding of mechanisms of change in treatments and for the personalization of treatments^17,18. These drawbacks of the HRSD create a need for additional, complementary tools.

A potential solution for the low ecological validity of the HRSD can be found in the well-supported experience sampling method (ESM) and ecological momentary assessment (EMA). ESM/EMA have already been used in research into mental conditions, including depression^19,20,21. In ESM/EMA studies, people react to repeated assessments and report their experiences while functioning in their everyday settings. This real-time assessment reduces memory biases^22,23. In addition, the frequent assessment of the ESM/EMA seems to better capture the dynamic pattern of symptoms, addressing another drawback of the original tool²⁴. As for efficiency, the integration of smartphones into ESM/EMA research has taken the field forward, providing the opportunity to create new assessment tools which are considerably more efficient and less demanding for participants than the traditional ones^25,26.

The use of smartphones in physical and mental medical research, especially for remote monitoring, is becoming increasingly common^27,28. To ensure high efficiency and keep the monitoring process user-friendly for participants, some researchers have used non-verbal (e.g., image-based) formats (e.g., Arthritis symptom monitoring²⁹). Non-verbal digital tools have also been used to capture complicated concepts, such as emotional states and mental symptoms, in clinical and non-clinical samples^30,31,32. Non-verbal digital ESM/EMA assessment tools enable a quick and intuitive response, and thus, are considered to be better suited for populations with poor motivation, such as MDD patients^31,33.

In light of these significant advantages, ESM/EMA in their technological form (e.g., smartphones), have spread in psychotherapy and psychiatric research, as a means of monitoring the progress of treatment³⁴. Yet, this trend is still in its infancy, with not enough evidence-based tools available^34,35,36,37. Even fewer attempts have been made to develop digital ESM/EMA versions of existing MDD monitoring tools³⁶. Two such studies sought to develop digital versions (apps) of the Center for Epidemiologic Studies Depression Scale-Revised (CESD-R) and of the Patient Health Questionnaire-9 (PHQ-9^38,39). Both studies displayed promising results in terms of validity and adherence rates. However, neither of them used a non-verbal format, which is better suited for the engagement of MDD patients^31,33. Furthermore, neither of the studies distinguished between trait-like and state-like aspects; that is, between a baseline report on the individual's general tendencies to experience MDD symptoms, and repeated reports on the daily experience of symptoms. In addition, to our knowledge, no previous effort has been made to develop a digital ESM/EMA version of the HRSD, despite being the most commonly used MDD scale in randomized controlled trial research^7,8. Therefore, to the best of our knowledge, our project is the first attempt to create an ESM/EMA digital version of the gold standard HRSD, in order to monitor MDD symptoms during treatment, while addressing the mentioned drawbacks of the original tool.

The present study

The aim of the present study was to develop a digital tool for monitoring MDD symptoms during treatment. The development of HRSD-D, a digital image-based version of the gold standard HRSD, aims to address four drawbacks of the existing tool: low ecological validity, low efficiency, missing information due to long intervals between assessments and lack of discrimination between state-like and trait-like aspects of symptomatology. HRSD-D collects daily real-time reports on MDD symptoms by smartphone and in everyday settings, unlike the original HRSD, which is a retrospective report based on a weekly interview in a clinical or lab setting. To improve efficiency and make HRSD-D as user-friendly as possible, we used images to report symptoms. This approach of using non-verbal content was found efficient and effective in assessing mental health symptoms and in differentiating between emotional states in clinical and non-clinical samples^{30,31,32,33,40,41,42}. The current study constitutes the first phase of the HRSD-D development program, and thus a prototype version was used for validation on a preclinical sample. The study focused on three main aims: (a) development of HRSD-D (two versions: HRSD-DS, state-like, and HRSD-DT, trait-like) by the selection of the images to be included; (b) validation of the two versions of HRSD-D; (c) assessment of the feasibility of HRSD-DS by examining its ability to capture the dynamic features of MDD manifestations occurring in parallel with significant stable features of symptomatology, and (d) replication in an independent sample.

General method

We developed and evaluated HRSD-D in three stages. In stage 1, we created a pool of items consisting of three potential images for each original HRSD item and asked well-trained HRSD interviewers to select the most representative image for each item. Based on the results of stage 1, we developed the prototypes of HRSD-DS and HRSD-DT. To disentangle the two components, we used two versions of the same construct with different instructions¹⁷ (e.g., STAI⁴³). In stage 2, we evaluated HRSD-D on a preclinical sample and tested its validity and feasibility using qualitative and quantitative approaches. In stage 3 we replicated the feasibility findings of stage 2, using an independent sample.

Stage 1: development of HRSD-D

We aimed to find a single representative image for each original HRSD item, to be included in HRSD-D. We focused on the first 17 items of HRSD (HRSD-17), a commonly used version in psychotherapy and psychiatric research⁴⁴.

Method

The preparatory process for stage 1 included consultations with a focus group of well-trained HRSD interviewers. Three items were discussed (retardation, agitation, and insight) because their evaluation is based on the interviewer's observation. The focus group led to the exclusion of two items, retardation and insight, which are based on the interviewer's observation, and the inclusion of one item, agitation, which is based to a larger degree on self-report. This process resulted in the inclusion of 15 items in HRSD-D. In addition, we added short titles to the images to make them easier to understand. Stage 1 included the following two phases: (a) finding three potential images for each HRSD item, and (b) selecting the most representative image for each item.

In the first phase, we looked for three potential images for each of the 15 HRSD items included in HRSD-D. To this end, we used "Thinkstock" online images (since then, moved to iStock: https://www.istockphoto.com/). In the second phase, we created an online survey, using "Google Forms," asking the respondents (n = 53) to choose one out of the three potential images for each item. The survey included animations and human images of different genders. The respondents were licensed clinical psychologists, and undergraduate and graduate students in psychology, working directly with individuals with MDD. Every HRSD item was presented in a separated block of the survey, which included the three potential images, with a title above each, and the text of the original item below them, as well as the question: "Which image do you find to be the most representative of this item?" After completing the 15 blocks of the survey, respondents were asked how well they thought the images captured the items overall. Answers were provided on a 1–5 scale. Respondents were also asked whether they thought the short titles were essential for understanding the items. At the end of the survey, respondents had the opportunity to comment in an open format. The procedure for image selection and the selection of the sample size of respondents were in accordance with previous studies examining evidence-based digital image-based tools for mental health^31,45.

Results

Respondents' answers indicated that, overall, the images successfully captured the idea of the original items (M = 3.96, SD = 0.8), and 85% confirmed that the short titles were essential. For 13 out of the 15 items, a single image was selected to be included in HRSD-D, based on the majority of votes. For the remaining two items (insomnia—early in the morning and hypochondriasis) the respondents' open-format feedback indicated that none of the potential images were good enough to represent them. For these two items, a second round of the survey was conducted, using a small sample of respondents who were well-trained in HRSD administration (n = 10). Eventually, 15 images were selected to be included in HRSD-D.

Stage 2: evaluation of HRSD-D

Based on the results from stage 1, we used the Qualtrics software to construct the prototypes of the two versions of HRSD-D: HRSD-DS and HRSD-DT. At this stage, we evaluated the validity and feasibility of HRSD-D on a preclinical sample, using both qualitative and quantitative approaches. We tested convergent validity against the original HRSD. To test the feasibility of HRSD-DS, we followed Wright and Simms’s⁴⁶ analyses of the daily dynamics of personality disorders. We examined the ability of HRSD-D to capture the dynamic nature of MDD symptom manifestations, as well as significant stable features of symptomatology (levels of symptoms and of fluctuations). The feasibility of HRSD-D was assessed also based on qualitative feedback from the participants and by calculating adherence rates across the month.

Method

Participants

Fifty participants reported on their history of depression or similar mood affective disorders, such as anxiety or dysthymia. Recruitment was based on non-probabilistic convenience sampling and snowball sampling methods, which are common in pilot studies^47,48, including participation of first-year undergraduate students (see Table 1 for the demographic characteristics of the sample, and Table S6 in the online supplement for clinical characteristics).

Table 1 Demographic characteristics of the sample.

Full size table

Procedure

Potential participants were asked to attend an introductory session, in which the procedure was explained in great detail. The procedure was approved by the Ethics Committee of the University of Haifa, and the experiment was performed in accordance with relevant guidelines and regulations. All participants signed an informed consent form. In the introductory session, participants completed the HRSD. Next, using their smartphones, they completed the two digital questionnaires (HRSD-DT and HRSD-DS). Participants were then instructed to complete the HRSD-DS questionnaire every day for the following 28 consecutive days, roughly at the same time of day. Every day, a link was sent to the participants’ smartphones by SMS at the same time of day, in accordance with the participants' preference and their awakening patterns. Once a week, participants underwent the original HRSD interview, on the same day of the week. The final session included a semi-structured interview regarding the user experience. Goodwin et al.⁴⁹ emphasized the important role of service users (patients) in the evaluation of mental health apps. Transforming HRSD-D into an app is one of the possible future development paths for this tool.

Measures

Hamilton rating scale for depression (HRSD-17)⁶

A 17-item clinically administered measure assessing the severity of depression. The final score, ranging from 0 to 52, is calculated by summing the 17 items. Higher scores indicate more severe depression. Interviews were conducted by one of the authors, who is highly trained and experienced in the administration and coding of the HRSD. The interviewer was blind to the HRSD-D reports of the participants until the end of the study period.

HRSD-D state-like (HRSD-DS)

A daily digital ESM assessment tool of MDD symptoms. HRSD-DS is a digital image-based version of the HRSD (HRSD-17), consisting of a single image for each of the original HRSD items (excluding insight and retardation). Each time the participants start the questionnaire, a screen with instructions is displayed, asking them to recall the preceding day, including sleep quality, activities, and emotional states. Next, 15 images are presented vertically, with a short title for each, and the question "How well does this image represent me in the past 24 h?" below the image. Participants are asked to rate every image on a scale of 1 (not at all) to 5 (very much). A scale is presented below the question and participants answer by pressing the number. HRSD-DS calculates a daily score by summing up all the 15 items (ranging from 15 to 75), with higher scores representing higher severity of MDD symptoms.

HRSD-D trait-like (HRSD-DT)

The trait-like version of HRSD-D is intended to produce a baseline measure. The questionnaire is identical to HRSD-DS, with one difference: it asks the participants to report their general tendency to experience MDD symptoms. Participants are therefore asked to recall their emotional tendencies during their adulthood. For each image, the question is formulated as follows: "How well does this image represent me in general?".

Qualitative interview

Semi-structured interview, asking participants about their experience with HRSD-DS (the version that was used repeatedly). Interviews took place at the last session and were conducted by the author. The interview focused on three main issues: (a) strengths of HRSD-DS, (b) weaknesses of HRSD-DS, and (c) key principles in developing the final tool or app. The guiding questions were as follows: How was the HRSD-DS experience? Did you have any problems with the tool? Did you find any weaknesses in the tool? What did you like about completing the questionnaire? What do you think are the key principles that need to be followed in developing the final tool? All interviews were recorded and transcribed.

Statistical Analyses

Validity of HRSD-D

The data were hierarchically nested, with assessments nested within individuals. To account for the resulting non-independence of assessments, and to prevent inflation of effects, we added the individual as a random effect to the analyses, using the SAS PROC MIXED procedure for multilevel modeling (MLM)⁵⁰. To test the validity of HRSD-DS, we investigated whether the daily HRSD-DS scores in a given week tended to covary with the weekly HRSD interview score for the same week. We conducted a series of multilevel models (MLM) to compute the correlations between the one-week averages of HRSD-DS scores and weekly HRSD scores. As for HRSD-DT, it is supposed to reflect general tendencies, and thus we examined its validity against the monthly averages. We tested the correlation between HRSD-DT scores and the average of the HRSD interviews conducted during the month. We also examined the correlation between HRSD-DT scores and the monthly average of the daily HRSD-DS scores. Post-hoc power analyses, supporting the ability of the sample size to produce accurate estimates and item-wise analysis ensuring the structural equivalence of the HRSD-D, are available in the online supplement.

Daily fluctuations in MDD symptoms

To test the ability of HRSD-DS to capture the daily fluctuations in MDD symptoms, we first calculated proportions of item endorsement and descriptive statistics for each HRSD-DS item. Next, we examined the proportion of total variance in each item attributable to individual differences (between-persons variability) in contrast to daily fluctuations (within-person variability). To isolate the variance in daily expressions of MDD attributable to individual differences, we calculated the intraclass correlation coefficient (ICC) from unconditional MLMs, with HRSD-DS items as the outcomes. This measure can be interpreted as the proportion of variance at the between-persons level. Within-person variance is then calculated as 1.00—ICC.

Stability of symptom levels and fluctuations

To test the ability of HRSD-DS to capture the stable features of individual symptomatology, and ascertain whether individuals maintain their relative position to each other in their level and variance⁴⁶, we investigated the stability of individual differences in average levels of symptoms and average levels of fluctuations. We divided individual time series into quarters (weeks 1, 2, 3, and 4) and calculated individual means (iMs) and individual standard deviation (iSDs) for each quarter. We then correlated the resulting iM and iSD scores for each quarter. This autocorrelation represents the degree of similarity between a given quarter and a lagged version of itself over successive quarters.

Predicting state items based on corresponding trait items

Finally, we sought to investigate the relation between the trait-like and state-like scores. To this end, we examined whether HRSD-DT scores predict individual differences in HRSD-DS scores, using MLMs. In these models, HRSD-DS items served as the Level 1 outcomes and were regressed on HRSD-DT items adjusted for gender and age at Level 2.

Results

Validity of HRSD-D

The available data collected suggest that mean administration time of daily reports with HRSD-DS was 94.32 s (SD = 94.44), and adherence rate for HRSD-DS was 96.29% over the 28-day study period. The mean HRSD-D and HRSD scores as well as their mean item scores are presented in Table 2, which shows that HRSD-DS items vary between patients and time measurement, as indicated by their SD. Results of the multilevel modeling analyses relating the one-week average of HRSD-DS scores to weekly HRSD scores are presented in Table 3. Over the four-week study period, the one-week average of daily HRSD-DS scores correlated significantly and positively with the HRSD score obtained at the interview conducted the same week. The correlation between the two measures and the proportion of shared variance was high, ranging from 50 to 62%. Additionally, HRSD-DT scores correlated positively and significantly with the monthly average of the HRSD scores (r = 0.66, p < 0.001), and with the monthly average of HRSD-DS scores (r = 0.82, p < 0.001).

Table 2 Means scores and SD of the HRSD-DT HRSD-DS and HRSD.

Full size table

Table 3 Correlations between one-week average of HRSD-DS scores and weekly HRSD score across the four weeks of study.

Full size table

Daily fluctuations in MDD symptoms

We examined the proportion of variance in daily HRSD-DS scores attributable to between-persons differences by calculating ICCs from intercept only MLMs. ICCs for HRSD-DS items are shown in Table 4 (note that in Tables 4, 5, 6, 7, 8 the names of the items are listed according to the titles displayed in HRSD-D, not necessarily as they appear in the original HRSD). All ICCs were significant (p < 0.001). At the item level, the average ICC was 0.57 (range: 0.25–0.77). This suggests that, on average, approximately 60% of the variance in the daily manifestation of MDD symptoms can be attributed to individual differences, and the remaining 40% to daily fluctuations. At the same time, we found differences depending on the individual item. The items concerning suicidal thoughts, loss of appetite and loss of weight has the lowest ICC indicating that most of the variance in their manifestations was due to daily fluctuations. Feelings of guilt, low motivation, anxiety, somatic symptoms of anxiety, low energy, low sexual desire and hypochondriasis were associated with the largest ICCs, indicating that most of the variance in their manifestations was due to stable individual differences, rather than daily fluctuations. Table 4 also summarizes patterns of endorsement for each HRSD-DS item. The third column of the table shows that the items varied considerably in the proportion of the sample that endorsed them, ranging from 70% and more of the sample that endorsed agitation and anxiety, to only 8% of the sample that endorsed suicidal thoughts, and 17% that endorsed loss of weight.

Table 4 Descriptive statistics for endorsement of daily manifestations of MDD symptoms based on HRSD-DS.

Full size table

Table 5 Stability in individual level of symptoms (Mean) over 4 weeks of the assessment period.

Full size table

Table 6 Stability in individual levels of fluctuations (SD) over 4 weeks of the assessment period.

Full size table

Table 7 Predicting individual differences in rates of daily state items from their corresponding baseline traits.

Full size table

Table 8 Summary of themes and sample responses extracted from participants' feedback.

Full size table

Stability of symptom levels and fluctuations

We tested whether individual differences in average levels of MDD symptoms and in levels of daily fluctuations were stable features of the individual over the weeks. To this end, we divided the individual time series into quarters (weeks 1, 2, 3, and 4) and calculated individual means (iMs) and individual standard deviation (iSDs) for each week. We then correlated the resulting iM and iSD scores across each quarter to estimate the stability of these features. Results are presented in Tables 5 and 6 (respectively). A high correlation between weeks (> 0.6) suggests that individuals who showed low levels of symptoms or very little change over time in one week also showed low levels of symptoms or little change in the assessments of other weeks, respectively. Therefore, a high correlation reflects stability in the level of change over time. On average, levels of symptoms were highly stable over weeks, and levels of fluctuations displayed moderate stability rates from one week to the next. Thus, the observed individual differences in mean levels of MDD symptoms and in levels of fluctuations present different stable and meaningful patterns of symptomatology.

Predicting state items with trait corresponding items

Table 7 shows regression coefficient estimates and p values of the association between baseline trait scores (based on HRSD-DT), adjusted to age and gender, and corresponding daily state scores (based on HRSD-DS), using MLMs estimated by robust standard errors, and treating outcomes as continuously distributed. As shown, baseline trait scores were significant predictors of individual differences in state scores.

Qualitative feedback

We used thematic analysis of the transcripts. Because of their relatively short length (around 10 min on average), we followed the basic principles of inductive thematic analysis according to Braun and Clarke⁵¹: familiarizing ourselves with the data, generating initial codes, searching for themes, self-reviewing the themes, defining and naming the themes and producing the final report. Themes extracted from the semi-structured interviews were divided according to the three main topics of the interview: strengths of HRSD-DS, weaknesses of HRSD-DS, and key principles in developing the final tool. Themes are presented in Table 8.

Stage 3: replication of stage 2

To test the replicability of the findings reported in stage 2, an independent preclinical sample was used.

Method

Thirty-six participants took part in the replication stage. The procedure and the characteristics of the sample are reported in the online supplement.

Results

The findings were largely replicated, including daily fluctuations of MDD symptoms, stability of symptoms levels and fluctuations, and predictions of state items based on corresponding trait items, as reported in stage 2. For further details see the online supplement. Most of the ICCs were slightly lower in the second sample, a decrease that was visible mostly for negative mood. Two items showed an increase in ICC; "loss of weight" and "suicidal thought."

Discussion

The present study sought to develop the first digital image-based version of the HRSD, the HRSD-D, an innovative tool for MDD symptom monitoring. The final version of the HRSD-D includes the HRSD-DT, a one-time baseline report on general tendencies of the individual to experience MDD symptoms, and the HRSD-DS, a daily report on the experience of symptoms, capturing daily fluctuations of symptom severity. The findings demonstrate the high feasibility of daily monitoring using the HRSD-D, with 94% of participants completing all the study measurements. HRSD-D showed promising preliminary findings regarding validity and strong correlations with the original HRSD. HRSD-DS was found to be sensitive to daily fluctuations not captured by the weekly HRSD, and the findings were replicated in an independent sample. This study provides empirical evidence of the importance of exploring changes in depressive symptoms at a higher time resolution.

HRSD-D was also able to capture both differences between individuals in MDD symptoms (HRSD-DT) and daily fluctuations within individuals (HRSD-DS). The findings suggest a high level of stability of symptoms differentiating between individuals, which may serve as a trait-like characteristic of the individual, and a moderate level of fluctuations within individuals. This is indicated by highly stable levels of symptoms over weeks, and moderate stability in levels of fluctuations from one week to the next. On average, approximately half the variance in the daily manifestation of symptoms was found to be attributed to daily fluctuations within individuals. This is consistent with previous research on daily manifestations of mental symptoms⁴⁶.

HRSD-D can provide an efficient, ecological, and fine-grained approach to research into the nature of MDD and may solve many of the drawbacks of traditional MDD symptom monitoring tools. First, HRSD-D provides more ecologically valid data and reduces reliance on memories. Weekly assessments might be inaccurate, as noted by our participants, and are especially prone to negative biases in MDD patients^52,53. Second, HRSD-D is efficient and requires less time and resources than does HRSD. This is a significant advantage considering the poor motivation that often characterizes MDD patients^14,15. From the point of view of researchers, HRSD-D provides an opportunity for assessment that does not require investing resources in the training of interviewers. Third, daily assessments provide a finer-grained clinical image than do weekly assessments¹⁶ and can support measurement-based care⁵⁴. Daily monitoring is more sensitive to the dynamic pattern of symptoms and provides more precise information, which is especially beneficial given the finding that half the variability in MDD symptom manifestations is attributed to daily fluctuations. The rich data can be used to reveal correlational and causal links between symptoms to personalize treatment^16,55. Fourth, HRSD-D may also deal with two main psychometric HRSD shortcomings¹⁰: the frequent daily measure might improve retest reliability, as short intervals between assessments were previously associated with much higher retest reliability scores⁵⁶; and the uniform scaling turns the items into equal contributors to the global score. Finally, the two versions of HRSD-D make it possible to distinguish between the stable baseline features of symptomatology—general tendencies to experience MDD symptoms (a trait-like component) and the dynamic features — the daily manifestations of symptoms (a state-like component).

The ability of HRSD-D to separate assessment of trait- and state-like components may be essential to understanding the potential role of a stable level of depression vs. the development and progress of depression over time¹⁷. At any time, the level of depression is influenced by some constant trait and temporary changes (e.g., environmental stressors, social support, or biological dispositions⁵⁷). This description of depression is consistent with our results showing that HRSD-DT scores explain half the variance of HRSD-DS. Additional support for the diverse roles of state and trait depression can be found in studies that showed that they correlate differently with psychopathology. For example, it was found that patients diagnosed with schizoaffective disorder show greater trait depressive symptoms than the healthy control group but not state depression⁵⁸.

Evaluating trait depression is also important for clinical practice because it enables evaluating baseline depression without the noise originating from temporary changes. A limitation of measures not designed to measure trait depression is their inability to evaluate baseline levels of depression from which change related to treatment can be evaluated. Baseline assessment can be influenced by the state of the patient. For example, the baseline depression of patients being evaluated after a bad day at work or a fight at home may be higher than usual, and subsequent changes cannot be attributed to treatment. The high correlation that was found between average trait depression, as evaluated by the HRSD-DT, and trait depression evaluated by averaging the HRSD-DS suggest that HRSD-DT indeed measures consistent trait that is not influenced by temporary changes and therefore can accurately measure baseline depression. Finally, our sample reported both high anxiety and agitation, which may point to some overlap between the two. This overlap is consistent with previous literature indicating that these two items are loaded on the same factor¹⁰.

Limitations and future directions

The limitations of this study can be divided into those of the current efforts to validate HRSD-D and those applicable to HRSD-D in general. The main limitation of the current study includes the use of a relatively small preclinical sample, which may affect the level of variability in some of the items (e.g., suicidal thoughts). But the variability in most of the items of the sample suggests that most of the items were sensitive enough to daily changes in symptoms even in a preclinical sample. Future studies should further explore the validity of HRSD-D with a larger sample of depressed patients. Another limitation is the fact that we measured symptoms daily, rather than with a time resolution where we did not expect symptoms to change (e.g., minutes). Therefore, we were not able to disentangle fluctuations within individuals and measurement errors. In this study, we followed the statistical pipeline suggested by Wright and Simms⁴⁶ to test the utility of daily measurements, and Pollak et al.³¹ and Haim et al.⁴⁵ in demonstrating the face validity of HRSD-D; future studies should complement the current findings with additional ones. Finally, although the present sample may represent the population it came from, future studies in different socio-cultural populations would be needed to further adjust the HRSD-D images, to both capture the content of the items of the original HRSD-D, and at the same time be culturally sensitive.

The limitations of HRSD-D itself have to do with the fact that it is based on self-report, and as such, on the desire of the participant to cooperate. Whereas the HRSD includes also the interviewer's viewpoint, HRSD-D is a pure self-report tool. A possible solution to this limitation may be the addition of implicit measures (e.g., audio recordings⁵⁹) to HRSD-D. Another limitation, mentioned by our participants, is the possible overwhelming effect of negative content. This limitation emerged also in previous research on mental health apps⁵⁹, pointing to the need for the inclusion of positive content.

Conclusion

HRSD-D is an innovative image-based tool for MDD symptom monitoring, and to our knowledge, the first digital version of the gold standard HRSD. Our study demonstrates the feasibility of monitoring symptoms using HRSD-D and promising preliminary findings regarding the validity of the data collected. The development of two HRSD-D versions (HRSD-DT, HRSD-DS), assessing the trait-like and state-like components of symptomatology, enables researchers to explore each of them separately, as well as the important interactions between them.

References

World Health Organization. Depression fact sheet, retrieved from. https://www.who.int/en/news-room/fact-sheets/detail/depression (2020).
Casacalenda, N., Perry, J. C. & Looper, K. Remission in major depressive disorder: a comparison of pharmacotherapy, psychotherapy, and control conditions. Am. J. Psychiatry 159, 1354–1360 (2002).
Article PubMed Google Scholar
Rush, A. J. et al. Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR* D report. Am. J. Psychiatry 163, 1905–1917 (2006).
Article PubMed Google Scholar
Beck, A. T., Steer, R. A. & Brown, G. K. Manual for the Beck Depression Inventory-II. vol. 1 (The Psychological Cooperation, 1996).
Montgomery, S. A., & Åsberg, M. A new depression scale designed to be sensitive to change. Br J Psychiatry 134(4), 382–389 (1979).
Article CAS PubMed Google Scholar
Hamilton, M. A rating scale for depression. J. Neurol. Neurosurg. Psychiatry 23, 56–62 (1960).
Article CAS PubMed PubMed Central Google Scholar
Lam, R. W., Michalaak, E. E. & Swinson, R. P. Assessment Scales in Depression, Mania and Anxiety. (Taylor and Francis, 2006).
Cusin, C., Yang, H., Yeung, A. & Fava, M. Rating scales for depression. Handb. Clin. Rat. Scales Assess. Psychiatry Ment. Heal. 7–35 (2009).
Worboys, M. The Hamilton Rating Scale for Depression: The making of a “gold standard” and the unmaking of a chronic illness, 1960–1980. Chronic Illn. 9, 202–219 (2013).
Article PubMed PubMed Central Google Scholar
Bagby, R. M., Ryder, A. G., Schuller, D. R. & Marshall, M. B. The Hamilton depression rating scale: Has the gold standard become a lead weight?. Am. J. Psychiatry 161, 2163–2177 (2004).
Article PubMed Google Scholar
Rohan, K. J. et al. A protocol for the Hamilton Rating Scale for Depression: Item scoring rules, Rater training, and outcome accuracy with data on its application in a clinical trial. J. Affect. Disord. 200, 111–118 (2016).
Article PubMed PubMed Central Google Scholar
Mathews, A. & MacLeod, C. Cognitive vulnerability to emotional disorders. Annu. Rev. Clin. Psychol. 1, 167–195 (2005).
Article Google Scholar
Matt, G. E., Vázquez, C. & Campbell, W. K. Mood-congruent recall of affectively toned stimuli: A meta-analytic review. Clin. Psychol. Rev. 12, 227–255 (1992).
Article Google Scholar
Treadway, M. T., Bossaller, N. A., Shelton, R. C. & Zald, D. H. Effort-based decision-making in major depressive disorder: a translational model of motivational anhedonia. J. Abnorm. Psychol. 121, 553–558 (2012).
Article PubMed PubMed Central Google Scholar
Smith, B. Depression and motivation. Phenomenol. Cogn. Sci. 12, 615–635 (2013).
Article Google Scholar
Fisher, A. J. Toward a dynamic model of psychological assessment: Implications for personalized care. J. Consult. Clin. Psychol. 83, 825–836 (2015).
Article PubMed Google Scholar
Zilcha-Mano, S. Toward personalized psychotherapy: The importance of the trait-like distinction for understanding therapeutic change. Am. Psychol. 76, 516–529 (2020).
Article PubMed Google Scholar
Zilcha-Mano, S. Major developments in methods addressing for whom psychotherapy may work and why. Psychother. Res. 29, 693–708 (2019).
Article PubMed Google Scholar
Myin-Germeys, I. et al. Experience sampling research in psychopathology: opening the black box of daily life. Psychol. Med. 39, 1533–1547 (2009).
Article CAS PubMed Google Scholar
Myin-Germeys, I. et al. Experience sampling methodology in mental health research: new insights and technical developments. World Psychiatry 17, 123–132 (2018).
Article PubMed PubMed Central Google Scholar
Telford, C., McCarthy-Jones, S., Corcoran, R. & Rowse, G. Experience sampling methodology studies of depression: the state of the art. Psychol. Med. 42, 1119–1129 (2012).
Article CAS PubMed Google Scholar
Napa Scollon, C., Prieto, C.-K. & Diener, E. Experience sampling: Promises and pitfalls, strength and weaknesses. in Assessing Well-Being 157–180 (Springer, 2009).
Verhagen, S. J. W., Hasmi, L., Drukker, M., van Os, J. & Delespaul, P. A. E. G. Use of the experience sampling method in the context of clinical trials. Evid. Based. Ment. Health 19, 86–89 (2016).
Article PubMed PubMed Central Google Scholar
Ebner-Priemer, U. W. & Trull, T. J. Ecological momentary assessment of mood disorders and mood dysregulation. Psychol. Assess. 21, 463 (2009).
Article PubMed Google Scholar
Miller, G. The smartphone psychology manifesto. Perspect. Psychol. Sci. 7, 221–237 (2012).
Article PubMed Google Scholar
Van Berkel, N., Ferreira, D. & Kostakos, V. The experience sampling method on mobile devices. ACM Comput. Surv. 50, 1–40 (2017).
Article Google Scholar
Sim, I. Mobile devices and health. N. Engl. J. Med. 381, 956–968 (2019).
Article PubMed Google Scholar
Torous, J., Friedman, R. & Keshavan, M. Smartphone ownership and interest in mobile applications to monitor symptoms of mental health conditions. JMIR mHealth uHealth 2, e2994 (2014).
Google Scholar
Yang, L. et al. Your activities of daily living (YADL): An image-based survey technique for patients with arthritis. arXiv Prepr. https://arxiv.org/abs/1601.03278 (2016).
Meschtscherjakov, A., Weiss, A. & Scherndl, T. Utilizing emoticons on mobile devices within ESM studies to measure emotions in the field. Proc. MME Conjunct. MobileHCI 9, 3361–3366 (2009).
Google Scholar
Pollak, J. P., Adams, P. & Gay, G. PAM: a photographic affect meter for frequent, in situ measurement of affect. in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 725–734 (2011).
Sobolev, M. et al. The Digital Marshmallow Test (DMT) diagnostic and monitoring mobile health app for impulsive behavior: development and validation study. JMIR mHealth uHealth 9, e25018 (2021).
Article PubMed PubMed Central Google Scholar
Desmet, P. M. A., Vastenburg, M. H. & Romero, N. Mood measurement with Pick-A-Mood: review of current methods and design of a pictorial self-report scale. J. Des. Res. 14, 241–279 (2016).
Google Scholar
Trull, T. J. & Ebner-Priemer, U. W. Using experience sampling methods/ecological momentary assessment (ESM/EMA) in clinical assessment and clinical research: introduction to the special section. (2009).
Bauer, S. & Moessner, M. Technology-enhanced monitoring in psychotherapy and e-mental health. J. Ment. Heal. 21, 355–363 (2012).
Article Google Scholar
Van Ameringen, M., Turna, J., Khalesi, Z., Pullia, K. & Patterson, B. There is an app for that! The current state of mobile applications (apps) for DSM-5 obsessive-compulsive disorder, posttraumatic stress disorder, anxiety and mood disorders. Depress. Anxiety 34, 526–539 (2017).
Article PubMed Google Scholar
Lui, J. H. L., Marcus, D. K. & Barry, C. T. Evidence-based apps? A review of mental health mobile applications in a psychotherapy context. Prof. Psychol. Res. Pract. 48, 199–210 (2017).
Article Google Scholar
Torous, J. et al. Utilizing a personal smartphone custom app to assess the patient health questionnaire-9 (PHQ-9) depressive symptoms in patients with major depressive disorder. JMIR Ment. Heal. 2, e3889 (2015).
Google Scholar
Chung, K. et al. Development and evaluation of a mobile-optimized daily self-rating depression screening app: A preliminary study. PLoS ONE 13, e0199118 (2018).
Article PubMed PubMed Central CAS Google Scholar
Manassis, K. et al. Mood assessment via animated characters: A novel instrument to evaluate feelings in young children with anxiety disorders. J. Clin. Child Adolesc. Psychol. 38, 380–389 (2009).
Article PubMed Google Scholar
Laurans, G. F. G. & Desmet, P. M. A. Introducing PREMO2: New directions for the non-verbal measurement of emotion in design. in Out of Control: Proceedings of the 8th International Conference on Design and Emotion 11–14 (2012).
Broekens, J. & Brinkman, W.-P. Affect Button: A method for reliable and valid affective self-report. Int. J. Hum. Comput. Stud. 71, 641–667 (2013).
Article Google Scholar
Spielberger, C. D., Gorsuch, R. L., Lushene, R., Vagg, P. R. & Jacobs, G. A. Manual for the STATE-TRAIT ANXIETY INVENtory (Form Y) Mind Garden. Palo Alto, CA (1983).
Carmody, T. J. et al. The Montgomery Äsberg and the Hamilton ratings of depression: a comparison of measures. Eur. Neuropsychopharmacol. 16, 601–611 (2006).
Article CAS PubMed PubMed Central Google Scholar
Haim, S. et al. The mobile photographic stress meter (MPSM) a new way to measure stress using images. in Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers 733–742 (2015).
Wright, A. G. C. & Simms, L. J. Stability and fluctuation of personality disorder features in daily life. J. Abnorm. Psychol. 125, 641–656 (2016).
Article PubMed PubMed Central Google Scholar
Kitchenham, B. & Pfleeger, S. L. Principles of survey research: part 5: populations and samples. ACM SIGSOFT Softw. Eng. Notes 27, 17–20 (2002).
Article Google Scholar
Connelly, L. M. Pilot studies. Medsurg Nurs. 17, 411–412 (2008).
PubMed Google Scholar
Goodwin, J., Cummins, J., Behan, L. & O’Brien, S. M. Development of a mental health smartphone app: Perspectives of mental health service users. J. Ment. Heal. 25, 434–440 (2016).
Article Google Scholar
Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D. & Oliver, S. SAS for Mixed Models. (SAS publishing, 2006).
Braun, V., & Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006).
Article Google Scholar
Clark, D. M. & Teasdale, J. D. Diurnal variation in clinical depression and accessibility of memories of positive and negative experiences. J. Abnorm. Psychol. 91, 87–95 (1982).
Article CAS PubMed Google Scholar
Greenberg, M. S. & Beck, A. T. Depression versus anxiety: a test of the content-specificity hypothesis. J. Abnorm. Psychol. 98, 9–13 (1989).
Article CAS PubMed Google Scholar
Scott, K. & Lewis, C. C. Using measurement-based care to enhance any treatment. Cogn. Behav. Pract. 22, 49–59 (2015).
Article PubMed PubMed Central Google Scholar
Fisher, A. J. & Boswell, J. F. Enhancing the personalization of psychotherapy with dynamic assessment and modeling. Assessment 23, 496–506 (2016).
Article PubMed Google Scholar
Trajković, G. et al. Reliability of the hamilton rating scale for depression: A meta-analysis over a period of 49 years. Psychiatry Res. 189, 1–9 (2011).
Article PubMed Google Scholar
Teasdale, J. D. Cognitive vulnerability to persistent depression. Cogn. Emot. 2, 247–274 (1988).
Article Google Scholar
Chiappelli, J., Nugent, K. L., Thangavelu, K., Searcy, K. & Hong, L. E. Assessment of trait and state aspects of depression in schizophrenia. Schizophr. Bull. 40, 132–142 (2014).
Article PubMed Google Scholar
Rickard, N., Arjmand, H.-A., Bakker, D. & Seabrook, E. Development of a mobile phone app to support self-monitoring of emotional well-being: a mental health digital innovation. JMIR Ment. Heal. 3, e6202 (2016).
Google Scholar

Download references

Acknowledgements

This study was supported by the Israel Science Foundation (Grant no. 186/15; 395/19).

Author information

Authors and Affiliations

The Department of Psychology, University of Haifa, Mount Carmel, 31905, Haifa, Israel
Adi Berko, Avigail Bar-Sella, Hadar Fisher & Sigal Zilcha-Mano
Feinstein Institutes for Medical Research, Northwell Health, New York, USA
Michael Sobolev
Cornell Tech, Cornell University, New York, USA
Michael Sobolev & J. P. Pollak

Authors

Adi Berko
View author publications
You can also search for this author in PubMed Google Scholar
Avigail Bar-Sella
View author publications
You can also search for this author in PubMed Google Scholar
Hadar Fisher
View author publications
You can also search for this author in PubMed Google Scholar
Michael Sobolev
View author publications
You can also search for this author in PubMed Google Scholar
J. P. Pollak
View author publications
You can also search for this author in PubMed Google Scholar
Sigal Zilcha-Mano
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors participated in the design and planning of the assessment-tool's development and validation described here. Specifically, S.Z.M. initiated the idea of the study, and A.B. run the research, under the supervision of S.Z.M. M.S. and J.P. advised in the process of designing and implementing the development and validation of the described tool. A.B. wrote the first draft of the manuscript. A.B.S. and H.F. contributed to the writing of the manuscript, and S.Z.M. supervised the work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sigal Zilcha-Mano.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Berko, A., Bar-Sella, A., Fisher, H. et al. Development and evaluation of the HRSD-D, an image-based digital measure of the Hamilton rating scale for depression. Sci Rep 12, 14342 (2022). https://doi.org/10.1038/s41598-022-18434-y

Download citation

Received: 01 December 2021
Accepted: 11 August 2022
Published: 22 August 2022
DOI: https://doi.org/10.1038/s41598-022-18434-y

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

A smartphone- and wearable-based biomarker for the estimation of unipolar depression severity

Daily estimates of clinical severity of symptoms in bipolar disorder from smartphone-based self-assessments

Classifying and clustering mood disorder patients using smartphone data from a feasibility study

Introduction

The present study

General method

Stage 1: development of HRSD-D

Method

Results

Stage 2: evaluation of HRSD-D

Method

Participants

Procedure

Measures

Hamilton rating scale for depression (HRSD-17)6

HRSD-D state-like (HRSD-DS)

HRSD-D trait-like (HRSD-DT)

Qualitative interview

Statistical Analyses

Validity of HRSD-D

Daily fluctuations in MDD symptoms

Stability of symptom levels and fluctuations

Predicting state items based on corresponding trait items

Results

Validity of HRSD-D

Daily fluctuations in MDD symptoms

Stability of symptom levels and fluctuations

Predicting state items with trait corresponding items

Qualitative feedback

Stage 3: replication of stage 2

Method

Results

Discussion

Limitations and future directions

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links

Hamilton rating scale for depression (HRSD-17)⁶