Introduction

Major depressive disorder (MDD) is the leading cause of disability worldwide, with more than 260 million people affected1. The burden of MDD is on the rise globally, making the improvement of treatments a high priority (World Health Organization1). Although treatments are overall effective in reducing symptoms, less than half the patients display full remission2,3. The reason may lie in problems of symptom monitoring, as insensitive monitoring may impede the identification of subtle differences between individuals and between patterns of symptomatology.

Available monitoring methods can be divided into two main categories: self-report questionnaires (such as the Beck Depression Inventory, BDI, Beck et al.4) and semi-structured interviews (such as the Montgomery-Asberg Depression Rating Scale5). One of the common tools for MDD symptom monitoring is the Hamilton Rating Scale for Depression (HRSD), a semi-structured interview, containing 17 items assessing the patient's symptoms in the preceding week6. This scale is the most commonly used tool for evaluating MDD symptoms in psychotherapy and psychiatric research, and is considered a gold standard7,8,9. Despite its extensive use, the HRSD has been criticized for the unequal contribution of the items to the global score, due to their different scaling, and for the poor inter-rater and retest reliability displayed by many of its items10.

In addition to these psychometric drawbacks, the HRSD has some conceptual drawbacks, which also characterize other traditional MDD monitoring tools. First, HRSD has low ecological validity, as the interview takes place in clinical or lab settings, while asking patients to retrospectively report on their everyday experiences11. As the memory of depressed individuals is negatively biased12,13, relying on their retrospective reports is problematic. Second, HRSD consumes a considerable amount of time and resources, as patients are asked to attend the interview weekly, and each interview may take 30 min to complete7. This procedure, which relies heavily on the patient's cooperation, might be very challenging, given the poor motivation that usually characterizes MDD14,15. Third, HRSD is usually based on a weekly assessment of symptoms8, and occasionally, assessment points are even more distant in time (e.g., Moran & Mohr, 2005). This weekly (or even monthly) monitoring is partial, as a great deal of information regarding symptomatology patterns is missing16. Finally, HRSD fails to distinguish between trait-like and state-like components of symptomatology; that is, between stable general features of the individual's symptomatology and the dynamic daily manifestation of symptoms. Distinguishing between them is essential for the understanding of mechanisms of change in treatments and for the personalization of treatments17,18. These drawbacks of the HRSD create a need for additional, complementary tools.

A potential solution for the low ecological validity of the HRSD can be found in the well-supported experience sampling method (ESM) and ecological momentary assessment (EMA). ESM/EMA have already been used in research into mental conditions, including depression19,20,21. In ESM/EMA studies, people react to repeated assessments and report their experiences while functioning in their everyday settings. This real-time assessment reduces memory biases22,23. In addition, the frequent assessment of the ESM/EMA seems to better capture the dynamic pattern of symptoms, addressing another drawback of the original tool24. As for efficiency, the integration of smartphones into ESM/EMA research has taken the field forward, providing the opportunity to create new assessment tools which are considerably more efficient and less demanding for participants than the traditional ones25,26.

The use of smartphones in physical and mental medical research, especially for remote monitoring, is becoming increasingly common27,28. To ensure high efficiency and keep the monitoring process user-friendly for participants, some researchers have used non-verbal (e.g., image-based) formats (e.g., Arthritis symptom monitoring29). Non-verbal digital tools have also been used to capture complicated concepts, such as emotional states and mental symptoms, in clinical and non-clinical samples30,31,32. Non-verbal digital ESM/EMA assessment tools enable a quick and intuitive response, and thus, are considered to be better suited for populations with poor motivation, such as MDD patients31,33.

In light of these significant advantages, ESM/EMA in their technological form (e.g., smartphones), have spread in psychotherapy and psychiatric research, as a means of monitoring the progress of treatment34. Yet, this trend is still in its infancy, with not enough evidence-based tools available34,35,36,37. Even fewer attempts have been made to develop digital ESM/EMA versions of existing MDD monitoring tools36. Two such studies sought to develop digital versions (apps) of the Center for Epidemiologic Studies Depression Scale-Revised (CESD-R) and of the Patient Health Questionnaire-9 (PHQ-938,39). Both studies displayed promising results in terms of validity and adherence rates. However, neither of them used a non-verbal format, which is better suited for the engagement of MDD patients31,33. Furthermore, neither of the studies distinguished between trait-like and state-like aspects; that is, between a baseline report on the individual's general tendencies to experience MDD symptoms, and repeated reports on the daily experience of symptoms. In addition, to our knowledge, no previous effort has been made to develop a digital ESM/EMA version of the HRSD, despite being the most commonly used MDD scale in randomized controlled trial research7,8. Therefore, to the best of our knowledge, our project is the first attempt to create an ESM/EMA digital version of the gold standard HRSD, in order to monitor MDD symptoms during treatment, while addressing the mentioned drawbacks of the original tool.

The present study

The aim of the present study was to develop a digital tool for monitoring MDD symptoms during treatment. The development of HRSD-D, a digital image-based version of the gold standard HRSD, aims to address four drawbacks of the existing tool: low ecological validity, low efficiency, missing information due to long intervals between assessments and lack of discrimination between state-like and trait-like aspects of symptomatology. HRSD-D collects daily real-time reports on MDD symptoms by smartphone and in everyday settings, unlike the original HRSD, which is a retrospective report based on a weekly interview in a clinical or lab setting. To improve efficiency and make HRSD-D as user-friendly as possible, we used images to report symptoms. This approach of using non-verbal content was found efficient and effective in assessing mental health symptoms and in differentiating between emotional states in clinical and non-clinical samples30,31,32,33,40,41,42. The current study constitutes the first phase of the HRSD-D development program, and thus a prototype version was used for validation on a preclinical sample. The study focused on three main aims: (a) development of HRSD-D (two versions: HRSD-DS, state-like, and HRSD-DT, trait-like) by the selection of the images to be included; (b) validation of the two versions of HRSD-D; (c) assessment of the feasibility of HRSD-DS by examining its ability to capture the dynamic features of MDD manifestations occurring in parallel with significant stable features of symptomatology, and (d) replication in an independent sample.

General method

We developed and evaluated HRSD-D in three stages. In stage 1, we created a pool of items consisting of three potential images for each original HRSD item and asked well-trained HRSD interviewers to select the most representative image for each item. Based on the results of stage 1, we developed the prototypes of HRSD-DS and HRSD-DT. To disentangle the two components, we used two versions of the same construct with different instructions17 (e.g., STAI43). In stage 2, we evaluated HRSD-D on a preclinical sample and tested its validity and feasibility using qualitative and quantitative approaches. In stage 3 we replicated the feasibility findings of stage 2, using an independent sample.

Stage 1: development of HRSD-D

We aimed to find a single representative image for each original HRSD item, to be included in HRSD-D. We focused on the first 17 items of HRSD (HRSD-17), a commonly used version in psychotherapy and psychiatric research44.

Method

The preparatory process for stage 1 included consultations with a focus group of well-trained HRSD interviewers. Three items were discussed (retardation, agitation, and insight) because their evaluation is based on the interviewer's observation. The focus group led to the exclusion of two items, retardation and insight, which are based on the interviewer's observation, and the inclusion of one item, agitation, which is based to a larger degree on self-report. This process resulted in the inclusion of 15 items in HRSD-D. In addition, we added short titles to the images to make them easier to understand. Stage 1 included the following two phases: (a) finding three potential images for each HRSD item, and (b) selecting the most representative image for each item.

In the first phase, we looked for three potential images for each of the 15 HRSD items included in HRSD-D. To this end, we used "Thinkstock" online images (since then, moved to iStock: https://www.istockphoto.com/). In the second phase, we created an online survey, using "Google Forms," asking the respondents (n = 53) to choose one out of the three potential images for each item. The survey included animations and human images of different genders. The respondents were licensed clinical psychologists, and undergraduate and graduate students in psychology, working directly with individuals with MDD. Every HRSD item was presented in a separated block of the survey, which included the three potential images, with a title above each, and the text of the original item below them, as well as the question: "Which image do you find to be the most representative of this item?" After completing the 15 blocks of the survey, respondents were asked how well they thought the images captured the items overall. Answers were provided on a 1–5 scale. Respondents were also asked whether they thought the short titles were essential for understanding the items. At the end of the survey, respondents had the opportunity to comment in an open format. The procedure for image selection and the selection of the sample size of respondents were in accordance with previous studies examining evidence-based digital image-based tools for mental health31,45.

Results

Respondents' answers indicated that, overall, the images successfully captured the idea of the original items (M = 3.96, SD = 0.8), and 85% confirmed that the short titles were essential. For 13 out of the 15 items, a single image was selected to be included in HRSD-D, based on the majority of votes. For the remaining two items (insomnia—early in the morning and hypochondriasis) the respondents' open-format feedback indicated that none of the potential images were good enough to represent them. For these two items, a second round of the survey was conducted, using a small sample of respondents who were well-trained in HRSD administration (n = 10). Eventually, 15 images were selected to be included in HRSD-D.

Stage 2: evaluation of HRSD-D

Based on the results from stage 1, we used the Qualtrics software to construct the prototypes of the two versions of HRSD-D: HRSD-DS and HRSD-DT. At this stage, we evaluated the validity and feasibility of HRSD-D on a preclinical sample, using both qualitative and quantitative approaches. We tested convergent validity against the original HRSD. To test the feasibility of HRSD-DS, we followed Wright and Simms’s46 analyses of the daily dynamics of personality disorders. We examined the ability of HRSD-D to capture the dynamic nature of MDD symptom manifestations, as well as significant stable features of symptomatology (levels of symptoms and of fluctuations). The feasibility of HRSD-D was assessed also based on qualitative feedback from the participants and by calculating adherence rates across the month.

Method

Participants

Fifty participants reported on their history of depression or similar mood affective disorders, such as anxiety or dysthymia. Recruitment was based on non-probabilistic convenience sampling and snowball sampling methods, which are common in pilot studies47,48, including participation of first-year undergraduate students (see Table 1 for the demographic characteristics of the sample, and Table S6 in the online supplement for clinical characteristics).

Table 1 Demographic characteristics of the sample.

Procedure

Potential participants were asked to attend an introductory session, in which the procedure was explained in great detail. The procedure was approved by the Ethics Committee of the University of Haifa, and the experiment was performed in accordance with relevant guidelines and regulations. All participants signed an informed consent form. In the introductory session, participants completed the HRSD. Next, using their smartphones, they completed the two digital questionnaires (HRSD-DT and HRSD-DS). Participants were then instructed to complete the HRSD-DS questionnaire every day for the following 28 consecutive days, roughly at the same time of day. Every day, a link was sent to the participants’ smartphones by SMS at the same time of day, in accordance with the participants' preference and their awakening patterns. Once a week, participants underwent the original HRSD interview, on the same day of the week. The final session included a semi-structured interview regarding the user experience. Goodwin et al.49 emphasized the important role of service users (patients) in the evaluation of mental health apps. Transforming HRSD-D into an app is one of the possible future development paths for this tool.

Measures

Hamilton rating scale for depression (HRSD-17)6

A 17-item clinically administered measure assessing the severity of depression. The final score, ranging from 0 to 52, is calculated by summing the 17 items. Higher scores indicate more severe depression. Interviews were conducted by one of the authors, who is highly trained and experienced in the administration and coding of the HRSD. The interviewer was blind to the HRSD-D reports of the participants until the end of the study period.

HRSD-D state-like (HRSD-DS)

A daily digital ESM assessment tool of MDD symptoms. HRSD-DS is a digital image-based version of the HRSD (HRSD-17), consisting of a single image for each of the original HRSD items (excluding insight and retardation). Each time the participants start the questionnaire, a screen with instructions is displayed, asking them to recall the preceding day, including sleep quality, activities, and emotional states. Next, 15 images are presented vertically, with a short title for each, and the question "How well does this image represent me in the past 24 h?" below the image. Participants are asked to rate every image on a scale of 1 (not at all) to 5 (very much). A scale is presented below the question and participants answer by pressing the number. HRSD-DS calculates a daily score by summing up all the 15 items (ranging from 15 to 75), with higher scores representing higher severity of MDD symptoms.

HRSD-D trait-like (HRSD-DT)

The trait-like version of HRSD-D is intended to produce a baseline measure. The questionnaire is identical to HRSD-DS, with one difference: it asks the participants to report their general tendency to experience MDD symptoms. Participants are therefore asked to recall their emotional tendencies during their adulthood. For each image, the question is formulated as follows: "How well does this image represent me in general?".

Qualitative interview

Semi-structured interview, asking participants about their experience with HRSD-DS (the version that was used repeatedly). Interviews took place at the last session and were conducted by the author. The interview focused on three main issues: (a) strengths of HRSD-DS, (b) weaknesses of HRSD-DS, and (c) key principles in developing the final tool or app. The guiding questions were as follows: How was the HRSD-DS experience? Did you have any problems with the tool? Did you find any weaknesses in the tool? What did you like about completing the questionnaire? What do you think are the key principles that need to be followed in developing the final tool? All interviews were recorded and transcribed.

Statistical Analyses

Validity of HRSD-D

The data were hierarchically nested, with assessments nested within individuals. To account for the resulting non-independence of assessments, and to prevent inflation of effects, we added the individual as a random effect to the analyses, using the SAS PROC MIXED procedure for multilevel modeling (MLM)50. To test the validity of HRSD-DS, we investigated whether the daily HRSD-DS scores in a given week tended to covary with the weekly HRSD interview score for the same week. We conducted a series of multilevel models (MLM) to compute the correlations between the one-week averages of HRSD-DS scores and weekly HRSD scores. As for HRSD-DT, it is supposed to reflect general tendencies, and thus we examined its validity against the monthly averages. We tested the correlation between HRSD-DT scores and the average of the HRSD interviews conducted during the month. We also examined the correlation between HRSD-DT scores and the monthly average of the daily HRSD-DS scores. Post-hoc power analyses, supporting the ability of the sample size to produce accurate estimates and item-wise analysis ensuring the structural equivalence of the HRSD-D, are available in the online supplement.

Daily fluctuations in MDD symptoms

To test the ability of HRSD-DS to capture the daily fluctuations in MDD symptoms, we first calculated proportions of item endorsement and descriptive statistics for each HRSD-DS item. Next, we examined the proportion of total variance in each item attributable to individual differences (between-persons variability) in contrast to daily fluctuations (within-person variability). To isolate the variance in daily expressions of MDD attributable to individual differences, we calculated the intraclass correlation coefficient (ICC) from unconditional MLMs, with HRSD-DS items as the outcomes. This measure can be interpreted as the proportion of variance at the between-persons level. Within-person variance is then calculated as 1.00—ICC.

Stability of symptom levels and fluctuations

To test the ability of HRSD-DS to capture the stable features of individual symptomatology, and ascertain whether individuals maintain their relative position to each other in their level and variance46, we investigated the stability of individual differences in average levels of symptoms and average levels of fluctuations. We divided individual time series into quarters (weeks 1, 2, 3, and 4) and calculated individual means (iMs) and individual standard deviation (iSDs) for each quarter. We then correlated the resulting iM and iSD scores for each quarter. This autocorrelation represents the degree of similarity between a given quarter and a lagged version of itself over successive quarters.

Predicting state items based on corresponding trait items

Finally, we sought to investigate the relation between the trait-like and state-like scores. To this end, we examined whether HRSD-DT scores predict individual differences in HRSD-DS scores, using MLMs. In these models, HRSD-DS items served as the Level 1 outcomes and were regressed on HRSD-DT items adjusted for gender and age at Level 2.

Results

Validity of HRSD-D

The available data collected suggest that mean administration time of daily reports with HRSD-DS was 94.32 s (SD = 94.44), and adherence rate for HRSD-DS was 96.29% over the 28-day study period. The mean HRSD-D and HRSD scores as well as their mean item scores are presented in Table 2, which shows that HRSD-DS items vary between patients and time measurement, as indicated by their SD. Results of the multilevel modeling analyses relating the one-week average of HRSD-DS scores to weekly HRSD scores are presented in Table 3. Over the four-week study period, the one-week average of daily HRSD-DS scores correlated significantly and positively with the HRSD score obtained at the interview conducted the same week. The correlation between the two measures and the proportion of shared variance was high, ranging from 50 to 62%. Additionally, HRSD-DT scores correlated positively and significantly with the monthly average of the HRSD scores (r = 0.66, p < 0.001), and with the monthly average of HRSD-DS scores (r = 0.82, p < 0.001).

Table 2 Means scores and SD of the HRSD-DT HRSD-DS and HRSD.
Table 3 Correlations between one-week average of HRSD-DS scores and weekly HRSD score across the four weeks of study.

Daily fluctuations in MDD symptoms

We examined the proportion of variance in daily HRSD-DS scores attributable to between-persons differences by calculating ICCs from intercept only MLMs. ICCs for HRSD-DS items are shown in Table 4 (note that in Tables 4, 5, 6, 7, 8 the names of the items are listed according to the titles displayed in HRSD-D, not necessarily as they appear in the original HRSD). All ICCs were significant (p < 0.001). At the item level, the average ICC was 0.57 (range: 0.25–0.77). This suggests that, on average, approximately 60% of the variance in the daily manifestation of MDD symptoms can be attributed to individual differences, and the remaining 40% to daily fluctuations. At the same time, we found differences depending on the individual item. The items concerning suicidal thoughts, loss of appetite and loss of weight has the lowest ICC indicating that most of the variance in their manifestations was due to daily fluctuations. Feelings of guilt, low motivation, anxiety, somatic symptoms of anxiety, low energy, low sexual desire and hypochondriasis were associated with the largest ICCs, indicating that most of the variance in their manifestations was due to stable individual differences, rather than daily fluctuations. Table 4 also summarizes patterns of endorsement for each HRSD-DS item. The third column of the table shows that the items varied considerably in the proportion of the sample that endorsed them, ranging from 70% and more of the sample that endorsed agitation and anxiety, to only 8% of the sample that endorsed suicidal thoughts, and 17% that endorsed loss of weight.

Table 4 Descriptive statistics for endorsement of daily manifestations of MDD symptoms based on HRSD-DS.
Table 5 Stability in individual level of symptoms (Mean) over 4 weeks of the assessment period.
Table 6 Stability in individual levels of fluctuations (SD) over 4 weeks of the assessment period.
Table 7 Predicting individual differences in rates of daily state items from their corresponding baseline traits.
Table 8 Summary of themes and sample responses extracted from participants' feedback.

Stability of symptom levels and fluctuations

We tested whether individual differences in average levels of MDD symptoms and in levels of daily fluctuations were stable features of the individual over the weeks. To this end, we divided the individual time series into quarters (weeks 1, 2, 3, and 4) and calculated individual means (iMs) and individual standard deviation (iSDs) for each week. We then correlated the resulting iM and iSD scores across each quarter to estimate the stability of these features. Results are presented in Tables 5 and 6 (respectively). A high correlation between weeks (> 0.6) suggests that individuals who showed low levels of symptoms or very little change over time in one week also showed low levels of symptoms or little change in the assessments of other weeks, respectively. Therefore, a high correlation reflects stability in the level of change over time. On average, levels of symptoms were highly stable over weeks, and levels of fluctuations displayed moderate stability rates from one week to the next. Thus, the observed individual differences in mean levels of MDD symptoms and in levels of fluctuations present different stable and meaningful patterns of symptomatology.

Predicting state items with trait corresponding items

Table 7 shows regression coefficient estimates and p values of the association between baseline trait scores (based on HRSD-DT), adjusted to age and gender, and corresponding daily state scores (based on HRSD-DS), using MLMs estimated by robust standard errors, and treating outcomes as continuously distributed. As shown, baseline trait scores were significant predictors of individual differences in state scores.

Qualitative feedback

We used thematic analysis of the transcripts. Because of their relatively short length (around 10 min on average), we followed the basic principles of inductive thematic analysis according to Braun and Clarke51: familiarizing ourselves with the data, generating initial codes, searching for themes, self-reviewing the themes, defining and naming the themes and producing the final report. Themes extracted from the semi-structured interviews were divided according to the three main topics of the interview: strengths of HRSD-DS, weaknesses of HRSD-DS, and key principles in developing the final tool. Themes are presented in Table 8.

Stage 3: replication of stage 2

To test the replicability of the findings reported in stage 2, an independent preclinical sample was used.

Method

Thirty-six participants took part in the replication stage. The procedure and the characteristics of the sample are reported in the online supplement.

Results

The findings were largely replicated, including daily fluctuations of MDD symptoms, stability of symptoms levels and fluctuations, and predictions of state items based on corresponding trait items, as reported in stage 2. For further details see the online supplement. Most of the ICCs were slightly lower in the second sample, a decrease that was visible mostly for negative mood. Two items showed an increase in ICC; "loss of weight" and "suicidal thought."

Discussion

The present study sought to develop the first digital image-based version of the HRSD, the HRSD-D, an innovative tool for MDD symptom monitoring. The final version of the HRSD-D includes the HRSD-DT, a one-time baseline report on general tendencies of the individual to experience MDD symptoms, and the HRSD-DS, a daily report on the experience of symptoms, capturing daily fluctuations of symptom severity. The findings demonstrate the high feasibility of daily monitoring using the HRSD-D, with 94% of participants completing all the study measurements. HRSD-D showed promising preliminary findings regarding validity and strong correlations with the original HRSD. HRSD-DS was found to be sensitive to daily fluctuations not captured by the weekly HRSD, and the findings were replicated in an independent sample. This study provides empirical evidence of the importance of exploring changes in depressive symptoms at a higher time resolution.

HRSD-D was also able to capture both differences between individuals in MDD symptoms (HRSD-DT) and daily fluctuations within individuals (HRSD-DS). The findings suggest a high level of stability of symptoms differentiating between individuals, which may serve as a trait-like characteristic of the individual, and a moderate level of fluctuations within individuals. This is indicated by highly stable levels of symptoms over weeks, and moderate stability in levels of fluctuations from one week to the next. On average, approximately half the variance in the daily manifestation of symptoms was found to be attributed to daily fluctuations within individuals. This is consistent with previous research on daily manifestations of mental symptoms46.

HRSD-D can provide an efficient, ecological, and fine-grained approach to research into the nature of MDD and may solve many of the drawbacks of traditional MDD symptom monitoring tools. First, HRSD-D provides more ecologically valid data and reduces reliance on memories. Weekly assessments might be inaccurate, as noted by our participants, and are especially prone to negative biases in MDD patients52,53. Second, HRSD-D is efficient and requires less time and resources than does HRSD. This is a significant advantage considering the poor motivation that often characterizes MDD patients14,15. From the point of view of researchers, HRSD-D provides an opportunity for assessment that does not require investing resources in the training of interviewers. Third, daily assessments provide a finer-grained clinical image than do weekly assessments16 and can support measurement-based care54. Daily monitoring is more sensitive to the dynamic pattern of symptoms and provides more precise information, which is especially beneficial given the finding that half the variability in MDD symptom manifestations is attributed to daily fluctuations. The rich data can be used to reveal correlational and causal links between symptoms to personalize treatment16,55. Fourth, HRSD-D may also deal with two main psychometric HRSD shortcomings10: the frequent daily measure might improve retest reliability, as short intervals between assessments were previously associated with much higher retest reliability scores56; and the uniform scaling turns the items into equal contributors to the global score. Finally, the two versions of HRSD-D make it possible to distinguish between the stable baseline features of symptomatology—general tendencies to experience MDD symptoms (a trait-like component) and the dynamic features — the daily manifestations of symptoms (a state-like component).

The ability of HRSD-D to separate assessment of trait- and state-like components may be essential to understanding the potential role of a stable level of depression vs. the development and progress of depression over time17. At any time, the level of depression is influenced by some constant trait and temporary changes (e.g., environmental stressors, social support, or biological dispositions57). This description of depression is consistent with our results showing that HRSD-DT scores explain half the variance of HRSD-DS. Additional support for the diverse roles of state and trait depression can be found in studies that showed that they correlate differently with psychopathology. For example, it was found that patients diagnosed with schizoaffective disorder show greater trait depressive symptoms than the healthy control group but not state depression58.

Evaluating trait depression is also important for clinical practice because it enables evaluating baseline depression without the noise originating from temporary changes. A limitation of measures not designed to measure trait depression is their inability to evaluate baseline levels of depression from which change related to treatment can be evaluated. Baseline assessment can be influenced by the state of the patient. For example, the baseline depression of patients being evaluated after a bad day at work or a fight at home may be higher than usual, and subsequent changes cannot be attributed to treatment. The high correlation that was found between average trait depression, as evaluated by the HRSD-DT, and trait depression evaluated by averaging the HRSD-DS suggest that HRSD-DT indeed measures consistent trait that is not influenced by temporary changes and therefore can accurately measure baseline depression. Finally, our sample reported both high anxiety and agitation, which may point to some overlap between the two. This overlap is consistent with previous literature indicating that these two items are loaded on the same factor10.

Limitations and future directions

The limitations of this study can be divided into those of the current efforts to validate HRSD-D and those applicable to HRSD-D in general. The main limitation of the current study includes the use of a relatively small preclinical sample, which may affect the level of variability in some of the items (e.g., suicidal thoughts). But the variability in most of the items of the sample suggests that most of the items were sensitive enough to daily changes in symptoms even in a preclinical sample. Future studies should further explore the validity of HRSD-D with a larger sample of depressed patients. Another limitation is the fact that we measured symptoms daily, rather than with a time resolution where we did not expect symptoms to change (e.g., minutes). Therefore, we were not able to disentangle fluctuations within individuals and measurement errors. In this study, we followed the statistical pipeline suggested by Wright and Simms46 to test the utility of daily measurements, and Pollak et al.31 and Haim et al.45 in demonstrating the face validity of HRSD-D; future studies should complement the current findings with additional ones. Finally, although the present sample may represent the population it came from, future studies in different socio-cultural populations would be needed to further adjust the HRSD-D images, to both capture the content of the items of the original HRSD-D, and at the same time be culturally sensitive.

The limitations of HRSD-D itself have to do with the fact that it is based on self-report, and as such, on the desire of the participant to cooperate. Whereas the HRSD includes also the interviewer's viewpoint, HRSD-D is a pure self-report tool. A possible solution to this limitation may be the addition of implicit measures (e.g., audio recordings59) to HRSD-D. Another limitation, mentioned by our participants, is the possible overwhelming effect of negative content. This limitation emerged also in previous research on mental health apps59, pointing to the need for the inclusion of positive content.

Conclusion

HRSD-D is an innovative image-based tool for MDD symptom monitoring, and to our knowledge, the first digital version of the gold standard HRSD. Our study demonstrates the feasibility of monitoring symptoms using HRSD-D and promising preliminary findings regarding the validity of the data collected. The development of two HRSD-D versions (HRSD-DT, HRSD-DS), assessing the trait-like and state-like components of symptomatology, enables researchers to explore each of them separately, as well as the important interactions between them.