Introduction

Social imagination often pictures the psychiatrist as a doctor with a kind of mystical power enabling direct perception of the deep, closed, mysterious human psyche. More modestly, in their day-to-day practice, psychiatrists try to apply the crucial “art of understanding”1 to address the specific needs of patients who are suffering from psychiatric illness. The clinical interview is a key component of psychiatric care, since it aims both to gain the confidence of patients and to gather critical information. It provides a clinical impression which is the cornerstone of both diagnostic and therapeutic reasoning. But like all social interactions, the impression resulting from a given interview can be misled by subtle, unconsciously perceived cues such as the presentability of the patient.

In some countries, such as France, the use of pyjamas for inpatients (mainly to prevent suicide and/or escape) is one possibly stigmatizing aspect of the management of inpatients suffering from major depressive episodes (MDE)2. Wearing a uniform was part of the norm in early “asylum care”. After World War Two, and concomitantly with the discovery of new and effective drugs, psychiatric care profoundly altered. The use of uniforms was gradually phased out. Nevertheless, many acute psychiatric in-patient units maintained a policy of placing newly admitted patients, both voluntary and involuntary, in night attire and withholding their day clothes3. Despite the absence of evidence for its usefulness, it is still a common practice in most French psychiatric hospitals, raising ethical debates from within and outside of the mental health profession4. Without entering further into these debates, our clinical intuition was that the presentation in blue pyjamas (i.e. blue scrubs) resulted in an exaggerated impression of severity. We call this possible bias the “blue pyjama syndrome” modelled on the “white coat syndrome” in the context of hypertension5.

Aims of the study

In this paper, we report on a study on the reliability of assessments in psychiatry aiming to better understand subjective measurement processes in MDE, by exploring the existence of the “blue pyjama syndrome”, and quantifying its impact.

Material and Methods

Eligibility

The study was conducted in the mood disorder unit of the Adult Psychiatry University Department of Rennes France. Adult inpatients with mood disorders and a current MDE (as defined by the DSM IV6), diagnosed using the Mini-International Neuropsychiatric Interview7 (MINI) and who were able to understand the study design were eligible. Patients under guardianship or trusteeship, or suffering from schizophrenia or with a medical need to be in pyjamas were not included in the study. Eligible patients were each given an information letter describing the study design (see https://osf.io/24r7k/ for the detailed letter in French). All participants provided written informed consent to take part in the trial. All procedures contributing to this work are in accordance with the relevant guidelines and regulations. The protocol was approved by the local committee (Comité d’éthique du CHU de Rennes) of Rennes, France, on 05 March 2015 (Avis n° 15.15, see https://osf.io/vpu8w/).

Trial design, randomisation and masking

This was a 5-day, prospective, randomized, cross-over study performed in a single centre comparing presentation in pyjamas to presentation in day clothes. After an initial assessment of clinical (medical history, MINI) and socio-demographic data, all patients were randomly assigned to one of two sequences of two assessments: (1) Assessment in pyjamas at day one (D1) and in day clothes at day five (D5) or (2) Assessment in day clothes at D1 and in pyjamas at D5. The period of 5 days was chosen for practical reasons. This 5-day time-lapse was intended 1/ to avoid any constraint for patients (for example the need to come back to the hospital after a hospitalization), 2/ to minimize heterogeneity and 3/ to limit missing data. The investigator used closed envelopes provided by the methodologist containing the randomization status for each patient, in accordance with a computer-generated randomisation list with a 1:1 ratio.

D1 and D5 assessments were identical and based on (1) the video recording of a 5-minute standardised interview (details are described elsewhere8) and (2) the Beck Depression Inventory (BDI)9, a self report inventory. We used our hospital blue pyjamas as the standardised intervention (as illustrated by our team in Fig. 1) and patients used their day clothes in the control condition. All interviews were performed by the same investigator.

Figure 1
figure 1

Four members of the team in pyjamas and in day clothes.

10 psychiatrists were recruited to participate in the study. Psychiatrists from the same team as the investigators were not included in order 1/ to avoid them assessing their own patients and 2/ to be sure that they would not guess the study design. After collection of their socio-demographic characteristics, each psychiatrist was asked to rate 10 videos (or in the case of 4 psychiatrists, 11). The videos were randomly assigned to each psychiatrist using the following rules: 1/ Each video was to be seen by two psychiatrists; 2/ Each psychiatrist was to see an imbalanced distribution (3:7 or 4:7 OR 7:3 or 7:4) of patients in pyjamas or in day clothes (this artifice was to avoid awakening the psychiatrists’ attention to the study objective); 3/ Each psychiatrist was to see only one video for each patient. This second computer-generated randomisation list was prepared by the methodologist and sent to the investigator who prepared the videos for each psychiatrist.

Each psychiatrist was instructed in the scoring of the Clinical Global Impressions scale (CGI), was asked to read a scoring guide, as described previously8 and for each video was to answer the following question “Considering your overall clinical experience with this type of patient, how mentally ill is the patient at this time?”.

The CGI scoring was performed using a Visual Analogue Scale (VAS) in order to collect a continuous outcome that would be easier to handle in our statistical models than the usual discrete outcomes collected with the traditional CGI. The VAS was graduated in order to present the 7 usual categories in the CGI. The use of VAS is common in clinical research and in psychiatry in particular, it has already been used and has demonstrated good inter-rater reliability and good correlation with other depression scales10,11,12.

Importantly, the psychiatrists were not aware of the study objective (deceptive design) and were told that the purpose of this study was to improve the CGI scale with the use of a VAS. They were debriefed after study completion as to the exact objective of the study.

The study was registered with the Open Science Framework on 18 May 2016 as soon we were aware that this framework offered the possibility of preregistration with an embargo period (registration number: osf.io/gcw9e; see https://osf.io/24r7k/ for the detailed protocol in French). Indeed, because the study design involved deceit towards psychiatrists, we did not want a preregistration to be publicly available before data collection was complete.

Statistical Analysis

The principal outcome was the difference in CGI score between the pyjama condition and the day clothes condition. Because the study was a cross-over study in the course of a one-week hospitalisation (the unit hospitalises one week at a time, which can be renewed), it was possible to assess the pre-post CGI difference as a secondary outcome. We planned this assessment in order to put the “pyjama effect” into perspective with the “one week of hospitalisation effect”. To take into account the correlated nature of the data gathered, this analysis was performed using a mixed model with the CGI score as the dependent variable and the following explanatory variables: 1/ pyjamas (yes/no) and 2/ hospitalisation (D1 or D5). This mixed model was performed with the “patient” and the “psychiatrist” factors specified as random effects. The results of this model are the effect of the explanatory variable expressed as CGI scores with their 95% confidence interval. Inter-rater agreement was assessed using the intraclass correlation coefficient (ICC) as defined by the ratio of the inter-patient variance to the sum of the inter-patient variance, the inter-rater variance and the residual variance13.

Finally, in order to explore whether any pyjama effect evidenced on the CGI was due to a pyjama effect in the clinicians’ evaluations or whether pyjamas had a genuine effect on patients’ mood, we analysed BDI scores as another secondary outcome. We also analysed the D1-D5 difference as assessed on the BDI. These analyses were performed using a pairwise t-test (two-tailed, P < 0.05).

Descriptive data were summarized numerically, with mean (+/−standard deviation) for quantitative data and numbers (percentages) for categorical data. All the statistical analyses were performed with R (R Development Core Team, version 3.2.1), with the library lme414.

Because of the complex nature of the design used here, we calculated the number of subjects that would be required for a univariate analysis and considered that it would be sufficient for our multivariate analysis. We hypothesized that the mean severity as measured by the CGI VAS would be 5 points (+/−1 points) in pyjama condition and 4 points (+/−1 points) in day clothes. On the basis of these hypotheses, the number of subjects required was 26 with 2 videos for each subject with the alpha and beta risks set at 0.05. To avoid a saturation of fatigue effect among observers, which would have increased measurement error, we decided to ask observers to participate no more than an estimated 2 hours (including instructions and assessment of the videos). We therefore decided that 10 psychiatrists would assess 10–11 videos.

Sensitivity analysis

Following a comment made during the peer review process, we performed a post-hoc sensitivity analysis to explore whether there was an interaction between the effect of the hospitalisation on the rater’s perception and the effect of pyjamas on his/her perception. We performed the same model as in our pre-specified analysis but we added an interaction term. An analysis of variance, and a model fit criterion (Akaïke’s Information Criterion (AIC)) were used to compare the two models.

Role of the funding source

The sponsor had no role concerning the preparation, review, or approval of the manuscript.

Results

Patients and psychiatrists

From May 2015 to June 2016, a total of 52 eligible patients were screened. 22 patients refused to participate in the study. For 2 patients, the second video was not useable due to technical problems and 2 patients left the hospital before the D5 assessment (it was not possible to film them after discharge because the video recorder had to stay in the hospital). Because these problems were independent from the study, it was decided to replace these patients. Therefore 26 patients completed the 2 video assessments (see Fig. 2). The clinical and demographic characteristics of these patients are presented in Table 1. From June 2016 to July 2016, a total of 11 eligible psychiatrists were identified and 10 agreed to participate in the study (one had no time for this research). They were aged of 43+/−9 years and 5 (50%) were women.

Figure 2
figure 2

Study flowchart.

Table 1 Demographic and Clinical Characteristics of the 26 Patients with Major Depressive Episode (MDE). For all results, data are summarized numerically, with mean (+/−Standard Deviation) for quantitative outcomes and numbers (percentage) for categorical outcomes.

Primary outcome: analysis of the CGI

Figure 3 presents the results concerning the CGI: pyjamas significantly increased the psychiatrists’ global impression of severity by 0·65 [0·27; 1·02] points (p = 0.001). The psychiatrists’ global impressions significantly rated patients as less severe at D5 in comparison with D1 by −0·66 [−1·03; −0·29] (p < 0.001). The ICC was 0.51. Data to reproduce this analysis are available here (https://osf.io/kh28j/) and the corresponding code is available here (https://osf.io/e6zqu/). After the study, all psychiatrists who rated the videos were debriefed, and none had guessed the objective of the study.

Figure 3
figure 3

Pyjama and hospitalization effects on Clinical Global Impressions (CGI). Panel A: Distribution of CGI scores in the day clothes and pyjama conditions. Data are presented for descriptive purpose only. The dots represent each value for each patient (a given patient has 2 values in each condition). Panel B: Distribution of CGI scores at Day 1 and Day 5. Data are presented for descriptive purposes only. The dots represent each value for each patient (a given patient has 2 values in each condition). Panel C: CGI analysis; Forest plot of coefficients and their 95% confidence interval observed with the mixed model (mixed model performed with the “patient” and the “psychiatrist” factors specified as random effects).

Secondary outcome

No difference was found in the analysis of the BDI self-report inventory scores between the pyjama and the day clothes conditions (mean difference = 0·69+/−8·88; p = 0·69) while the score at D1 was 5.84 (+/−6·62) points higher than the score observed at D5 (p < 0·001). These results were also observed in the multivariate model (Fig. 4). Data to reproduce these analyses are available here (https://osf.io/jdmtq/) and the code is available here (https://osf.io/e6zqu/).

Figure 4
figure 4

Pyjama and hospitalization effects on the Beck Depression Inventory (BDI). Panel A: Distribution of BDI scores in the day clothes and pyjama conditions. Panel B: Distribution of BDI scores at Day 1 and Day 5. Panel C: BDI analysis; forest plot of coefficients and their 95% confidence interval observed with a mixed model (mixed model performed with the “patient” factors specified as random effect).

Sensitivity analysis

Figure 5 presents the results of the post-hoc sensitivity analysis. In this model, pyjamas significantly increased the psychiatrists’ global impression of severity by 1·42 [0·52; 2·31] points (p = 0.004). In addition, the psychiatrists’ global impressions did not necessarily rate patients as less severe at D5 in comparison with D1 (difference of 0·11 [−0·78; 1·01], p = 0.803). Although the result did not reach significance, the assessment rather depended on the patient’s attire at Day 5 (interaction of 1·53 [−0·08; 3·15], p = 0.075). The AIC criterion suggested that this model was slightly more parsimonious than our pre-specified model (AIC of 338.42 versus 339.93) although this better fit did not reach statistical significance (p-value = 0.06). Data to reproduce this analysis are available here (https://osf.io/kh28j/) and the corresponding code is available here (https://osf.io/e6zqu/).

Figure 5
figure 5

Pyjama and hospitalization effects and interaction on Clinical Global Impressions (CGI). Panel A: Distribution of CGI scores at Day 1 (D1) and Day 5 (D5) in the day clothes and pyjama conditions. Data are presented for descriptive purposes only. The dots represent each value for each patient (a given patient has 2 values in each condition). Panel B: CGI analysis; Forest plot of coefficients and their 95% confidence interval observed with the mixed model including an interaction (mixed model performed with the “patient” and the “psychiatrist” factors specified as random effects). Negative values for the interaction term (Clothes/days) means that the positive effect perceived after 5 days of hospitalization is more marked when patients are in day clothes than in pyjamas.

Discussion

Statement of principal findings

This study confirmed our first intuition concerning the “blue pyjama syndrome”. While the presentation in pyjamas did not affect patients’ self-report of depression severity, it affected the clinicians’ subjective impressions since they rated higher levels of severity for patients in pyjamas. This difference was in the same order of magnitude as the improvement observed after one week of hospitalisation also confirmed by the self-report of depression severity.

Interestingly, although not statistically significant, the post-hoc sensitivity analysis suggested that this pre-post improvement was mostly perceived by psychiatrists when the patient was not dressed in pyjamas at Day 5. This result suggests stimulating hypotheses. It could be that “the blue pyjama syndrome” is more marked in less severe clinical presentations. Alternatively, it is possible that the improvement observed during hospitalization might translate into changes in presentation that could be easy to identify but are masked by the presentation in blue pyjamas. Further research is needed to confirm this possible interaction and disentangle the possible interpretations of this result.

Strengths and weaknesses of the study

While the pre-post difference could be due to various factors such as regression to the mean, spontaneous improvement, placebo effect, or, indeed effects of the different therapeutic changes that occurred during the one-week hospitalization15, none of these factors are likely to have confounded our estimation of the ‘blue pyjama syndrome’ in the context of this randomized controlled trial. The inter-rater agreement was acceptable in our study. A value of 0.50 is usually considered as being fair16. Nevertheless, certain limitations can be taken into account with this specific design. First, any experiment is somewhat artificial, and even if the video CGI 1/ is validated, 2/ is considered as holistic in its approach and 3/ enables a phenomenological understanding of a given patient9, the 5-minute videoed interview cannot recreate a typical clinical encounter where the clinical impressions are built up session after session, providing material for a more comprehensive understanding. Moreover, a video could induce artefactual attitudes or coping strategies that could be different from those in a direct face-to-face interview. Furthermore, when only a brief videotaped interview is available the clinician could be more likely to make use of non-verbal information to a larger degree than in a normal interview setting. Specifically, using CGIs, which are not focused on symptoms but on overall functioning, may have limited use in an inpatient setting where more detailed information is usually required. On the other hand, at least in France, most clinicians do not use depression rating scales for their day-to-day clinical practice and rely rather on their impressions. For this specific reason, we think that this choice was relevant. In addition, a substantial number of participants were not included in the study because they refused to participate (mainly because they were not comfortable with the idea of being videotaped).

The single ward location and the limited external validity in relation to other settings are two limitations that should be taken into account. Indeed, all participants were recruited from the same unit. In this unit patients 1/ are typically admitted for one or two weeks, 2/ are not under compulsory hospitalisation and 3/ are always in day clothes. Therefore, the patients included could represent an overly selected and homogenous subgroup of depressed patients. One might hypothesise that an opposite trend to the “blue pyjama syndrome” might be observed in patients from a different population, such as, for example, very severely depressed patients with extreme self-neglect. In this case, the presentation in clean blue pyjamas could bias the psychiatrist’s opinion toward an underestimation of the current episode severity. Despite the artificial situation of this randomised study, it might in fact be appropriate for psychiatrists to use diagnostic information from a patient’s attire. For instance, since pyjamas are often prescribed to patients because of risk of suicide or self-harm, the use of pyjamas might in fact be related to the clinical state and would probably be relevant to consider for clinicians, especially when the only information available is a short (5 min.) videotaped interview. In other words, patients in pyjamas are likely to be seen as more severely ill because it suggests that they are too ill to look after their appearance and maintain a daily routine, or because they are severely ill or at significant risk. This might sometimes be true in a real life setting but it was nonetheless not the case in our study. Finally, the sample size in our study is small. The wide confidence intervals observed in the post-hoc sensitivity analysis suggest that the study might be underpowered to provide a refined picture of the “blue pyjama” syndrome. Therefore, our results should not be over-interpreted and, as in most studies, the main interest is rather to invite reflection among clinicians. Clinicians should be aware of this possible bias related to their assessment of a given patient’s severity.

Results in the context of the “pyjama literature”

Numerous studies have explored the possibility of a “rater bias” in randomized controlled trials. These studies explored whether blind raters were able to identify drug vs. placebo treatment in proportions that exceeded chance, for instance, because of adverse events17, 18 and how this could impact study outcomes15, 19,20,21,22,23,24. There is also a large body of literature on stigma in general, in the neurosciences25 and in psychiatry26. But to our knowledge, there is no literature about the bias resulting from the subjective perception of patients’ appearance nor specifically on the use of pyjamas.

The literature on pyjamas is indeed very sparse. A review of the PubMed database found 27 references with the “pyjama OR pajama” as keywords: only two references3, 27 addressed the use of pyjamas in psychiatric hospitals, and neither was experimental. The first one is a one-page paper published in 1982 that is no longer accessible, even by communicating with the author. The second is a critical analysis of this practice focused on enforced pyjama wearing3. Along these lines, Richard Lakeman suggests that this practice might be unequivocally incompatible with notions of patient-perceived recovery, and remarks that there is no evidence to suggest that it could contribute to clinical recovery. Precisely, and bearing in mind the limitations we have raised, our data suggests that the use of pyjamas may interfere with clinicians’ judgment about recovery. We found no epidemiological data about their use internationally. Further searches with the keyword “clothing” found a few references focused on sociological descriptions of the effect of nurses wearing street clothes in place of uniforms28, but no reference about patients wearing pyjamas. We are thus confident that we are the first to provide experimental data in “the pyjama literature” and encourage efforts to replicate this finding, including the use of alternative and more naturalistic designs such as observational studies.

Interestingly, the “blue pyjama syndrome” tackles a very crucial issue in the field of therapeutic research. There is indeed considerable debate about the small although statistically significant differences29 that are usually observed between antidepressants and placebos in MDE. Nonetheless, there is still no consensus and no convincing data to establish whether the differences are clinically meaningful or not30, 31. Let us take a concrete example. A recent meta-analysis of vortioxetine (the newest antidepressant) versus placebo, using the CGI, found a statistically significant drug-placebo disparity of 0.55 points for the 20 mg dose (smaller differences were observed for other doses)32. The efficacy of vortioxetine is thus based on evidence that is in the same order of magnitude as that for our “blue pyjama syndrome”.

Perspectives

Of course, we used the term “blue pyjama syndrome” to be thought-provoking, not to claim that we have discovered a genuinely new psychiatric “syndrome”. In fact, the interest lies not in a study of pyjama-wearing per se, but in what our study says about the reliability and validity of observer ratings for depression. It indicates that ratings of the severity of depression are liable to be influenced by superficial factors concerning the patient’s attire that do not necessarily have any relationship with the severity of the condition. We can imagine that if such ratings are affected by what the patient is wearing, they will also be affected by the patient’s background and environment and other factors that may have no relationship with the condition that is supposed to be the subject of the assessment.

The controversial Roshenam experiment33, is an emblematic and historical example addressing the issue of reliability in psychiatric evaluation. Roshenam and other pseudo-patients gained admission to psychiatric hospitals by briefly reporting that that they had been hearing voices. After admission, they no longer reported symptoms and behaved as they ‘normally’ would. Despite this, many were treated as inpatients for substantial periods of time. Roshenam suggested that “the hospital itself imposes a special environment in which the meaning of behavior could easily be misunderstood”. Of course, our study is not as disruptive and provocative, but, in the same spirit, it questions whether clinicians and researchers are really able to rate the severity of patient condition in an unbiased, objective manner that reflects the real facts about the condition itself, or whether they are making a judgement based on preconceptions that probably reflect more about their own beliefs than the condition of the patient. This opens onto the complex issue of how we make judgements and produce quantitative data in psychiatry, and what it might really mean when we do so34, 35.