Introduction

The Edinburgh Postnatal Depression Scale (EPDS) is a questionnaire that was developed to evaluate postpartum depression1. Postpartum depression is a common mental illness that can exerts harmful effects on both mothers and their offspring2,3. The prevalence of postpartum depressions is approximately 10–15%4,5. The EPDS had been validated against clinical diagnosis in over 37 languages since its development6, and it has been regarded as the most frequently used and well-validated screening tool for postpartum depression7,8.

Although the original purpose of EPDS was postpartum depression screening, recent studies revealed that subscales of the EPDS can be used in new ways, such as anxiety disorder screening9,10,11. Given that postpartum depression is often accompanied by anxiety3,12, this means that application of its subscales can help evaluate mothers’ mental health conditions in greater detail13. Interestingly, some researchers suggest that there is a difference in pathogenesis between postpartum depression and major depression14,15. Thus, identifying subtypes of postpartum depression, such as having anxiety symptoms or not, is also expected to help elucidate the pathogenesis of postpartum depression16. Furthermore, recent studies revealed that the subscales predict the lack of mothers’ bonding to their offspring differentially17,18. Given that bonding failure might form a hotbed of maltreatment or child abuse19, the subscales might inform parental care for beleaguered mothers.

However, despite the common understanding of its actual or potential usefulness, the weakness of EPDS lies in its low factorial validity. That is, whereas a well-designed questionnaire should measure the same construct in various settings, there are many differing views concerning its factor structure. For example, the EPDS was first developed as a one-factor construct1. However, studies conducted since its development have suggested two- (e.g. ‘anxiety’ and ‘depression and anhedonia' factor or ‘anxiety and depression’ and ‘anhedonia’ factor9,20) or three- (e.g. ‘anxiety’, ‘depression’, and ‘anhedonia’ factor21) factor structures. Although no study to our knowledge has indicated a structure of four or more factors, an inconsistent factor structure of EPDS prevents researchers and caregivers from promoting new EPDS applications.

One plausible reason for this discrepancy lies in how the factor structures were extracted. Although there are no gold standard psychometric settings for exploratory factor analysis (EFA), it is recommended (1) not to use principal component analysis (PCA) as an extraction method, (2) not to utilise Kaiser criterion (eigenvalue ≥ 1) for determining the number of factors to be extracted, and (3) not to utilise orthogonal rotation method (i.e. varimax; assuming extracted factors’ correlations to one another are zero) for rotating factors22. Additionally, more N is better when conducting an EFA, for example, N ≥ 300 is considered ‘good’ according to Comfrey and Lee23, and the goodness-of-fit of an extracted model should be examined via a confirmatory factor analysis (CFA)22. Furthermore, it is generally accepted that the factor structure should be assessed at multiple time points. However, to date, scant studies have fulfilled all these criteria simultaneously21,24,25, and those that did were all from Western countries.

Therefore, we examined the factor structures of the EPDS utilising a large dataset (N > 90,000) from the nationwide birth cohort of the Japan Environment and Children’s Study (JECS). We selected this dataset because it contains EPDS data at two postpartum time points from outside Western countries (i.e. in Japan). To extract factors of the EPDS, an EFA was conducted using settings suitable for psychometrics, using neither PCA, Kaiser criterion, nor orthogonal rotation. Factor structures were obtained, and their goodness-of-fit indices were compared with those derived from previous studies that examined factors with a CFA.

Results

Data from 91,063 mothers were analysed. Their mean age was 31.3 ± 5.05 years, mean body mass index before pregnancy was 22.5 ± 3.25, 42.6% of mothers were primipara, 36.0% had less than 12 years of education, 40.0% had an annual income of less than 4-million-yen, 4.3% were current smokers, 4.2% were current alcohol drinkers, and 7.9% had a past history of psychiatric illness. The prevalence of postpartum depression—defined by an EPDS score ≥ 926,27—was 14.4% and 11.7% at 1 and 6 months, respectively.

EFA

The factor structure of the EPDS at 1- and 6-months postpartum, derived from the EFA—with extraction settings for one, two, and three factors—is presented in Table 1. Means and standard deviations of each item are also shown in Table 1. Initial eigenvalues and % cumulative variance (solutions explaining at least ≥ 50% of total variance, which is necessary for meaningful factor analysis28) of the first three factors were 4.18 (41.8%), 1.28 (54.6%), and 1.03 (64.9%) at 1 month; and 4.13 (41.3%), 1.33 (54.6%), and 0.997 (64.6%) at 6 months, respectively. The Kaiser–Meyer–Olkin statistic of sampling adequacy at 1 and 6 months were 0.849 and 0.852, respectively (≥ 0.5 suggests appropriateness for factor analysis29). Bartlett’s tests of sphericity were significant (p < 0.001) both at 1 and 6 months (significance indicates that correlations among items are not constant; thus, they are appropriate for factor analysis).

Table 1 Factor structures of the Edinburgh Postnatal Depression Scale derived from exploratory factor analyses using maximum likelihood extraction with Oblimin rotation.

In the three-factor solution, extracted factors were identical at 1 and 6 months. The ‘anxiety’ factor consisted of EPDS items 3, 4, 5, and 6; the ‘depression’ factor consisted of EPDS items 7, 9, and 10; and the ‘anhedonia’ factor consisted of EPDS items 1 and 2. Cronbach’s α for ‘anxiety’, ‘depression’, and ‘anhedonia’ were 0.75, 0.72, and 0.81 at 1 month; and 0.74, 0.72, and 0.76 at 6 months, respectively (≥ 0.7 is considered good30). Factor correlations between ‘anxiety’ and ‘depression’, ‘depression’ and ‘anhedonia’, and ‘anxiety’ and ‘anhedonia’ were 0.57, 0.54, and 0.42 at 1 month; and 0.60, 0.51, and 0.37 at 6 months, respectively.

In the two-factor solution, extracted factors were also identical at 1 and 6 months. An ‘anxiety and depression’ factor, consisting of EPDS items 3 through 10; and an ‘anhedonia’ factor, consisting of EPDS items 1 and 2, were extracted. Cronbach’s α for ‘anxiety and depression’ and ‘anhedonia’ were 0.81 and 0.81 at 1 month; and 0.82 and 0.76 at 6 months, respectively. Factor correlations between the two factors at 1 and 6 months were 0.50 and 0.45, respectively.

In the one-factor solution, all EPDS items contributed to the one-factor ‘total’. Cronbach’s αs for the total were 0.82 both at 1 and 6 months.

CFA

Figure 1 shows a three-factor path diagram that was extracted from the EFA. Table 2 shows various goodness-of-fit indices of the current three-factor model shown in Fig. 1, together with the current two-factor models, and 18 other factor models constructed identically to the one presented in Fig. 1. Goodness-of-fit indices used in the study were χ2/degree of freedom (df), adjusted goodness-of-fit index (AGFI), root mean square error of estimation (RMSEA), comparative fit index (CFI), parsimonious CFI (PCFI), Tucker-Lewis index [TLI; also known as the non-normed fit index (NNFI)], standardised root mean square residual (SRMR), and the Akaike information criterion (AIC). As a guideline, smaller χ2/df, AGFI ≥ 0.95, RMSEA ≤ 0.06, CFI ≥ 0.95, larger PCFI, TLI ≥ 0.95, SRMR ≤ 0.08, and smaller AIC is considered a good model31,32. One model yielded negative error variance on item 10. When comparing one-, two-, and three-factor models, three-factor models yielded better fit than two-factor models, and two-factor models were better than one-factor models, overall. The current three-factor model identified in the EFA yielded acceptable goodness-of-fit; however, the current two-factor models did not attain sufficient fit.

Figure 1
figure 1

Current three-factor model of the Edinburgh Postnatal Depression Scale, with standardised parameter estimates.

Table 2 Goodness-of-fit indices of various factor models of the Edinburgh Postnatal Depression Scale.

Additional analysis

The results of the CFA when item 6 and 8 were included/not included are shown in Table 3.

Table 3 Goodness-of-fit indices of various factor models of the Edinburgh Postnatal Depression Scale with/without item 6 and 8.

Sensitivity analysis

The results of sensitivity analyses using the complete dataset were not meaningfully different than those calculated using the full information maximum likelihood (FIML) dataset. In addition, the use of ordinary least squares (OLS) and Promax rotation did not produce meaningful differences.

Discussion

In this study, we extracted the factor structures of the EPDS, at two postpartum time points, using EFA with settings suitable for psychometrics; using neither PCA, Kaiser criterion, nor orthogonal rotation, taking advantage of a large sample size (N > 90,000), and then compared their goodness-of-fit indices to those of various one-, two-, and three-factor solutions from previous studies by conducting CFA. The results supported the current three-factor solution derived from the EFA at both time points; specifically, ‘anxiety’ (items 3, 4, 5, and 6), ‘depression’ (items 7, 9, and 10), and ‘anhedonia’ (items 1 and 2). The current model explained about 65% of the total variance and demonstrated acceptable goodness-of-fit indices. Additionally, the model was found to be stable across time, which is consistent with results reported by Coates et al.21, who examined two antenatal and two postpartum periods.

In addition to the current model, there are several other models that have been reported with acceptably high goodness-of-fit indices. These models are very similar to the current model. For example, Coates et al.’21 three-factor structure [i.e. ‘anxiety’ (items 3, 4, 5, and 6), ‘depression’ (items 7, 8, 9, and 10), and ‘anhedonia’ (items 1 and 2)], would be identical to the current structure if item 8 was removed from the depression factor. Interestingly, the three-factor models by Kozinszky et al.25, Tuohy and McVey33, Kubota et al.16, Takehara et al.13, and Chiu et al.34 are similar to ours. Although the best CFI, TLI, and SRMR were attained by the three-factor model presented by Kozinszky et al.25 [i.e. ‘anxiety’ (items 4 and 5, ‘depression’ (items 8 and 9), and ‘anhedonia’ (items 1 and 2)], this model only adopted six out of 10 items and yielded the worst PCFI among all models examined. Generally, removal of items leads to improved goodness-of-fit, but at a reduction in content validity. The inverse problem is known as the ‘Bandwidth-fidelity dilemma’35. Thus, it should be noted that the best goodness-of-fit indices do not always denote the best model, but the most plausible model describing the data. Taken together, we could conclude that the basic factor structure of the EPDS should be three-factor; ‘anxiety’ = items 3, 4, 5, and (6) ‘depression’ = items 7, (8), 9, and (10), and ‘anhedonia’ = items 1 and 2, where items in parentheses represent a low degree of confirmation. These findings raise questions about the one- or two-factor structure of the EPDS.

If the number of extracted factors was determined using the Kaiser criterion in this study, the number of factors at 1 and 6 months would have been three and two, respectively. This is because the third and the fourth eigenvalue at 1 month were 1.03 and 0.72 (data not shown), respectively, whereas the second and the third eigenvalue at 6 months were 1.33 and 0.97, respectively. However, as noted by Osborne22, this criterion does not always yield the best result, as goodness-of-fit indices derived from the three-factor solution were far better than those from two-factor solutions. Thus, when using EFA, utilisation of other criteria such as a scree plot, parallel analysis, minimum average partial criteria, and goodness-of-fit indices, in addition to theoretical considerations, is recommended rather than adopting the Kaiser criterion alone22.

The factor structure of the EPDS could plausibly depend on culture and/or language. For example, the cut-off value of the EPDS varies among countries and ranges from 9 (e.g. the Japanese version) to 13 (the original version)36. Culturally sensitive cut-off points were recommended by the EPDS developers. This difference is plausible owing to cultural variations in the expression of depressive symptoms37. For example, Japanese women are typically reluctant to disclose depressive symptoms38. Instead, they tend to express emotional problems by referring to physical problems or concern for their child, whereas the EPDS contains items on neither somatic symptoms nor childcare. This tendency is partially owing to the traditional concepts that emotions are considered a weakness of the mind and enduring physical and/or psychological distress is a virtue. However, despite these variations, previous studies from the UK21; Australia24; Japan13,16; and of U.S. African American, Hispanic, and White samples34, which used similar EFA methods similar to ours—using neither PCA, Kaiser criterion, nor orthogonal rotation—consistently supported a similar three-factor structure. Thus, to appropriately examine this problem, it is necessary to do so utilising proper methodology.

Interestingly, factor correlations between ‘anxiety’, ‘depression’, and ‘anhedonia’ in the three-factor model were lower than those from previous studies21,24,39. Therefore, compared to previous studies, our result shows high discriminant validity; i.e. factors should not correlate too high (≥ 0.85 is considered problematic40). Positively worded items in the EPDS are used only in items constituting ‘anhedonia’ factor, which might contribute to separate ‘anhedonia’ factor from ‘depression’ factor21. Such a separation from mixed valences of items (and response scales) has been previously reported41,42. In fact, all the two-factor models examined in this study10,39,43 consist of an independent anxiety factor and a bounded depression/anhedonia factor. Although our data seems to support ‘depression’ and ‘anhedonia’ being distinct factors, further studies examining this point using bifactor models42,44 are needed.

When evaluating each EPDS item, three items were notable. First, while item 6 was not problematic in this study, it has been frequently reported as cross-loaded with anxiety and depression factors10,21,33,39,44,45,46,47,48. In contrast, item 8 was deleted in the current study owing to cross-loading but has been regarded as a good item in other studies, having discriminatively high factor loadings for depression alone21. However, the same cross-loading tendency was observed in Kubota et al.16, who also used the Japanese version of the EPDS. Given Coates et al.’21 indication that item 6 is open to interpretation, the essence of item 6 may have been altered during translation. In contrast, item 8 might reflect characteristics of Japanese women: unwilling to express their emotions, as previously mentioned. Perhaps this is a difficult question for Japanese women to answer ‘yes’ to. Supporting this view, the raw score of item 8 (approximately 0.4) in our study was in fact far lower than that (approximately 0.8) in a previous study21, though measured timing was not the same. Further studies examining this point taking advantage of translated versions of the EPDS are needed.

Second, a previous study conducted by Chiu et al.34 excluded item 10 (self-harm) before conducting analyses to avoid negative eigenvalues. They noted that the existence of a ‘quite rare’ response category sometimes yields negative eigenvalues. In fact, the answer ‘quite often’ only occurred 0.8% of the time in that study. However, inclusion of item 10 in the current analyses was not an issue even though the answer ‘quite often’ only occurred among 0.5% of our respondents, which was lower than the 0.8% rate in Chiu et al.34. Considering that suicide is a leading cause of maternal death49 and self-harm is associated with an increased risk of suicide50, the existence of EPDS item 10 is significant in and of itself51. While this was an interesting and welcomed attempt to reorganise the EPDS9,52, as it will promote a new application of EPDS and will lead to improved performance of EPDS concerning factorial validity, sensitivity, and/or specificity, item 10 should not be excluded merely because it may lead to problematic analyses.

Limitations of the current study include the following: first, we examined the factor structure of the EPDS at two postpartum time points but did not examine factor structure antepartum. Prior studies support both the differences25 and similarities10,21 between antepartum and postpartum time periods. Further studies should examine this issue. Second, in relation to the above-mentioned point, we did not measure EPDS before 1 month and after 6 months. Third, while our sample consisted of JECS participants, which is a nationwide birth cohort, the extent that our results are generalisable to other populations cannot be determined. For example, it is not clear whether the present factor structure also yields the best good-of-fit indices in fathers.

Despite these limitations, analyses revealed that a three-factor structure consisting of anxiety (items 3, 4, 5, and 6), depression (items 7, 9, and 10), and anhedonia (items 1 and 2) showed acceptably high goodness-of-fit, invariability across postpartum time points, sufficient explanation of total variance, and good internal reliability. The EPDS likely consists of three dimensions: anxiety, depression, and anhedonia. These findings raise questions about the one- or two-factor structure of the EPDS and may shed light on why the EPDS factor structure has been equivocal. We hope that our results will inform new usages of EPDS such as anxiety disorder screening. Further studies examining other populations and/or the antepartum period may prove fruitful.

Methods

Participants and design

Participants consisted of mothers taking part in the JECS. The JECS is an ongoing nationwide government-funded birth cohort study of various environmental factors, as well as children’s health and development. Recruitment for the study occurred across 15 regional centres, including both rural and urban locations, throughout Japan, from January 2011 to March 2014. The detailed design and baseline characteristics of the JECS cohort have been published previously53,54.

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on research involving human participants and with the Helsinki Declaration of 1975, as revised in 2008. All procedures involving human participants were approved by the Ministry of the Environment’s Institutional Review Board on Epidemiological Studies (no. 100910001), the ethics committees of all participating institutions, and the Ethics Committee of the University of Toyama (no. R2019035). Written informed consent was obtained from all participants.

This study used the dataset jecs-an-20180131, which was released in March 2018 and contains 103,062 pregnancies. To arrive at the number of unique mothers who participated for the first time, 5,647 pregnancies were excluded because of multiple registrations, 949 pregnancies were excluded because of multiple births, and 3,676 pregnancies were excluded because of miscarriage or stillbirth (Fig. 2). Among the remaining 92,790 unique mothers with singleton live births, 1,727 mothers were further excluded because of completely missing data or no response to the EPDS questionnaires administered at either 1- or 6-months postpartum. Thus, data from 91,063 mothers with singleton live births were analysed.

Figure 2
figure 2

Study flow chart.

The remaining 91,063 mothers were divided into two datasets at a ratio of 1:9 (Fig. 2). This ratio was chosen to exceed the pre-existing largest number of participants included in the EFA21,52. A stratification random sampling technique, setting 15 regional centres as a stratification variable was used. The first dataset (n = 9,098) was used to derive factor structures. The second dataset (n = 81,965) was used to calculate goodness-of-fit.

Measures

A self-administered questionnaire, including the Japanese version of the EPDS26 (to be described), was administered to mothers on two occasions—1 month and 6 months after delivery. Questionnaires were distributed 1 month after delivery when participants visited the hospital (at which they gave birth) for their 1-month-old’s health check-up. If the questionnaires could not be collected at the time of the visit, they were returned by mail. The questionnaire at 6 months was distributed and collected via mail.

The EPDS1,6 is a 10-item self-administered questionnaire, written in a Likert-type format. Briefly, EPDS items consist of (1) laugh, (2) enjoyment, (3) self-blame, (4) anxious, (5) scared, (6) hard to cope, (7) hard to sleep, (8) sad, (9) crying, and (10) self-harm, and (1) and (2) are positively worded item. Participants were asked to mark their level of agreement with the response made on a 4-point response scale, with total scores ranging from 0 to 30.

The Japanese version of the EPDS was developed by Okano et al.26 using a back-translation technique and corresponded well with the original version in item valence and response scales; thus, we considered it a validated translated version55. It provided good internal reliability (Cronbach’s α = 0.78), test–retest reliability (r = 0.92), and an optimal cut-off score of 8/9 screening for clinical diagnosis of depression (75% sensitivity and 93% specificity). The 8/9 cut-off point was also validated in a study by Yamashita et al.27 and provided 82% sensitivity and 95% specificity.

Statistical analyses

EFA

EPDS scores at 1- and 6-months postpartum were separately analysed using the first dataset (n = 9,098). In this analysis, extraction was set to maximum likelihood (ML) with an Oblimin oblique rotation, which was the same setting as used in a previous study that had a large sample size21. Analyses were repeated three times, setting the extracted number of the factors to one, two, and three. Because no study has indicated a structure of four or more factors, we simply examined all the possible numbers of factors by using the brute force method. The subscale rule was items having a factor loading ≥ 0.32 for a particular factor29 and < 0.32 for other factors. Missing values were handled using the FIML method. In cases of completely missing data, the participant was removed because even the FIML method cannot treat completely missing data. Owing to this deletion, the final ns were 9,038 at 1 month and 8,613 at 6 months.

CFA

Factor structures derived from the EFA, together with 18 other factor structures derived from previous studies1,9,10,13,16,21,24,25,33,34,39,43,44,45,52,56,57,58,59,60,61,62,63,64, were examined using the second dataset (n = 81,965). The criteria of literature selection here were whether it is an original study1, whether it has N ≥ about 300, or whether it overlaps N ≥ about 300 study in the final factor structure. Candidate literatures were gathered by conducting snowball search65 including references from recent literatures reviewing or examining various factor structures21,25 and by using the outputs, further searching more recent literatures citing them using the PubMed. Consequently, the factor structures examined here include one-, two-, and three-factor solutions. Goodness-of-fit was evaluated in terms of χ2/df, AGFI, RMSEA, CFI, PCFI, TLI (NNFI), SRMR, and AIC. As previously mentioned, smaller χ2/df, AGFI ≥ 0.95, RMSEA ≤ 0.06, CFI ≥ 0.95, larger PCFI, TLI ≥ 0.95, SRMR ≤ 0.08, and smaller AIC represent a good model31,32. However, there are many views as to optimal cut-off values for each index66. Missing values were handled using the FIML method.

We conducted a multiple-group analysis where each EPDS, at 1- and 6-months postpartum, forms structurally identical independent models. This analysis did not consider time-series measurement to make it possible to directly compare the present findings with previous ones. The final ns were 81,401 at 1 month and 77,239 at 6 months.

Additional analysis

Because the EFA result on item 6 and 8 were somewhat equivocal, we conducted additional a CFA where item 6 and 8 were included/not included.

Sensitivity analysis

Results from the complete case analysis were compared to those from the FIML analysis to assess the differences between the strategies for addressing missing values. In addition, we also used the OLS extraction method and Promax rotation in place of ML and Oblimin rotation, respectively.

All analyses were performed using SAS software (version 9.4; SAS Institute Inc., Cary, NC, USA).