Introduction

Depression is a common psychiatric disorder affecting about 15–18% of the general population worldwide [1, 2]. According to the World Health Organization (WHO), depression has been ranked as the third contributor to the overall global burden of disease, as well as the most common cause of disease burden among females in low-, middle-, and high-income countries since 2004 [3]. In 2017, depression was ranked as a leading cause of disability worldwide by the WHO, with a prevalence of >427 million people of all ages globally [4].

In addition to psychological symptoms, individuals with depression have been reported to experience subsequently increased risks of various somatic diseases [5,6,7,8,9,10,11,12,13]. Longitudinal studies have reported elevated risks of asthma [5, 6, 11], diabetes [7, 13], cardiovascular disease [8, 12], Parkinson’s disease [9], and dementia [10] among individuals diagnosed with depression. More recently, an increased risk of up to 30 different medical conditions and mortality following a mood disorder diagnosis (including depression) has been reported using data from the national register of Denmark [14,15,16]. Furthermore, a causal link has been suggested between depression and multiple disease outcomes through Mendelian Randomization analyses of the UK Biobank participants [17] and other populations [18, 19]. Hence, interventions to prevent future health declines are crucial for patients with depression, in addition to the psychotherapy and pharmacotherapy to address depressive symptoms. A comprehensive analysis of disease trajectories after a diagnosis of depression might assist the identification of key pathways linking depression with subsequent somatic problems.

Disease trajectory analysis has recently been proposed as a novel approach to explore disease progression over time [20]. Through visualizing the networks of emerged diseases—studying the magnitude of disease–disease associations and the temporal order of such associations, the analysis provides a foundation for examining causal relationships and sequential patterns of multiple morbidities. To date, disease trajectory analysis has been applied successfully in studies on the identification of sequential diseases, in general, or after a diagnosis of a specific medical condition, such as breast cancer [21, 22]. However, applications of this approach to explore patterns of disease networks after a specific psychiatric disorder are scarce.

With a focus on medical conditions associated with a prior diagnosis of depression, we aimed to clarify the disease trajectory network after depression using UK Biobank data. Because the clustering of sequential and interacting diseases may imply shared etiologies or disease mechanisms, our investigation can help identify the key affected pathways leading to the deterioration in general health after depression.

Methods and materials

Study design

Details of the UK Biobank study design are described elsewhere [23]. In brief, this prospective cohort study enrolled 502,507 participants from ~9.2 million people across the UK who were invited to join the cohort between 2006 and 2010. The participantsʼ age was 40 to 69 years at recruitment, and the majority was of white ethnicity. Baseline assessments, including a computer-assisted interview using a touch-screen questionnaire, a physical examination and collection of biological samples, were conducted with all participants in 22 assessment centers. Health-related outcomes of the participants were obtained through regular linkages to multiple national databases. Inpatient hospital data on participants in England, Scotland and Wales were updated using their respective databases: the Hospital Episode Statistics database, the Scottish Morbidity Record and the Patient Episode Database [24]. The inpatient hospital data were deemed to cover all the UK Biobank participants since 1997 [24]. Mortality data on the UK Biobank participants in England and Wales were obtained from the National Health Service (NHS) Digital, whereas the mortality data for the participants in Scotland were obtained from the NHS Central Register [24].

In the present study, based on the UK Biobank data, we excluded 14 individuals who withdrew their informed consent forms, leaving 502,493 individuals (229,115 males and 273,378 females) for the analysis. Individuals with depression were included in the exposed cohort on the date of their diagnosis of depression. We then randomly selected up to five unexposed individuals from the study population who were free of depression on the diagnosis date of the index patient (i.e., the index date of the unexposed individual) and individually matched to the index patient by birth year, sex, and Townsend deprivation index (transformed to decile variable) [25], using the method of incidence density sampling. Townsend deprivation index was assigned to each individual based on postcode location with a higher index score indicating a higher degree of deprivation. In a sub-analysis, the Charlson comorbidity index (CCI) on the index date was calculated for each participant as their baseline comorbidity level [26], based on UK Biobank inpatient hospital data. To minimize the influence of reverse causality and surveillance bias [27] (Supplementary Fig. 1), follow-up for all participants started from 6 months after the index date and continued until death, loss to follow-up or the end of study (December 31, 2019), whichever occurred first.

Ascertainment of depression

We defined patients with depression as individuals admitted for inpatient hospital care with a diagnosis (either primary or secondary) of major depression (according to the International Classification of Diseases, Tenth Revision [ICD-10]: F32 and F33), based on UK Biobank inpatient hospital data between January 1, 1997 and December 31, 2019. The upper boundary was set to eliminate the potential influence of the Coronavirus Disease 2019 (COVID-19) outbreak on disease occurrence [28]. The diagnoses of depression, from the Hospital Episode Statistics database in England and the equivalent databases in Scotland and Wales, have been validated against detailed clinical evaluations, demonstrating good positive predictive value (75%) [29, 30]. As we focused on the identification of severe depression based on inpatient diagnoses, the unexposed individuals were defined as those who had no inpatient care records concerning depression at the time of the selection.

Diagnoses of other medical conditions

Diagnoses of other medical conditions were retrieved from the main and secondary diagnoses in the UK Biobank inpatient hospital data, which were also documented using the ICD-10. We excluded diagnoses related to pregnancy, childbirth, perinatal conditions, and unclassified symptoms or signs, and restricted the analyses to Chapters 1–14 and 19–20 of the ICD-10. We used the 3-digit ICD-10 codes for medical condition identification and combined conditions with clinical or biological similarities (Supplementary Data 1). For conditions identified in multiple records of the same individual, only the first record and the date of the first hospital visit were used as the date of diagnosis. The diagnoses of medical conditions in the UK inpatient hospital data have been validated, yielding an overall median diagnostic accuracy of 80.3% (interquartile range, 63.3–94.1%) comparing routinely collected data sets with case notes [31].

Death

The cause of death of an individual was defined as the underlying cause of death recorded in the UK Biobank mortality data. The underlying cause of death was also coded in accordance with the ICD-10, and the causes were classified into 16 categories, mainly based on the ICD-10 chapters (Supplementary Data 1) [32].

Statistical analysis

Trajectory analyses

We used a previously proposed method to identify disease trajectories following the diagnosis of depression [20, 21]. The method includes three interrelated steps (Supplementary Fig. 2): the first step was a phenome-wide association analysis (PheWAS) using conditional Cox regression, with the aim of investigating the risks of 470 medical conditions among individuals with depression compared to a matched group of unexposed individuals. To ensure statistical power, we limited the analyses to medical conditions that occurred in at least 250 patients with depression (~1%). Only diseases with a p < the Bonferroni corrected threshold and a hazard ratio (HR) > 1 were considered in the second step of the analysis. In the second step, for all possible disease 1 (D1) and disease 2 (D2) pairs with a temporal order experienced by at least 125 depression patients (250/2), binomial tests were conducted to investigate whether more depression patients (>50%) had D2 diagnosed after D1 among those with both D1 and D2 diagnoses. D1 → D2 disease pairs with a binomial test p < the Bonferroni corrected threshold were then included in the third step of the analysis, where nested case-control study design and conditional Logistic regression were used to confirm and assess the magnitude of the associations. A detailed description is available in Supplementary Data 2.

To investigate disease pairs leading to death (e.g., D1 → D2 → Death), we repeated steps 1–3 of the above disease trajectory analyses (Supplementary Fig. 3). We restricted the analyses to D1 → D2 → Death pairs that were experienced by at least 20 depression patients.

The networks of the disease trajectories were formed by combining disease pairs with overlapping diseases (e.g., disease pairs D1 → D2 and D2 → D3 with overlapping D2 were combined in the trajectory D1 → D2 → D3).

Subgroup and sensitivity analyses

To investigate whether the disease trajectories after depression differed by sex, we performed separate analyses for females and males. Given the reduced sample size, we adjusted the thresholds of these analyses, by multiplying the original thresholds (250 for the step 1 analysis, 125 for the step 2 analysis and 20 for the analysis investigating the disease pairs leading to death) with the proportions of these subgroups in the total population of individuals with depression (i.e., females: 65% and males: 35%). To explore whether the impact of depression on other diseases changed over time, we performed separate PheWAS (i.e., step 1) for the different follow-up periods within five years of the index date and beyond. We further stratified the PheWAS analysis by season of cohort entry (January–March, April–June, July–September, and October–December) to detect any effect of seasonal variation on the disease associations analyzed.

As depression may be secondary to a preceding somatic disease and a large proportion of the patients with depression were hospitalized with a main diagnosis of somatic diseases (15,602/24,130, 64.66%) which might lead to subsequent diseases independent of depression, we repeated the PheWAS after excluding individuals with a history of any disease that belonged to the same category of the outcome disease (retrieved from the inpatient hospital data), as indicated in Supplementary Data 1. In this analysis, we also additionally adjusted for the CCI on the index date to further control of the somatic comorbidities at baseline. For the same reason, we also conducted a sensitivity analysis by restricting to a subgroup of participants (both depression patients and matched unexposed individuals) without any inpatient care records during the 6 months prior to the index date. We further tested the robustness of our results in relation to the definition of depression by ascertaining depression solely through the primary diagnosis reported in the UK Biobank inpatient hospital data. All the statistical analyses were conducted using SciPy (version 1.4.1), Statsmodels (version 0.11.1) and Lifelines (version 0.25.2) in Python 3.8. The p value for statistical significance was set to 0.05/number of analyses performed (Bonferroni corrections), to account for multiple testing.

Results

Based on the community-based cohort study of UK Biobank, 24,130 individuals with depression were identified, together with 120,366 age-, sex-, and Townsend deprivation index-matched unexposed individuals without such a diagnosis. Follow-up of all participants was conducted from 6 months after the index date until death or the end of 2019, for the occurrence of 470 medical conditions and 16 specific causes of death (Fig. 1). The participants‘ median age was 62.0 years on the index date, and most of them were females (63.63%). The median follow-up time was 4.94 years (Table 1).

Fig. 1: Flow chart of study population selection and main analysis steps.
figure 1

This figure shows the study design, the inclusion and exclusion process of study population selection, and main analysis steps.

Table 1 Basic characteristics of the depression individuals and their matched individuals without depression.

Only 132, out of all 470 studied medical conditions, involved >250 patients with depression. Among these, 129 showed a statistically significant association with a prior diagnosis of depression (Supplementary Table 1). As expected, associations were found between depression and other psychiatric conditions, including bipolar disorder (hazard ratio [HR] = 15.77, 95% confidence intervals [CI], 13.18–18.87), intentional self-harm (HR = 11.57, 95% CI, 9.93–13.49), and schizophrenia (HR = 9.34, 95% CI, 7.75–11.24) (Fig. 2). Subsequent somatic diseases with the strongest associations with depression were sequelae of cerebrovascular disease (4.45, 95% CI 3.79–5.22), pressure ulcer (3.94, 95% CI 3.45–4.49), and ulcer of lower limb (3.73, 95% CI 3.18–4.38).

Fig. 2: Hazard ratios (HRs) of other medical conditions among depression individuals compared to matched individuals without depression.
figure 2

The X axis shows the disease categories according to ICD-10 codes A-N and S-Y. The Y axis shows the significant hazard ratios after Bonferroni correction of each medical condition when comparing depression individuals to individuals without depression. Details of the hazard ratios, number of cases, and 95% confidence intervals are listed in Supplementary Table 1.

For mortality, increased risks were present in 10 of 16 studied categories of causes of death, with the strongest association noted for death due to unnatural causes (HR = 6.42, 95% CI, 4.95–8.33) (Table 2).

Table 2 Hazard ratios (HRs) with 95% confidence intervals (CIs) of different causes of death among depression individuals compared to matched individuals without depression.

Temporal disease trajectories in individuals with depression

In total, 110 disease pairs were established among all 129 depression-related medical conditions identified in step 1 (Supplementary Fig. 2). An overview of the disease trajectories subsequent to the diagnosis of depression is presented in the Supplementary Material (Supplementary Fig. 4). Among the diseases ordered as D1, three major types were categorized according to the similarity of their underlying affected systems or their etiologies. Cluster 1 mainly included diseases of cardiometabolic system, where the disease tree thrived after the diagnoses of chronic ischemic heart disease, angina pectoris, primary hypertension, diabetes, and disorders of lipoprotein metabolism and other lipidaemias (Fig. 3a). Cluster 2 consisted of a group of diseases related to chronic inflammation, which first presented as the development of asthma, osteoarthritis, and other inflammatory arthritis (Fig. 3b). Unlike the other two clusters, Cluster 3 was a cluster of diseases and conditions related to tobacco abuse (Fig. 3c). All odd ratios (ORs) and numbers of cases of disease pairs in the trajectories are presented in the Supplementary Material (Supplementary Table 2).

Fig. 3: Trajectories of three main disease clusters among depression individuals.
figure 3

This figure illustrates three following subgroups of disease trajectories identified in our analysis: A-Cardiometabolic diseases, B-Chronic inflammatory diseases, C-Tobacco abuse. The combined ICD-10 codes for the medical conditions are shown within the circle and round rectangle. The color of the circle represents the hazard ratios of this condition when comparing depression individuals to matched individuals without depression. The number above the arrow connecting two circles corresponds to the number of disease pairs among depression individuals. The color of the arrows indicates the odds ratio of the sequential association between the two medical conditions among depression individuals.

Temporal disease trajectories leading to mortality following a diagnosis of depression

In total, 38 D1- > D2- > Death pairs were recognized based on the results of steps 1 and 2 of the analyses (Supplementary Fig. 3 and Supplementary Table 3), which constituted three networks of disease trajectories leading to deaths among depression patients (Fig. 4). The mortality trajectory of cardiovascular disease was associated with chronic ischemic heart disease, tobacco abuse, and injuries due to external causes. The mortality trajectory of diseases of the respiratory system was associated with primary hypertension, chronic obstructive pulmonary disease (COPD), and tobacco abuse, followed by pneumonia and respiratory failure. The mortality trajectory of malignant neoplasm was associated with several diseases, subsequent to anxiety, dorsalgia, tobacco abuse, and COPD.

Fig. 4: Disease trajectories leading to mortality with different death causes among depression individuals.
figure 4

This figure shows the identified disease trajectories leading to CVDD (Cardiovascular disease death), RSDD (Respiratory system disease death), and MND (Malignant neoplasm death) among depression individuals. The death causes are shown within the octagons. The combined ICD-10 codes for the medical conditions are shown within the circle and round rectangle. The color of the circle and octagon represents the hazard ratios of this medical condition when comparing depression individuals to matched individuals without depression. The number above the arrow connecting two circles and circle with octagon corresponds to the number of disease pairs among depression individuals. The color of the arrows indicates the odds ratio of the sequential association between the two medical conditions among depression individuals.

Subgroup and sensitivity analyses

The separate PheWAS for females and males revealed largely similar subsequent diseases in relation to a prior diagnosis of depression, although the magnitude of the associations differed slightly. The females with depression had a higher relative risk for obesity, alcohol abuse, sleep disorder, heart failure, chronic ischemic heart disease, and chronic rheumatic heart disease. Male patients with depression had a higher relative risk for sepsis, malnutrition, anxiety, hypothyroid conditions, functional dyspepsia, osteoporosis, and falls (Supplementary Table 1 shows the HRs and 95% CIs). In the disease trajectory analysis, female and male depression patients had several common trajectories (Supplementary Fig. 5, Supplementary Tables 45). In addition, although both groups died from malignant neoplasms and respiratory system diseases, different progression pathways were found between the male and female patients (Supplementary Fig. 6, Supplementary Tables 67). The subgroup PheWAS for the different follow-up periods revealed associations of almost identical diseases with depression both during the first 5 years of follow-up and thereafter (Supplementary Table 8). Similarly, stratified analysis by season rendered largely similar estimates as those of the main analysis (Supplementary Table 9).

In sensitivity analyses, we obtained largely similar PheWAS results when excluding depression patients who had a history of any disease that belonged to the same category of the outcome disease (Supplementary Table 10). Also, restricting the analyses to individuals without any inpatient record of a somatic disease during the 6 months before the index date did not change the results of the main analyses substantially (Supplementary Table 11). When using only the primary diagnoses (n = 1920, 7.96%) from the UK Biobank inpatient hospital data to identify depression, we found 141 out of 470 tested medical conditions to be associated with a prior diagnosis of depression. Compared with the results of the main analysis, 111 (86.05%) significant medical conditions were identified in both analyses, although differential HRs were observed in 20 of these conditions (Supplementary Table 12).

Discussion

In this large community-based cohort study using the UK Biobank data, we found that individuals who received a diagnosis of depression through inpatient care were at a subsequently elevated risk of 129 medical conditions. Adding to the existing data that demonstrated similar temporal and causal links between depression and various health consequences, our analyses of the disease trajectories identified, for the first time, three major network-based clusters according to the similarities in the underlying affected systems or etiologies, revealing that alterations in cardiometabolic system, the chronic status of inflammation, together with behavior-related changes (i.e., tobacco abuse), may be key pathways linking depression with a wide range of other diseases and conditions downstream. In addition, we observed similar disease trajectories among males and females with depression. Given the recognizable decline in the general health of patients with depression, these findings call for future investigations of potential interventions to target these key pathways.

Although no previous study has addressed the disease trajectories after a diagnosis of depression, the results of our PheWAS gain support from Danish register-based studies, showing associations between depression and a wide range of psychiatric and somatic diseases and conditions [14, 16]. Notably, in addition to consistently reported associations with many common diseases, novel relationships between depression and subsequent individual diseases with relatively low incidence, such as sepsis, visual disturbances/blindness, and urolithiasis, were also unraveled in our analyses. More recently, a disease trajectory browser has been developed based on national data from inpatient wards, outpatient clinics, and emergency room visits between 1994 and 2018 in Denmark [22]. By searching for depression (ICD-10: F32) using the browser, some of the key diseases, such as cardiometabolic diseases, identified as occurring more-than-expected subsequent to depression in our study were found to occur more-than-expected prior to depression. These results should, however, not be compared directly because in our study we aimed to visualize physiological changes after depression by studying a cohort of patients with depression whereas the Danish work examined diseases occurring both before and after depression (and any other disease) by analyzing the entire general population of Denmark. Therefore, the inconsistent results observed between these two studies do not necessarily invalidate each other. Moreover, using Mendelian Randomization, a previous study has demonstrated causal links between depression and 20, out of 42, phenotypically associated medical conditions [17]. Given that those conditions largely overlap with the key medical conditions identified in the three disease trajectory clusters in our study, results of such analysis corroborate our findings that there are key pathways linking depression to a general health decline.

In the present study, the visualized disease trajectory networks imply the cardiometabolic system may be impacted by the occurrence of depression, with initial presentations of chronic ischemic heart disease, angina pectoris, primary hypertension, diabetes, and disorders of lipoprotein metabolism and other lipidaemias. This is in line with the results of previous phenotypic studies linking depression with major types of cardiometabolic diseases, such as chronic ischemic heart disease [33], stroke [14, 34], heart failure [14], diabetes [7, 13], metabolic syndrome [35], and other non-specific heart diseases [36, 37]. Also, Mendelian Randomization analyses have suggested a causal association between depression and ischemic heart disease and lipid metabolism, independent of shared environmental and behavioral factors or medication use [17,18,19]. Other possible underlying mechanisms include sympathetic activation, hypothalamic–pituitary activation, and the chronic status of inflammation observed among individuals with depression, which may accelerate the development or progression of cardiometabolic diseases [38,39,40,41,42,43,44]. With further connections to a wide spectrum of diseases after the occurrence of cardiometabolic diseases in the disease trajectory networks, this pathway may be a promising target for the health promotion of depression patients.

We found an increased risk of asthma, followed by osteoarthritis and other inflammatory arthritis, after a depression diagnosis in the present study. Similar results were reported in a meta-analysis of six prospective studies with a total of 83,684 depression patients, indicating an increased risk (43%) of asthma after depression [11]. Likewise, a recent Mendelian randomization analysis provided evidence supporting a causal link between depression and asthma and osteoarthrosis [17]. As these diseases have key inflammatory mechanisms in common, the observed associations support the notion that depression may lead to exaggerated or prolonged inflammatory responses, as reported in the research literature on both animals and humans [45]. As a result, anti-inflammatory treatments may have the potential to prevent a general health decline after depression.

Behavior-related factors, mainly the tobacco abuse that often co-occurs and follows a diagnosis of depression, have been proposed as possible underlying mechanisms linking depression with somatic problems [46,47,48,49]. Our analyses identified an array of diseases, including diseases of the respiratory system and infectious diseases, subsequent to tobacco abuse, underscoring the necessity for enhanced interventions for smoking cessation among patients with depression.

The major strengths of our study include the application of the disease trajectory analysis to visualize the associated disease networks after depression. By illustrating a whole picture of temporal disease progression patterns, this discovery-driven analysis complements association analysis in earlier studies often studying a single disease pair with a specific hypothesis. However, the nature of this analytic strategy prevents us from discovering novel disease pairs involving rare medical conditions. Additional targeted studies, are therefore, needed to examine rarer comorbidities of depression, as a complement to the present approach. Regardless, the disease-trajectory approach provides important additional data to the existing literature by clarifying the clusters and the temporal order of these subsequent diseases. The identification of common disease networks can on the other hand assist in the exploration of key physiological pathways, leading to a general health decline after depression. Other strengths include the use of a large community-based cohort, where complete data for diseases diagnoses were collected prospectively, through linkages with national health registers.

Notable limitations include the lack of primary care data, which may have led to the exclusion of less severe medical conditions in our analyses. Furthermore, the completeness and accuracy of the observed disease trajectory networks may have been impacted the relatively limited follow-up period of the present study (i.e., the median follow-up time was 4.94 years after excluding the first 6 months of follow-up). Additional studies with longer follow-up are needed to investigate diseases with a later onset, such as neurodegenerative diseases, to obtain a more comprehensive picture of disease trajectories subsequent to depression. However, because the findings of the present study did not differ significantly during the first 5 years after the diagnosis of depression and beyond, such truncations of data are unlikely to have a significant influence on the validity of the present results. Notably, although we aimed to explore physiological changes subsequent to depression, the disease trajectory analysis does not, by itself, provide a sufficient basis for causal inference, while lends support to causal relationships indicated elsewhere [17,18,19]. Yet, the observed disease patterns might also be explained by the shared etiological factors between depression and these conditions, including genetics, lifestyle, and other environmental factors. Future exploration of disease trajectories by level of genetic susceptibility to depression could be valuable. Although similar estimates were obtained in the sensitivity analyses after excluding depression patients with a pre-existing diagnosis of disease in the same category as the outcome disease and individuals with an inpatient record of a somatic disease during the 6 months prior to the index date (indicating a limited effect of pre-existing somatic comorbidities), we cannot rule out the possibility that these comorbidities contributed to the disease trajectories observed after depression. In addition, it is important to note that patients with depression in this study were identified through inpatient care (likely exhibiting a severe form of the disease) and had a median age of 62 years at the time of ascertainment, which is different from the median age of the diagnosis of depression in the general population. Further, separate analysis of patients with multiple episodes of depression was not feasible due to the limited number of cases. Therefore, disease trajectories after milder versions of depression, depression diagnosed at a younger age, or recurrent depression should be examined in future studies. Last, in a time sequential analysis, it is challenging to investigate how time-varying confounders, such as behavioral factors and life experience, for which we had little information about would have contributed to the observed disease networks.

In conclusion, this community-based study confirmed a general increase in the risk of many mental disorders, somatic diseases, and causes of death subsequent to severe depression. We identified three main disease clusters following an inpatient diagnosis of depression, which might imply that alterations in cardiometabolic diseases, chronic inflammatory diseases, and tobacco abuse may contribute significantly to the general health decline after depression. Investigations of potential interventions targeting these key pathways are therefore warranted.