An individual participant data analysis of prospective cohort studies on the association between subclinical thyroid dysfunction and depressive symptoms

In subclinical hypothyroidism, the presence of depressive symptoms is often a reason for starting levothyroxine treatment. However, data are conflicting on the association between subclinical thyroid dysfunction and depressive symptoms. We aimed to examine the association between subclinical thyroid dysfunction and depressive symptoms in all prospective cohorts with relevant data available. We performed a systematic review of the literature from Medline, Embase, Cumulative Index to Nursing and Allied Health Literature, and the Cochrane Library from inception to 10th May 2019. We included prospective cohorts with data on thyroid status at baseline and depressive symptoms during follow-up. The primary outcome was depressive symptoms measured at first available follow-up, expressed on the Beck’s Depression Inventory (BDI) scale (range 0–63, higher values indicate more depressive symptoms, minimal clinically important difference: 5 points). We performed a two-stage individual participant data (IPD) analysis comparing participants with subclinical hypo- or hyperthyroidism versus euthyroidism, adjusting for depressive symptoms at baseline, age, sex, education, and income (PROSPERO CRD42018091627). Six cohorts met the inclusion criteria, with IPD on 23,038 participants. Their mean age was 60 years, 65% were female, 21,025 were euthyroid, 1342 had subclinical hypothyroidism and 671 subclinical hyperthyroidism. At first available follow-up [mean 8.2 (± 4.3) years], BDI scores did not differ between participants with subclinical hypothyroidism (mean difference = 0.29, 95% confidence interval =  − 0.17 to 0.76, I2 = 15.6) or subclinical hyperthyroidism (− 0.10, 95% confidence interval =  − 0.67 to 0.48, I2 = 3.2) compared to euthyroidism. This systematic review and IPD analysis of six prospective cohort studies found no clinically relevant association between subclinical thyroid dysfunction at baseline and depressive symptoms during follow-up. The results were robust in all sensitivity and subgroup analyses. Our results are in contrast with the traditional notion that subclinical thyroid dysfunction, and subclinical hypothyroidism in particular, is associated with depressive symptoms. Consequently, our results do not support the practice of prescribing levothyroxine in patients with subclinical hypothyroidism to reduce the risk of developing depressive symptoms.


Abbreviations
Subclinical thyroid dysfunction is common in the adult population and its prevalence increases with age, affecting up to 10-15% of older adults 1 . Patients are diagnosed with subclinical thyroid dysfunction when their serum thyroid-stimulating hormone (TSH) levels are below or above the reference range, but when their serum free thyroxine (fT4) levels are still within the reference range. Only a few symptoms are usually linked to subclinical thyroid dysfunction. Several guidelines discuss the association between depressive symptoms and thyroid dysfunction, and the potential benefit of levothyroxine for patients with the two diagnoses [2][3][4][5] . However, as the evidence is low, guidelines do not recommend to routinely treat patients with subclinical hypothyroidism and depressive symptoms with levothyroxine. Nevertheless, a study among GPs reported that the presence of depressive symptoms or low mood influence their decision whether or not to start treatment for subclinical hypothyroidism 6 . The association between subclinical thyroid dysfunction and depressive symptoms is unclear because studies to date have yielded conflicting results. Several studies showed that participants with subclinical hypothyroidism or subclinical hyperthyroidism had more severe depressive symptoms, but other studies reported no differences [7][8][9][10][11][12][13][14][15] . The largest prospective study published showed no association between subclinical hypothyroidism and incidence of depression after 2 years of follow-up 16 , whereas depressive symptoms were associated with subclinical hyperthyroidism (but not subclinical hypothyroidism) in another prospective study 11 .
These conflicting results may be explained by differences in outcome definition and in the statistical methods used to analyse data. Individual participant data (IPD) help researchers to standardise the analyses and definitions used across studies and make it possible to identify the effects for different subgroups in large study populations 17 . For example, IPD allow the use of uniform cut-off levels of TSH to define thyroid status for each study, the same model for the analysis in each study, and stratification by age and sex.
We thus aimed to assess the association between subclinical hypothyroidism or subclinical hyperthyroidism and future development of depressive symptoms by conducting an analysis of IPD from prospective cohort studies.

Methods
We registered this systematic review and IPD analysis in the international Prospective Register of Systematic Reviews PROSPERO (CRD42018091627) and published the study protocol 18 . This study adheres to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement for IPD systematic reviews 19 .
Search strategy and selection criteria. We performed a systematic literature search in Ovid Medline, Ovid Embase, Cumulative Index to Nursing and Allied Health Literature (CINAHL), and in the Cochrane Library, from inception to 10th May 2019. We included publications from prospective studies that measured at least baseline TSH in adults and assessed depressive symptoms during follow-up on a validated continuous depression scale or diagnosis of depression (e.g. through ICD-10 or DSM-V codes). The following search items were used: thyroid diseases, hyperthyroidism, hypothyroidism, thyroid hormones, triiodothyronine, thyroxine, thyrotropin, subclinical, sub-clinical, mild, subnormal, pre-clinical, preclinical, depression. We did not include studies of only depressed patients, pregnant women, or women wanting to get pregnant. We included studies in any language and any publication year. We worked with two experienced librarians to develop the search strategy in Ovid Medline and then translated it to match subject headings and keywords for the other databases. See Appendix 1 for details of the Medline search strategy. Details of the systematic literature search, inclusion and exclusion criteria, and IPD analysis have been described in detail elsewhere 18 . We identified additional unpublished data by contacting the Thyroid Studies Collaboration, a consortium of cohort studies that investigate the association between subclinical thyroid dysfunction and clinical outcomes 20 .
Data extraction and quality assessment. We contacted investigators from all prospective cohort studies that met the inclusion criteria to collaborate in our IPD analysis by sharing their data. We requested IPD on thyroid status at baseline (TSH, fT4), and free triiodothyronine (fT3)), socioeconomic status (education, income), demographics (sex, age), medication use (levothyroxine, anti-thyroid, anti-depressant, thyroid-altering medication, including lithium, and amiodarone), and on depressive symptoms as measured on a validated continuous scale at baseline and at any follow-up. Each study was approved by its local ethics committee and all participants gave informed consent for the original studies. We used the Newcastle-Ottawa Scale (NOS) to assess the quality of the studies included 21 . The NOS contains eight items that focus on selection, comparability and outcome. The scale is scored from zero to nine stars; the highest score indicates the best methodological quality. We classed the studies as good, fair, and poor quality based on their star rating. We also assessed the certainty in the evidence with the GRADE tool (www.grade worki nggro up.org). The certainty of evidence based on observational studies is «low», and may be decreased to «very low» for several reasons including study limitations (i.e. study quality), inconsistency, indirectness, imprecision, or increased for other considerations like particular design features of extremely rigorous well-conducted observational studies 22 . For assessing the study limitations, the final NOS score of the studies included in the main analysis was used. E.g. if all the included studies have a good NOS quality score the study limitations in the GRADE can be judged as "not serious" (Appendix 4).
Thyroid function testing. Consistent with our previous IPD analyses, we used uniform TSH cut-off levels and study-specific fT4 cut-off values to define the thyroid status 20,23,24 . We defined euthyroidism as TSH levels between 0.45 and 4.50 mIU/L. Subclinical hyperthyroidism was defined as TSH levels < 0.45 mIU/L with normal fT4 values, and subclinical hypothyroidism as TSH levels > 4.50 and < 20 mIU/L and fT4 values within the reference range. For fT4, we used study specific cut-offs because these measurements show greater inter method variation than do sensitive 3rd generation TSH assays 20 . We excluded participants with fT4 values out of the reference range. When fT4 values were missing, we considered participants with TSH levels below 0.45 mIU/L to have subclinical rather than overt hyperthyroidism and participants with TSH levels above 0.45 mIU/L and below 20 mIU/L to have subclinical rather than overt hypothyroidism, because most adults with a TSH level in this range rather have subclinical than overt thyroid dysfunction 25,26 . We performed a sensitivity analysis excluding participants with missing fT4 to verify that the results remained robust, meaning that the effect size was not clinically different from the main result. We additionally performed a sensitivity analysis examining the difference in depressive symptoms between the subclinical hyperthyroidism and euthyroid participants, excluding participants with missing fT3 levels or values outside the reference range. We completed two sensitivity analyses excluding participants with thyroid medication (levothyroxine or anti-thyroid medication), and with thyroid altering medication (anti-thyroid drugs, levothyroxine, amiodarone, lithium) at baseline or follow-up.
Depressive symptoms. Since studies used different continuous scales to measure depressive symptoms, we converted scores from different scales to the Beck Depression Inventory (BDI) scale, a commonly used depressive symptoms scale 27,28 . The BDI scale ranges from 0 to 63, with higher values indicating greater frequency and severity of depressive symptoms; the minimal clinically important difference is 5 points 28  www.nature.com/scientificreports/ to convert the Center for Epidemiological Studies Depression (CESD) scale to the BDI, we used a conversion factor of 1.05 (63 (BDI range) ÷ 60 (CESD range)) 28 . To transfer measurements from the CESD scale to the BDI scale, we then multiplied each individual's CESD value by the conversion factor. The primary outcome was depressive symptoms measured at first available follow-up, expressed on the BDI scale. As previously defined in the study protocol we converted measurements to the BDI scale instead of a standardised scale to facilitate the interpretation 18,28 . In a sensitivity analysis, we used the original scale to calculate the mean difference in each study and then we pooled the standardised mean differences (SMDs) across the studies. We coded SMDs so that positive values would indicate more severe depressive symptoms in participants with subclinical thyroid dysfunction than in euthyroid controls: < 0.40 was a small effect; 0.40-0.70 was a moderate effect, and > 0.70 was a large effect, respectively 29 . Since depressive symptoms could be influenced by medications, we conducted a sensitivity analysis excluding participants with antidepressant medication at baseline or follow-up. A secondary outcome was depressive symptoms at baseline, expressed on the BDI scale. An additional secondary outcome was depressive symptoms at the last available follow-up and at the third year of follow-up, expressed on the BDI scale. We chose a follow-up at year three because most of the cohorts had this follow-up time in common. Studies without a 3-year follow up were excluded from this analysis. Another secondary outcome was incidence of depressive symptoms at the first available follow-up. For the outcome of incidence of depression, we analysed data on diagnosis of depression or established cut-off points for presence of depression from the continuous depressive symptoms scales (cut-off points were defined as; ( 33 ). In the analysis of incidence of depression, we excluded participants with diagnosed depression or with depressive symptoms score above the cut-off at baseline. In the primary outcome analyses, we excluded participants with missing data on depressive symptoms at baseline or follow-up. In a sensitivity analysis, we did not exclude those participants, but used multiple imputation for missing data on depressive symptoms. We additionally conducted a sensitivity analysis excluding participants with dementia (Mini-Mental State Examination (MMSE) score < 24, or diagnosis of dementia), since the relationship between dementia and depression is complex 34 .

Analysis.
We performed a two-stage IPD analysis. In the first stage, we estimated the effect size for each study separately. For the primary outcome, we calculated the mean difference in BDI score between participants with subclinical hypothyroidism or hyperthyroidism and euthyroid controls using a multivariable linear regression model adjusted for age, sex, depressive symptoms at baseline, education, and income. We only included studies with data on depressive symptoms at baseline because adjusting for depressive symptoms at baseline adjusts for imbalance and accounts for the correlation between baseline and follow-up, which makes the effect estimates more precise 35 . In a sensitivity analysis, we additionally included studies without baseline data on depressive symptoms. When comparing incidences of depression, we calculated odds ratio for incidence of depression at the first available follow-up between subclinical hypothyroidism or hyperthyroidism and euthyroid controls using a multivariable logistic regression model adjusted for age, sex, depressive symptoms at baseline, income, and education. For the cross-sectional analysis at baseline, we calculated mean difference in BDI score between participants with subclinical hypo-or hyperthyroidism and euthyroid control, adjusted for age, sex, education, and income. In the second stage of the IPD analysis, we pooled derived mean differences or odds ratios from all the different studies from the first stage using a random effects model.
To identify sub-populations at risk and possible sources of heterogeneity, we conducted predefined subgroup and sensitivity analyses on the primary outcome. We performed predefined subgroup analyses by age (younger and older than 75 years old), by sex, by TSH levels (4.51-6.99 mIU/L, 7.00-9.99 mIU/L, 10.0-20.0 mIU/L) 20 , and by levothyroxine use at baseline.
We performed a sensitivity analysis excluding the HUNT study as this was the biggest study included in the analysis and therefore had the biggest weight for the overall result.
Heterogeneity was estimated with I 2 and the Q test. Statistical significance testing was 2-sided and P < 0.05 was considered statistically significant. All analyses were conducted with Stata, release 15 (StataCorp).

Ethics approval and consent to participate. Statement on ethical approval from ethics committee This is
a manuscript that analysed existing cohort data. Each study in the individual participant data set received local ethical approval. Our analysis did not include identifiable data.
Statement on guidelines followed The manuscript adheres to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement for IPD systematic reviews and to the PRISMA statement for systematic review protocols.
Statement on written informed consent from the participants This manuscript analysed existing cohort data, each study in the individual participant data set received informed consent from the participants. Our analysis did not include identifiable data.

Results
Of the 1047 studies we identified through the literature search, four studies met our inclusion criteria (Appendix 2) 11,13,16,36 . From within the Thyroid Studies Collaboration we identified seven additional studies where data on subclinical thyroid dysfunction and depressive symptoms had not been published. We invited the investigators of eligible studies to collaborate in this IPD analysis; only one investigator, from a study identified by the literature review, declined to participate 16 . This study only presented a dichotomised result in the publication, so we could not combine this aggregate data with our main IPD analysis. We received IPD from ten studies with 33,769 participants that met our inclusion criteria. For the primary outcome analysis, we included only studies with a continuous outcome, and in which depressive symptoms had been measured at baseline. Six studies met www.nature.com/scientificreports/ these criteria, including 23,038 participants. Mean age (± SD) was 60 years (± 13), 65% were female, and median TSH was 1.63 mIU/L (Table 1). 21,025 (91%) participants were euthyroid, 1342 (6%) participants had subclinical hypothyroidism, and, 671 (3%) had subclinical hyperthyroidism. In sensitivity analyses, we additionally included the three studies that did not measure depressive symptoms at baseline [37][38][39] . For the secondary outcome of incidence of depression, we additionally included the Health in Men Study, which did not use a continuous scale to measure depressive symptoms but assessed incidence of depression via data linkage 36 . Depressive symptoms scores at baseline were balanced between groups in each cohort and on average the correlation between depressive symptoms at baseline and follow-up was 0.6 across studies (Appendix 6).
Quality assessment. Based on the NOS, the quality of all studies that we included in the primary outcome analysis was good (Appendix 3). Certainty in the evidence assessed with the GRADE tool for the primary outcome was low (Appendix 4). Because of the low number of studies, we did not assess publication bias 40 .
Subclinical hypothyroidism. At first available follow-up (mean 8.2 (± 4.3) years), there was no difference in the primary outcome of BDI score between subclinical hypothyroidism and euthyroid controls (pooled mean difference (MD) 0.29, 95% CI − 0.17 to 0.76) with a low heterogeneity (I 2 = 15.6%) (Fig. 1). Secondary outcomes are shown in Fig. 2. At baseline there was neither relevant difference in depressive symptoms between euthyroid participants (mean BDI = 10.28) and participants with subclinical hypothyroidism (mean BDI = 9.63), nor in multivariable analysis adjusted for age, sex, education, and income (MD in BDI     Results of sensitivity analyses in which we excluded participants taking thyroid medication (N = 21,391), thyroid altering medication (N = 21,384) or antidepressant medication (N = 23,175), participants with dementia (N = 22,224), or participants without fT4 measurements at baseline (N = 5618), or in which we excluded the study with the biggest weight (HUNT) (N = 7043), were similar comparable to those from our primary analysis (Table 2). When we used multiple imputation, depressive symptoms between subclinical hypothyroidism and euthyroidism did also not differ in a sensitivity analysis that included participants with t4 data on outcome and depressive symptoms at baseline (N = 45,398). Results of a sensitivity analysis in which we included studies without baseline information on depressive symptoms (N = 27,361) were similar to those of the main analysis. In a sensitivity analysis that used the original scale from each study, we found SMD between groups of 0.04 (95% CI = − 0.02-0.09). Figure 3 shows the results of several subgroup analyses. After stratifying the population that was included in the primary outcome analysis by age (participants older and younger than 75), by sex, by levothyroxine treatment at baseline, and by different TSH levels, we found no significant differences in depressive symptoms scores.
Subclinical Hyperthyroidism. There was no difference in the primary outcome of depressive symptoms between subclinical hyperthyroidism and euthyroid controls (MD 0.10, 95% CI − 0.67-0.48, I 2 = 3.2%) at first available follow-up (Fig. 1), as well as at year three follow-up, and at last available follow-up (Appendix 5a). At baseline, there was no difference in depressive symptoms expressed on the BDI scale between euthyroid participants (Mean BDI = 10.26) and participants with subclinical hyperthyroidism (Mean BDI = 10.28) (Appendix 5). Odds for incidence of depression were not higher for participants with subclinical hyperthyroidism than www.nature.com/scientificreports/ for euthyroid controls (Appendix 5a). Our results remained robust in several sensitivity and subgroup analyses (Appendix 1b,c).

Discussion
In this IPD analysis of 23,038 participants from six prospective cohort studies, we found no clinically relevant differences in depressive symptoms during follow-up between subclinical hypothyroidism or hyperthyroidism and euthyroid controls. Depressive symptoms of participants with subclinical hypothyroidism or hyperthyroidism were not different from symptoms of euthyroid control participants at baseline or at any follow-up. Participants with subclinical hypothyroidism were not at increased risk for incidence of depression. Our results were robust across all sensitivity analyses. To our knowledge, no pooled IPD analysis has previously assessed the association of subclinical thyroid dysfunction and depressive symptoms. The results are in contrast with two previous meta-analyses of cross-sectional studies 7, 8 which found a positive association between subclinical hypothyroidism and depression. Reasons for difference in results could be that we included published and unpublished studies, that we did not include studies only on depressed patients, that we did not include case-control studies and cross-sectional studies (only cross-sectional analysis of prospective studies in our analysis, as they are considered of higher validity), and that we analyzed individual participant data, which leads to far more reliable results 17,35 . With individual participant data, we could use standardized definition of subclinical hypo-and hyperthyroidism for all studies, which was not possible in these study-level meta-analyses. Our results are in line with the results found by the Kangbuk Samsung Health Study showing that participants with subclinical hypothyroidism had no higher incidence of depression than euthyroid controls 16 .
Our study has some limitations. First, younger people were underrepresented because three of six studies included participants only over 64. However, our sensitivity analysis that excluded participants over 75 also yielded no association, but we were able to include too few participants to assess risk among middle-aged adults. Second, we were limited by measurement of depressive symptoms using different scales across studies, so that we  48 . Sensitivity analyses 1-4, 6, 8, 10: the same studies as in the main analysis were included, only participants with a certain measurement missing were excluded. Sensitivity analysis 5: the same studies as in the main analysis without the Health ABC Study 32 because in this study FT4 was not measured in the euthyroid group. Sensitivity analysis 7: the same studies as in the main analysis plus 3 studies that did not have data for depressive symptoms at baseline were included (PREVEND (Prevention of Renal and Vascular end-stage Disease) 49 , MrOS (Osteoporotic Fractures in Men) 50 , SHIP (Study of Health in Pomerania) 51 ). Sensitivity analysis 9: the same studies as in the main analysis without the HUNT 48 study as the HUNT study has the biggest weight in the summarized result of the main outcome (34.96%). ‡ Mean differences using the original scale for depressive symptoms within each study were pooled. § Overall mean BDI score at first follow-up of all 23,038 participants was 10.67 with a standard deviation of 8.97. www.nature.com/scientificreports/ were unable to combine the effect estimates from different studies using their original scales. To standardise the scale between studies, we converted scores from the various scales to the BDI scale, a common scale whose scores are easy to interpret 28 . As there was no validated conversion factor, we examined whether our results remained robust when we converted the original scores to a standardised scale, which yielded similar findings. Third, we did not have access to information about treatment prior to baseline; patients with subclinical thyroid dysfunction and depressive symptoms may have been more frequently diagnosed with subclinical thyroid dysfunction and been treated to restore euthyroidism prior to baseline, in which case the subclinical thyroid dysfunction group at baseline would overrepresent the number of people who did not develop depressive symptoms. Fourth, the diagnosis of subclinical hypothyroidism was based on one assessment of TSH, and did not depend on a second

Higher Depression Score in Euthyroid
Higher Depression Score in SHypo The association between subclinical hypothyroidism and depressive symptoms by subgroups* −2 Figure 3. The association between subclinical hypothyroidism and depressive symptoms by subgroups*. * Analysis adjusted for depressive symptoms at baseline, sex, age, and education (The CHS, Health ABC Study, and the InChianti Study were additionally adjusted for income www.nature.com/scientificreports/ verified measurement, which is a limitation of most published large cohorts that have examined the risk of subclinical thyroid dysfunction 20,41 . Based on a single elevated or decreased TSH level, participants might revert to normal thyroid function over follow-up, which could have biased the results towards the null. However, previous IPD analyses including cohorts with just one TSH assessment documented an association between subclinical thyroid dysfunction and both coronary heart disease or fractures 20,24 . Inferring a causal relationship from observational data is challenging and, for this reason, we completed a series of sensitivity analysis to minimise the potential effects of residual confounding and bias. Our study was strengthened by its IPD analysis, considered the most appropriate method in evidence synthesis since it offers many advantages over aggregate data analysis 42 . For example, our results do not suffer from the ecological bias of study-level meta-analyses. We could also standardise definitions of predictors and outcomes, use uniform adjustments for potential confounders to reduce heterogeneity across studies, and include unpublished data to increase the robustness of our results and our power to detect associations. Because our IPD analysis was large, we could assess the effects of age, sex, thyroid medication, antidepressant medication, and TSH levels in subgroup analyses. Current guidelines for the management of people with subclinical hypothyroidism tend to recommend thyroid hormone substitution for adults with TSH levels > 10 mIU/L and for people with lower TSH values who are young or symptomatic, although some recent guidelines have more narrow indications 43,44 . As we found no association between subclinical hypothyroidism and depressive symptoms, one might infer that thyroid hormones would be of limited benefit for the treatment of depressive symptoms in affected people with subclinical hypothyroidism. This is in line with a previous meta-analysis of four small randomised clinical trials (total N = 278) that found no positive association between thyroid hormone therapy and depressive symptoms 45 . Overall, our results do not support increased risk of depressive symptoms in adults with subclinical thyroid dysfunction.

Data availability
The datasets analysed during this study are available from the corresponding author on reasonable request. www.nature.com/scientificreports/