Introduction

In 2015, diabetes caused an estimated 1.6 million deaths and ‘higher-than-optimal’ blood glucose was responsible for a further 2.2 million, through both direct clinical progression and as a risk factor for cardiovascular and kidney diseases [1]. Pivotally, age-standardised mean fasting plasma glucose has increased globally by 0.07 mmol/L per decade or more [2]. The global deterioration of glycaemic health highlights a need to address this problem before it reaches the clinical stages of disease.

Fasting plasma glucose generally follows a relatively stable linear trajectory and has been observed only to steeply increase up to 3 years before onset of diabetes [3]. Thus, preserving normal fasting plasma glucose levels may be key to maintaining good metabolic health and substantially delay diabetes onset.

The pathways leading to deterioration of glycaemic health are more complex than a linear cause-effect model may suggest. Maintaining normoglycaemia may rely on the interplay of diverse causal factors, including those at a biological, socioeconomic and psychosocial level. This follows the theory of the bio-psychosocial model for biomedicine crystallised by Engel four decades ago [4]. He formulated a personalised model for patient-care in which there would be mutual influence of the mind and body, in order to understand disease aetiology [5, 6].

Part of the challenge of implementing a comprehensive model of health lies in defining and modelling psychosocial determinants of health. Despite growing research in this field, there is no apparent consensus in the literature on a single definition of psychosocial health. In this study, we used the following WHO definition of mental health (2014) as a guiding principle: “a state of wellbeing in which every individual realises their own potential, can cope with the normal stresses of everyday life, can work productively and fruitfully, and is able to make a contribution to their community.” [7]. Specifically, we focused on the four highlighted components in order to identify and practically apply the psychosocial aspect of/to the model.

So far, despite much discussion there has been less focus on using individual-level data to formulate a practical model to guide clinical practice. In order to address this challenge, we have taken an empirical approach that is data-driven and exploratory, using data from a general birth cohort to translate this theoretical model into a practical framework that may be used to personalise preventative healthcare.

Methods

Study population and design

The study population comprised participants of the Northern Finland Birth Cohort 1966 (NFBC1966). It is an unselected, general population birth cohort including 96.3% of all births during 1966 in the two northern provinces of Finland, with clinical adult follow-ups at 31 and 46 years. The current analysis focuses on these follow-ups. The 31-year follow-up, conducted during 1997, consisted of a target population of 11,322 eligible individuals alive and living in Finland at this time (Supplementary 1). Of the target population, 77% of individuals completed the background questionnaire and 71% attended a clinical examination. The latest 46-year follow-up was conducted between April 2012 and February 2014. Of the target population consisting of 10,321 eligible individuals (Supplementary 1), 57% attended the clinical examination.

Individuals self-reporting a diabetes diagnosis at 31 years (n = 51) were excluded. A further 3,638 of the population did not participate in the 46-year clinical follow-up, and thus did not have an outcome measure. After exclusion, the  final study sample  comprised 5,078 participants (43% male) (Supplementary 1). The final sample included 103 participants who had not fasted before blood sample collection as statistical analyses showed no difference in laboratory measurements (t-test, P > 0.05) between these groups.

The study was approved by the Ethics Committee of the Northern Ostrobothnia Hospital District. All participants gave written informed consent.

Biological and psychosocial variable selection

We used a systematic approach for bio-psychosocial indicator selection and applied the following criteria in the given order. Following an inventory of available variables within the NFBC1966 31-year data collection, we selected all those which fulfilled the following:

Criteria 1: A component of metabolic syndrome [8], consisting of measures relating to adiposity, insulin sensitivity, lipid levels and blood pressure, or an indicator of psychosocial health, based on the four components highlighted previously in the WHO (2014) definition; well-being, stresses of everyday health, work and community. We reviewed the data inventory and selected variables using our own judgement and a priori knowledge. Four members of the team independently reviewed the available data to ensure we did not miss any relevant variables.

Criteria 2: Easily obtainable as part of general routine health check-up. Table 1 shows the full variable list.

Table 1 Bio-psychosocial variables from 31-year data inventory according to selection criteria 1 & 2

Criteria 3: Associated with the outcome, i.e. fasting plasma glucose (P < 0.10) at 46 years. The F-test determined inclusion of categorical variables (P < 0.10).

Data collection

Metabolic variables at 31 years

Participants were invited to a clinical examination as described elsewhere [9]. Height and weight were measured to an accuracy of 0.1 cm and 0.1 kg, respectively, and converted to BMI (kg/m2). Waist circumference was measured from the point midway between the costal margin and iliac crest and recorded to an accuracy of 0.1 cm. Systolic and diastolic blood pressure (SBP and DBP) were measured twice with a mercury sphygmomanometer in sitting position from the right arm after 15 minutes of rest; two readings were taken and the average measurement used.

Laboratory samples were taken from participants. Analyses were conducted within 24 h for serum high density lipoproteins cholesterol (HDL-C) and triglycerides (TG) and determined by enzymatic methods using a Hitachi 911 Chemistry Analyser (Roche, Boehringer Mannheim, Germany). Blood samples for glucose assays were stored at −20 °C and analysed within 7 days of sampling by a glucose dehydrogenase method (Granutest 250, Diagnostica Merck, Darmstadt, Germany). Samples for assay of serum insulin were stored at −20 °C and analysed within 7 days of sampling using RIA (Pharmacia Diagnostics, Uppsala, Sweden) [9]. A correction constant was applied for participants having lipid or blood pressure altering medication (Supplementary 2) [10, 11].

Psychosocial indicators at 31 years

Information on psychosocial indicators was derived from a postal questionnaire. Questions related to well-being, stresses of everyday life, work and community. The full questions and response descriptions are available in Supplementary 3.

Outcome measure (fasting glucose) at 46 years

At 46 years, all participants attending the clinical examination followed similar protocol as before. Relevant to this study, blood samples were taken from participants after an overnight fast. Plasma samples were stored at −20 °C and analysed within 7 days of sampling for fasting plasma glucose (glucose dehydrogenase method; Advia 1800, Siemens Healthcare Diagnostics Inc.,Tarrytown, Ny, USA Country).

Statistical analysis

Variable selection

Descriptive statistics were generated for all explanatory variables and outcome measure and distributions were examined for normality. Univariate linear regression was used to assess association of each explanatory variable with the outcome of fasting plasma glucose at 46 years. A pearson’s correlation matrix was subsequently used to ensure factorability of the selected variables (Fig. 1). Males and females were analysed together as there were no major differences in factor loading patterns when stratified in preliminary analyses.

Fig. 1
figure 1

Correlation matrix of bio-psychosocial indicators. BMI body mass index, HDL-C high density lipoprotein cholesterol, TG, triglycerides, SBP systolic blood pressure, DBP diastolic blood pressure. Blue represents positive correlations and orange represents negative correlations. The size of dot reflects the strength of correlation

Factor analysis

In all models, factor analysis was conducted using Mplus 7.0 [12]. Mplus uses full information maximum likelihood method to estimate the model parameters in order to account for missing data. [13].

Exploratory factor analysis (EFA) was conducted to identify the factor structure, followed by confirmatory factor analysis (CFA) to produce the final model. The dataset was halved using random number generation to perform cross-validation. There are no standardised guidelines for minimum sample size to be used for a valid EFA, although the general consensus is the larger the N and thus N:item ratio, the better [14]. Our sample size in the development half of the dataset is 2,556 reflecting a 134:1 participant:item ratio which is well above the 5:1 ratio suggested by some authors [15]. Geomin (oblique) rotations were used as it was hypothesised that the factors are likely to be correlated. Due to inclusion of categorical variables, the model parameters were estimated using mean-adjusted and variance-adjusted weighted least squares (WLSMV) method.

Optimum factor structure was selected based on a combination of model fit statistics, examination of scree plot and eigenvalues and inspection of factor loading patterns to ensure a scientifically feasible model [16]. At this stage, strength of factor loading was used to remove variables, which did not load significantly onto any factor or loaded with a score less than 0.3 onto any one factor. We used 0.3 as a cut-off based on our large sample size (n = 5,078) [17]. Model fit was assessed using Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI) and Tucker Lewis Index (TLI) values. Commonly implemented guidelines of less than 0.08 for RMSEA and greater than 0.90 for CFI and TLI were used to assess model fit [18,19,20]. A chi-square test was also conducted to evaluate the congruency between the hypothesised model and empirical evidence, although it is well recognised that chi-square tests are sensitive to large sample size [18, 21].

Factor scores were extracted and used in multivariable linear regression models to assess associations and interrelationships between factor scores and fasting glucose. These models were adjusted for sex and each of the other factors using a forward stepwise approach. The lowest AIC was primarily used to determine the best prediction model, however we also considered R2 and BIC.

Results

Variable selection

Following data inventory, 26 variables relating to bio-psychosocial health were selected. We excluded social anhedonia according to criteria 2 [22, 23]. Of the remaining 25 variables (Table 1), we excluded employment history, anxiety, functioning, optimism, active and passive coping as they did not associate with fasting glucose at 46 years (criteria 3).

Distinct correlation clusters were visible, highlighting groups of biological and psychosocial related variables (Fig. 1). The strongest correlations were present between the cardio-metabolic variables.

Bio-psychosocial factors

A base of 19 items relating to bio-psychosocial health were entered into an EFA using the testing half of the dataset (n = 2,556). Five eigenvalues were greater than one, and therefore we tested models with a one-factor to five-factor structure. Examination of the scree plot (Supplementary 4) suggested that a three, four or five-factor structure fitted best, although model fit statistics tended to favour the four or five-factor model (Table 2). These factors also showed consistency with the patterns observed in the correlation matrix (Fig. 1) and appeared to separate into biological, socioeconomic and psychosocial latent factors. As the number of factors increased, the biological variables separated into metabolic and blood pressure factors and psychosocial separated into a psychological and additional (psycho)social factor.

Table 2 EFA of bio-psychosocial variables at 31 years associated with fasting glucose at 46 years (n = 2,556)

In all three models, variables loaded strongly to their factor. We then looked for a clean factor structure, i.e. no cross-loadings, and scientific plausibility [14]. The five-factor structure was excluded due to strong cross-loading of insulin and marital status, and difficulty in scientifically identifying the additional (psycho)social factor (Supplementary 7). Two items (glucose and adaptive coping) were excluded from the final model due to loading below 0.3 to any one factor (Table 2).

Factors were named to reflect their included variables. Basic and further education, occupation and household income loaded onto the first factor named ‘socioeconomic’ and accounted for 19% of the total variance of the model. Insulin, waist circumference, BMI, HDL-C and TG loaded onto a ‘metabolic’ factor and accounted for 14% of variation. Marital status, employment status, home ownership, depression, sleep quality and life satisfaction accounted for 9% of variation and was named the ‘psychosocial’ factor. SBP and DBP loaded onto a ‘blood pressure’ factor and accounted for 7% of variation (Supplementary 4).

This model was replicated in the other half of the dataset and CFA was then performed using the full dataset. This demonstrated a good fit for the data (RMSEA = 0.065; CFI = 0.92; TLI = 0.90) and was chosen as the final model (Fig. 2).

Fig. 2
figure 2

Confirmatory factor analysis of four-factor structure containing the bio-psychosocial indicators. BMI body mass index, HDL-C high density lipoprotein cholesterol, TG triglycerides, SBP systolic blood pressure, DBP diastolic blood pressure. Boxes represent observed indicators, circles represent latent factors and two-way arrows represent correlation between factors. Pearson correlation coefficients are written in italics. χ2 = 2510.83, df = 113, N = 5 078, P < 0.01; CFI = 0.92, RMSEA = 0.065

Sensitivity analysis

EFA was also conducted in the full sample and demonstrated similar model fit, strength of loading and loading patterns (Supplementary 5). Additionally, CFAs were conducted for all three of the potential factor structures to examine model fit and strength of loading of each variable. This confirmed the four-factor structure was optimal (Supplementary 6 & 7).

Predicting fasting glucose from factors scores

Table 3 shows univariable results for associations between each factor and fasting glucose at 46 years (model 1). The socioeconomic, metabolic and blood pressure factors were all associated (P < 0.05) with fasting glucose, and this remained following sex adjustment (Model 2). The psychosocial factor, however, was not associated with fasting glucose at 46 years until it was adjusted for sex.

Table 3 Association of 31-year factor scores with fasting glucose at 46 years

Multivariable regression analysis assessed whether associations remained significant when additionally adjusted for the other factors (model 3) and the forest plots show their stepwise addition (Fig. 3). The metabolic factor appeared to have the most robust effect on fasting glucose. Unsurprisingly, the metabolic factor subsequently had a large influence on the association of the other factors with fasting glucose. Although the socioeconomic factor initially had the largest effect, it was completely attenuated by the addition of the metabolic factor. This was also the case for blood pressure. Noticeably, the psychosocial effect on fasting glucose at 46 years was only attenuated by the socioeconomic factor.

Fig. 3
figure 3

Forest plots showing the effect estimates (beta, 95% CI) of each factor on fasting plasma glucose at age 46 years. Each factor has been sequentially adjusted for sex and the other factors. S socioeconomic factor, M metabolic factor, P psychosocial factor, BP blood pressure factor

Of the 11 possible factor combinations (Supplementary 8), the best prediction model for fasting glucose at 46 years was the combination of metabolic and psychosocial factors when adjusted for sex. This explained 10.7% of the variation in fasting glucose.

Attrition analysis

There were 5,641 participants with a recorded fasting glucose measurement at age 46 years. However, 563 of these participants did not partake in the 31-year follow-up and thus were not included in this study. These participants were more likely to be male (55%; chi-sq < 0.05) and tended to have higher fasting glucose at 46 years (5.65 vs 5.50 mmol/L; P < 0.05).

Discussion

Using a systematic variable selection strategy, we derived four latent factors at age 31 years underlying fasting plasma glucose in midlife. These factors delineated socioeconomic, metabolic, psychosocial and blood pressure components and were named accordingly. Analyses on their effect size against fasting glucose 15 years later in life brought additional insights on the nature of these associations. It particularly highlighted independent effects for the metabolic and psychosocial factors.

Latent factors

EFA showed distinct variable clusters similar to biological and psychosocial groupings observed in the correlation matrix (Fig. 1). The fit statistics and general structure were consistent in both the full dataset and random half used for cross-validation. Minor differences were observed only in the loading of glucose to the socioeconomic factor in the three-factor structure and the additional cross-loading of marital status to the psychological factor in the five-factor structure (Supplementary 5).

Metabolic syndrome is characterised by simultaneous observation of metabolic abnormalities including abdominal obesity, hypertension, hyperglycaemia and dyslipidemia [24, 25]. Previous studies attempting to capture the factorial structure of metabolic syndrome used an a-priori driven CFA that enforced each of the metabolic syndrome components onto four first order factors (adiposity, insulin resistance, hypertension, dyslipidaemia) before loading onto an overall metabolic factor [26, 27]. Although their model demonstrated good fit according to standard criteria, glucose loaded weakly onto the insulin resistance factor, and hypertension had the weakest loading onto the overall metabolic factor, suggesting that the model may be improved. We observed similar patterns of results in our factor structure; initially the blood pressure variables clustered with the metabolic components. However, we discovered a better fit for the data was observed when these separated into an individual blood pressure factor. Although the variables in the present approach were selected using predetermined generic criteria, the procedure was data-driven. This allowed us the opportunity to observe the best fit for our data without enforcing a structure based only on a-priori categorisation of metabolic syndrome. It may be surprising that glucose at 31 years was excluded from our final model. It did not load strongly to any one factor, but loaded significantly onto every factor. This is not unexpected as variables were selected based on their association with fasting glucose at 46 years. It is also interesting that as the factors separated further, glucose shows a higher affinity towards the socioeconomic factor than the biological as we may expect. This is more evident in the testing dataset results (Table 2), but can also be observed in the full dataset, particularly within the five-factor model (Supplementary 5).

In terms of the less understood non-biological factors, using a data-driven, exploratory approach allowed us to elucidate factors consisting of variables that shared some commonality. It helped differentiate between those variables more representative of socioeconomic position, and variables, which we believe relate more to psychosocial health.

Basic and further education, occupation and household income clustered together to form the socioeconomic factor. Although these measures seem similar, they all capture slightly different pathways in which health may be influenced. Income provides access to resources, which enables greater options in food choices, a higher level of health care and more physical activity possibilities [28]. Education provides increased awareness of health issues and a greater willingness to engage in healthy behaviours [29]. Occupation may determine flexibility in working hours and schedule, thus allowing available time to engage in leisure and exercise. Additionally, worksite health promotion programmes and policies that protect occupational safety may also play a role [28]. Traditionally researchers have used only one of these measures to study socioeconomic position, but the use of a composite factor may reflect many complex differences in social health.

Marital status, home ownership, employment status, depression, sleep quality and life satisfaction loaded onto another factor, which we have termed psychosocial. Martikainen et al [30] has suggested a working definition of psychosocial determinants of health as “pertaining to the influence of social factors on an individual’s mind or to the interrelation of behavioural and social factors”. The implication of this definition is that psychosocial factors in the context of health research can be viewed as mediating effects of socioeconomic structural factors on individual health outcomes or conditioned and modified by the socioeconomic structures in which they exist.

Predicting fasting glucose from factors scores

The metabolic factor demonstrated the most robust and stable association with glucose at age 46 following multiple adjustments. The blood pressure factor was also significantly associated with fasting glucose both unadjusted and adjusted for sex. However, when additionally adjusted for all other factors, its contribution to the model was no longer observed. We noted that it is specifically the addition of the metabolic factor which attenuates this relationship, indicating that the effects of blood pressure on later fasting glucose may be mostly mediated by metabolic components. Nonetheless, despite the separation of blood pressure to form a separate factor from the metabolic factor, we observed that they were still highly correlated (pearson correlation coefficient = 0.59; Fig. 2). We observed an especially strong correlation structure between blood pressure and the insulin-WC-BMI cluster (Fig. 1) that supports the hypothesis of a common origin of these biological functions [31]. There seems to be shared molecular architecture, probably originating from foetal development that links these functions together with the regulation of fasting glucose [31]. However, knowledge gaps remain to delineate the causal mechanisms and their effect on the regulation of glycemic health.

The socioeconomic factor also demonstrated a significant association with fasting glucose, which persisted following adjustment for sex but not adjustment for all other factors. Looking closer at the stepwise addition of factors to the socioeconomic factor, it appears that it is only the metabolic factor which attenuates its effect. This suggests that some of its influence may be explained, at least in part, by health behaviours acting on metabolic processes. This is consistent with findings of recent publications demonstrating relationships between socioeconomic position and cardio-metabolic health outcomes [32,33,34,35]. Health behaviours such as alcohol consumption, smoking status, dietary intake and physical activity directly impact metabolic status, particularly adiposity and lipid levels. In the present study we did not use lifestyle information as it is difficult to obtain objective measurements, especially in routine health care. However, we believe lifestyle is captured within this socioeconomic factor as it is frequently reported [32, 33] and can be included in further models to investigate the effects of modifiable factors.

The effect of the psychosocial factor was in contrast to the others. No association with glucose was observed when unadjusted. However, sex-adjustment and adjustment for all other factors, showed a significant association which was not even diminished by the strong effects of the metabolic factor. Closer examination of the stepwise sex-adjusted model showed that only the addition of the socioeconomic factor attenuated the psychosocial relationship with glucose. Psychosocial and socioeconomic factors are very closely related and appear to be linked by household income as seen in cross-loadings from EFA (Table 2) and correlation matrix (Fig. 1). Thus, it is expected that their effects on the outcome are also closely related and may act via similar pathways. We observed no significant sex differences in the formation of the factors, however, a larger sample size would allow sex-specific analysis and may help to identify what is causing this factor to behave differently in males and females.

We speculate that the psychosocial factor may actually be capturing a ‘stress’ effect. Strong biological links exist between glucose metabolism and neuroendocrine responses to variation in psychosocial well-being. The hormonal outputs of the hypothalamic-pituitary-adrenal axis, such as the glucocorticoids, acutely alter short-term glucose metabolism [36]. Long-term exposure to psychosocial stressors may contribute to chronic glycaemic dysregulation in individuals. The independent effects of the psychosocial and metabolic factors suggest that there are separate biological pathways in which fasting glucose levels are maintained [37,38,39,40].

Strengths and limitations

We used a large, unselected birth cohort that is particularly rich in data during these two follow-up periods. At 31 and 46 years, the participants are still young, and therefore less likely to experience acceleration in glucose associated with T2D or its pre-clinical stages. However, we acknowledge that more frequent follow-ups would allow us to identify participants with high risk of developing T2D.

We recognise that our study does have some limitations. As with all longitudinal designs there is missing data and this may be partly due to attrition. We conducted attrition analysis in order to take this into consideration, and we aimed to overcome the missing at random data by using Mplus software, which compensates for missing data. We have tried to be as descriptive as possible in naming the latent factors, however, it is challenging to accurately capture the trait they are representing.

There is always a trade-off between incorporating the maximum available dimensions to produce a completely comprehensive model, and using widely available and easily accessible measures to achieve similar results which is what we have been aiming to do in this study. For example we did not include lifestyle variables as they are time-consuming to analyse in a meaningful manner and are likely to already be captured via the socioeconomic factor. However, we have demonstrated reasonably good model fit statistics for all models, which can be explained scientifically and we have used cross-validation techniques to further strengthen design.

Implications

This study is the first step in developing a model, which may be used clinically to identify those with an increased risk of developing poor glycaemic health and T2D. Early identification of these individuals can provide an opportunity to implement targeted interventions and policy recommendations for personalised prevention. The following steps as part of the DynaHEALTH project will aim to translate this systematic approach to create risk scores during the life course to reflect the dynamic and trajectory of deteriorating glycaemic control.

Conclusions

The present study supports evidence for the bio-psychosocial nature of adult glycemic health and utilises an example to understand and reduce its complexity. To date, most studies have attempted to analyse the biological and psychological factors separately, making it difficult to distinguish the relations of these components. However, this is critical in the context of complex life-long, non-communicable diseases such as T2D where physiological and social functioning are impacted. Here we reported a systematic data-driven approach to study the relationship between the factors associated with the maintenance of normal fasting glucose. The methodology employed brings transparency in variable selection and is easily transferable to other traits and complex diseases with strong interplay between biological and psychosocial factors.