## Introduction

Self-rated health (SRH) is one of the most frequently used measures in epidemiological, clinical and social research. It is known to predict mortality, future functional status and outcome of treatment in populations that vary by age, gender, social class, health status, country and culture1,2. In many studies SRH remains a significant predictor of mortality after adjustment for several other health indicators2. However, the wider the array of objectively measured health variables, the weaker is the independent predictive power of SRH3,4. Therefore it is plausible that the association between SRH and mortality is not causal but is due to the ability of the self-assessment to more exhaustively capture the realm of health and the objective bodily condition than most other health indicators2,5.

Jylhä (2009) has suggested that when asked to evaluate their general health status, respondents will take into account any individual relevant information that they think describes their “health”. This information is then considered in the context of the social and psychological situation. Empirical studies show that individuals will mainly take into account their medical diagnoses and functional status1, but also experienced symptoms, medication and other signs of health problems, and particularly in the absence of any evident health problems, different health-related lifestyles and risk factors6.

A number of studies have found associations between SRH and biomarker levels (i.e. quantities of biomolecules) in blood, such as white cell count, albumin and haemoglobin3, HDL cholesterol3,7,8,9, leptin10, TNF-α and IL-1ra11, CD19+ cells and IgG12, CRP13,14,15, IL-613,16,17, fasting plasma glucose and glycosylated haemoglobin18 and vitamin D19,20. The findings are fragmented, however, as each study usually addresses only a few indicators. Furthermore, most samples have been small, and in many cases other health data have been less than optimal. At least in part, these associations probably reflect the severity and symptoms of chronic diseases. Yet it is possible to hypothesize that the level of and changes in some biomarkers may, through interoceptive processes, be associated with sensations and symptoms that individuals take into account in their self-ratings of health. Overall, little is known about the experiential counterparts of variations in blood biochemistry, although the connections of fatigue with peripheral inflammation for example, are well described21,22.

In order to understand the potential of SRH as a measure of health in clinical practice and research, it is important to know how accurately SRH reflects the condition of the human body. In the present explorative study, we investigated the associations of SRH with a wide array of biomolecule levels measured in blood and urine. These biomarkers provide more detailed information about the condition of the human body than diagnostic names alone as they can reflect the stage and severity of current pathologies as well as the physiological processes taking place in individuals without clinical diagnoses. We hypothesize that the possible association between SRH and biomarkers is largely, but not entirely, explained by diseases and physical functioning, which at least to some extent reflects the severity of diseases. We also hypothesize that the association of SRH and mortality is partly explained by the measured biomarkers. We addressed the following questions: (i) to what extent are the biomarkers available in the study associated with SRH; (ii) to what extent are the associations between biomarkers and SRH explained by disease diagnoses and physical functioning; and (iii) do the biomarkers associated with SRH explain part of the association between SRH and mortality? Analyses i) and iii) were also performed for individuals without diagnosed diseases. We used three extensive population-based data sets: MARK-AGE, the Copenhagen Aging and Midlife Biobank (CAMB), and Health 2000, covering a total of approx. 15,000 individuals and 150 biomarkers.

## Results

### Participant characteristics

In MARK-AGE (n = 3,187) 12% rated their health as excellent, 35% as very good, 41% as good, 11% as fair and 1.4% as poor. In CAMB (n = 5,335) the figures were 9.1%, 41%, 40%, 8.6% and 1.4%, respectively. In Health 2000 (n = 6,444) 32% rated their health as good, 30% as rather good, 27% as moderate, 8% as rather poor and 3.5% as poor. Table 1 shows the participants’ characteristics by SRH. In all three data sets poorer SRH was associated with a higher number of diseases and poorer physical functioning (Table 1; Chi-square test: p < 0.001).

### Biomarkers associated with SRH

Out of the 150 biomarkers investigated (Fig. 1A, Supplementary information 1, Supplementary information 2: Table S1 and S2), 57 were significantly associated with SRH in linear regression analyses that were adjusted for age and gender (model i). Table 1 shows the means and medians by SRH for these 57 biomarkers. Eleven of them were available in two or three data sets (Table 2, Supplementary information 2: Table S1 and S2) and they showed similar directions of associations across the data sets. Of the 57 biomarkers, 46 were available for analysis in one data set only (Table 3, Supplementary information 2: Table S1 and S2).

When additionally adjusted for number of diseases and physical functioning (model ii), 26 biomarkers were still associated with SRH. Seven of these biomarkers were available in two or three data sets (Table 2) and 19 biomarkers in one data set only (Table 3). A schema of the analysis pipeline and summaries of the findings are shown in Supplementary information 2 (Figure S1, Table S1 and S2) and Fig. 1B.

As examples of replicated and new findings representing different biological domains, Fig. 2 shows the associations of eight selected biomarkers, categorized as quartiles, with SRH. Most of these analyses showed a graded association of poorer SRH with poorer biomarker levels.

### Mortality analysis

In the Health 2000 data the association between SRH and mortality (iii) was analysed using Cox regression analysis. Out of 5,957 participants with a full set of data for relevant variables, 1,207 (20%) died between 2000 and 2015. The average survival time for the deceased participants was 8.1 (SD 4.2) years.

A graded relationship was found between SRH and mortality in models adjusting for age and gender, and then additionally for diseases (Fig. 3). In the third model the analysis was further adjusted for ten biomarkers that were significantly associated with SRH (model ii, results shown in Tables 2 and 3: leptin, apolipoprotein B, cotinine, HbA1C, HDL:total cholesterol ratio, HDL cholesterol, triglycerides, 25-hydroxy-vitaminD, gamma-glutamyltransferase and CRP). The association between SRH and mortality was weakened after the addition of these biomarkers to the model, but it still remained significant: hazard ratios (95% CIs) 1.1 (0.9, 1.4), 1.3 (1.0, 1.5) and 2.0 (1.6, 2.5), respectively.

### Analyses for individuals without diseases

All analyses were also conducted separately for subsamples of individuals who had no disease diagnoses (MARK-AGE n = 1657, CAMB n = 2499, Health 2000 n = 2574). Descriptive statistics for these participants are shown in Supplementary information 2: Table S3. In linear regression models adjusted for age and gender, 8 biomarkers that were available for analysis in two or three data sets (p < 0.05 in all data sets ) and 8 biomarkers that were available in one data set (Bonferroni-adjusted p value < 0.05) were significantly associated with SRH (Supplementary information 2: Table S1, S4 and S5). As in the results for the whole data, CRP, triglycerides and HDL cholesterol showed a significant association in all three data sets, and vitamin D in the two data sets where it was available. For the biomarkers that in the full data were significantly associated with SRH, with only few exceptions, the direction of the associations was the same among the “healthy” participants, although the associations in the latter group did not always reach statistical significance. Results for all biomarkers from the analysis in healthy individuals are shown in Supplementary information 3.

For the subsample of individuals without diagnoses (n = 2,408) in the Health 2000 data, the association between SRH and mortality was analysed using Cox regression analysis. During the 15-year follow-up the number of deaths was 193 (8%), and average survival time for the deceased participants in this sample was 9.2 (SD 4.0) years. In this subsample SRH was a significant dose-responsive predictor of mortality (Supplementary information 2: Figure S2). When the biomarkers that showed a significant association with SRH in the full data analysis (Table 2 and 3, model (ii) were included in the model, and when good SRH was set as the reference category, hazard ratios (95% CIs) for rather good, moderate and poor SHR were 1.2 (0.9, 1.7), 1.3 (0.9, 1.9) and 2.3 (1.3, 3.8), respectively.

## Discussion

The underlying assumption of this work was that SRH is a more comprehensive and sensitive indicator of the condition of the human organism than medical diagnoses or measures of physical functioning alone. We hypothesized that if this was true, SRH should show an association with blood and urine biomarkers that reflect the physiological regulation of the organism. Therefore, we analysed 150 biomarkers from almost 15,000 participants enrolled in three population-based studies. Altogether 57 biomarkers showed a significant association with SRH, and for 26 of them the association was upheld when the number of chronic diseases and physical functioning were taken into account. In subsamples of individuals without chronic diseases, 16 biomarkers were associated with SRH. These associations were almost exclusively in a logical direction, i.e. a “worse” biomarker level was associated with poorer SRH and vice versa. Moreover, biomarkers weakened the association between SRH and mortality.

We had no a priori hypothesis as to which biomarkers are important regarding SRH. In this explorative study we included all blood and urine measures that were available in the study samples. Our results confirm the previous evidence for most biomarkers that have been reported to be associated with SRH, and they additionally reveal a large number of new associations. These biomarkers are descriptive of various biological systems of the human body, including inflammation (e.g. CRP), lipid and glucose metabolisms (e.g. cholesterol and HbA1C), oxidative stress (e.g. protein carbonyls) and tissue damage (cell-free DNA), as well as of lifestyles and environmental exposures (e.g. carotenoids, vitamin D, cotinine). Many of the biomarkers associated with SRH are also known to be biomarkers of ageing23.

When selected biomarkers—CRP, HDL cholesterol, HbA1C, 25-hydroxyl-vitamin-D, zeaxanthin, apolipoprotein-B, cell-free DNA and protein carbonyls—were picked up as examples and examined as quartiles, poorer biomarker levels were fairly constantly associated with higher odds for poorer SRH in all data sets (Fig. 2). CRP is a proinflammatory marker and known to be associated with SRH13,14,15. A few studies have also reported an association between poorer SRH and lower HDL cholesterol3,7,8,9 and higher HbA1C18 levels. In our study lower vitamin D level was associated with poorer SRH in both data sets where it was available, and a similar, strong association was also seen among individuals without chronic conditions. Previous studies with smaller samples have likewise shown an association between vitamin D and SRH19,20. Recent studies have connected low vitamin D concentration with multiple extra-skeletal processes such as cancer progression, coronary heart disease, depression and a range of immune functions24,25,26. The mechanisms of these associations are not well known, but it has been suggested that low vitamin D level should be understood as a marker of ill health rather than a causal factor27,28.

This is the first study to report associations between SRH and e.g. zeaxanthin, apolipoprotein-B, cell-free DNA and protein carbonyls. Zeaxanthin is a carotenoid pigment present in the eye and obtained from the diet (e.g. egg yolk and orange peppers)29. It has antioxidative properties and is suggested to have a protective role against eye diseases (especially age-related macular degeneration) as well as cardiovascular diseases and cancer30. Apolipoprotein B is mostly known as the LDL carrier protein, and it is an important contributor to atherosclerosis and cardiovascular disease31. Circulating cell-free DNA is a marker of cellular death and tissue damage in many acute and chronic conditions (e.g. sepsis, trauma, aseptic inflammation, cardiovascular diseases and cancer)32,33,34,35,36,37. Elevated levels of protein carbonyls (i.e. plasma protein oxidation levels) are a marker of oxidative stress and observed in various pathologies such as Alzheimer’s disease, rheumatoid arthritis, diabetes, sepsis, renal dysfunction and respiratory failure38.

We suggest that there are three main pathways through which biomarkers measured in blood or urine can affect SRH. First, several biomarkers are characteristics of clinical diagnoses. For certain biomarkers such as cholesterol or glucose levels the role is well-known as is their significance as risk factors of disease. Individuals who are asked to rate their own health may be inclined to interpret high values (if they know them) as signs of poorer health. Yet respondents may not necessarily consider their biomarker levels or even be aware of them when asked to assess their health, but instead consider their disease diagnoses, symptoms or decreased physical functioning caused by their diseases. In this case the association between biomarkers and SRH is indirect and mediated by the association of SRH with diseases known to the respondents. This hypothesis for disease pathway is supported by the finding that associations between biomarkers and SRH were more marked among individuals with disease than those without.

Second, it is known that particularly individuals without major health problems take account of health-related lifestyles and behavioural risk factors as components of SRH6,39. In the present study better SRH was associated with higher levels of carotenoids (zeaxanthin, beta-carotene, lutein, beta-cryptoxanthin and beta-carotene) in plasma, and hippurate and trigonelline in urine. These molecules serve as markers of fruit and vegetable intake. Worse SRH was associated with higher cotinine and gamma-glutamyltransferase levels, which serve as markers for smoking exposure and alcohol consumption, respectively. Again it is plausible that the route from biomarkers to SRH is indirect, i.e. that respondents assess their health as good or poor not on the basis of their biomarker levels but rather particular health-related lifestyles that are considered healthy or unhealthy.

Third, an interesting but poorly understood mechanism is the possibility that a biomarker level or change in biomarker level in the body might stimulate physical sensations, and that these sensations are interpreted as information about the state of one’s health. This is not a novel hypothesis but was suggested by Stenback as early as 1964 and later by Kaplan and Camacho in 1983 as one potential explanation for the association between SRH and mortality40,41. Since these studies, research has continued to accumulate about the interoceptive processes through which information on internal states of the body is communicated to the brain to enable the regulation of vital inner processes and the maintenance of physiological stability21,42,43,44.Most of the research data on interoceptive signalling of humoral processes, i.e. changes in blood substance levels, concerns inflammation: higher circulating levels of inflammatory biomarkers, cytokines, are known to underlie symptoms such as fatigue, general malaise, poor appetite and low mood21,22, and they are known to be associated with poor SRH45,46. In our study, higher levels of inflammatory markers such as CRP and IL-18 showed associations with poor SRH, and for CRP this was true in all three data sets independently of diseases and physical functioning. Yet the empirical evidence on interoceptive signalling of humoral processes remains haphazard and for other blood-measured substances than inflammatory markers almost non-existent.

In our study, as in many previous ones, SRH showed a strong, robust association with mortality. Poorer SRH predicted mortality even after adjusting for chronic conditions in the total sample and in the subsample without chronic conditions. Adjusting for biomarkers weakened this association in both situations, which supports our initial hypothesis.

In conclusion, our study demonstrated strong and logical associations of SRH with numerous biomarkers measured in blood and urine, even independently of chronic diseases and functional status. Poorer SRH was associated with worse biomarker levels and vice versa. These biomarkers were descriptive of many different organ systems and bodily processes. The findings suggest that SRH has a solid biological basis. Our results also lend support to the notion that SRH is a robust, comprehensive but non-specific indicator that can more exhaustively capture health-related processes than many conventional measures of health and disease. To verify the potential of SRH in research and in clinical practice, multidisciplinary research is needed to explore the mechanisms that convey messages from body biology to individuals’ subjective assessments.

## Methods

### Study populations

In MARK-AGE, questionnaires and interviews were conducted and biological data collected between 2008 and 2012 at the following recruiting centres: Hall in Tyrol/Innsbruck (Austria), Namur (Belgium), Esslingen (Germany), Athens and surrounding regions (Greece), Bologna (Italy), Warsaw (Poland), Tampere (Finland) and Leiden (The Netherlands)50,51. The total number of participants in this analysis was 3,187 (age range 18–92 years).

CAMB collected questionnaire data and biological samples from 5,335 participants (age range 48–62 years) in 2009–2011. This data set comprises participants from three cohort studies: the Metropolit 1953 Danish Male Birth Cohort (MP), the Copenhagen Perinatal Cohort (CPC) born in 1959–1961, and the Danish Longitudinal Study on Work, Unemployment and Health (DALWUH) born in 1949 or 195952.

Health 2000 is a nationwide survey conducted in 2000–2001 with a randomly selected sample (n = 8,028) of the Finnish population aged 30 years or over53. For this analysis we used a subsample of 6,444 participants (age range 30– 99 years) with relevant information.

No human participants were directly involved in the current study and only data was taken for the current study.

### Measures

SRH was assessed in interviews and questionnaires. In MARK-AGE and CAMB, SRH was inquired by asking: “In general, would you say your health is…?”; and in Health 2000 by asking: “Is your present state of health…?”. The response options were “poor”, “fair”, “good”, “very good” or “excellent” (CAMB and MARK-AGE) and “poor”, “rather poor”, “moderate”, “rather good” or “good” (Health 2000). In linear regression models, SRH was used as a continuous variable ranging from 0 to 4, with a higher value referring to poorer SRH. For the other analyses, the two poorest SRH categories (poor & fair in MARK-AGE and CAMB; poor & rather poor in Health 2000) were combined into one. Then, as an outcome in logistic regression analyses (in Fig. 2), SRH was dichotomized as poor versus all other categories. In mortality analysis (Health 2000) SRH was grouped into four categories: (1) good, (2) rather good, (3) moderate and (4) poor.

A total of 150 biomarkers measured in blood and urine were available for analysis (full list shown in Supplementary information 1): 134 biomarkers in MARK-AGE, 14 in CAMB, and 28 in Health 2000. All measurements were carried out in accordance with relevant guidelines and regulations. Altogether 17 biomarkers were available in two or three of the study populations. A few biomarkers were measured with two different but equivalent measurements. The proportion of missing biomarker data ranged from 0.05 to 25%. A few biomarkers representing different biological domains were selected for inclusion in Fig. 2 as examples of previously shown and new associations with SRH. For this illustration, biomarker levels were categorized as quartiles.

The indicator of physical functioning came from interviews, questionnaires and hand grip strength measurements. A summary variable for physical functioning was constructed out of three components: (1) ability to walk 0.5 mile (in MARK-AGE), 0.25 mile (in CAMB) or 0.5 km (in Health 2000); (2) ability to run 100 m (in CAMB and Health 2000) or do vigorous activities such as running, lifting heavy objects, participating in strenuous sports (in MARK-AGE); and (3) hand grip strength (MARK-AGE, CAMB, Health 2000). In each of these three components, more points corresponded to poorer functioning. The components of walking and running & vigorous activities were scored as 0 = no limitations, 1 = moderate limitations and 2 = highly limited or cannot do at all. Hand grip strength was grouped in tertiles (categories 0, 1 and 2). The scores from the three components were added together to obtain a sum score of physical functioning, ranging from 0 to 6.

Disease diagnoses, including cardiovascular diseases, hypertension, diabetes, cancer/tumour, respiratory diseases and arthritis were obtained from interview and questionnaire data. The variable “number of diseases” ranged from 0 to 6 diseases, but in the final analyses it was categorized as 0, 1, 2, 3 or 4 + diseases. In addition, subsamples with participants without any of the above mentioned diagnoses were extracted in each data set (MARK-AGE n = 1,657; CAMB n = 2,499; Health 2000 n = 2574; characteristics in Supplementary information 2: Table S3).

Mortality data were only available for the Health 2000 sample. Dates of death were drawn from the National Register on Causes of Death maintained by Statistics Finland, and the length of follow-up was 15 years.

### Statistical analysis

The association between each individual biomarker and SRH was first explored using linear regression analysis in the three independent cross-sectional data sets (MARK-AGE, CAMB and Health 2000). SRH was the dependent variable, and the models were adjusted for (i) age as a continuous variable and gender, and (ii) additionally for the number of diseases and physical functioning. All MARK-AGE analyses were adjusted for recruitment centre. The nominal p-value threshold was set at 0.05 for biomarkers that were available in two or three data sets (specifically, it was required that the p-value threshold had to be met in all data sets), and, to control the multiple testing problem, at Bonferroni-adjusted p-value of 0.05 for biomarkers that were available in one data set only. Additionally, logistic regression models were used to analyse the associations of eight selected biomarkers, categorized as quartiles, with poor SRH, adjusted for age and gender. The results of the eight selected biomarkers were visualized as forest plots (Fig. 2).

Next, the association of SRH with all-cause mortality in the Health 2000 data set was analysed using Cox proportional hazard modelling (iii). 1) The model was adjusted for (1) age and gender; (2) additionally for number of diseases; and (3) furthermore additionally for the biomarkers that were associated with SRH in the linear regression analysis (Model ii). The nominal p-value threshold was set at 0.05. These results were visualized as forest plot (Fig. 3).

Finally, we repeated the analyses of the associations between SRH and the biomarkers in all three data sets and the mortality analysis in the Health 2000 data in subsamples without disease diagnoses. The criteria for statistical significance were the same as in the main analysis.

The data was processed, analysed and visualized using R software (R 3.4.0) and IBM SPSS software version 24.0 (IBM Corp., Armonk, New York, USA). In each model, participants with missing data for a biomarker, mortality, age, gender, SRH, physical functioning or number of diseases were excluded from the analyses.

### Ethics approval

Human participants were not directly involved in the current study and only existing data was taken for the current analysis. The study was conducted in accordance with the Declaration of Helsinki ethical principles and all research participants gave their informed consent to be part of the study. The studies (MARK-AGE, CAMB, Health 2000) were approved by the local ethics committees51,52,53.