Self-rated health in individuals with and without disease is associated with multiple biomarkers representing multiple biological domains

Self-rated health (SRH) is one of the most frequently used indicators in health and social research. Its robust association with mortality in very different populations implies that it is a comprehensive measure of health status and may even reflect the condition of the human organism beyond clinical diagnoses. Yet the biological basis of SRH is poorly understood. We used data from three independent European population samples (N approx. 15,000) to investigate the associations of SRH with 150 biomolecules in blood or urine (biomarkers). Altogether 57 biomarkers representing different organ systems were associated with SRH. In almost half of the cases the association was independent of disease and physical functioning. Biomarkers weakened but did not remove the association between SRH and mortality. We propose three potential pathways through which biomarkers may be incorporated into an individual’s subjective health assessment, including (1) their role in clinical diseases; (2) their association with health-related lifestyles; and (3) their potential to stimulate physical sensations through interoceptive mechanisms. Our findings indicate that SRH has a solid biological basis and it is a valid but non-specific indicator of the biological condition of the human organism.


Results
Participant characteristics. In MARK-AGE (n = 3,187) 12% rated their health as excellent, 35% as very good, 41% as good, 11% as fair and 1.4% as poor. In CAMB (n = 5,335) the figures were 9.1%, 41%, 40%, 8.6% and 1.4%, respectively. In Health 2000 (n = 6,444) 32% rated their health as good, 30% as rather good, 27% as moderate, 8% as rather poor and 3.5% as poor. Table 1 shows the participants' characteristics by SRH. In all three data sets poorer SRH was associated with a higher number of diseases and poorer physical functioning (Table 1; Chi-square test: p < 0.001).
Biomarkers associated with SRH. Out of the 150 biomarkers investigated (Fig. 1A, Supplementary information 1, Supplementary information 2: Table S1 and S2), 57 were significantly associated with SRH in linear regression analyses that were adjusted for age and gender (model i). Table 1 shows the means and medians by SRH for these 57 biomarkers. Eleven of them were available in two or three data sets ( Table 2, Supplementary  information 2: Table S1 and S2) and they showed similar directions of associations across the data sets. Of the 57 biomarkers, 46 were available for analysis in one data set only (Table 3, Supplementary information 2: Table S1 and S2).
When additionally adjusted for number of diseases and physical functioning (model ii), 26 biomarkers were still associated with SRH. Seven of these biomarkers were available in two or three data sets (

Concentration of biomarker, unit, mean (median)
Amino acid metabolism Alanine minotransferase, U/l 24 (22) 26 (  www.nature.com/scientificreports/ biomarkers in one data set only (Table 3). A schema of the analysis pipeline and summaries of the findings are shown in Supplementary information 2 ( Figure S1, Table S1 and S2) and Fig. 1B. As examples of replicated and new findings representing different biological domains, Fig. 2 shows the associations of eight selected biomarkers, categorized as quartiles, with SRH. Most of these analyses showed a graded association of poorer SRH with poorer biomarker levels.

Mortality analysis.
In the Health 2000 data the association between SRH and mortality (iii) was analysed using Cox regression analysis. Out of 5,957 participants with a full set of data for relevant variables, 1,207 (20%) died between 2000 and 2015. The average survival time for the deceased participants was 8.1 (SD 4.2) years. www.nature.com/scientificreports/ A graded relationship was found between SRH and mortality in models adjusting for age and gender, and then additionally for diseases (Fig. 3). In the third model the analysis was further adjusted for ten biomarkers that were significantly associated with SRH (model ii, results shown in Tables 2 and 3: leptin, apolipoprotein B, cotinine, HbA1C, HDL:total cholesterol ratio, HDL cholesterol, triglycerides, 25-hydroxy-vitaminD, gammaglutamyltransferase and CRP). The association between SRH and mortality was weakened after the addition of these biomarkers to the model, but it still remained significant: hazard ratios (95% CIs) 1.1 (0.9, 1.4), 1.3 (1.0, 1.5) and 2.0 (1.6, 2.5), respectively.
Analyses for individuals without diseases. All analyses were also conducted separately for subsamples of individuals who had no disease diagnoses (MARK-AGE n = 1657, CAMB n = 2499, Health 2000 n = 2574). Descriptive statistics for these participants are shown in Supplementary information 2: Table S3. In linear regression models adjusted for age and gender, 8 biomarkers that were available for analysis in two or three data sets (p < 0.05 in all data sets ) and 8 biomarkers that were available in one data set (Bonferroni-adjusted p value < 0.05) were significantly associated with SRH (Supplementary information 2: Table S1, S4 and S5). As in the results for the whole data, CRP, triglycerides and HDL cholesterol showed a significant association in all three data sets, and vitamin D in the two data sets where it was available. For the biomarkers that in the full data were significantly associated with SRH, with only few exceptions, the direction of the associations was the same among the "healthy" participants, although the associations in the latter group did not always reach statistical significance. Results for all biomarkers from the analysis in healthy individuals are shown in Supplementary information 3.
For the subsample of individuals without diagnoses (n = 2,408) in the Health 2000 data, the association between SRH and mortality was analysed using Cox regression analysis. During the 15-year follow-up the number of deaths was 193 (8%), and average survival time for the deceased participants in this sample was 9.2 (SD 4.0) years. In this subsample SRH was a significant dose-responsive predictor of mortality (Supplementary information 2: Figure S2). When the biomarkers that showed a significant association with SRH in the full data analysis (Table 2 and 3, model (ii) were included in the model, and when good SRH was set as the reference category, hazard ratios (95% CIs) for rather good, moderate and poor SHR were 1.2 (0.9, 1.7), 1.3 (0.9, 1.9) and 2.3 (1.3, 3.8), respectively.

Figure 1.
A summary of (A) the number of biomarkers in the analysis in the three data sets and (B) biomarkers associated with SRH, adjusted for age and gender (Model i, in the full samples). Sample types are indicated with colors: red = blood, yellow = urine, and * symbol indicates that association was significant also after the adjustment for diseases and physical functioning (Model ii, in the full samples). www.nature.com/scientificreports/

Discussion
The underlying assumption of this work was that SRH is a more comprehensive and sensitive indicator of the condition of the human organism than medical diagnoses or measures of physical functioning alone. We hypothesized that if this was true, SRH should show an association with blood and urine biomarkers that reflect the physiological regulation of the organism. Therefore, we analysed 150 biomarkers from almost 15,000 participants enrolled in three population-based studies. Altogether 57 biomarkers showed a significant association with SRH, and for 26 of them the association was upheld when the number of chronic diseases and physical functioning were taken into account. In subsamples of individuals without chronic diseases, 16 biomarkers were associated with SRH. These associations were almost exclusively in a logical direction, i.e. a "worse" biomarker level was associated with poorer SRH and vice versa. Moreover, biomarkers weakened the association between SRH and mortality. We had no a priori hypothesis as to which biomarkers are important regarding SRH. In this explorative study we included all blood and urine measures that were available in the study samples. Our results confirm the previous evidence for most biomarkers that have been reported to be associated with SRH, and they additionally reveal a large number of new associations. These biomarkers are descriptive of various biological systems of the human body, including inflammation (e.g. CRP), lipid and glucose metabolisms (e.g. cholesterol and HbA1C), oxidative stress (e.g. protein carbonyls) and tissue damage (cell-free DNA), as well as of lifestyles and environmental exposures (e.g. carotenoids, vitamin D, cotinine). Many of the biomarkers associated with SRH are also known to be biomarkers of ageing 23 .
When selected biomarkers-CRP, HDL cholesterol, HbA1C, 25-hydroxyl-vitamin-D, zeaxanthin, apolipoprotein-B, cell-free DNA and protein carbonyls-were picked up as examples and examined as quartiles, poorer biomarker levels were fairly constantly associated with higher odds for poorer SRH in all data sets (Fig. 2). CRP is a proinflammatory marker and known to be associated with SRH [13][14][15] . A few studies have also reported an association between poorer SRH and lower HDL cholesterol 3,7-9 and higher HbA1C 18 levels. In our study lower vitamin D level was associated with poorer SRH in both data sets where it was available, and a similar,    15-year follow-up). In Cox proportional hazards models (iii) hazard ratios (HRs) and 95% CIs were adjusted for (1) age and gender; (2) then additionally for number of diseases; and then (3) additionally for 10 biomarkers. www.nature.com/scientificreports/ samples have likewise shown an association between vitamin D and SRH 19,20 . Recent studies have connected low vitamin D concentration with multiple extra-skeletal processes such as cancer progression, coronary heart disease, depression and a range of immune functions [24][25][26] . The mechanisms of these associations are not well known, but it has been suggested that low vitamin D level should be understood as a marker of ill health rather than a causal factor 27,28 . This is the first study to report associations between SRH and e.g. zeaxanthin, apolipoprotein-B, cell-free DNA and protein carbonyls. Zeaxanthin is a carotenoid pigment present in the eye and obtained from the diet (e.g. egg yolk and orange peppers) 29 . It has antioxidative properties and is suggested to have a protective role against eye diseases (especially age-related macular degeneration) as well as cardiovascular diseases and cancer 30 . Apolipoprotein B is mostly known as the LDL carrier protein, and it is an important contributor to atherosclerosis and cardiovascular disease 31 . Circulating cell-free DNA is a marker of cellular death and tissue damage in many acute and chronic conditions (e.g. sepsis, trauma, aseptic inflammation, cardiovascular diseases and cancer) [32][33][34][35][36][37] . Elevated levels of protein carbonyls (i.e. plasma protein oxidation levels) are a marker of oxidative stress and observed in various pathologies such as Alzheimer's disease, rheumatoid arthritis, diabetes, sepsis, renal dysfunction and respiratory failure 38 .
We suggest that there are three main pathways through which biomarkers measured in blood or urine can affect SRH. First, several biomarkers are characteristics of clinical diagnoses. For certain biomarkers such as cholesterol or glucose levels the role is well-known as is their significance as risk factors of disease. Individuals who are asked to rate their own health may be inclined to interpret high values (if they know them) as signs of poorer health. Yet respondents may not necessarily consider their biomarker levels or even be aware of them when asked to assess their health, but instead consider their disease diagnoses, symptoms or decreased physical functioning caused by their diseases. In this case the association between biomarkers and SRH is indirect and mediated by the association of SRH with diseases known to the respondents. This hypothesis for disease pathway is supported by the finding that associations between biomarkers and SRH were more marked among individuals with disease than those without.
Second, it is known that particularly individuals without major health problems take account of health-related lifestyles and behavioural risk factors as components of SRH 6,39 . In the present study better SRH was associated with higher levels of carotenoids (zeaxanthin, beta-carotene, lutein, beta-cryptoxanthin and beta-carotene) in plasma, and hippurate and trigonelline in urine. These molecules serve as markers of fruit and vegetable intake. Worse SRH was associated with higher cotinine and gamma-glutamyltransferase levels, which serve as markers for smoking exposure and alcohol consumption, respectively. Again it is plausible that the route from biomarkers to SRH is indirect, i.e. that respondents assess their health as good or poor not on the basis of their biomarker levels but rather particular health-related lifestyles that are considered healthy or unhealthy.
Third, an interesting but poorly understood mechanism is the possibility that a biomarker level or change in biomarker level in the body might stimulate physical sensations, and that these sensations are interpreted as information about the state of one's health. This is not a novel hypothesis but was suggested by Stenback as early as 1964 and later by Kaplan and Camacho in 1983 as one potential explanation for the association between SRH and mortality 40,41 . Since these studies, research has continued to accumulate about the interoceptive processes through which information on internal states of the body is communicated to the brain to enable the regulation of vital inner processes and the maintenance of physiological stability 21,[42][43][44] .Most of the research data on interoceptive signalling of humoral processes, i.e. changes in blood substance levels, concerns inflammation: higher circulating levels of inflammatory biomarkers, cytokines, are known to underlie symptoms such as fatigue, general malaise, poor appetite and low mood 21,22 , and they are known to be associated with poor SRH 45,46 . In our study, higher levels of inflammatory markers such as CRP and IL-18 showed associations with poor SRH, and for CRP this was true in all three data sets independently of diseases and physical functioning. Yet the empirical evidence on interoceptive signalling of humoral processes remains haphazard and for other blood-measured substances than inflammatory markers almost non-existent.
In our study, as in many previous ones, SRH showed a strong, robust association with mortality. Poorer SRH predicted mortality even after adjusting for chronic conditions in the total sample and in the subsample without chronic conditions. Adjusting for biomarkers weakened this association in both situations, which supports our initial hypothesis.
We were able to utilize three large population-based data sets from multiple European countries, but, unfortunately, not all biomarkers were available in more than one data set, which would allowed the principles of conventional replication studies. For some of the biomarkers investigated the association with SRH had already been reported earlier. The strengths of our study included the fact that we had access to a large number of new biomarkers; that several of them were available in more than one data set; that we had access to data on chronic conditions and physical functioning; and that in one sample it was also possible to investigate mortality. Multimorbidity indicated by the number of disease diagnoses is an effective descriptive of health and has prognostic value [47][48][49] . In our analysis, the six clinical diagnoses available for analysis in the three study samples were common and chronic. However, several important diagnostic categories were absent from our analysis, and we had no information on disease severity other than physical functioning. Therefore, the data available was not ideal to adjust for disease. These are the major limitations of the study. In our exploratory analyses, linear regression models were used to provide an easily understandable overview on the associations of SRH, basically an ordinal variable, with biomarkers; this approach is consistent with several earlier studies, and based on observed continuity in its association with many other health variables. Further, because of the explorative approach, we decided not to construct organ-specific or cluster-based groupings of the biomarkers, and as the role of individual biomarkers in connection to SRH is not known, we decided to include all available measures in our analyses without selection. The mechanisms linking biomarkers with SRH were also beyond the scope of this study. Further studies are needed to shed light on the full pathways between SRH and the biological state of the body. www.nature.com/scientificreports/ In conclusion, our study demonstrated strong and logical associations of SRH with numerous biomarkers measured in blood and urine, even independently of chronic diseases and functional status. Poorer SRH was associated with worse biomarker levels and vice versa. These biomarkers were descriptive of many different organ systems and bodily processes. The findings suggest that SRH has a solid biological basis. Our results also lend support to the notion that SRH is a robust, comprehensive but non-specific indicator that can more exhaustively capture health-related processes than many conventional measures of health and disease. To verify the potential of SRH in research and in clinical practice, multidisciplinary research is needed to explore the mechanisms that convey messages from body biology to individuals' subjective assessments. No human participants were directly involved in the current study and only data was taken for the current study.

Methods
Measures. SRH was assessed in interviews and questionnaires. In MARK-AGE and CAMB, SRH was inquired by asking: "In general, would you say your health is…?"; and in Health 2000 by asking: "Is your present state of health…?". The response options were "poor", "fair", "good", "very good" or "excellent" (CAMB and MARK-AGE) and "poor", "rather poor", "moderate", "rather good" or "good" (Health 2000). In linear regression models, SRH was used as a continuous variable ranging from 0 to 4, with a higher value referring to poorer SRH. For the other analyses, the two poorest SRH categories (poor & fair in MARK-AGE and CAMB; poor & rather poor in Health 2000) were combined into one. Then, as an outcome in logistic regression analyses (in Fig. 2), SRH was dichotomized as poor versus all other categories. In mortality analysis (Health 2000) SRH was grouped into four categories: (1) good, (2) rather good, (3) moderate and (4) poor.
A total of 150 biomarkers measured in blood and urine were available for analysis (full list shown in Supplementary information 1): 134 biomarkers in MARK-AGE, 14 in CAMB, and 28 in Health 2000. All measurements were carried out in accordance with relevant guidelines and regulations. Altogether 17 biomarkers were available in two or three of the study populations. A few biomarkers were measured with two different but equivalent measurements. The proportion of missing biomarker data ranged from 0.05 to 25%. A few biomarkers representing different biological domains were selected for inclusion in Fig. 2 as examples of previously shown and new associations with SRH. For this illustration, biomarker levels were categorized as quartiles.
The indicator of physical functioning came from interviews, questionnaires and hand grip strength measurements. A summary variable for physical functioning was constructed out of three components: (1) ability to walk 0.5 mile (in MARK-AGE), 0.25 mile (in CAMB) or 0.5 km (in Health 2000); (2) ability to run 100 m (in CAMB and Health 2000) or do vigorous activities such as running, lifting heavy objects, participating in strenuous sports (in MARK-AGE); and (3) hand grip strength (MARK-AGE, CAMB, Health 2000). In each of these three components, more points corresponded to poorer functioning. The components of walking and running & vigorous activities were scored as 0 = no limitations, 1 = moderate limitations and 2 = highly limited or cannot do at all. Hand grip strength was grouped in tertiles (categories 0, 1 and 2). The scores from the three components were added together to obtain a sum score of physical functioning, ranging from 0 to 6.
Disease diagnoses, including cardiovascular diseases, hypertension, diabetes, cancer/tumour, respiratory diseases and arthritis were obtained from interview and questionnaire data. The variable "number of diseases" ranged from 0 to 6 diseases, but in the final analyses it was categorized as 0, 1, 2, 3 or 4 + diseases. In addition, subsamples with participants without any of the above mentioned diagnoses were extracted in each data set (MARK-AGE n = 1,657; CAMB n = 2,499; Health 2000 n = 2574; characteristics in Supplementary information 2: Table S3).
Mortality data were only available for the Health 2000 sample. Dates of death were drawn from the National Register on Causes of Death maintained by Statistics Finland, and the length of follow-up was 15 years. Statistical analysis. The association between each individual biomarker and SRH was first explored using linear regression analysis in the three independent cross-sectional data sets (MARK-AGE, CAMB and Health 2000). SRH was the dependent variable, and the models were adjusted for (i) age as a continuous variable and gender, and (ii) additionally for the number of diseases and physical functioning. All MARK-AGE analyses were adjusted for recruitment centre. The nominal p-value threshold was set at 0.05 for biomarkers that were available in two or three data sets (specifically, it was required that the p-value threshold had to be met in all data sets), and, to control the multiple testing problem, at Bonferroni-adjusted p-value of 0.05 for biomarkers that were available in one data set only. Additionally, logistic regression models were used to analyse the associations of www.nature.com/scientificreports/ eight selected biomarkers, categorized as quartiles, with poor SRH, adjusted for age and gender. The results of the eight selected biomarkers were visualized as forest plots (Fig. 2). Next, the association of SRH with all-cause mortality in the Health 2000 data set was analysed using Cox proportional hazard modelling (iii). 1) The model was adjusted for (1) age and gender; (2) additionally for number of diseases; and (3) furthermore additionally for the biomarkers that were associated with SRH in the linear regression analysis (Model ii). The nominal p-value threshold was set at 0.05. These results were visualized as forest plot (Fig. 3).
Finally, we repeated the analyses of the associations between SRH and the biomarkers in all three data sets and the mortality analysis in the Health 2000 data in subsamples without disease diagnoses. The criteria for statistical significance were the same as in the main analysis.
The data was processed, analysed and visualized using R software (R 3.4.0) and IBM SPSS software version 24.0 (IBM Corp., Armonk, New York, USA). In each model, participants with missing data for a biomarker, mortality, age, gender, SRH, physical functioning or number of diseases were excluded from the analyses.
Ethics approval. Human participants were not directly involved in the current study and only existing data was taken for the current analysis. The study was conducted in accordance with the Declaration of Helsinki ethical principles and all research participants gave their informed consent to be part of the study. The studies (MARK-AGE, CAMB, Health 2000) were approved by the local ethics committees [51][52][53] .

Data availability
The data used in the current study are not publicly available for ethical reasons. However, data are available upon request from the Health 2000 survey, MARK-AGE and CAMB for researchers who meet the criteria for access to confidential data. Data from the MARK-AGE study are available from the MARK-AGE steering committee (contact: Alexander Bürkle, alexander.buerkle@uni-konstanz.de). Data from CAMB are available from the Copenhagen Aging and Midlife Biobank steering committee (https ://camb.ku.dk/, contact: Rikke Lund, rilu@ sund.ku.dk). Health 2000 data are available from THL on request, subject to the submission of approved study proposals and a data transfer agreement (contact: terveys-2000-2011@thl.fi).